Meknavin et al., 1998 - Google Patents

Combining trigram and winnow in Thai OCR error correction

Meknavin et al., 1998

Document ID: 17369964603234630276
Author: Meknavin S; Kijsirikul B; Chotimongkol A; Nuttee C
Publication year: 1998
Publication venue: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

External Links

Cited by

Snippet

For languages that have no explicit word boundary such as Thai, Chinese and Japanese, correcting words in text is harder than in English because of additional ambiguities in locating error words. The traditional method handles this by hypothesizing that every …

Continue reading at aclanthology.org (PDF) (other versions)

230000011218 segmentation 0 abstract description 6

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing

Similar Documents

Publication	Publication Date	Title
US20210390271A1 (en)	2021-12-16	Neural machine translation systems
US7536297B2 (en)	2009-05-19	System and method for hybrid text mining for finding abbreviations and their definitions
Peng et al.	2001	Self-supervised Chinese word segmentation
Tong et al.	1996	A statistical approach to automatic OCR error correction in context
US6233544B1 (en)	2001-05-15	Method and apparatus for language translation
EP1361522B1 (en)	2011-08-03	A system for automatically annotating training data for a natural language understanding system
US6816830B1 (en)	2004-11-09	Finite state data structures with paths representing paired strings of tags and tag combinations
CN112215013B (en)	2022-04-19	A deep learning-based clone code semantic detection method
Ekbal et al.	2008	Language independent named entity recognition in indian languages
EP1131812A2 (en)	2001-09-12	Method and apparatus for improved part-of-speech tagging
CN113657122B (en)	2023-12-15	A Mongolian-Chinese machine translation method integrating pseudo-parallel corpus with transfer learning
Theeramunkong et al.	2001	Non-dictionary-based Thai word segmentation using decision trees
Ernst-Gerlach et al.	2006	Generating search term variants for text collections with historic spellings
CN109684928A (en)	2019-04-26	Chinese document recognition methods based on Internal retrieval
Singh et al.	2018	Review of real-word error detection and correction methods in text documents
Gupta et al.	2021	Designing and development of stemmer of Dogri using unsupervised learning
Schaback et al.	2007	Multi-level feature extraction for spelling correction
JP5097802B2 (en)	2012-12-12	Japanese automatic recommendation system and method using romaji conversion
Meknavin et al.	1998	Combining trigram and winnow in Thai OCR error correction
CN112784227A (en)	2021-05-11	Dictionary generating system and method based on password semantic structure
CN114154503A (en)	2022-03-08	A Sensitive Data Type Identification Method
US5555345A (en)	1996-09-10	Learning method of neural network
US20230412633A1 (en)	2023-12-21	Apparatus and Method for Predicting Malicious Domains
CN118395987A (en)	2024-07-26	BERT-based landslide hazard assessment named entity identification method of multi-neural network
Zhuang et al.	2005	An OCR post-processing approach based on multi-knowledge