Meknavin et al., 1998 - Google Patents
Combining trigram and winnow in Thai OCR error correctionMeknavin et al., 1998
View PDF- Document ID
- 17369964603234630276
- Author
- Meknavin S
- Kijsirikul B
- Chotimongkol A
- Nuttee C
- Publication year
- Publication venue
- 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2
External Links
Snippet
For languages that have no explicit word boundary such as Thai, Chinese and Japanese, correcting words in text is harder than in English because of additional ambiguities in locating error words. The traditional method handles this by hypothesizing that every …
- 230000011218 segmentation 0 abstract description 6
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210390271A1 (en) | Neural machine translation systems | |
US7536297B2 (en) | System and method for hybrid text mining for finding abbreviations and their definitions | |
Peng et al. | Self-supervised Chinese word segmentation | |
Tong et al. | A statistical approach to automatic OCR error correction in context | |
US6233544B1 (en) | Method and apparatus for language translation | |
EP1361522B1 (en) | A system for automatically annotating training data for a natural language understanding system | |
US6816830B1 (en) | Finite state data structures with paths representing paired strings of tags and tag combinations | |
CN112215013B (en) | A deep learning-based clone code semantic detection method | |
Ekbal et al. | Language independent named entity recognition in indian languages | |
EP1131812A2 (en) | Method and apparatus for improved part-of-speech tagging | |
CN113657122B (en) | A Mongolian-Chinese machine translation method integrating pseudo-parallel corpus with transfer learning | |
Theeramunkong et al. | Non-dictionary-based Thai word segmentation using decision trees | |
Ernst-Gerlach et al. | Generating search term variants for text collections with historic spellings | |
CN109684928A (en) | Chinese document recognition methods based on Internal retrieval | |
Singh et al. | Review of real-word error detection and correction methods in text documents | |
Gupta et al. | Designing and development of stemmer of Dogri using unsupervised learning | |
Schaback et al. | Multi-level feature extraction for spelling correction | |
JP5097802B2 (en) | Japanese automatic recommendation system and method using romaji conversion | |
Meknavin et al. | Combining trigram and winnow in Thai OCR error correction | |
CN112784227A (en) | Dictionary generating system and method based on password semantic structure | |
CN114154503A (en) | A Sensitive Data Type Identification Method | |
US5555345A (en) | Learning method of neural network | |
US20230412633A1 (en) | Apparatus and Method for Predicting Malicious Domains | |
CN118395987A (en) | BERT-based landslide hazard assessment named entity identification method of multi-neural network | |
Zhuang et al. | An OCR post-processing approach based on multi-knowledge |