Liu et al., 2017 - Google Patents

A Bambara tonalization system for word sense disambiguation using differential coding, segmentation and edit operation filtering

Liu et al., 2017

View PDF

Document ID: 17390814033771457807
Author: Liu L; Nouvel D
Publication year: 2017
Publication venue: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

External Links

Cited by

Snippet

In many languages such as Bambara or Arabic, tone markers (diacritics) may be written but are actually often omitted. NLP applications are confronted to ambiguities and subsequent difficulties when processing texts. To circumvent this problem, tonalization may be used, as …

Continue reading at aclanthology.org (PDF) (other versions)

230000011218 segmentation 0 title abstract description 22

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2795—Thesaurus; Synonyms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/211—Formatting, i.e. changing of presentation of document
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis

Similar Documents

Publication	Publication Date	Title
Brants	1999	Cascaded markov models
JP2008504605A (en)	2008-02-14	System and method for spelling correction of non-Roman letters and words
Maamouri et al.	2006	Diacritization: A challenge to Arabic treebank annotation and parsing
Megyesi	2002	Shallow parsing with PoS taggers and linguistic features
Gamallo et al.	2018	Dependency parsing with finite state transducers and compression rules
Sawalha et al.	2014	Automatically generated, phonemic Arabic-IPA pronunciation tiers for the Boundary Annotated Qur’an Dataset for Machine Learning (version 2.0)
Baxi et al.	2024	Recent advancements in computational morphology: A comprehensive survey
Recski	2014	Hungarian noun phrase extraction using rule-based and hybrid methods
Alghamdi et al.	2010	Automatic restoration of Arabic diacritics: a simple, purely statistical approach
Elshafei	2006	Machine generation of Arabic diacritical marks
Liu et al.	2017	A Bambara tonalization system for word sense disambiguation using differential coding, segmentation and edit operation filtering
Vasiu et al.	2020	Enhancing tokenization by embedding romanian language specific morphology
Ibrahim et al.	2014	Amharic sentence parsing using base phrase chunking
Siivola et al.	2007	Morfessor and VariKN machine learning tools for speech and language technology
Le et al.	2008	A maximum entropy approach to sentence boundary detection of Vietnamese texts
Adewole et al.	2017	Token validation in automatic corpus gathering for yoruba language
JP3080066B2 (en)	2000-08-21	Character recognition device, method and storage medium
CN115577712A (en)	2023-01-06	Text error correction method and device
Özge et al.	2022	Diacritics correction in Turkish with context-aware sequence to sequence modeling
Ablimit et al.	2008	Partly supervised Uyghur morpheme segmentation
Hahn et al.	2009	Optimizing CRFs for SLU tasks in various languages using modified training criteria.
Walentynowicz et al.	2019	Tagger for polish computer mediated communication texts
Saraswathi et al.	2007	Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system
CN115034239A (en)	2022-09-09	Hanyue neural machine translation method based on noise reduction prototype sequence
Kumar et al.	2021	Efficient text normalization via hybrid bi-directional lstm