[go: up one dir, main page]

Lita et al., 2003 - Google Patents

Truecasing

Lita et al., 2003

View PDF
Document ID
10611822868773904723
Author
Lita L
Ittycheriah A
Roukos S
Kambhatla N
Publication year
Publication venue
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

External Links

Snippet

Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of∼ 98% on news articles. Task based evaluation …
Continue reading at aclanthology.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/277Lexical analysis, e.g. tokenisation, collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • G06F17/3066Query translation
    • G06F17/30669Translation of the query language, e.g. Chinese to English
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/2775Phrasal analysis, e.g. finite state techniques, chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2809Data driven translation
    • G06F17/2827Example based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • G06F17/271Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2785Semantic analysis
    • G06F17/279Discourse representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2872Rule based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2863Processing of non-latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/274Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2755Morphological analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/289Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Similar Documents

Publication Publication Date Title
Lita et al. Truecasing
Banko et al. Mitigating the paucity-of-data problem: Exploring the effect of training corpus size on classifier performance for natural language processing
Habash et al. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop
Bikel et al. An algorithm that learns what's in a name
US20100332217A1 (en) Method for text improvement via linguistic abstractions
El Hadj et al. Arabic part-of-speech tagging using the sentence structure
Loftsson Correcting a POS-tagged corpus using three complementary methods
Bapat et al. A paradigm-based finite state morphological analyzer for Marathi
Mataoui et al. A new syntax-based aspect detection approach for sentiment analysis in Arabic reviews
Tennage et al. Transliteration and byte pair encoding to improve tamil to sinhala neural machine translation
Islam et al. A vocabulary-free multilingual neural tokenizer for end-to-end task learning
KR20100041019A (en) Document translation apparatus and its method
Comas et al. Sibyl, a factoid question-answering system for spoken documents
Palmer et al. Information extraction from broadcast news speech data
Brown et al. Capitalization recovery for text
Uchimoto et al. Morphological analysis of the Corpus of Spontaneous Japanese
Boulaknadel et al. Amazighe Named Entity Recognition using a A rule based approach
Tukur et al. Parts-of-speech tagging of Hausa-based texts using hidden Markov model
Brooke et al. Building a lexicon of formulaic language for language learners
Rimkutė et al. Morphological annotation of the Lithuanian corpus
Al-Arfaj et al. Arabic NLP tools for ontology construction from Arabic text: An overview
Olinsky et al. Non-standard word and homograph resolution for asian language text analysis.
Sankaravelayuthan et al. English to Tamil machine translation system using parallel corpus
Basnayake et al. Plagiarism detection in Sinhala language: A software approach
May et al. Surprise! What's in a Cebuano or Hindi Name?