Lita et al., 2003 - Google Patents
TruecasingLita et al., 2003
View PDF- Document ID
- 10611822868773904723
- Author
- Lita L
- Ittycheriah A
- Roukos S
- Kambhatla N
- Publication year
- Publication venue
- Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
External Links
Snippet
Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of∼ 98% on news articles. Task based evaluation …
- 238000001514 detection method 0 abstract description 8
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/3066—Query translation
- G06F17/30669—Translation of the query language, e.g. Chinese to English
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/271—Syntactic parsing, e.g. based on context-free grammar [CFG], unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
- G06F17/279—Discourse representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/274—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2755—Morphological analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lita et al. | Truecasing | |
Banko et al. | Mitigating the paucity-of-data problem: Exploring the effect of training corpus size on classifier performance for natural language processing | |
Habash et al. | Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop | |
Bikel et al. | An algorithm that learns what's in a name | |
US20100332217A1 (en) | Method for text improvement via linguistic abstractions | |
El Hadj et al. | Arabic part-of-speech tagging using the sentence structure | |
Loftsson | Correcting a POS-tagged corpus using three complementary methods | |
Bapat et al. | A paradigm-based finite state morphological analyzer for Marathi | |
Mataoui et al. | A new syntax-based aspect detection approach for sentiment analysis in Arabic reviews | |
Tennage et al. | Transliteration and byte pair encoding to improve tamil to sinhala neural machine translation | |
Islam et al. | A vocabulary-free multilingual neural tokenizer for end-to-end task learning | |
KR20100041019A (en) | Document translation apparatus and its method | |
Comas et al. | Sibyl, a factoid question-answering system for spoken documents | |
Palmer et al. | Information extraction from broadcast news speech data | |
Brown et al. | Capitalization recovery for text | |
Uchimoto et al. | Morphological analysis of the Corpus of Spontaneous Japanese | |
Boulaknadel et al. | Amazighe Named Entity Recognition using a A rule based approach | |
Tukur et al. | Parts-of-speech tagging of Hausa-based texts using hidden Markov model | |
Brooke et al. | Building a lexicon of formulaic language for language learners | |
Rimkutė et al. | Morphological annotation of the Lithuanian corpus | |
Al-Arfaj et al. | Arabic NLP tools for ontology construction from Arabic text: An overview | |
Olinsky et al. | Non-standard word and homograph resolution for asian language text analysis. | |
Sankaravelayuthan et al. | English to Tamil machine translation system using parallel corpus | |
Basnayake et al. | Plagiarism detection in Sinhala language: A software approach | |
May et al. | Surprise! What's in a Cebuano or Hindi Name? |