[go: up one dir, main page]

Zhang et al., 2007 - Google Patents

Chinese segmentation with a word-based perceptron algorithm

Zhang et al., 2007

View PDF
Document ID
5834750668371237548
Author
Zhang Y
Clark S
Publication year
Publication venue
Proceedings of the 45th annual meeting of the association of computational linguistics

External Links

Snippet

Standard approaches to Chinese word segmentation treat the problem as a tagging task, assigning labels to the characters in the sequence indicating whether the character marks a word boundary. Discriminatively trained models based on local character features are used …
Continue reading at aclanthology.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/277Lexical analysis, e.g. tokenisation, collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2809Data driven translation
    • G06F17/2827Example based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2765Recognition
    • G06F17/2775Phrasal analysis, e.g. finite state techniques, chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • G06F17/2705Parsing
    • G06F17/2715Statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • G06F17/2217Character encodings
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2863Processing of non-latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2872Rule based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/289Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling

Similar Documents

Publication Publication Date Title
Zhang et al. Chinese segmentation with a word-based perceptron algorithm
Malmi et al. Encode, tag, realize: High-precision text editing
Faruqui et al. Morphological inflection generation using character sequence to sequence learning
Baniata et al. A neural machine translation model for Arabic dialects that utilises multitask learning (MTL)
Azmi et al. A survey of automatic Arabic diacritization techniques
US20060015317A1 (en) Morphological analyzer and analysis method
Na Conditional random fields for Korean morpheme segmentation and POS tagging
Zitouni et al. Arabic diacritic restoration approach based on maximum entropy models
von der Wense et al. The LMU system for the CoNLL-SIGMORPHON 2017 shared task on universal morphological reinflection
Shen et al. The role of context in neural morphological disambiguation
Susanto et al. Learning to capitalize with character-level recurrent neural networks: an empirical study
Shafi et al. UNLT: Urdu natural language toolkit
Paripremkul et al. Segmenting words in Thai language using Minimum text units and conditional random Field
Onyenwe et al. Toward an effective igbo part-of-speech tagger
Jibril et al. Anec: An amharic named entity corpus and transformer based recognizer
Wong et al. iSentenizer‐μ: Multilingual Sentence Boundary Detection Model
King et al. The iucl+ system: Word-level language identification via extended markov models
Nhat Minh A Feature-Rich Vietnamese Named Entity Recognition Model
Tran et al. Named entity recognition in Vietnamese documents
Garcia-Martinez et al. Addressing data sparsity for neural machine translation between morphologically rich languages
pal Singh et al. Naive Bayes classifier for word sense disambiguation of Punjabi language
Boroş et al. Large tagset labeling using feed forward neural networks. case study on romanian language
Yu et al. Identification of Code‐Switched Sentences and Words Using Language Modeling Approaches
Shivachi et al. Learning syllables using Conv-LSTM model for swahili word representation and part-of-speech tagging
Kaur et al. Roman to gurmukhi social media text normalization