Zhang et al., 2007 - Google Patents
Chinese segmentation with a word-based perceptron algorithmZhang et al., 2007
View PDF- Document ID
- 5834750668371237548
- Author
- Zhang Y
- Clark S
- Publication year
- Publication venue
- Proceedings of the 45th annual meeting of the association of computational linguistics
External Links
Snippet
Standard approaches to Chinese word segmentation treat the problem as a tagging task, assigning labels to the characters in the sequence indicating whether the character marks a word boundary. Discriminatively trained models based on local character features are used …
- 230000011218 segmentation 0 title abstract description 18
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/2715—Statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Chinese segmentation with a word-based perceptron algorithm | |
Malmi et al. | Encode, tag, realize: High-precision text editing | |
Faruqui et al. | Morphological inflection generation using character sequence to sequence learning | |
Baniata et al. | A neural machine translation model for Arabic dialects that utilises multitask learning (MTL) | |
Azmi et al. | A survey of automatic Arabic diacritization techniques | |
US20060015317A1 (en) | Morphological analyzer and analysis method | |
Na | Conditional random fields for Korean morpheme segmentation and POS tagging | |
Zitouni et al. | Arabic diacritic restoration approach based on maximum entropy models | |
von der Wense et al. | The LMU system for the CoNLL-SIGMORPHON 2017 shared task on universal morphological reinflection | |
Shen et al. | The role of context in neural morphological disambiguation | |
Susanto et al. | Learning to capitalize with character-level recurrent neural networks: an empirical study | |
Shafi et al. | UNLT: Urdu natural language toolkit | |
Paripremkul et al. | Segmenting words in Thai language using Minimum text units and conditional random Field | |
Onyenwe et al. | Toward an effective igbo part-of-speech tagger | |
Jibril et al. | Anec: An amharic named entity corpus and transformer based recognizer | |
Wong et al. | iSentenizer‐μ: Multilingual Sentence Boundary Detection Model | |
King et al. | The iucl+ system: Word-level language identification via extended markov models | |
Nhat Minh | A Feature-Rich Vietnamese Named Entity Recognition Model | |
Tran et al. | Named entity recognition in Vietnamese documents | |
Garcia-Martinez et al. | Addressing data sparsity for neural machine translation between morphologically rich languages | |
pal Singh et al. | Naive Bayes classifier for word sense disambiguation of Punjabi language | |
Boroş et al. | Large tagset labeling using feed forward neural networks. case study on romanian language | |
Yu et al. | Identification of Code‐Switched Sentences and Words Using Language Modeling Approaches | |
Shivachi et al. | Learning syllables using Conv-LSTM model for swahili word representation and part-of-speech tagging | |
Kaur et al. | Roman to gurmukhi social media text normalization |