Zhang et al., 2007 - Google Patents

Chinese segmentation with a word-based perceptron algorithm

Zhang et al., 2007

Document ID: 5834750668371237548
Author: Zhang Y; Clark S
Publication year: 2007
Publication venue: Proceedings of the 45th annual meeting of the association of computational linguistics

External Links

Cited by

Snippet

Standard approaches to Chinese word segmentation treat the problem as a tagging task, assigning labels to the characters in the sequence indicating whether the character marks a word boundary. Discriminatively trained models based on local character features are used …

Continue reading at aclanthology.org (PDF) (other versions)

230000011218 segmentation 0 title abstract description 18

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
- G06F17/2715—Statistical methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling

Similar Documents

Publication	Publication Date	Title
Zhang et al.	2007	Chinese segmentation with a word-based perceptron algorithm
Malmi et al.	2019	Encode, tag, realize: High-precision text editing
Faruqui et al.	2015	Morphological inflection generation using character sequence to sequence learning
Baniata et al.	2018	A neural machine translation model for Arabic dialects that utilises multitask learning (MTL)
Azmi et al.	2015	A survey of automatic Arabic diacritization techniques
US20060015317A1 (en)	2006-01-19	Morphological analyzer and analysis method
Na	2015	Conditional random fields for Korean morpheme segmentation and POS tagging
Zitouni et al.	2009	Arabic diacritic restoration approach based on maximum entropy models
von der Wense et al.	2017	The LMU system for the CoNLL-SIGMORPHON 2017 shared task on universal morphological reinflection
Shen et al.	2016	The role of context in neural morphological disambiguation
Susanto et al.	2016	Learning to capitalize with character-level recurrent neural networks: an empirical study
Shafi et al.	2023	UNLT: Urdu natural language toolkit
Paripremkul et al.	2021	Segmenting words in Thai language using Minimum text units and conditional random Field
Onyenwe et al.	2019	Toward an effective igbo part-of-speech tagger
Jibril et al.	2023	Anec: An amharic named entity corpus and transformer based recognizer
Wong et al.	2014	iSentenizer‐μ: Multilingual Sentence Boundary Detection Model
King et al.	2014	The iucl+ system: Word-level language identification via extended markov models
Nhat Minh	2022	A Feature-Rich Vietnamese Named Entity Recognition Model
Tran et al.	2007	Named entity recognition in Vietnamese documents
Garcia-Martinez et al.	2020	Addressing data sparsity for neural machine translation between morphologically rich languages
pal Singh et al.	2018	Naive Bayes classifier for word sense disambiguation of Punjabi language
Boroş et al.	2013	Large tagset labeling using feed forward neural networks. case study on romanian language
Yu et al.	2013	Identification of Code‐Switched Sentences and Words Using Language Modeling Approaches
Shivachi et al.	2021	Learning syllables using Conv-LSTM model for swahili word representation and part-of-speech tagging
Kaur et al.	2020	Roman to gurmukhi social media text normalization