Natarajan et al., 2007 - Google Patents
Finding structure in noisy text: topic classification and unsupervised clusteringNatarajan et al., 2007
View PDF- Document ID
- 6464854012431830434
- Author
- Natarajan P
- Prasad R
- Subramanian K
- Saleem S
- Choi F
- Schwartz R
- Publication year
- Publication venue
- International Journal of Document Analysis and Recognition (IJDAR)
External Links
Snippet
This paper addresses two types of classification of noisy, unstructured text such as newsgroup messages:(1) spotting messages containing topics of interest, and (2) automatic conceptual organization of messages without prior knowledge of topics of interest. In …
- 238000000034 method 0 abstract description 23
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/30707—Clustering or classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Papagiannopoulou et al. | Local word vectors guiding keyphrase extraction | |
Liu et al. | Story forest: Extracting events and telling stories from breaking news | |
Grishman | Information extraction | |
Hoffart et al. | Discovering emerging entities with ambiguous names | |
Marquez et al. | A machine learning approach to POS tagging | |
CN107844533A (en) | A kind of intelligent Answer System and analysis method | |
US10417338B2 (en) | External resource identification | |
Tamboli et al. | Authorship analysis and identification techniques: A review | |
John et al. | A supervised keyphrase extraction system | |
CN111563361B (en) | Text label extraction method and device and storage medium | |
Beck et al. | From automatic keyword detection to ontology-based topic modeling | |
US10552459B2 (en) | Classifying a document using patterns | |
Shetty et al. | Auto text summarization with categorization and sentiment analysis | |
Roy et al. | An unsupervised normalization algorithm for noisy text: a case study for information retrieval and stance detection | |
Natarajan et al. | Finding structure in noisy text: topic classification and unsupervised clustering | |
Krishnamani et al. | Forum summarization using topic models and content-metadata sensitive clustering | |
Agirre et al. | Ubc-as: A graph based unsupervised system for induction and classification | |
HaCohen-Kerner et al. | Cross-domain authorship attribution: Author identification using char sequences, word unigrams, and POS-tags features | |
Trieschnigg et al. | Hierarchical topic detection in large digital news archives | |
Arumugam et al. | Similitude Based Segment Graph Construction and Segment Ranking for Automatic Summarization of Text Document | |
Gheni et al. | Suggesting new words to extract keywords from title and abstract | |
Zhukova et al. | ANEA: Automated (named) entity annotation for German domain-specific texts | |
Li et al. | Probabilistic topic model based approach for detecting bursty events from social media data | |
Bradshaw et al. | Evaluating Better Document Representation in Clustering with Varying Complexity. | |
Ding et al. | VSEC-LDA: boosting topic modeling with embedded vocabulary selection |