Uemura et al., 2008 - Google Patents
Unsupervised spam detection by document complexity estimationUemura et al., 2008
View PDF- Document ID
- 11862483610358179474
- Author
- Uemura T
- Ikeda D
- Arimura H
- Publication year
- Publication venue
- International Conference on Discovery Science
External Links
Snippet
In this paper, we study a content-based spam detection for a specific type of spams, called blog and bulletin board spams. We develop an efficient unsupervised algorithm DCE that detects spam documents from a mixture of spam and non-spam documents using an entropy …
- 238000001514 detection method 0 title abstract description 28
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/30707—Clustering or classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fournier‐Viger et al. | A survey of pattern mining in dynamic graphs | |
US8386574B2 (en) | Multi-modality classification for one-class classification in social networks | |
US8725666B2 (en) | Information extraction system | |
CN103874994B (en) | Method and apparatus for automatically summarizing the content of an electronic document | |
Hajishirzi et al. | Adaptive near-duplicate detection via similarity learning | |
US9268749B2 (en) | Incremental computation of repeats | |
Liu et al. | Who is. com? Learning to parse WHOIS records | |
US8612364B2 (en) | Method for categorizing linked documents by co-trained label expansion | |
Kantchelian et al. | Robust detection of comment spam using entropy rate | |
Pérez-Díaz et al. | Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification | |
US20040111438A1 (en) | Method and apparatus for populating a predefined concept hierarchy or other hierarchical set of classified data items by minimizing system entrophy | |
Somesha et al. | Classification of phishing email using word embedding and machine learning techniques | |
Tsai et al. | D2S: document-to-sentence framework for novelty detection | |
Choudhury et al. | How difficult is it to develop a perfect spell-checker? A cross-linguistic analysis through complex network approach | |
Pandya et al. | Mated: metadata-assisted twitter event detection system | |
Uemura et al. | Unsupervised spam detection by document complexity estimation | |
Giannella | An improved algorithm for unsupervised decomposition of a multi‐author document | |
Kumar et al. | Sentiment analysis using social and topic context for suicide prediction | |
Burnside et al. | One Day in Twitter: Topic Detection Via Joint Complexity. | |
Prilepok et al. | Spam detection using data compression and signatures | |
Narisawa et al. | Detecting blog spams using the vocabulary size of all substrings in their copies | |
Santos et al. | Spam filtering through anomaly detection | |
de Vel et al. | E-mail authorship attribution for computer forensics | |
Rallapalli et al. | Sense: Semantically enhanced node sequence embedding | |
Chae et al. | Incremental feature selection for efficient classification of dynamic graph bags |