WO2009145605A3 - A document categorization system - Google Patents
A document categorization system Download PDFInfo
- Publication number
- WO2009145605A3 WO2009145605A3 PCT/MY2009/000065 MY2009000065W WO2009145605A3 WO 2009145605 A3 WO2009145605 A3 WO 2009145605A3 MY 2009000065 W MY2009000065 W MY 2009000065W WO 2009145605 A3 WO2009145605 A3 WO 2009145605A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- ontology
- objects
- text
- attributes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
Document categorization system is a tool for matching objects in an ontology (101) against text documents. Relationship based object document matcher (100) will parse 5 the text document and returns the object name to which it's related to in the ontology (101 ). These attributes comprise of data objects and its relationships, either object or data type relationships. The Relationship based object document matcher (100) matches the attributes of all the objects in the ontology against the contents of the text document. When the objects attributes are matched in the text, the document 0 matching probability index increases. The document with the highest document matching probability index will create a new relationship in the ontology (101 ) between the document and the matched object.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| MYPI20081851 | 2008-05-30 | ||
| MYPI20081851A MY158574A (en) | 2008-05-30 | 2008-05-30 | A document categorization system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2009145605A2 WO2009145605A2 (en) | 2009-12-03 |
| WO2009145605A3 true WO2009145605A3 (en) | 2010-02-25 |
Family
ID=41377819
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/MY2009/000065 Ceased WO2009145605A2 (en) | 2008-05-30 | 2009-05-29 | A document categorization system |
Country Status (2)
| Country | Link |
|---|---|
| MY (1) | MY158574A (en) |
| WO (1) | WO2009145605A2 (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040088157A1 (en) * | 2002-10-30 | 2004-05-06 | Motorola, Inc. | Method for characterizing/classifying a document |
| US7213205B1 (en) * | 1999-06-04 | 2007-05-01 | Seiko Epson Corporation | Document categorizing method, document categorizing apparatus, and storage medium on which a document categorization program is stored |
-
2008
- 2008-05-30 MY MYPI20081851A patent/MY158574A/en unknown
-
2009
- 2009-05-29 WO PCT/MY2009/000065 patent/WO2009145605A2/en not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7213205B1 (en) * | 1999-06-04 | 2007-05-01 | Seiko Epson Corporation | Document categorizing method, document categorizing apparatus, and storage medium on which a document categorization program is stored |
| US20040088157A1 (en) * | 2002-10-30 | 2004-05-06 | Motorola, Inc. | Method for characterizing/classifying a document |
Also Published As
| Publication number | Publication date |
|---|---|
| MY158574A (en) | 2016-10-14 |
| WO2009145605A2 (en) | 2009-12-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Read | A pruned problem transformation method for multi-label classification | |
| WO2011159516A3 (en) | Semantic content searching | |
| WO2009003072A3 (en) | Integrated platform for user input of digital ink | |
| WO2012068238A3 (en) | Shipping system and method with taxonomic tariff harmonization | |
| WO2005076161A3 (en) | System and method for comparative analysis of textual documents | |
| WO2007103352A3 (en) | Systems and methods for document annotation | |
| WO2006008733A3 (en) | A method for determining near duplicate data objects | |
| WO2010105216A3 (en) | System and method for automatic semantic labeling of natural language texts | |
| WO2008049023A9 (en) | Method and system for offline indexing of content and classifying stored data | |
| WO2009023344A3 (en) | Managing status of search index generation in handheld book reader device | |
| WO2008027503A3 (en) | Semantic search engine | |
| WO2008031062A3 (en) | System and method for building and retriving a full text index | |
| WO2007008492A3 (en) | Processing collocation mistakes in documents | |
| WO2005038668A3 (en) | Computer implemented methods and systems for representing multiple schemas and transferring data between different data schemas within a contextual ontology | |
| WO2009006030A3 (en) | A compliance management system | |
| GB0823706D0 (en) | Fast data entry | |
| WO2011119410A3 (en) | A system and methods thereof for mining web based user generated content for creation of term taxonomies | |
| WO2009145605A3 (en) | A document categorization system | |
| Chase Lipton et al. | Thresholding classifiers to maximize F1 score | |
| WO2008126262A1 (en) | Content explanation apparatus and method | |
| Casella et al. | Declining near-infrared flux from the black-hole candidate MAXI J1820+ 070 (ASASSN-18ey) in transition | |
| Elberrichi | Text mining using n-grams | |
| WO2008114316A1 (en) | Electronic document management device and electronic document management program | |
| Yanagimoto et al. | Information filtering using Kullback-Leibler divergence | |
| Mizuno et al. | Informing a robot of object location with both hand-gesture and verbal cues |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09755068 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 09755068 Country of ref document: EP Kind code of ref document: A2 |