Héroux et al., 2007 - Google Patents
Automatic ground-truth generation for document image analysis and understandingHéroux et al., 2007
View PDF- Document ID
- 10099098174959031542
- Author
- Héroux P
- Barbu E
- Adam S
- Trupin E
- Publication year
- Publication venue
- Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)
External Links
Snippet
Performance evaluation for document image analysis and understanding is a recurring problem. Many ground-truthed document image databases are now used to evaluate algorithms, but these databases are less useful for the design of a complete system in a …
- 238000010191 image analysis 0 title abstract description 6
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2247—Tree structured documents; Markup, e.g. Standard Generalized Markup Language [SGML], Document Type Definition [DTD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2205—Storage facilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/211—Formatting, i.e. changing of presentation of document
- G06F17/212—Display of layout of document; Preview
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/211—Formatting, i.e. changing of presentation of document
- G06F17/218—Tagging; Marking up ; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2264—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
- G06F17/248—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30908—Information retrieval; Database structures therefor; File system structures therefor of semistructured data, the undelying structure being taken into account, e.g. mark-up language structure data
- G06F17/30914—Mapping or conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30011—Document retrieval systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00442—Document analysis and understanding; Document recognition
- G06K9/00463—Document analysis by extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics, paragraphs, words or letters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11210456B2 (en) | Method relating to preparation of a report | |
| US7305612B2 (en) | Systems and methods for automatic form segmentation for raster-based passive electronic documents | |
| Hurst et al. | Review of automatic document formatting | |
| US6466694B2 (en) | Document image processing device and method thereof | |
| Altamura et al. | Transforming paper documents into XML format with WISDOM++ | |
| US8166037B2 (en) | Semantic reconstruction | |
| US7831908B2 (en) | Method and apparatus for layout of text and image documents | |
| Hadjar et al. | Xed: a new tool for extracting hidden structures from electronic documents | |
| US20020029232A1 (en) | System for sorting document images by shape comparisons among corresponding layout components | |
| EP1696337A2 (en) | Document processing apparatus, document processing method and computer program | |
| US20070171473A1 (en) | Information processing apparatus, Information processing method, and computer program product | |
| US20030101203A1 (en) | Function-based object model for use in website adaptation | |
| Meunier | Optimized XY-cut for determining a page reading order | |
| Peels et al. | Document architecture and text formatting | |
| Héroux et al. | Automatic ground-truth generation for document image analysis and understanding | |
| Gemelli et al. | Datasets and annotations for layout analysis of scientific articles | |
| JP2009110500A (en) | Document processing apparatus, document processing method, and document processing apparatus program | |
| US6678699B2 (en) | Visual indexing of displayable digital documents | |
| Myka et al. | Automatic hypertext conversion of paper document collections | |
| CN119807447A (en) | A file retrieval method, system, product and readable storage medium | |
| JP4787955B2 (en) | Method, system, and program for extracting keywords from target document | |
| CN117634447A (en) | Method and system for automatically generating fine-grained annotated document layout analysis data sets | |
| Chanod et al. | From legacy documents to xml: A conversion framework | |
| Ingram et al. | Building datasets to support information extraction and structure parsing from electronic theses and dissertations | |
| KR20020061443A (en) | Method and system for data gathering, processing and presentation using computer network |