PE20161166A1 - DOCUMENT CHARACTERIZATION METHOD - Google Patents
DOCUMENT CHARACTERIZATION METHODInfo
- Publication number
- PE20161166A1 PE20161166A1 PE2016001498A PE2016001498A PE20161166A1 PE 20161166 A1 PE20161166 A1 PE 20161166A1 PE 2016001498 A PE2016001498 A PE 2016001498A PE 2016001498 A PE2016001498 A PE 2016001498A PE 20161166 A1 PE20161166 A1 PE 20161166A1
- Authority
- PE
- Peru
- Prior art keywords
- document
- class
- multiclass
- text
- contents
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Creation or modification of classes or clusters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Tourism & Hospitality (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Character Discrimination (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
La presente invencion describe un metodo de caracterizacion automatica de documento, el cual recibe un documento dado de entrada y sin estructurar y da como resultado la asignacion automatica de una o mas clases o categorias de documentos, con los cuales se relacionan los contenidos; la determinacion automatica de una lista de nombres de personas naturales o juridicas encontradas en el texto, la determinacion automatica de otra informacion relevante en el texto y la fecha de emision del documento, que se relaciona con la clase o multiclase del documento. Este metodo es mas rapido, mas completo y mas preciso que la caracterizacion manual o la descripcion manual realizada por personal tecnico legal. El metodo comprende los siguientes pasos o etapas: al recibir un documento digital, de un usuario conectado a una aplicacion Web, desde su propio computador, procesar dicho documento con una aplicacion de Reconocimiento Optico de Caracteres (OCR, por su sigla en ingles); ejecutar un proceso de caracterizacion automatica de documento dentro de la aplicacion y procesar el texto del documento recibido desde el usuario (sobre la base de reglas), asignar una clase o multiclase; reconocer nombres de personas y organizaciones en los contenidos del documento y en conformidad con el texto; reconocer y extraer informacion relevante en funcion de la definicion hecha de clase o multiclase y el conjunto de reglas definidas para esa clase o multiclase; reconocer fechas Relevantes en el documento y asignar un puntuacion a cada una de ellas segun la clase previamente definida, utilizar las reglas establecidas de la misma clase o multiclase; y revisar los contenidos del documento, reconociendo diferentes patrones de texto, tal como una combinacion de palabras claves, sinonimos o terminos equivalentes, dentro de los mismos contenidos del documento, a traves de medios de deteccion.The present invention describes a method of automatic document characterization, which receives a given input and unstructured document and results in the automatic assignment of one or more classes or categories of documents, with which the contents are related; the automatic determination of a list of names of natural or legal persons found in the text, the automatic determination of other relevant information in the text and the date of issuance of the document, which is related to the class or multiclass of the document. This method is faster, more complete and more accurate than manual characterization or manual description performed by legal technical personnel. The method comprises the following steps or stages: upon receiving a digital document from a user connected to a Web application from their own computer, processing said document with an Optical Character Recognition (OCR) application; executing an automatic document characterization process within the application and processing the text of the document received from the user (based on rules), assigning a class or multiclass; recognize names of people and organizations in the contents of the document and in accordance with the text; recognize and extract relevant information based on the definition made of class or multiclass and the set of rules defined for that class or multiclass; recognize Relevant dates in the document and assign a score to each of them according to the previously defined class, use the established rules of the same class or multiclass; and reviewing the contents of the document, recognizing different text patterns, such as a combination of keywords, synonyms or equivalent terms, within the contents of the document, through detection means.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201461941002P | 2014-02-18 | 2014-02-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
PE20161166A1 true PE20161166A1 (en) | 2016-10-26 |
Family
ID=53877689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PE2016001498A PE20161166A1 (en) | 2014-02-18 | 2015-02-18 | DOCUMENT CHARACTERIZATION METHOD |
Country Status (3)
Country | Link |
---|---|
CL (1) | CL2016002090A1 (en) |
PE (1) | PE20161166A1 (en) |
WO (1) | WO2015125088A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11017221B2 (en) | 2018-07-01 | 2021-05-25 | International Business Machines Corporation | Classifying digital documents in multi-document transactions based on embedded dates |
US11003889B2 (en) | 2018-10-22 | 2021-05-11 | International Business Machines Corporation | Classifying digital documents in multi-document transactions based on signatory role analysis |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6957384B2 (en) * | 2000-12-27 | 2005-10-18 | Tractmanager, Llc | Document management system |
US7284191B2 (en) * | 2001-08-13 | 2007-10-16 | Xerox Corporation | Meta-document management system with document identifiers |
JP2007233913A (en) * | 2006-03-03 | 2007-09-13 | Fuji Xerox Co Ltd | Image processing apparatus and program |
US8520979B2 (en) * | 2008-08-19 | 2013-08-27 | Digimarc Corporation | Methods and systems for content processing |
-
2015
- 2015-02-18 PE PE2016001498A patent/PE20161166A1/en not_active Application Discontinuation
- 2015-02-18 WO PCT/IB2015/051239 patent/WO2015125088A1/en active Application Filing
-
2016
- 2016-08-18 CL CL2016002090A patent/CL2016002090A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
CL2016002090A1 (en) | 2016-12-30 |
WO2015125088A1 (en) | 2015-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Luke et al. | Limits on lexical prediction during reading | |
CL2019003535A1 (en) | System and method for issuing a loan to a consumer who has been determined to be creditworthy. | |
MX2019001112A (en) | SYSTEM AND METHOD FOR THE IMPLEMENTATION OF CONTAINERS THAT EXTRACT AND APPLY KNOWLEDGE FROM THE SEMANTIC PAGE. | |
DOP2019000065A (en) | MULTILINGUAL CHARACTER ENTRY DEVICE | |
MX2017008583A (en) | Discriminating ambiguous expressions to enhance user experience. | |
ECSP18067575A (en) | SYSTEM AND METHOD FOR VERIFICATION OF AUTHENTICITY OF DOCUMENT INFORMATION | |
CO2017011036A2 (en) | Process and system to generate functional architecture documents and analysis specification and software design documents automatically | |
BR112018003372A2 (en) | method for providing staged shaving recommendations, computer program executable on a processing unit, personal care system, and shaving appliance | |
BR112017003650A2 (en) | keyboard input disambiguation | |
CO2019005833A2 (en) | Systems and methods to perform fingerprint-based user authentication using images captured using mobile devices | |
CO2017007037A2 (en) | Methods for understanding incomplete natural language query | |
WO2015200110A3 (en) | Techniques for machine language translation of text from an image based on non-textual context information from the image | |
GB2542288A (en) | Enhancing reading accuracy, efficiency and retention | |
MX2016014234A (en) | System and method for the creation and use of visually-diverse high-quality dynamic layouts. | |
PE20201181A1 (en) | PROCEDURE TO IDENTIFY AN OBJECT WITHIN AN IMAGE AND MOBILE DEVICE TO EXECUTE THE PROCEDURE | |
Gomaa et al. | Automatic scoring for answers to Arabic test questions | |
BR112017019015A2 (en) | system that facilitates the use of user-entered keywords to search for related clinical concepts, and method for facilitating the use of user-entered keywords to search for related clinical concepts | |
EP3038068A3 (en) | Barcode-based safety system and method | |
BR112016017972A8 (en) | METHOD FOR MODIFICATION OF COMMUNICATION FLOW | |
MX2022001419A (en) | CLUSTERING OF PAIRED SEGMENTS TO DETERMINE THE LINKAGE OF THE DATA SET IN A DATABASE. | |
MX2017007035A (en) | Method for text recognition and computer program product. | |
AR093815A1 (en) | METHOD AND DEVICES FOR CONVERSION OF FLAT BOOK TO ENRICHED BOOK IN ELECTRONIC READERS | |
AR113680A1 (en) | SYSTEMS AND METHODS TO IDENTIFY USERS BASED ON VOICE DATA AND MEDIA CONSUMPTION DATA | |
BR112019000188A2 (en) | computer-implemented, non-transient, computer-readable method and computer-implemented system | |
Araque et al. | Aspect Based Sentiment Analysis of Spanish Tweets. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FD | Application declared void or lapsed |