[go: up one dir, main page]

PE20161166A1 - DOCUMENT CHARACTERIZATION METHOD - Google Patents

DOCUMENT CHARACTERIZATION METHOD

Info

Publication number
PE20161166A1
PE20161166A1 PE2016001498A PE2016001498A PE20161166A1 PE 20161166 A1 PE20161166 A1 PE 20161166A1 PE 2016001498 A PE2016001498 A PE 2016001498A PE 2016001498 A PE2016001498 A PE 2016001498A PE 20161166 A1 PE20161166 A1 PE 20161166A1
Authority
PE
Peru
Prior art keywords
document
class
multiclass
text
contents
Prior art date
Application number
PE2016001498A
Other languages
Spanish (es)
Inventor
Hargous Juan Ignacio Saa
Marin Jose Manuel Jimenez
Urrich Rodrigo Andres Sandoval
Original Assignee
Servicios Digitales Webdox Spa
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=53877689&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=PE20161166(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Servicios Digitales Webdox Spa filed Critical Servicios Digitales Webdox Spa
Publication of PE20161166A1 publication Critical patent/PE20161166A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Creation or modification of classes or clusters
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Character Discrimination (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La presente invencion describe un metodo de caracterizacion automatica de documento, el cual recibe un documento dado de entrada y sin estructurar y da como resultado la asignacion automatica de una o mas clases o categorias de documentos, con los cuales se relacionan los contenidos; la determinacion automatica de una lista de nombres de personas naturales o juridicas encontradas en el texto, la determinacion automatica de otra informacion relevante en el texto y la fecha de emision del documento, que se relaciona con la clase o multiclase del documento. Este metodo es mas rapido, mas completo y mas preciso que la caracterizacion manual o la descripcion manual realizada por personal tecnico legal. El metodo comprende los siguientes pasos o etapas: al recibir un documento digital, de un usuario conectado a una aplicacion Web, desde su propio computador, procesar dicho documento con una aplicacion de Reconocimiento Optico de Caracteres (OCR, por su sigla en ingles); ejecutar un proceso de caracterizacion automatica de documento dentro de la aplicacion y procesar el texto del documento recibido desde el usuario (sobre la base de reglas), asignar una clase o multiclase; reconocer nombres de personas y organizaciones en los contenidos del documento y en conformidad con el texto; reconocer y extraer informacion relevante en funcion de la definicion hecha de clase o multiclase y el conjunto de reglas definidas para esa clase o multiclase; reconocer fechas Relevantes en el documento y asignar un puntuacion a cada una de ellas segun la clase previamente definida, utilizar las reglas establecidas de la misma clase o multiclase; y revisar los contenidos del documento, reconociendo diferentes patrones de texto, tal como una combinacion de palabras claves, sinonimos o terminos equivalentes, dentro de los mismos contenidos del documento, a traves de medios de deteccion.The present invention describes a method of automatic document characterization, which receives a given input and unstructured document and results in the automatic assignment of one or more classes or categories of documents, with which the contents are related; the automatic determination of a list of names of natural or legal persons found in the text, the automatic determination of other relevant information in the text and the date of issuance of the document, which is related to the class or multiclass of the document. This method is faster, more complete and more accurate than manual characterization or manual description performed by legal technical personnel. The method comprises the following steps or stages: upon receiving a digital document from a user connected to a Web application from their own computer, processing said document with an Optical Character Recognition (OCR) application; executing an automatic document characterization process within the application and processing the text of the document received from the user (based on rules), assigning a class or multiclass; recognize names of people and organizations in the contents of the document and in accordance with the text; recognize and extract relevant information based on the definition made of class or multiclass and the set of rules defined for that class or multiclass; recognize Relevant dates in the document and assign a score to each of them according to the previously defined class, use the established rules of the same class or multiclass; and reviewing the contents of the document, recognizing different text patterns, such as a combination of keywords, synonyms or equivalent terms, within the contents of the document, through detection means.

PE2016001498A 2014-02-18 2015-02-18 DOCUMENT CHARACTERIZATION METHOD PE20161166A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US201461941002P 2014-02-18 2014-02-18

Publications (1)

Publication Number Publication Date
PE20161166A1 true PE20161166A1 (en) 2016-10-26

Family

ID=53877689

Family Applications (1)

Application Number Title Priority Date Filing Date
PE2016001498A PE20161166A1 (en) 2014-02-18 2015-02-18 DOCUMENT CHARACTERIZATION METHOD

Country Status (3)

Country Link
CL (1) CL2016002090A1 (en)
PE (1) PE20161166A1 (en)
WO (1) WO2015125088A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11017221B2 (en) 2018-07-01 2021-05-25 International Business Machines Corporation Classifying digital documents in multi-document transactions based on embedded dates
US11003889B2 (en) 2018-10-22 2021-05-11 International Business Machines Corporation Classifying digital documents in multi-document transactions based on signatory role analysis

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6957384B2 (en) * 2000-12-27 2005-10-18 Tractmanager, Llc Document management system
US7284191B2 (en) * 2001-08-13 2007-10-16 Xerox Corporation Meta-document management system with document identifiers
JP2007233913A (en) * 2006-03-03 2007-09-13 Fuji Xerox Co Ltd Image processing apparatus and program
US8520979B2 (en) * 2008-08-19 2013-08-27 Digimarc Corporation Methods and systems for content processing

Also Published As

Publication number Publication date
CL2016002090A1 (en) 2016-12-30
WO2015125088A1 (en) 2015-08-27

Similar Documents

Publication Publication Date Title
Luke et al. Limits on lexical prediction during reading
CL2019003535A1 (en) System and method for issuing a loan to a consumer who has been determined to be creditworthy.
MX2019001112A (en) SYSTEM AND METHOD FOR THE IMPLEMENTATION OF CONTAINERS THAT EXTRACT AND APPLY KNOWLEDGE FROM THE SEMANTIC PAGE.
DOP2019000065A (en) MULTILINGUAL CHARACTER ENTRY DEVICE
MX2017008583A (en) Discriminating ambiguous expressions to enhance user experience.
ECSP18067575A (en) SYSTEM AND METHOD FOR VERIFICATION OF AUTHENTICITY OF DOCUMENT INFORMATION
CO2017011036A2 (en) Process and system to generate functional architecture documents and analysis specification and software design documents automatically
BR112018003372A2 (en) method for providing staged shaving recommendations, computer program executable on a processing unit, personal care system, and shaving appliance
BR112017003650A2 (en) keyboard input disambiguation
CO2019005833A2 (en) Systems and methods to perform fingerprint-based user authentication using images captured using mobile devices
CO2017007037A2 (en) Methods for understanding incomplete natural language query
WO2015200110A3 (en) Techniques for machine language translation of text from an image based on non-textual context information from the image
GB2542288A (en) Enhancing reading accuracy, efficiency and retention
MX2016014234A (en) System and method for the creation and use of visually-diverse high-quality dynamic layouts.
PE20201181A1 (en) PROCEDURE TO IDENTIFY AN OBJECT WITHIN AN IMAGE AND MOBILE DEVICE TO EXECUTE THE PROCEDURE
Gomaa et al. Automatic scoring for answers to Arabic test questions
BR112017019015A2 (en) system that facilitates the use of user-entered keywords to search for related clinical concepts, and method for facilitating the use of user-entered keywords to search for related clinical concepts
EP3038068A3 (en) Barcode-based safety system and method
BR112016017972A8 (en) METHOD FOR MODIFICATION OF COMMUNICATION FLOW
MX2022001419A (en) CLUSTERING OF PAIRED SEGMENTS TO DETERMINE THE LINKAGE OF THE DATA SET IN A DATABASE.
MX2017007035A (en) Method for text recognition and computer program product.
AR093815A1 (en) METHOD AND DEVICES FOR CONVERSION OF FLAT BOOK TO ENRICHED BOOK IN ELECTRONIC READERS
AR113680A1 (en) SYSTEMS AND METHODS TO IDENTIFY USERS BASED ON VOICE DATA AND MEDIA CONSUMPTION DATA
BR112019000188A2 (en) computer-implemented, non-transient, computer-readable method and computer-implemented system
Araque et al. Aspect Based Sentiment Analysis of Spanish Tweets.

Legal Events

Date Code Title Description
FD Application declared void or lapsed