WO1997038394A1 - Procede d'evaluation automatique d'une adresse reportee sur un document apres avoir ete transformee en donnees numeriques - Google Patents
Procede d'evaluation automatique d'une adresse reportee sur un document apres avoir ete transformee en donnees numeriques Download PDFInfo
- Publication number
- WO1997038394A1 WO1997038394A1 PCT/DE1997/000554 DE9700554W WO9738394A1 WO 1997038394 A1 WO1997038394 A1 WO 1997038394A1 DE 9700554 W DE9700554 W DE 9700554W WO 9738394 A1 WO9738394 A1 WO 9738394A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- address
- pattern
- character string
- determined
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/26—Techniques for post-processing, e.g. correcting the recognition result
- G06V30/262—Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- a system is known with which, for. B. Business letter documents can be categorized and then forwarded in electronic or paper form, or can be stored in a targeted manner.
- the system contains a unit for layout segmentation of the document, a unit for optical text recognition, a unit for address recognition and a unit for content analysis and categorization.
- a mixed bot-up and top-down approach is used, which as the individual steps
- the address recognition is carried out with a unification-based parameter that works with an attributed context-free grammar for addresses. Parts of the text that are correctly parsed in the sense of the address grammar are accordingly addresses. The contents of the addresses are determined using equations of the grammar. The procedure is described in [2]. Information retrieval techniques for automatic indexing of texts are used for content analysis and categorization. The details are as follows:
- a new business letter is then categorized by comparing the index terms of this letter with the lists of significant words for all categories. Depending on the significance, the weights of the index terms contained in the letter are multiplied by a constant and summed up. By dividing this sum by the number of index terms in the letter, there is a probability for each class. The exact calculations result from [3].
- the result of the content analysis is then a list of hypotheses sorted according to probabilities.
- the runtime of the content analysis is specified between half a second and two seconds of CPU time with a maximum number of 75 index terms per letter.
- the object on which the invention is based is to specify a method by which the address recognition and address evaluation is improved. It is assumed that the address of the document already exists as digital data, which are then processed further. This object is achieved in accordance with the features of patent claim 1.
- the method according to the invention is based on the technique of approximate string matching.
- the method described by Bertossi et al in [4] is used, which compares a word with a pattern and calculates the number of confusions, omissions and insertions of letters in the word.
- the pattern is selected which most closely corresponds to the word w to be examined.
- a similarity or distance measure d is required for the two words, the pattern m and the word w to be examined.
- the absolute number of errors is not suitable for this, since the patterns can be of different lengths. This problem can be shown using examples:
- the reconstruction information of a letter is not a calculable measure. Therefore, according to the invention, the Markov entropy H-y- (N) is used as a model for this.
- ew is the number of errors in the word to be examined w.
- 1 shows a system with which the address is recognized and evaluated on a paper document
- 2 shows a more precise representation of the system for evaluating the address.
- a paper document Dok is scanned by a scanner SC and an image file BD is generated.
- an image file BD is generated.
- the part of the image which contains the address is segmented.
- the layout segmentation is designated SG in FIG. 1.
- the result is an image file that only contains the address part A-SG of the document.
- This image data of the address is converted into ASCII data using OCR.
- the address in ASCII data is named in Fig. 1 ADR.
- the ASCII address file ADR still contains errors, so that by comparing this address file with stored patterns it is often not possible to identify the addressee uniquely.
- the address recognition is designated ADR-E, a
- the file can contain the addressees assigned to these patterns, both of which can be contained in a memory.
- the address recognition ADR-E emits an address hypothesis for each pattern, which is called ADR-H and which represents the measure of the similarity.
- the technique of "approximate string matching" is used in the exemplary embodiment.
- the method described by Bertossi 14] is used, which compares a word with a pattern and the number of mix-ups, omissions and insertions of letters is calculated in accordance with Fig. 2 in the unit MA to which the address ADR in ASCII code and the pattern m are supplied one for every possible addressee Set of unique addressee names.
- the patterns m to m n are thus compared with the address ADR, that is, they are determined and a hypothesis ADR-H is formed for each pattern, that is, the most similar word (hypothesis) is determined in the address for each pattern.
- the distance measure ° -inf is used for each pair of pattern-hypotheses according to the above formula for the similarity of two
- Patterns and the corresponding addressees are stored in one unit (memory).
- the hypotheses for the individual patterns are contained in ADR-H, in the unit DIST the distance measurement is carried out for each
- the distance dimensions d ⁇ n fi are fed to a unit MIN for miniature calculation, which determines the minimum dinf ⁇ r- j and subjects it to a threshold value check SW.
- the threshold value check SW rejects an address as unassignable if d is above the threshold value, this is shown with rw, otherwise the addressee ADR-A corresponding to the pattern is issued.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Pour identifier et évaluer la suite de caractères contenue dans une adresse et pour affecter l'adresse à un destinataire, la suite de caractères de l'adresse est comparée à des modèles mémorisés qui contiennent une désignation d'adresses définie pour chaque destinataire. Le modèle retenu est celui qui se rapproche le plus de la suite de caractères. A cet effet, une mesure de déplacement est constituée qui définit la similitude entre l'adresse et le modèle et cette mesure de déplacement est examinée afin de voir si elle se situe au-dessus ou en dessous d'un seuil prédéterminé. Si la mesure de déplacement se situe en dessous du seuil prédéterminé, le destinataire associé à ce modèle est sorti.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP97916350A EP0891599A1 (fr) | 1996-04-03 | 1997-03-18 | Procede d'evaluation automatique d'une adresse reportee sur un document apres avoir ete transformee en donnees numeriques |
| JP9535727A JP2000508100A (ja) | 1996-04-03 | 1997-03-18 | 宛先をデジタルデータに変換した後で文書に記載されたこの宛先を自動的に評価するための方法 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| DE19613401.3 | 1996-04-03 | ||
| DE19613401 | 1996-04-03 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO1997038394A1 true WO1997038394A1 (fr) | 1997-10-16 |
Family
ID=7790414
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/DE1997/000554 Ceased WO1997038394A1 (fr) | 1996-04-03 | 1997-03-18 | Procede d'evaluation automatique d'une adresse reportee sur un document apres avoir ete transformee en donnees numeriques |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP0891599A1 (fr) |
| JP (1) | JP2000508100A (fr) |
| WO (1) | WO1997038394A1 (fr) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1843276A1 (fr) * | 2006-04-03 | 2007-10-10 | Océ-Technologies B.V. | Procédé de traitement automatisé des documents textes sur papier |
| US7436979B2 (en) | 2001-03-30 | 2008-10-14 | Siemens Energy & Automation | Method and system for image processing |
-
1997
- 1997-03-18 JP JP9535727A patent/JP2000508100A/ja active Pending
- 1997-03-18 EP EP97916350A patent/EP0891599A1/fr not_active Withdrawn
- 1997-03-18 WO PCT/DE1997/000554 patent/WO1997038394A1/fr not_active Ceased
Non-Patent Citations (4)
| Title |
|---|
| DOSTER W: "Contextual postprocessing system for cooperation with a multiple-choice character-recognition system", IEEE TRANSACTIONS ON COMPUTERS, NOV. 1977, USA, vol. C-26, no. 11, ISSN 0018-9340, pages 1090 - 1101, XP002035206 * |
| IMPEDOVO S ET AL: "Hand-written numeral recognition 'the organization degree measurement'", PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, MUNICH, WEST GERMANY, 19-22 OCT. 1982, 1982, NEW YORK, NY, USA, IEEE, USA, pages 40 - 43, vol.1, XP002035209 * |
| JUMARIE G: "New results in the information theory of patterns and forms", SYSTEMS ANALYSIS - MODELLING - SIMULATION, 1987, EAST GERMANY, vol. 4, no. 6, ISSN 0232-9298, pages 483 - 520, XP002035208 * |
| ROSENBAUM W S ET AL: "Multifont OCR postprocessing system", IBM JOURNAL OF RESEARCH AND DEVELOPMENT, JULY 1975, USA, vol. 19, no. 4, ISSN 0018-8646, pages 398 - 421, XP002035207 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7436979B2 (en) | 2001-03-30 | 2008-10-14 | Siemens Energy & Automation | Method and system for image processing |
| EP1843276A1 (fr) * | 2006-04-03 | 2007-10-10 | Océ-Technologies B.V. | Procédé de traitement automatisé des documents textes sur papier |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2000508100A (ja) | 2000-06-27 |
| EP0891599A1 (fr) | 1999-01-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| DE3889092T2 (de) | Optische Zeichenlesevorrichtung. | |
| EP1665132B1 (fr) | Procede et systeme de detection de donnees provenant de plusieurs documents lisibles par ordinateur | |
| DE69428590T2 (de) | Auf kombiniertem lexikon und zeichenreihenwahrscheinlichkeit basierte handschrifterkennung | |
| DE69636057T2 (de) | Sprecherverifizierungssystem | |
| DE69814104T2 (de) | Aufteilung von texten und identifizierung von themen | |
| DE69600461T2 (de) | System und Verfahren zur Bewertung der Abbildung eines Formulars | |
| DE60204005T2 (de) | Verfahren und einrichtung zur erkennung eines handschriftlichen musters | |
| DE69423692T2 (de) | Sprachkodiergerät und Verfahren unter Verwendung von Klassifikationsregeln | |
| DE2541204A1 (de) | Verfahren zur fehlererkennung und einrichtung zur durchfuehrung der verfahren | |
| DE19511470C1 (de) | Verfahren zur Ermittlung eines Referenzschriftzuges anhand einer Menge von schreiberidentischen Musterschriftzügen | |
| DE19705757A1 (de) | Verfahren und Gerät für das Design eines hoch-zuverlässigen Mustererkennungs-Systems | |
| DE19511472C1 (de) | Verfahren zur dynamischen Verifikation eines Schriftzuges anhand eines Referenzschriftzuges | |
| DE2513566A1 (de) | Binaere referenzmatrix | |
| DE3246631C2 (de) | Zeichenerkennungsvorrichtung | |
| DE19933984C2 (de) | Verfahren zur Bildung und/oder Aktualisierung von Wörterbüchern zum automatischen Adreßlesen | |
| EP0891599A1 (fr) | Procede d'evaluation automatique d'une adresse reportee sur un document apres avoir ete transformee en donnees numeriques | |
| EP1076896B1 (fr) | Procede et dispositif d'identification par un ordinateur d'au moins un mot de passe en langage parle | |
| DE69625649T2 (de) | Verfahren zur Überprüfung von Unterschriften | |
| EP2259210A2 (fr) | Procédé et dispositif destinés à l'analyse d'une base de données | |
| EP2273383A1 (fr) | Procédé et dispositif de recherche automatique de documents dans un dispositif de stockage de données | |
| DE69901324T2 (de) | Vorrichtung, Verfahren und Speichermedium zur Sprechererkennung | |
| EP0731955B1 (fr) | Procede et dispositif de saisie et d'identification automatique d'informations enregistrees | |
| Sorensen et al. | Black-White Differences in the Occurrence of Job Shifts. | |
| EP1758688A1 (fr) | Procede pour determiner automatiquement des donnees de puissance operationnelles | |
| DE102009013390A1 (de) | Verfahren und Vorrichtung zum Klassifizieren eines physikalischen Objekts mittels eines parametrierten Klassifikators |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP US |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 1997916350 Country of ref document: EP |
|
| WWP | Wipo information: published in national office |
Ref document number: 1997916350 Country of ref document: EP |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 1997916350 Country of ref document: EP |