DE102023127343A1

DE102023127343A1 - Method for creating a database for document classes and methods for scanning and processing a document and a device for carrying out these methods

Info

Publication number: DE102023127343A1
Application number: DE102023127343.4A
Authority: DE
Inventors: Sebastian Georgi; Arnold Allweier; Timo Eckhard
Original assignee: Chromasens GmbH
Current assignee: Chromasens GmbH
Priority date: 2023-10-06
Filing date: 2023-10-06
Publication date: 2025-04-10
Also published as: WO2025073816A1

Abstract

Die Erfindung betrifft Verfahren und eine Vorrichtung, die die Teil- oder vollautomatische Verarbeitung von Dokumenten erlauben, selbst wenn diese geknickt oder gefaltet waren und somit nicht mehr glatt sind. Abtastbilder, die durch Abtasten der Dokumente erzeugt werden, können automatisch Dokumentenklassen zugeordnet werden und dementsprechend klassenspezifisch verarbeitet und insbesondere ausgelesen werden. Dies erlaubt insbesondere die Verarbeitung von maschinengedruckten Dokumenten, in welche Personen handschriftlichen Text eingefügt haben.

The invention relates to methods and a device that allow the partially or fully automatic processing of documents, even if they were bent or folded and are therefore no longer smooth. Scanned images generated by scanning the documents can be automatically assigned to document classes and processed and, in particular, read out accordingly in a class-specific manner. This allows, in particular, the processing of machine-printed documents into which people have inserted handwritten text.

Description

Die Erfindung betrifft ein Verfahren zum Erstellen einer Datenbank für Dokumentenklassen und ein Verfahren zum Abtasten und Verarbeiten eines Dokumentes sowie eine Vorrichtung zum Ausführen dieser Verfahren.The invention relates to a method for creating a database for document classes and a method for scanning and processing a document as well as a device for carrying out these methods.

Es gibt Verfahren zum Erkennen ähnlicher Bilder auf Basis von Merkmalen. Es sind sogenannte „Content Based Image Retrieval (CBIR)“ - Systeme bekannt, die ähnliche Bilder in großen Bilddatenbanken finden. Solche CBIR-Verfahren gehen beispielsweise aus X. Wangming et. al. (Dezember 2008), „Application of Image SIFT Features to the Context of CBIR“, in „2008 International Conference on Computer Science and Software Engineering“ (Ausgabe 4, Seiten 552-555 ), IEEE, und aus K. R. Reddy et. al. (2016), „A Comparative Study of SIFT and PCA for Content-Based Image Retrieval, Inter. Refereed J. Ing. Sci. (IRJES) 5 (11), 12-19 hervor. Bei solchen CBIR-Systemen gibt es oft Fehlzuordnungen, die jedoch nicht problematisch sind, da diese Systeme darauf ausgelegt sind, im Zweifel auch mehrere passende Bilder zurückzugeben.There are methods for recognizing similar images based on features. So-called "Content Based Image Retrieval (CBIR)" systems are known that find similar images in large image databases. Such CBIR methods, for example, are based on X. Wangming et. al. (December 2008), “Application of Image SIFT Features to the Context of CBIR,” in “2008 International Conference on Computer Science and Software Engineering” (Issue 4, pages 552-555 ), IEEE, and from KR Reddy et. al. (2016), "A Comparative Study of SIFT and PCA for Content-Based Image Retrieval, Inter. Refereed J. Ing. Sci. (IRJES) 5 (11), 12-19 Such CBIR systems often result in mismatches, but these are not problematic, as these systems are designed to return multiple matching images in case of doubt.

Verfahren zum Entzerren von Bildern verwenden manchmal eine Merkmals-Zuordnung (englisch: Feature Matching). Bei einer Merkmals-Zuordnung werden grundsätzlich Merkmale in zwei Bildern, welche einen zumindest bereichsweise gleichen Inhalt zeigen sollen, extrahiert und einander zugeordnet. Ist eines der beiden Bilder ein Referenzbild, dann können aufgrund dieser Zuordnung der Merkmale die Bildpunkte des Bildes, das nicht das Referenzbild ist, derart verschoben werden, dass sie genauso oder in einer ähnliche Anordnung wie im Referenzbild angeordnet sind. Ist das Referenzbild nicht verzerrt, dann ist nach dieser Verschiebung der Bildpunkte das Nicht-Referenzbild entzerrt. Das Zuordnen von Merkmalen geht beispielsweise aus R. C. Gonzalez „Digital Image Processing“, 4. Ausgabe Pearson, Seite 915, 916 hervor.Methods for correcting image distortion sometimes use feature matching. In feature matching, features are extracted from two images that are intended to show at least some of the same content and then matched to one another. If one of the two images is a reference image, then based on this match of features, the pixels of the image that is not the reference image can be shifted so that they are arranged in the same way or in a similar order to that in the reference image. If the reference image is not distorted, then after this shift of the pixels the non-reference image is corrected. The matching of features is described, for example, in R. C. Gonzalez "Digital Image Processing", 4th edition Pearson, pages 915, 916.

Eine solche Merkmals-Zuordnung kann in der Bildverarbeitung für unterschiedlichste Anwendungen eingesetzt werden. Eine Sie kann zum Beispiel zum Bilder-Stitching oder zum Stabilisieren von Videosignalen verwendet werden. Sie kann auch verwendet werden, um bestimmte Objekte in einem Videosignal zu verfolgen oder 3D-Modelle aus mehreren Bildern zu erzeugen. Beim Entzerren eines Eingangsbildes bezüglich eines Referenzbildes werden aus den paarweise zugeordneten Merkmalen Bestimmlokale Verschiebungen bestimmt, welche sowohl lineare Verschiebungen als auch Rotationen umfassen können. Diese Verschiebungen werden über den gesamten Bildbereich interpoliert, sodass ein „dichtes Verschiebungsfeld“ für jeden Bildpunkt entsteht, wie es beispielsweise in Lee Seungyong, Georg Wolberg and Sung Yong Shin „Scattered Data Interpolation with Multilevel B-Splines" IEEE Transactions on Visualisation and Computer Graphics 3.3 (1997); Seiten 228 bis 244 beschrieben ist.Such feature mapping can be used for a wide variety of applications in image processing. For example, it can be used for image stitching or for stabilizing video signals. It can also be used to track specific objects in a video signal or to generate 3D models from multiple images. When rectifying an input image with respect to a reference image, local displacements are determined from the paired features, which can include both linear displacements and rotations. These displacements are interpolated across the entire image domain, creating a "dense displacement field" for each pixel, as described, for example, in Lee Seungyong, Georg Wolberg, and Sung Yong Shin, "Scattered Data Interpolation with Multilevel B-Splines," IEEE Transactions on Visualization and Computer Graphics 3.3 (1997), pages 228 to 244.

Aus der US 6, 711, 293 B1 ist ein Verfahren bekannt, mit welchem skalierungsinvariante Merkmale (SIFT- Merkmale: Scale Invariant Feature Transform) extrahiert werden können. Solche Merkmale haben den Vorteil, dass sie beispielsweise aus unterschiedlichen Blickrichtungen einer Kamera gleichermaßen in den hierbei abgetasteten Bildern vorhanden sind. Diese Merkmale können somit in Bildern aus unterschiedlichen Blickrichtungen einander korrekt zugeordnet werden.From the US 6, 711, 293 B1 A method is known with which scale-invariant features (SIFT features: Scale Invariant Feature Transform) can be extracted. Such features have the advantage that they are equally present in the scanned images from different viewing directions of a camera, for example. These features can thus be correctly assigned to one another in images from different viewing directions.

Es gibt noch weitere Typen von Merkmalen, wie zum Beispiel ORB-Merkmale oder SURF-Merkmale ( Rublee Ethan et.al. „ORB: An Efficient Alternative to SIFT or SURF“ 2011 International Conference on Computer Vision, IEEE 2011 ).There are other types of features, such as ORB features or SURF features ( Rublee Ethan et al. “ORB: An Efficient Alternative to SIFT or SURF” 2011 International Conference on Computer Vision, IEEE 2011 ).

Aus der EP 1 594 078 B1 gehen ein System und ein Verfahren hervor, mit welchen in jedem Bild Punkte von Interesse (POIs) bei unterschiedlichen Auflösungen identifiziert werden. Ein Punkt von Interesse ist ein Punkt, dessen Position innerhalb eines Bildes durch mindestens eine Eigenschaft definiert ist, die den Bildpunkten in einer Nachbarschaft von Bildpunkten einer vordefinierten Größe um den Punkt herum zuzuschreiben ist. Darüber hinaus ist jeder Punkt von Interesse ein Punkt, dem eine eindeutige Ausrichtung zugewiesen werden kann, die auf mindestens einer Eigenschaft beruht, die den Bildpunkten zuzuordnen ist. Dies kann eine dem Punkt zurechenbare Eigenschaft sein, die zur Definition der Punktposition oder einer anderen Eigenschaft in einer Nachbarschaft um den Punkt herum dient. Sobald die Punkte von Interesse identifiziert sind, wird für jeden Punkt von Interesse ein Deskriptor erstellt. Dieser Deskriptor charakterisiert den Punkt auf eine Art und Weise, die im Wesentlichen invariant gegenüber Änderungen der Bildlage, der Orientierung und des Maßstabes sowie gegenüber Änderungen der Bildgröße ist. Weiterhin ist der Deskriptor im Wesentlichen unveränderlich gegenüber einer Skalierung des Bildes sowie gegenüber Änderungen der Intensität der Bildpunkte in der Region um den Punkt von Interesse. Diese Deskriptoren werden auch als Merkmalsvektoren bezeichnet, wobei ein Merkmal durch den Merkmalsvektor, die zu gehörigen Koordinaten und eine Hauptrichtung definiert ist.From the EP 1 594 078 B1 A system and method are disclosed for identifying points of interest (POIs) in each image at different resolutions. A point of interest is a point whose position within an image is defined by at least one property attributable to the pixels in a neighborhood of pixels of a predefined size around the point. Furthermore, each point of interest is a point that can be assigned a unique orientation based on at least one property attributable to the pixels. This can be a property attributable to the point that defines the point position or another property in a neighborhood around the point. Once the points of interest have been identified, a descriptor is created for each point of interest. This descriptor characterizes the point in a manner that is essentially invariant to changes in image position, orientation, and scale, as well as to changes in image size. Furthermore, the descriptor is essentially invariant to scaling of the image and to changes in the intensity of the pixels in the region around the point of interest. These descriptors are also called feature vectors, where a feature is defined by the feature vector, the corresponding coordinates and a principal direction.

Der vorliegenden Erfindung liegt die Aufgabe zugrunde, ein Verfahren zum Erstellen einer Datenbank für Dokumentenklassen, ein Verfahren zum Abtasten und Verarbeiten eines Dokumentes sowie eine Vorrichtung zum Ausführen dieser Verfahren zu schaffen, mit welcher eine zuverlässige Zuordnung von optisch abgetasteten Dokumenten zu entsprechenden Dokumentenklassen möglich ist. Insbesondere soll die Zuordnung möglichst eindeutig sein. Die Vorrichtung soll auch einfach bedienbar sein.The present invention is based on the object of creating a method for creating a database for document classes, a method for scanning and processing a document and a device for carrying out these methods, with which a reliable assignment It is possible to assign optically scanned documents to corresponding document classes. In particular, the assignment should be as clear as possible. The device should also be easy to use.

Nach einem ersten Aspekt betrifft die vorliegende Erfindung ein Verfahren zum Erstellen einer Datenbank für Dokumentenklassen, wobei eine jede Dokumentenklasse durch mehrere Merkmale definiert ist. Bei diesem Verfahren werden mehrere Exemplare eines zu klassifizierenden Dokumententyps mit einer Abtastvorrichtung abgetastet, wobei von jedem Exemplar des Dokumententyps (= Dokument) ein Abtastbild erzeugt wird. Jedes Abtastbild wird zu einem digitalen Referenzbild des zu klassifizierenden Dokumententyps ausgerichtet. Die derart ausgerichteten Abtastbilder werden einander überlagert, wobei die einzelnen Bildpunkte gemittelt werden, sodass ein Prototypbild erzeugt wird. Aus dem Prototypbild werden erneut Merkmale bestimmt, welche in der Datenbank als die die Dokumentenklasse definierenden Datenbankmerkmale eingetragen werden.According to a first aspect, the present invention relates to a method for creating a database for document classes, wherein each document class is defined by a plurality of features. In this method, a plurality of copies of a document type to be classified are scanned with a scanning device, wherein a scan image is generated for each copy of the document type (= document). Each scan image is aligned with a digital reference image of the document type to be classified. The scan images aligned in this way are superimposed on one another, wherein the individual pixels are averaged, thus generating a prototype image. From the prototype image, features are again determined, which are entered in the database as the database features defining the document class.

Mit dem Verfahren werden mehrere Exemplare eines Dokumententyps, also mehrere Dokumente, abgetastet und die hierdurch erzeugten Abtastbilder werden zueinander ausgerichtet, überlagert und gemittelt. Hierdurch ergibt sich ein Prototypbild, das prototypisch für den Dokumententyp ist. Da das Prototypbild durch optisches Abtasten mit einer Abtastvorrichtung erzeugt worden ist, gibt das Prototypbild den Dokumententyp so wieder, wie er von der Abtastvorrichtung wahrgenommen wird. Mit anderen Worten bedeutet dies, dass das Prototypbild auch durch das Abtastverfahren und die Abtastvorrichtung bedingte Eigenschaften aufweist und so nicht exakt mit dem digitalen Referenzbild übereinstimmen muss. Ein solches Prototypbild ist einem mit der Abtastvorrichtung abgetasteten Bild eines solchen Dokuments damit ähnlicher als das digitale Referenzbild. Die aus dem Prototypbild extrahierten Merkmale sind somit für die Abtastbilder, die mit der Abtastvorrichtung von diesen Dokumenten erzeugt werden, spezifischer als die entsprechenden Merkmale, welche alleine aus dem digitalen Referenzbild erzeugt werden. Hierdurch wird eine Datenbank gebildet, die die einzelnen Dokumentklassen sehr zuverlässig mit Datenbankmerkmalen definiert.The method involves scanning multiple copies of a document type, i.e., multiple documents, and aligning, superimposing, and average the resulting scanned images. This produces a prototype image that is prototypical for the document type. Since the prototype image was generated by optical scanning with a scanning device, the prototype image reproduces the document type as perceived by the scanning device. In other words, this means that the prototype image also exhibits properties determined by the scanning method and the scanning device and thus does not have to exactly match the digital reference image. Such a prototype image is therefore more similar to an image of such a document scanned with the scanning device than to the digital reference image. The features extracted from the prototype image are therefore more specific to the scanned images generated from these documents with the scanning device than the corresponding features generated from the digital reference image alone. This creates a database that very reliably defines the individual document classes using database features.

Möchte man Dokumente automatisch bearbeiten, dann sollten die Dokumente eindeutig einer Dokumentenklasse zugeordnet werden können, denn nur so ist sichergestellt, dass der Inhalt der Dokumente zuverlässig und korrekt automatisch extrahiert werden kann. Der Zweck einer Datenbank für Dokumentenklassen ist es, Dokumente zu identifizieren und der Dokumentenklasse zuzuordnen. Zu den einzelnen Dokumentenklassen können weitere Informationen vorliegen, welche das Extrahieren der in den Dokumente enthaltenen Informationen erleichtern. Diese weiteren Informationen beschreiben z.B. Felder mit Ankreuzkästchen, Felder mit maschinenlesbaren Texten oder Zahlen. Dies funktioniert jedoch nur zuverlässig, wenn ein bestimmtes Dokument korrekt einer Dokumentenklasse zugeordnet wird. Bei einer Fehlzuordnung würden Anweisungen zum Extrahieren aus einer nicht korrekten Dokumentenklasse verwendet werden, welche dann für das jeweilige Dokument nicht geeignet sind. Die vom Prototypbild abgeleiteten Merkmale erlauben eine solche eindeutige Zuordnung eines Dokumentes zu einer bestimmten Dokumentenklasse.If you want to process documents automatically, then the documents should be clearly assigned to a document class, as this is the only way to ensure that the document content can be reliably and correctly extracted automatically. The purpose of a database for document classes is to identify documents and assign them to the document class. Additional information may be available for the individual document classes, which makes it easier to extract the information contained in the documents. This additional information describes, for example, fields with checkboxes, fields with machine-readable text or numbers. However, this only works reliably if a specific document is correctly assigned to a document class. In the event of an incorrect assignment, instructions for extracting from an incorrect document class would be used, which would then be unsuitable for the respective document. The features derived from the prototype image allow such a clear assignment of a document to a specific document class.

Vorzugsweise wird zum Erstellen einer bestimmten Datenbankjeweils ein Typ von Abtastvorrichtung oder eine einzige bestimmte Abtastvorrichtung verwendet. Hierdurch sind die Datenbankmerkmale spezifisch für den Typ von Abtastvorrichtung bzw. für die einzige bestimmte Abtastvorrichtung. Abtastvorrichtungen zum optischen Abtasten von Dokumenten können sich erheblich unterscheiden, sodass entsprechend unterschiedliche Abtastbilder hierdurch erzeugt werden, selbst wenn das gleiche Dokument mit unterschiedlichen Abtastvorrichtungen abgetastet wird. Die Unterschiede können in der Farb- bzw. Helligkeitsempfindlichkeit und in der Schärfe, mit welcher die einzelnen Bildpunkt erfasst werden liegen. Bei Verwendung eines einzigen Typs von Abtastvorrichtung oder Verwendung einer einzigen bestimmten Abtastvorrichtung werden solche Abweichungen reduziert bzw. ausgeschlossen und die Dokumente können zuverlässig erfasst werden.Preferably, one type of scanning device or a single specific scanning device is used to create a specific database. This means that the database characteristics are specific to the type of scanning device or to the single specific scanning device. Scanning devices for optically scanning documents can differ considerably, so that correspondingly different scanned images are produced, even if the same document is scanned with different scanning devices. The differences can lie in the color or brightness sensitivity and in the sharpness with which the individual pixels are captured. By using a single type of scanning device or a single specific scanning device, such deviations are reduced or eliminated and the documents can be captured reliably.

Das Ausrichten eines jeden Abtastbildes kann mittels einer Merkmalszuordnung oder mittels anderer Verfahren, wie zum Beispiel einem Verfahren nach dem optischen Fluss (englisch: Optical Flow) ausgeführt werden Dadurch, dass die einzelnen Abtastbilder zum digitalen Referenzbild ausgerichtet werden, können sie grundsätzlich in einer beliebigen Position in der Abtastvorrichtung angeordnet sein. Durch das automatische Ausrichten werden die Abtastbilder in Deckung mit dem digitalen Referenzbild gebracht. Das Ausrichten mittels einer Merkmalszuordnung erlaubt eine sehr präzise Ausrichtung der Abtastbilder zum digitalen Referenzbild.The alignment of each scanned image can be performed using feature mapping or other methods, such as an optical flow method. Because the individual scanned images are aligned to the digital reference image, they can essentially be arranged in any position within the scanning device. Automatic alignment brings the scanned images into register with the digital reference image. Alignment using feature mapping allows for very precise alignment of the scanned images to the digital reference image.

Die Datenbankmerkmale können jeweils einen Merkmalsvektor, der das jeweilige Merkmal beschreibt, und Koordinaten umfassen, die den Ort des jeweiligen Merkmals definieren, wobei der Merkmalsvektor aus dem Prototypbild und die entsprechenden Koordinaten aus dem Referenzbild gewonnen werden. Dadurch, dass die Merkmale aus dem Prototypbild gewonnen werden, beschreiben sie die einzelnen Merkmale, wie sie die jeweilige Abtastvorrichtung erkennt bzw. sieht. Durch die Zuordnung der Koordinaten aus dem Referenzbild zu den jeweiligen Merkmalen werden hingegen die exakten Koordinaten des Referenzbildes verwendet, da die entsprechenden Koordinaten der Merkmale im Prototypbild aufgrund der Mittelung der Vielzahl von Bildern etwas verschwommen bzw. unpräzise sein können. Hierdurch wird eine optimale Kombination an Information in den Merkmalen erhalten, wodurch die Qualität der Merkmale sehr hoch ist.The database features can each comprise a feature vector describing the respective feature and coordinates defining the location of the respective feature, with the feature vector being obtained from the prototype image and the corresponding coordinates from the reference image. By obtaining the features from the prototype image, they describe the individual features as they are recognized or seen by the respective scanning device. By assigning the However, the exact coordinates of the reference image are used to determine the coordinates of the respective features, since the corresponding coordinates of the features in the prototype image may be somewhat blurred or imprecise due to the averaging of the multiple images. This results in an optimal combination of information in the features, resulting in a very high quality of the features.

Bei der Mittelung der Bildpunkte der mehreren zu dem Referenzbild ausgerichteten Abtastbilder werden die Bildpunkte der überlagerten Abtastbilder mit den gleichen Koordinaten gemittelt. Mit dem Ausdruck „Mittelung der Bildpunkte“ ist gemeint, dass Farbwerte und/oder Helligkeitswerte der jeweiligen Bildpunkte einer Mittelung unterzogen werden. Die Mittelung kann mittels einer gleichgewichteten Mittelung aller Bildpunkte, die dieselben Koordinaten aufweisen, ausgeführt werden. Die einzelnen Bildpunkte können auch unterschiedlich gewichtet sein. So kann es bspw. zweckmäßig sein, wenn die Verteilung der Farbwerte und/oder Helligkeitswerte der Bildpunkte analysisiert wird und Werte, die stark vom Mittelwert abweichen, ausgeschlossen werden.. Die Mittelung kann auch durch Bestimmung eines Medianwertes der Bildpunkte mit den gleichen Koordinaten in den überlagerten Abtastbildern erfolgen.When averaging the pixels of the multiple scans aligned to the reference image, the pixels of the superimposed scans with the same coordinates are averaged. The term "pixel averaging" means that the color values and/or brightness values of the respective pixels are averaged. Averaging can be carried out by equally weighting all pixels that have the same coordinates. The individual pixels can also be weighted differently. For example, it can be useful to analyze the distribution of the color values and/or brightness values of the pixels and exclude values that deviate significantly from the mean. Averaging can also be carried out by determining a median value for the pixels with the same coordinates in the superimposed scans.

Vorzugsweise werden bestimmte vorab definierte Bereiche der ausgerichteten Abtastbilder nicht berücksichtigt. Dies sind vor allem Bereiche mit in den jeweiligen Exemplaren des Dokumententyps unterschiedlichem Inhalt. Typische Beispiele für solche Bereiche sind Felder, in welche Nutzer des Dokuments bestimmte individuelle Informationen, wie zum Beispiel ihren Namen oder ihre Adresse einfügen sollen. Die Schriftzeichen, die diese Informationen darstellen, unterscheiden sich von Dokument zu Dokument. Es macht wenig Sinn, diese Schriftzeichen als ein die Dokumentklasse klassifizierendes Merkmal zu verwenden. Dieses Nicht-Berücksichtigen bedeutet, dass zumindest in diesen Bereichen keine Merkmale extrahiert werden.Preferably, certain predefined areas of the aligned scan images are excluded. These are primarily areas with different content in the respective copies of the document type. Typical examples of such areas are fields in which document users are asked to enter certain individual information, such as their name or address. The characters that represent this information differ from document to document. It makes little sense to use these characters as a feature classifying the document class. This exclusion means that, at least in these areas, no features are extracted.

Bestimmten vorab definierten Bereichen können Bedeutungen zugeordnet sein. Diese Bedeutungen können in einer Datenbank hinterlegt sein und beim Auswerten des Inhalts des Dokumentes verwendet werden, indem den in diesen Bereichen enthaltene Information die jeweilige Bedeutung zugeordnet wird oder diese Information im Wissen deren Bedeutung ausgewertet wird.Meanings can be assigned to certain predefined areas. These meanings can be stored in a database and used when evaluating the content of the document by assigning the respective meaning to the information contained in these areas or by evaluating this information with knowledge of its meaning.

Die „bestimmten Bereiche“ die nicht berücksichtigt werden und denen bestimmte Bedeutungen zugeordnet werden, können die selben Bereiche sein. Sie können sich aber auch unterscheiden oder nur zum Teil die selben Bereiche sein.The "specific areas" that are not considered and to which specific meanings are assigned may be the same areas. But they may also be different or only partially the same areas.

Die Nicht-Berücksichtigung dieser Bereiche kann bereits vor der Überlagerung z.B. durch Ausschneiden der entsprechenden Bereiche in den noch nicht ausgerichteten Abtastbildern erfolgen. Genauso ist es möglich, diese Bereiche erst dann auszuschneiden, wenn die einzelnen Abtastbilder einander überlagert sind. Das „Ausschneiden“ dieser Bereiche erfolgt beispielsweise durch Zuordnen eines vorbestimmten Farbwertes zu diesen Bildpunkten, wie zum Beispiel eines Farbwertes, der die Farbe Weiß, oder eines Farbwertes, der die Farbe Schwarz darstellt.These areas can be excluded before the overlay, for example, by cutting out the corresponding areas in the not-yet-aligned scan images. It is also possible to cut out these areas only once the individual scan images are superimposed. These areas can be "cut out," for example, by assigning a predetermined color value to these pixels, such as a color value representing the color white or a color value representing the color black.

Zum Ausrichten der Abtastbilder zu dem digitalen Referenzbild können folgende Schritte jeweils ausgeführt werden:

- Entzerren und Ausrichten des jeweiligen Abtastbildes mittels einer Merkmalszuordnung (= erste Merkmalszuordnung), wobei Merkmale des Abtastbildes, welche jeweils einen Merkmalsvektor umfassen, zu korrespondierenden Merkmalen des Referenzbildes des Dokuments zugeordnet werden und entsprechend dieser Merkmalszuordnung eine Homographie-Matrix ermittelt wird, mit der alle Bildpunkte des jeweiligen Abtastbildes zur Ausbildung eines Homographie-Bildes abgebildet werden,
- Hinzufügen der Koordinaten der Merkmale im Homographie-Bild zu den jeweiligen Merkmalsvektoren und
- Hinzufügen der Koordinaten der Merkmale im Referenzbild zu den jeweiligen Merkmalsvektoren und,
- erneutes Zuordnen der Merkmale des Homographie-Bildes zu korrespondierenden Merkmalen des Referenzbildes des Dokuments (= zweite Merkmalszuordnung), wobei bei der Zuordnung die den Merkmalsvektoren zugeordneten Koordinaten berücksichtigt werden,
- Entzerren des Homographie-Bildes nach Maßgabe der Zuordnung der Merkmale des Homographie-Bildes zu den Merkmalen des Referenzbildes.

To align the scanned images to the digital reference image, the following steps can be performed:

- Rectifying and aligning the respective scanned image by means of a feature assignment (= first feature assignment), whereby features of the scanned image, each comprising a feature vector, are assigned to corresponding features of the reference image of the document and, in accordance with this feature assignment, a homography matrix is determined with which all pixels of the respective scanned image are mapped to form a homography image,
- Adding the coordinates of the features in the homography image to the respective feature vectors and
- Adding the coordinates of the features in the reference image to the respective feature vectors and,
- re-assigning the features of the homography image to corresponding features of the reference image of the document (= second feature assignment), whereby the coordinates assigned to the feature vectors are taken into account during the assignment,
- Rectification of the homography image according to the assignment of the features of the homography image to the features of the reference image.

Die aufgrund der ersten Merkmalszuordnung erzeugte Homographie-Matrix erlaubt eine globale perspektivische Verzerrungskorrektur. Das Abtastbild wird mittels der Homographie-Matrix durch Rotation, Translation und einer globalen perspektivischen Entzerrung korrigiert. Hierdurch kann das Dokument an sich beliebig bezüglich der optischen Abtastvorrichtung, mit welcher das optische Abtasten ausgeführt wird, angeordnet werden. Durch die Homographie-Abbildung wird das Abtastbild soweit gedreht und verschoben, dass es bezüglich der Ausrichtung im Wesentlichen mit der Ausrichtung des digitalen Referenzbildes übereinstimmt. Im Homographie-Bild sind Verzerrungen, welche lokal unterschiedlich ausgeprägt sind, wie sie beispielsweise durch Knicken und Falten des Dokuments entstehen, nicht korrigiert. Eine solche Homographie-Abbildung erlaubt keine Korrektur von lokal unterschiedlichen Verzerrungen.The homography matrix generated from the first feature assignment allows for global perspective distortion correction. The scanned image is corrected using the homography matrix through rotation, translation, and global perspective distortion correction. This allows the document to be positioned arbitrarily with respect to the optical scanning device used for optical scanning. Homography mapping rotates and shifts the scanned image until its orientation essentially matches that of the digital reference image. Distortions that vary locally are included in the homography image. pronounced distortions, such as those caused by bending and folding of the document, are not corrected. Such homography mapping does not allow for the correction of locally varying distortions.

Bei der zweiten Merkmalszuordnung durch die Zuordnung der Merkmale des Homographie-Bildes zu den Merkmalen des Referenzbildes werden sowohl die Koordinaten der Merkmale des Homographie-Bildes als auch die Koordinaten der Merkmale des Referenzbildes mit berücksichtigt. Bei einer solchen Zuordnung wird der Abstand der entsprechenden Merkmalsvektoren bestimmt. Je geringer der Abstand ist, desto besser gilt die Zuordnung. Als Merkmalspaare der Merkmale des Homographie-Bildes und der Merkmale des Referenzbildes werden somit diejenigen Merkmalspaare ausgewählt, die den geringsten Abstand besitzen. Durch Berücksichtigung der Koordinaten der Merkmale ist somit der Abstand der Merkmale des Homographie-Bildes zu den entsprechenden Merkmalen im Referenzbild gering, wenn die entsprechenden Merkmale in den beiden Bildern jeweils an der gleichen oder an einer ähnlichen Position angeordnet sind. Hierdurch können Merkmale eindeutig zugeordnet werden, auch wenn in den entsprechenden Bildern gleiche oder ähnliche Merkmale wiederholt auftreten. Durch die örtliche Zuordnung dieser Merkmale können gleiche oder ähnliche Merkmale eindeutig voneinander unterschieden werden. Eine solche örtliche Zuordnung macht jedoch nur Sinn, wenn das Abtastbild im Wesentlichen genauso ausgerichtet ist; wie das Referenz Bild. Ist zum Beispiel das Abtastbild um einen Winkel von mehr als 45° gegenüber dem Referenzbild gedreht, dann besitzen die einander zuzuordnenden Merkmale völlig unterschiedliche Koordinaten und die Koordinaten hätten keinerlei Aussagekraft für die Ähnlichkeit bzw. Zuordnung der Merkmale. Erst die Ausrichtung des Abtastbildes durch die Abbildung mittels der Homographie-Matrix zum Homographie-Bild erlaubt die sinnvolle Verwendung der Koordinaten der Merkmale als Bestandteil der Merkmalsvektoren, da durch die Abbildung mittels der Homographie-Matrix das Abtastbild zu dem Referenzbild ausgerichtet wird.In the second feature assignment, by assigning the features of the homography image to the features of the reference image, both the coordinates of the features of the homography image and the coordinates of the features of the reference image are taken into account. During such an assignment, the distance between the corresponding feature vectors is determined. The smaller the distance, the better the assignment. The feature pairs with the smallest distance are selected as feature pairs of the features of the homography image and the features of the reference image. By taking the coordinates of the features into account, the distance between the features of the homography image and the corresponding features in the reference image is small if the corresponding features are arranged in the same or a similar position in both images. This allows features to be clearly assigned, even if identical or similar features occur repeatedly in the corresponding images. By spatially assigning these features, identical or similar features can be clearly distinguished from one another. However, such a spatial assignment only makes sense if the scanned image is aligned in essentially the same way as the reference image. For example, if the scanned image is rotated by an angle of more than 45° relative to the reference image, the features to be assigned to each other would have completely different coordinates, and the coordinates would have no meaningful significance for the similarity or assignment of the features. Only the alignment of the scanned image by mapping it to the homography image using the homography matrix allows the meaningful use of the feature coordinates as components of the feature vectors, since the mapping using the homography matrix aligns the scanned image to the reference image.

Nach der zweiten Merkmalszuordnung sind die Merkmale des Homographie-Bildes und des Referenzbildes einander so präzise zugeordnet, dass das Homographie-Bild nach der Maßgabe der zweiten Merkmalszuordnung sehr präzise entzerrt werden kann, wobei auch lokal unterschiedliche Entzerrungen, wie sie durch Knicken und Falten eines Dokuments verursacht werden, möglich sind.After the second feature assignment, the features of the homography image and the reference image are assigned to each other so precisely that the homography image can be rectified very precisely according to the second feature assignment, whereby locally different rectifications, such as those caused by bending and folding of a document, are also possible.

Das Entzerren des Homographie-Bildes wird vorzugsweise mit einer Freiform-Entzerrungsmethode durchgeführt, indem beispielsweise ein Verschiebungsfeld interpoliert wird, wie es oben erläutert ist (siehe Diskussion von Lee, Seungyong, et al.).The rectification of the homography image is preferably performed using a freeform rectification method, for example, by interpolating a displacement field as explained above (see discussion by Lee, Seungyong, et al.).

Die Merkmal sind vorzugsweise skalierungs- und/oder rotationsinvariante Merkmale.The features are preferably scaling and/or rotation invariant features.

Vorzugsweise werden die Abtastbilder vor der Überlagerung entzerrt, wobei das Entzerren mittels einer Merkmalszuordnung und/oder mittels einer Zuordnung von Bildausschnitten erfolgen kann. Hierbei kann sowohl eine Entzerrung mittels Merkmalszuordnung als auch eine Entzerrung mittels Zuordnung von Bildausschnitten in Kombination erfolgen, sodass bestimmte Bereiche der Abtastbilder durch eine Merkmalszuordnung und andere Bereiche durch eine Zuordnung von Bildausschnitten entzerrt werden. Mit diesen Entzerrungsmethoden ist jeweils eine Freiform-Entzerrung möglich.Preferably, the scanned images are rectified before superimposition, whereby the rectification can be performed by means of feature assignment and/or by means of image section assignment. Both rectification by means of feature assignment and rectification by means of image section assignment can be performed in combination, so that certain areas of the scanned images are rectified by means of feature assignment and other areas by means of image section assignment. These rectification methods each enable freeform rectification.

Nach einem weiteren Aspekt betrifft die vorliegende Erfindung ein Verfahren zum Abtasten und Verarbeiten eines Dokumentes umfassend die Schritte

- Abtasten eines Dokuments mit einer optischen Abtastvorrichtung, wobei ein Abtastbild des Dokuments erzeugt wird,
- Klassifizieren des Abtastbildes zu einer Dokumentenklasse, wobei aus dem Abtastbild Merkmale extrahiert werden, und diese Merkmale mit Datenbankenmerkmalen einer Merkmalsdatenbank, die mehrere Dokumentenklassen definieren, verglichen werden, wobei das Abtastbild der Dokumentenklasse zugeordnet wird, zu der die beste Übereinstimmung mit den Datenbankmerkmalen der jeweiligen Dokumentenklasse erzielt wird.

According to a further aspect, the present invention relates to a method for scanning and processing a document comprising the steps

- scanning a document with an optical scanning device, whereby a scanned image of the document is generated,
- Classifying the scanned image into a document class, wherein features are extracted from the scanned image and these features are compared with database features of a feature database that defines a plurality of document classes, wherein the scanned image is assigned to the document class for which the best match with the database features of the respective document class is achieved.

Das Klassifizieren der Dokumente mittels einer Merkmalszuordnung erlaubt eine präzise und korrekte Zuordnung. Dies gilt insbesondere, wenn eine Merkmalsdatenbank verwendet wird, wie sie oben erläutert ist.Classifying documents using feature mapping allows for precise and accurate classification. This is especially true when using a feature database, as described above.

Die derart klassifizierten Dokumente können mit der digitalen Referenz registriert werden. Die digitale Referenz kann Anweisungen enthalten, nach welchen bestimmte Informationen des Dokuments auszulesen und entsprechend zu verarbeiten sind. Diese Anweisungen können zum Beispiel Anweisungen umfassen, in welchem angegeben ist, was für ein Typ von Information (zum Beispiel Name, Adresse, Telefonnummer,...) an bestimmten Bereichen im Dokument enthalten sein soll. Diese Informationen können dann automatisch ausgelesen und entsprechend ihrer Bedeutung verarbeitet werden.Documents classified in this way can be registered with the digital reference. The digital reference can contain instructions according to which specific information from the document is to be read and processed accordingly. These instructions can, for example, include instructions specifying what type of information (e.g., name, address, telephone number, etc.) should be included in specific areas of the document. This information can then be automatically read and processed according to its meaning.

Vorzugsweise wird als Abtastvorrichtung jeweils ein Typ von Abtastvorrichtung oder eine einzige bestimmte Abtastvorrichtung verwendet. Insbesondere wird der jeweils eine Typ von Abtastvorrichtung oder die eine einzige bestimmte Abtastvorrichtung verwendet, die auch zum Erstellen der Merkmalsdatenbank verwendet worden ist. Hierdurch wird die Qualität einer gleichmäßigen Abtastung hochgehalten, wodurch die Qualität der Zuordnung entsprechend hoch ist.Preferably, a type of scanning device or a A single specific scanning device is used. In particular, the same type of scanning device or the same specific scanning device that was also used to create the feature database is used. This maintains the quality of uniform scanning, which in turn ensures a correspondingly high quality of the assignment.

Die Erfindung wird beispielhaft nachfolgend näher anhand der Zeichnungen erläutert. Die Zeichnung zeigen in:

1a ein Ausführungsbeispiel einer Abtastvorrichtung zum optischen Abtasten von Dokumenten in einer grob schematisch vereinfachten Darstellung,
1b eine Abtasteinheit der Abtastvorrichtung aus 1A und ein Abtastelement und eine Beleuchtungseinrichtung in ihrer geometrischen Zuordnung zum Abtastbereich,
2 ein Verfahren zum Erstellen einer Datenbank für Dokumentenklassen in einem Flussdiagramm,
3 ein Abtastbild und ein digitales Referenzbild eines Dokuments, jeweils nach bestimmten Bearbeitungsschritten, und
4 ein Verfahren zum Abtasten und Verarbeiten eines Dokuments mit einer Datenbank, die mit dem Verfahren nach 2 erzeugt worden ist, in einem Flussdiagramm.

The invention is explained in more detail below by way of example with reference to the drawings. The drawings show:

1a an embodiment of a scanning device for optically scanning documents in a roughly schematic simplified representation,
1b a scanning unit of the scanning device 1A and a scanning element and an illumination device in their geometric assignment to the scanning area,
2 a method for creating a database for document classes in a flowchart,
3 a scanned image and a digital reference image of a document, each after certain processing steps, and
4 a method for scanning and processing a document with a database which is compatible with the method according to 2 has been generated, in a flowchart.

Die Erfindung wird nachfolgend anhand einer Vorrichtung zum optischen Abtasten von Dokumenten näher erläutert (1a, 1b). Eine solche Abtastvorrichtung 1 umfasst eine Abtasteinheit 2 und eine Auswerteeinheit 3.The invention is explained in more detail below using a device for optically scanning documents ( 1a , 1b) . Such a scanning device 1 comprises a scanning unit 2 and an evaluation unit 3.

Im vorliegenden Ausführungsbeispiel ist die Auswerteeinheit 3 durch einen herkömmlichen Arbeitsplatzrechner bzw. Personal Computer 4 mit einer Eingabeeinheit 5 und einer Anzeigeeinheit 6 ausgebildet. Die Eingabeeinheit 5 ist eine Computertastatur und die Anzeigeeinheit 6 ein Computerbildschirm.In the present embodiment, the evaluation unit 3 is formed by a conventional workstation or personal computer 4 with an input unit 5 and a display unit 6. The input unit 5 is a computer keyboard, and the display unit 6 is a computer screen.

Die Auswerteeinheit 3 kann jedoch auch durch einen beliebigen anderen Mehrzweckrechner, wie zum Beispiel ein Computer-Tablet oder ein Smartphone, ausgebildet sein oder vollständig in die Abtasteinheit 2 integriert sein.However, the evaluation unit 3 can also be formed by any other multi-purpose computer, such as a computer tablet or a smartphone, or can be fully integrated into the scanning unit 2.

Im vorliegenden Ausführungsbeispiel ist die Auswerteeinheit 3 in Form eines quaderförmigen Körpers mit einer Grundplatte 7, zwei Seitenwandungen 8, 9 und einer Deckenwandung 10 ausgebildet. Die Grundplatte 7 und die Wandungen 8-10 begrenzen einen Hohlraum, wobei die Grundplatte 7 einen Abtastbereich 11 bildet, auf dem ein abzutastendes Dokument 12 angeordnet werden kann. In der Deckenwandung 10 sind ein Abtastelement 13 und eine Beleuchtungseinrichtung 14 angeordnet ( 1b).In the present embodiment, the evaluation unit 3 is designed in the form of a cuboid body with a base plate 7, two side walls 8, 9, and a top wall 10. The base plate 7 and the walls 8-10 define a cavity, with the base plate 7 forming a scanning area 11 on which a document 12 to be scanned can be placed. A scanning element 13 and an illumination device 14 are arranged in the top wall 10 ( 1b) .

Das Abtastelement 13 ist eine Digitalkamera mit einem zweidimensionalen Kamerachip und einem Objektiv. Die Kamera ist mit ihrer Blickrichtung in Richtung zum Abtastbereich 11 ausgerichtet, sodass der gesamte Abtastbereich von der Kamera optisch abgetastet werden kann.The scanning element 13 is a digital camera with a two-dimensional camera chip and a lens. The camera's viewing direction is directed toward the scanning area 11, allowing the entire scanning area to be optically scanned by the camera.

Die Beleuchtungseinrichtung 14 weist als Leuchtmittel Leuchtdioden auf und ist so ausgebildet, dass sie weißes Licht abgeben kann. Hierzu können Leuchtdioden vorgesehen sein, die weißes Licht abgeben oder es können unterschiedlich farbige Leuchtdioden vorgesehen sein, die gemeinsam weißes Licht erzeugen. Bei mehreren unterschiedlich farbigen Leuchtdioden besteht die Möglichkeit, diese so anzusteuern, dass sie auch Licht mit unterschiedlichen Farbbereichen bzw. Spektralbereichen abgeben können, indem die Lichtintensität der unterschiedlich farbigen Leuchtdioden entsprechend eingestellt wird. Die Beleuchtungseinrichtung kann Reflektoren aufweisen (nicht dargestellt), die vorzugsweise so ausgebildet sind, dass das Licht möglichst diffus auf den Abtastbereich 11 gelenkt wird. Eine solche Beleuchtung wird auch als Dunkelfeldbeleuchtung bezeichnet. Wenn vor allem oder ausschließlich nicht-spiegelnde Dokumente abgetastet werden sollen, dann kann es auch Sinn machen, eine Hellfeldbeleuchtung vorzusehen, da mit einer Hellfeldbeleuchtung bei gleicher Lichtstärke der Lichtquelle eine höhere Lichtintensität am Abtastbereich 11 als mit einer Dunkelfeldbeleuchtung erzielt wird.The illumination device 14 comprises light-emitting diodes as illuminants and is designed to emit white light. For this purpose, light-emitting diodes that emit white light can be provided, or light-emitting diodes of different colors can be provided that together generate white light. If several light-emitting diodes of different colors are provided, it is possible to control them so that they can also emit light with different color ranges or spectral ranges by adjusting the light intensity of the differently colored light-emitting diodes accordingly. The illumination device can have reflectors (not shown), which are preferably designed to direct the light onto the scanning area 11 as diffusely as possible. Such illumination is also referred to as dark-field illumination. If primarily or exclusively non-reflective documents are to be scanned, it can also be useful to provide bright-field illumination, since with bright-field illumination, a higher light intensity is achieved at the scanning area 11 than with dark-field illumination, with the same light source intensity.

Die Oberfläche des Abtastbereichs 11 ist vorzugsweise matt mit einer gleichmäßigen Farbe (insbesondere Weiß) ausgebildet.The surface of the scanning area 11 is preferably matt with a uniform color (especially white).

Die in den Innenraum weisenden Flächen der Seitenwandungen 8, 9 sind vorzugsweise auch matt ausgebildet und weisen eine helle Farbe, insbesondere weiße Farbe, auf, sodass das von der Beleuchtungseinrichtung abgegebene und von den Seitenwandungen 8, 9 reflektierte Licht möglichst diffus in den Abtastbereich 11 gelenkt wird.The surfaces of the side walls 8, 9 facing into the interior are preferably also matt and have a light color, in particular white, so that the light emitted by the lighting device and reflected by the side walls 8, 9 is directed into the scanning area 11 as diffusely as possible.

Die Auswerteeinheit 3 ist mittels einer Datenleitung 15 zum bidirektionalen Übertragen von Daten mit der Abtasteinheit 2 derart verbunden, so dass die Auswerteeinheit 3 sowohl das Abtastelement 13, als auch die Beleuchtungseinrichtung 14 ansteuern und die mit dem Abtastelement 13 erfassten Daten auslesen kann. Die Auswerteeinheit 3 dient somit nicht nur zum Auswerten von von der Kamera 13 erfassten Bildern, sondern auch zum Steuern des gesamten Prozesses.The evaluation unit 3 is connected to the scanning unit 2 via a data line 15 for bidirectional data transmission, so that the evaluation unit 3 can control both the scanning element 13 and the illumination device 14 and read out the data acquired by the scanning element 13. The evaluation unit 3 thus serves not only to evaluate images captured by the camera 13, but also to control the entire process.

Im Rahmen der Erfindung können auch andere Abtastvorrichtungen verwendet werden, wie zum Beispiel Abtastvorrichtungen, welche eine Zeilenkamera verwenden, die parallel zu einem abzutastenden Dokument während des Abtastvorganges bewegt wird, wie es zum Beispiel von Kopiergeräten bekannt ist, oder auch eine Digitalkamera, welche beispielsweise mit einem Stativ in einer bestimmten örtlichen Beziehung zu einem Auswertebereich angeordnet ist. Es ist auch nicht notwendig, dass die Abtasteinheit eine Beleuchtungseinrichtung aufweist, denn grundsätzlich könnte auch mit Umgebungslicht ein Dokument entsprechend abgetastet werden.Other scanning devices can also be used within the scope of the invention, such as scanning devices that use a line-scan camera that moves parallel to a document to be scanned during the scanning process, as is known, for example, from copiers, or even a digital camera that is arranged, for example, on a tripod in a specific spatial relationship to an evaluation area. It is also not necessary for the scanning unit to have an illumination device, since, in principle, a document could also be scanned using ambient light.

Die Abtasteinheit 2 gemäß dem vorliegenden Ausführungsbeispiel weist einen bestimmten, feststehenden geometrischen Bezug zwischen dem Abtastelement, 13, der Beleuchtungseinrichtung 14 und dem Abtastbereich 11 auf. Zudem sind durch die rohrförmige Ausbildung der Abtastvorrichtung 1 mit den beiden Seitenwandungen, der Deckenwandung und der Grundplatte 7 Störeinflüsse durch Umgebungslicht gering, und trotzdem ist der Abtastbereich 11 von zwei Seiten einfach zugänglich, sodass ein abzutastendes Dokument 12 von einem Benutzer einfach im Abtastbereich 11 platziert werden kann.The scanning unit 2 according to the present embodiment has a specific, fixed geometric relationship between the scanning element 13, the illumination device 14, and the scanning area 11. Furthermore, the tubular design of the scanning device 1 with the two side walls, the top wall, and the base plate 7 minimizes interference from ambient light, yet the scanning area 11 is easily accessible from two sides, so that a document 12 to be scanned can be easily placed in the scanning area 11 by a user.

Derartige Abtastvorrichtungen 1 sind vor allem zum Abtasten von Ausweisdokumenten, wie zum Beispiel Personalausweis, Führerschein oder dergleichen, oder zum Abtasten von bestimmten Formularen, wie zum Beispiel Lottoscheinen, Anmeldeformularen oder dergleichen, geeignet.Such scanning devices 1 are particularly suitable for scanning identification documents, such as identity cards, driving licenses or the like, or for scanning certain forms, such as lottery tickets, registration forms or the like.

Die oben erläuterte Abtastvorrichtung stellt ein mögliches Ausführungsbeispiel dar. Sie kann aber auf unterschiedlichste Weise angewandelt werden, sofern eine Kamera und ein Abtastbereich vorgesehen wird. So ist es bspw. möglich, an Stelle der Seitenwände einen einzelnen, vertikal verlaufenden, stangenförmige Träger zwischen dem Abtastbereich und der Deckenwandung vorzusehen.The scanning device described above represents one possible embodiment. However, it can be adapted in a variety of ways, provided a camera and a scanning area are provided. For example, it is possible to provide a single, vertically extending, rod-shaped support between the scanning area and the ceiling wall instead of the side walls.

Nachfolgend wird ein Verfahren zum Erstellen einer Datenbank für Dokumentenklassen erläutert, das anhand von mit der oben erläuterten Abtastvorrichtung 1 abgetasteten Dokumenten ausgeführt wird.A method for creating a database for document classes is explained below, which is carried out using documents scanned with the scanning device 1 explained above.

Das Verfahren beginnt im Schritt S1 (2).The process begins in step S1 ( 2 ).

Im Schritt (S2) werden mehrere Dokumente 12 von dem Abtastelement 13 abgetastet. Hierzu wird zunächst ein Dokument 12 auf den Abtastbereich 11 der Abtasteinheit 2 gelegt. Das Dokument 12 wird durch die Beleuchtungseinrichtung 14 beleuchtet, um alle Details des Dokuments 12 genau erfassen zu können und um möglichst gleichmäßige Abtastbedingungen beim Abtasten unterschiedlicher Dokumente zu erhalten. Sobald das Dokument 12 an seinem Platz liegt und korrekt ausgeleuchtet wird, initiiert ein Nutzer den Abtastvorgang, beispielsweise indem er eine Taste an der Abtasteinheit 2 oder an der Eingabeeinheit 5 betätigt.In step (S2), several documents 12 are scanned by the scanning element 13. For this purpose, a document 12 is first placed on the scanning area 11 of the scanning unit 2. The document 12 is illuminated by the illumination device 14 in order to accurately capture all details of the document 12 and to obtain the most uniform scanning conditions possible when scanning different documents. Once the document 12 is in place and correctly illuminated, a user initiates the scanning process, for example, by pressing a button on the scanning unit 2 or on the input unit 5.

Dieser Vorgang wird für mehrere Dokumente, d. h. n Dokumente 12 wiederholt, sodass n Abtastbilder von den jeweiligen Dokumenten erzeugt werden.This process is repeated for several documents, i.e. n documents 12, so that n scan images are generated from the respective documents.

Bei der Abtastvorrichtung 1 werden die einzelnen Dokumente manuell auf den Abtastbereich 11 gelegt. Es ist selbstverständlich auch möglich, eine automatische Zuführung zum Zuführen der Dokumente in den Abtastbereich 11 vorzusehen, sodass das Abtasten mehrerer Dokumente vollautomatisch erfolgen kann.In the scanning device 1, the individual documents are manually placed onto the scanning area 11. It is of course also possible to provide an automatic feeder for feeding the documents into the scanning area 11, so that the scanning of multiple documents can be carried out fully automatically.

Das Auslösen eines Abtastvorgangs kann auch automatisch gestartet werden, ohne dass eine Taste betätigt werden muss. Es kann beispielsweise ein Näherungssensor vorgesehen sein, der feststellt, dass sich ein Gegenstand im Abtastbereich 11 befindet, woraufhin zunächst eine erste vorläufige optische Abtastung erfolgt, um zu prüfen, ob der Gegenstand ruhig im Abtastbereich 11 liegt, d.h. dass er nicht bewegt wird. Ist dies der Fall, dann kann er mit der Beleuchtungseinrichtung 14 beleuchtet werden und die eigentliche optische Abtastung ausgeführt werden.A scanning process can also be initiated automatically, without having to press a button. For example, a proximity sensor can be provided to detect that an object is located in the scanning area 11. Following this, an initial preliminary optical scan is performed to check whether the object is stationary in the scanning area 11, i.e., that it is not moving. If this is the case, it can be illuminated with the illumination device 14, and the actual optical scan can be performed.

Im vorliegenden Ausführungsbeispiel ist das Abtastelement 13 als eine Kamera ausgebildet, welche einen zweidimensionalen Kamerachip und ein Objektiv aufweist. Die Kamera ist mit ihrer Blickrichtung auf den Abtastbereich 11 ausgerichtet, so dass möglichst der gesamte Abtastbereich und damit das gesamte Dokument 12 mit einer einzelnen Aufnahme erfasst werden kann. Das so erzeugte Abtastbild 16 enthält eine Darstellung des Dokuments 12, die zunächst beliebig gedreht und/oder verzerrt sein kann (3, oberste Reihe).In the present embodiment, the scanning element 13 is designed as a camera having a two-dimensional camera chip and a lens. The camera's viewing direction is directed toward the scanning area 11, so that the entire scanning area, and thus the entire document 12, can be captured with a single image. The scanned image 16 thus generated contains a representation of the document 12, which can initially be rotated and/or distorted as desired ( 3 , top row).

Nach der Erfassung des Abtastbildes 16 werden die Daten, die das Abtastbild 16 darstellen, über die Datenleitung 15 an die Auswerteeinheit 3 übermittelt. Mit der Auswerteeinheit 3 werden die Abtastbilder 16 ausgewertet.After the scanning image 16 has been captured, the data representing the scanning image 16 are transmitted to the evaluation unit 3 via the data line 15. The evaluation unit 3 evaluates the scanning images 16.

Bei dieser Auswertung wird eine erste Merkmalszuordnung (englisch: Feature Matching) eines Abtastbildes eines ersten Dokuments ausgeführt (Schritt S3). Hierbei werden Merkmale, welche auch als Features, Keypoints oder Schlüsselpunkte bezeichnet werden, in dem Abtastbild 16 mit einem sogenannten SIFT-Algorithmus (Scale-Invariant Feature Transform-Algorithmus) identifiziert und extrahiert.During this evaluation, a first feature matching of a scanned image of a first document is performed (step S3). Here, features, also referred to as features or keypoints, are identified and extracted in the scanned image 16 using a so-called SIFT algorithm (Scale-Invariant Feature Transform algorithm).

Hierbei werden Merkmale identifiziert, die gegenüber einer Maßstabsveränderung unveränderlich, d.h. invariant, sind. Dies wird beispielsweise ausgeführt, indem ein Skalenraum erzeugt wird, der im Wesentlichen eine Reihe von Bildern mit unterschiedlichem Maßstab (=unterschiedliche Skalen) bzw. Auflösung umfasst, wobei diese Bilder aus dem Abtastbild 16 berechnet werden. Für jeden Maßstab wird das Abtastbild 16 mit Gaußfilter mit zunehmendem Sigma-Wert gefaltet, um eine Reihe von unscharfen Bildern zu erzeugen. Die Gauß-Differenz (DOG = Difference Of Gaußian) wird dann durch Subtraktion zweier aufeinanderfolgender unscharfer Bilder berechnet. Das Ergebnis ist eine Serie von DOG-Bildern. Mögliche Merkmalspunkte werden als lokale Minima oder Maxima in den DOG-Bildern identifiziert, indem jedes Pixel mit seinen Nachbarn im aktuellen Abtastbild 16 und den Nachbarn in den Skalen-Bildern verglichen wird.This involves identifying features that are unchanging, i.e. invariant, with respect to a change in scale. This is carried out, for example, by generating a scale space that essentially comprises a series of images with different scales (= different scales) or resolutions, whereby these images are calculated from the sample image 16. For each scale, the sample image 16 is convolved with a Gaussian filter with increasing sigma value to generate a series of blurred images. The Difference Of Gaussian (DOG) is then calculated by subtracting two consecutive blurred images. The result is a series of DOG images. Possible feature points are identified as local minima or maxima in the DOG images by comparing each pixel with its neighbors in the current sample image 16 and the neighbors in the scale images.

Ein Merkmalspunkt gibt die Koordinaten eines zentralen Punktes eines Merkmals an.A feature point specifies the coordinates of a central point of a feature.

Potentielle Merkmalspunkte werden durch eine Anpassung einer quadratischen Funktion an die lokalen Bildmuster um die potentiellen Merkmalspunkte herum verifiziert. Merkmale, die einen geringen Kontrast aufweisen oder schlecht auf einer Kante lokalisiert sind, werden verworfen. Merkmale mit geringem Kontrast reagieren empfindlich auf Rauschen. Merkmalspunkte entlang von Kanten sind räumlich nicht stabil.Potential feature points are verified by fitting a quadratic function to the local image patterns around the potential feature points. Features with low contrast or poorly localized to an edge are discarded. Low-contrast features are sensitive to noise. Feature points along edges are spatially unstable.

Eine finale Position der Merkmalspunkte wird durch ein Maximum oder Minimum der angepassten quadratischen Funktion bestimmt. Jedem identifizierten Merkmalspunkt werden ein oder mehrere Orientierungen zugewiesen, die auf der lokalen Gradientenrichtung des Abtastbildes 16 passieren. Dies gewährleistet die Rotationsinvarianz. Hierzu wird für die Umgebung des Merkmalspunktes die Gradientengröße und die Orientierung berechnet. Ein Orientierungs-Histogramm mit 36 Säulen, die 360° abdecken, wird aus der Gradientenorientierung von Stichpunkten innerhalb einer Region und dem Schlüsselpunkt erstellt. Die Maxima in diesem Histogramm bestimmen die Ausrichtung des Merkmalspunktes. Die höchste Spitze im Histogramm und jedes weitere lokale Maximum, das innerhalb von 80 % der höchsten Spitze liegt, wird verwendet, um die Orientierung zuzuweisen. Nachdem die Merkmalspunkte identifiziert und die Orientierung zugewiesen wurde, wird für jedes Merkmal ein Deskriptor, auch Merkmalsvektor genannt, erstellt, um das lokale Aussehen des Merkmals zu erfassen.A final position of the feature points is determined by a maximum or minimum of the fitted quadratic function. Each identified feature point is assigned one or more orientations that pass 16 on the local gradient direction of the scan image. This ensures rotation invariance. For this purpose, the gradient magnitude and orientation are calculated for the neighborhood of the feature point. An orientation histogram with 36 columns covering 360° is created from the gradient orientation of key points within a region and the key point. The maxima in this histogram determine the orientation of the feature point. The highest peak in the histogram and any subsequent local maximums that lie within 80% of the highest peak are used to assign the orientation. After the feature points have been identified and the orientation assigned, a descriptor, also called a feature vector, is created for each feature to capture the local appearance of the feature.

Der Deskriptor ist ein eindeutiger Fingerabdruck für jeden Merkmalspunkt und ermöglicht so einen Abgleich zwischen zwei Bildern, da die entsprechenden Merkmalspunkte in den jeweiligen Bildern einander zugeordnet werden können. Der Deskriptor kann als ein Histogramm der Gradientenorientierung in einem Bereich um den Merkmalspunkt betrachtet werden, das eine oberste Darstellung der lokalen Bildtextur liefert.The descriptor is a unique fingerprint for each feature point, enabling matching between two images, as the corresponding feature points in the respective images can be mapped to each other. The descriptor can be viewed as a histogram of the gradient orientation in a region around the feature point, providing a top-level representation of the local image texture.

Zur Erstellung eines Deskriptors wird zunächst die Region um den Schlüsselpunkt beispielsweise in ein 4 × 4-Raster von Unterregionen unterteilt. Für jede Unterregion wird die Gradientenstärke in Ausrichtung der Bildpunkte berechnet. Anschließend wird für jede Unterregion ein Orientierungs-Histogramm mit 8 Säulen bzw. Bins erstellt, was insgesamt 128 Säulen bzw. Dimensionen entspricht. Die Histogrammwerte erfassen die dominante Gradientenrichtung in der lokalen Nachbarschaft des Merkmalspunktes. Die Histogrammwerte bilden die Elemente des Deskriptors bzw. Vektors. Dieser Deskriptor wird normalisiert, um seine Robustheit gegenüber Beleuchtung und Kontrastschwankungen zu erhöhen. Der normalisierte Vektor ist der Deskriptor für die Merkmalspunkte und liefert eine invariante Darstellung der lokalen Bildstruktur.To create a descriptor, the region around the keypoint is first divided into a 4 × 4 grid of subregions, for example. For each subregion, the gradient strength is calculated in the orientation of the pixels. An orientation histogram with 8 columns or bins is then created for each subregion, corresponding to a total of 128 columns or dimensions. The histogram values capture the dominant gradient direction in the local neighborhood of the feature point. The histogram values form the elements of the descriptor or vector. This descriptor is normalized to increase its robustness to lighting and contrast variations. The normalized vector is the descriptor for the feature points and provides an invariant representation of the local image structure.

Diese Merkmalserkennung wird auch bei einem digitalen Referenzbild 17 durchgeführt. Das digitale Referenzbild ist eine perfekte digitale Darstellung wesentlicher Elemente des Dokuments 12. Die einzelnen Dokumente 12 können zusätzlich Informationen, welche insbesondere von einem Benutzer des Dokuments manuell oder maschinell hinzugefügt worden sind, enthalten. Im Übrigen stimmt der Inhalt des Dokuments 12 weitestgehend mit dem digitalen Referenzbild 17 überein.This feature recognition is also performed on a digital reference image 17. The digital reference image is a perfect digital representation of essential elements of the document 12. The individual documents 12 may contain additional information, which has been added manually or mechanically, in particular, by a user of the document. Otherwise, the content of the document 12 largely corresponds to the digital reference image 17.

Anschließend erfolgt die eigentliche Zuordnung der Merkmalspunkte des Abtastbildes 16 und des Referenzbildes 17, indem sie miteinander verglichen werden, wobei beim Vergleich der Merkmalspunkte zwischen dem Abtastbild 16 und dem Referenzbild 17 anhand der Deskriptoren ein Abstand (zum Beispiel euklidischer Abstand) zwischen den entsprechenden Vektoren bestimmt wird und die Merkmalspunkte der beiden Bilder einander zugeordnet werden, deren Abstand am geringsten ist. Dies kann beispielsweise derart ausgeführt werden, dass zunächst Abstände zwischen Merkmalspunkten, welche unter einem bestimmten Schwellenwert liegen, als potentielle Übereinstimmungen bewertet werden. Danach kann ein Ratio-Test durchgeführt werden, bei dem der Abstand eines Merkmalspunktes des Abtastbildes 16 zum zweitnächsten Merkmalspunkt des Referenzbildes 17 berechnet wird. Das Verhältnis der Abstände des nächstliegenden und des zweitnächstgelegenen Merkmalspunktes wird dann berechnet. Liegt das Verhältnis unter einem bestimmten Schwellenwert, zum Beispiel 0,8, wird die Übereinstimmung als gültig betrachtet. Nachdem die Übereinstimmung zwischen Merkmalen des Abtastbildes 16 und dem Referenzbild 17 gefunden wurden, wird der Vorgang durch den Abgleich von Merkmalspunkten des Referenzbildes 17 mit dem Abtastbild 16 wiederholt. Es können nur die Merkmalspunkte beibehalten werden, die in beiden Richtungen übereinstimmen.The actual assignment of the feature points of the scanned image 16 and the reference image 17 then takes place by comparing them with each other. When comparing the feature points between the scanned image 16 and the reference image 17, a distance (e.g., Euclidean distance) between the corresponding vectors is determined based on the descriptors, and the feature points of the two images with the smallest distance are assigned to each other. This can be carried out, for example, by initially evaluating distances between feature points that are below a certain threshold value as potential matches. A ratio test can then be performed, in which the distance of a feature point of the scanned image 16 to the second-closest feature point of the reference image 17 is calculated. The ratio of the distances between the closest and the second-closest feature point is then calculated. If the ratio is below a certain threshold value, for example, 0.8, the match is considered valid. After the correspondence between features of the scanned image 16 and the reference image 17 were found, the process is repeated by comparing feature points of the reference image 17 with the scan image 16. Only the feature points that match in both directions can be retained.

Anschließend können die beibehaltenen Merkmalspunkte noch gefiltert werden, um Ausreißer, die zum Beispiel durch Rauschen entstanden sind, zu entfernen. Hierzu kann beispielsweise der sogenannte RANSAC-Algorithmus (Random Sample Consensus) verwendet werden, um Ausreißer und Fehler festzustellen. Andere Algorithmen zur Erkennung von Ausreißern sind hierbei aber auch möglich.The retained feature points can then be filtered to remove outliers caused, for example, by noise. For this purpose, the so-called RANSAC (Random Sample Consensus) algorithm can be used to detect outliers and errors. However, other algorithms for outlier detection are also possible.

Das Zuordnen von Merkmalen zweier Bilder ist an sich bekannt. Es können auch andere bekannte Zuordnungsmethoden verwendet werden, sofern sie eine zuverlässige Zuordnung ähnlicher Merkmale bewirken.Matching features between two images is well known. Other known matching methods can also be used, provided they reliably match similar features.

Die Merkmalszuordnung wird genutzt, um eine Homographie-Matrix zu berechnen. Die Homographie-Matrix erlaubt eine perspektivische Entzerrung des Abtastbildes und eine Ausrichtung auf das Referenzbild. Durch Anwenden der Homographie-Matrix auf das Abtastbild 16 wird das Abtastbild bezüglich des Referenzbildes 17 ausgerichtet (gedreht bzw. verschoben) und auch entzerrt, sodass ein Homographie-Bild 18 entsteht (3). Mit dieser Homographie-Abbildung bzw. Homographie-Entzerrung können perspektivische Verzerrungen, jedoch keine inhomogenen Verzerrungen innerhalb des Bildes entzerrt werden. Verzerrungen, wie sie beispielsweise durch Falten und Knicken eines Dokumentes entstehen, können mit einer Homographie-Entzerrung nicht aufgehoben werden.The feature assignment is used to calculate a homography matrix. The homography matrix allows for perspective rectification of the scanned image and alignment with the reference image. By applying the homography matrix to the scanned image 16, the scanned image is aligned (rotated or shifted) with respect to the reference image 17 and also rectified, resulting in a homography image 18 ( 3 This homography mapping or homography rectification can correct perspective distortions, but not inhomogeneous distortions within the image. Distortions such as those caused by folding or creasing a document cannot be eliminated with homography rectification.

Zur Berechnung der Homografie-Matrix wird eine direkte lineare Transformation (DLT) verwendet. Sie erfordert mindestens vier übereinstimmende, nicht-kollineare Merkmale aus dem Abtastbild 16 und dem Referenzbild 17. Es können jedoch auch mehr Übereinstimmungen verwendet werden, wobei hier dann eine Lösung der kleinsten Quadrate berechnet wird. Aufgrund von Rauschen, falschen Übereinstimmungen und anderen Faktoren stimmen nicht alle Übereinstimmungen perfekt überein, sodass die Homographie-Matrix mit Verfahren wie dem oben erwähnten RANSAC-Algorithmus bestimmt werden muss. Der RANSAC-Algorithmus wählt wiederholt eine zufällige Teilmenge übereinstimmender Merkmalspunkte aus und berechnet die Homographie-Matrix. Es wird anschließend bestimmt, wie viele Übereinstimmungen mit dieser berechneten Homografie-Matrix übereinstimmen, und die Homografie-Matrix mit der höchsten Anzahl von Übereinstimmungen wird als Endergebnis ausgewählt.A direct linear transform (DLT) is used to calculate the homography matrix. It requires at least four matching, non-collinear features from the sample image 16 and the reference image 17. However, more matches can be used, in which case a least-squares solution is calculated. Due to noise, false matches, and other factors, not all matches will match perfectly, so the homography matrix must be determined using methods such as the RANSAC algorithm mentioned above. The RANSAC algorithm repeatedly selects a random subset of matching feature points and calculates the homography matrix. It then determines how many matches match this calculated homography matrix, and the homography matrix with the highest number of matches is selected as the final result.

Bei der Merkmalsbestimmung im Schritt S3 können bestimmte vorab definierte Bereiche ausgeblendet werden. Dies sind in der Regel Bereiche, die individuelle Informationen in den einzelnen Dokumenten aufweisen und somit nicht einander zuordbar sind. Das Ausblenden kann bspw. dadurch erfolgen, dass die Farbwerte und/oder Helligkeitswerte dieser Bereiche auf jeweils einen bestimmten Wert gesetzt werden, der bspw. der Farbe Weiß entspricht. Diese Ausblendbereiche können somit anhand der Merkmale definiert sein.During feature determination in step S3, certain predefined areas can be hidden. These are usually areas that contain individual information in the individual documents and are therefore not assignable to one another. Hiding can be achieved, for example, by setting the color and/or brightness values of these areas to a specific value, which corresponds, for example, to the color white. These hidden areas can thus be defined based on the features.

Sobald die Homographie-Matrix bestimmt ist, wird sie verwendet, um das Abtastbild 16 in das Homografie-Bild 18 zu transformieren. Hierzu wird jeder Bildpunkt im Abtastbild 16 mit der Homographie-Matrix multipliziert, um den transformierten Punkt im Homografie-Bild 18 zu erhalten. Dieser Schritt des Abbildens stellt die oben erwähnte perspektivische Entzerrung dar (Schritt S4). Bei diesem perspektivischen Entzerren kann ein Abtastbild 16 in der Größe und Ausrichtung an das Referenzbild 17 angepasst werden, wobei einzelne Teile des aufgenommenen Abtastbildes 16 abgeschnitten werden können. Dies ist hierbei jedoch so ausgelegt, dass die Informationsinhalte des Abtastbildes 16, die das Dokument 12 betreffen, nicht abgeschnitten werden. Hierbei wird beispielsweise eine Kantenerkennung durchgeführt, um die Kanten des Dokuments 12 zu erkennen. Das Abtastbild 16 wird nun so beschnitten, dass alle Kanten des Dokuments 12 noch im Abtastbild 16 verbleiben.Once the homography matrix is determined, it is used to transform the scanned image 16 into the homographic image 18. For this purpose, each pixel in the scanned image 16 is multiplied by the homography matrix to obtain the transformed point in the homographic image 18. This mapping step represents the perspective distortion correction mentioned above (step S4). During this perspective distortion correction, a scanned image 16 can be adjusted in size and orientation to the reference image 17, whereby individual parts of the acquired scanned image 16 can be cropped. However, this is designed such that the information content of the scanned image 16 relating to the document 12 is not cropped. For example, edge detection is performed to detect the edges of the document 12. The scanned image 16 is now cropped such that all edges of the document 12 still remain in the scanned image 16.

Nach der Transformation kann es im Homografie-Bild 18 Bereiche geben, die keine entsprechenden Bildpunkte des verzerrten Abtastbildes aufweisen. Diese Bereiche werden durch benachbarte Bildpunkte oder andere Inpainting-Techniken aufgefüllt.After the transformation, there may be 18 regions in the homography image that do not contain corresponding pixels from the distorted scan image. These regions are filled with neighboring pixels or other inpainting techniques.

Gemäß diesem Ausführungsbeispiel wird das Homographie-Bild 18 durch entsprechende bekannte Algorithmen geschärft. Es sind Algorithmen zur Kontrastanpassung oder zur Glättung von Artefakten, die bei der Transformation entstanden sind, bekannt, die hier angewandt werden können. Nun werden die (Orts-)Koordinaten der Merkmale sowohl im Homographie-Bild 18 als auch im Referenzbild 17 zu den jeweiligen Deskriptoren hinzugefügt. Dieses Hinzufügen der Koordinaten wird als Stamping bezeichnet (Schritt S5).According to this embodiment, the homography image 18 is sharpened using corresponding known algorithms. Algorithms for contrast adjustment or for smoothing artifacts created during the transformation are known and can be applied here. The (location) coordinates of the features in both the homography image 18 and the reference image 17 are then added to the respective descriptors. This addition of the coordinates is referred to as stamping (step S5).

Im anschließenden Schritt S6 werden die Merkmale des Homografie-Bildes 18 erneut zu korrespondierenden Merkmalen des Referenzbildes 17 zugeordnet, wobei die Zuordnung die den Deskriptoren zugeordneten Koordinaten berücksichtigt (zweite Merkmalszuordnung). Die Deskriptoren umfassen beispielsweise 130 Dimensionen, wovon 128 dieser Dimensionen aus einem 4 x 4 Raster von Unterregionen mit je 8 Säulen bzw. Bins stammen (siehe oben). Die zusätzlichen zwei Dimensionen sind die Koordinaten. Diese zwei zusätzlichen Dimensionen können genauso gewichtet sein wie alle anderen Dimensionen zuanmme. Es macht jedoch auch Sinn, die zwei zusätzlichen Dimensionen (= Koordinaten) stärker zu gewichten, beispielsweise mit einem Gewichtungsfaktor, der zumindest die fünffache Gewichtung oder zumindest die zehnfache Gewichtung der Koordinaten gegenüber den anderen Dimensionen bewirkt. Die Koordinaten können auch derart stark gewichtet werden, dass sie das gleiche Gewicht haben, wie die restlichen 128 Dimensionen. Die Koordinaten können also mit dem gleichen Gewicht wie die anderen Dimensionen in den Vergleich der Merkmale zur Zuordnung derselben eingehen.In the subsequent step S6, the features of the homography image 18 are again assigned to corresponding features of the reference image 17, whereby the assignment takes into account the coordinates assigned to the descriptors (second feature assignment). The descriptors comprise, for example, 130 dimensions, of which 128 of these dimensions are derived from a 4 x 4 grid of subregions. each with 8 columns or bins (see above). The additional two dimensions are the coordinates. These two additional dimensions can be weighted in the same way as all the other dimensions combined. However, it also makes sense to weight the two additional dimensions (= coordinates) more heavily, for example with a weighting factor that results in at least five times the weighting or at least ten times the weighting of the coordinates compared to the other dimensions. The coordinates can also be weighted so heavily that they have the same weight as the remaining 128 dimensions. The coordinates can therefore be included in the comparison of the characteristics for assignment with the same weight as the other dimensions.

Durch die Berücksichtigung der Koordinaten wird der Ort der Merkmale wesentlich stärker als bei der ersten Merkmalszuordnung berücksichtigt, was dazu führt, dass tatsächlich nur die im Referenzbild und Abtastbild nahe beieinanderliegenden Merkmale einander zugeordnet werden. Dies ist hier zuverlässig möglich, da zuvor eine perspektivische Entzerrung durchgeführt worden ist, bei der das Referenzbild und das Abtastbild zueinander ausgerichtet worden sind, sodass die entsprechenden Merkmale sich in den jeweiligen Bildern an ähnlichen Orten befinden. Ohne diese Ausrichtung können die beiden Bilder, beispielsweise um 90° zueinander verdreht, angeordnet sein, wodurch sich miteinander korrespondierende Merkmale an ganz unterschiedlichen Orten befinden würden. Die Berücksichtigung der Ortskoordinaten im Merkmalsvektor würde erhebliche Fehlzuordnungen verursachen.By taking the coordinates into account, the location of the features is considered much more closely than in the first feature assignment, which means that only the features that are close to each other in the reference image and the scanned image are actually assigned to each other. This is reliably possible here because a perspective correction was previously performed, in which the reference image and the scanned image were aligned with each other so that the corresponding features are located in similar locations in the respective images. Without this alignment, the two images can be arranged, for example, rotated by 90° to each other, which would result in corresponding features being located in completely different locations. Taking the location coordinates into account in the feature vector would cause significant misassignments.

Aufgrund der neu zugeordneten Merkmale werden Verschiebevektoren zwischen einander zugeordneten Merkmalen des Homographie-Bildes 18 als auch des Referenzbildes 17 berechnet, was auch als Vectoring bezeichnet wird.Based on the newly assigned features, displacement vectors are calculated between assigned features of the homography image 18 and the reference image 17, which is also referred to as vectoring.

Es kann sein, dass zusammenhängende Bereiche der Dokumente keine oder nur sehr wenige Merkmale aufweisen, sodass bezüglich dieser Bereiche keine Verschiebevektoren aufgrund der Merkmalszuordnung vorhanden sind. Hier macht es Sinn, Bildausschnitte des Homografie-Bildes 18 und des Referenzbildes 17 einander zuzuordnen und einen entsprechenden Verschiebevektor für die einander zugeordneten Bildausschnitte zu bestimmen. Das Zuordnen von Bildausschnitten wird auch als Template-Matching bezeichnet.It may be that contiguous areas of the documents have no or very few features, so that no displacement vectors exist for these areas due to the feature assignment. In this case, it makes sense to assign image sections of the homography image 18 and the reference image 17 to each other and determine a corresponding displacement vector for the assigned image sections. The assignment of image sections is also referred to as template matching.

Aus den Verschiebevektoren der Merkmalszuordnung und der Zuordnung der Bildausschnitte wird ein Verschiebevektorfeld erzeugt. Hierbei gehen die Verschiebevektoren entweder unmittelbar oder gemittelt bzw. interpoliert in das Verschiebevektorfeld ein. Für Bereiche, in welchen keine Verschiebevektoren vorhanden sind, werden entsprechende Verschiebevektoren interpoliert.A displacement vector field is generated from the displacement vectors of the feature assignment and the assignment of the image sections. The displacement vectors are included in the displacement vector field either directly or in an averaged or interpolated form. For regions where no displacement vectors are present, corresponding displacement vectors are interpolated.

Mithilfe dieses Verschiebevektorfeldes wird das Homografie-Bild 18 einer Freiform-Entzerrung unterzogen. Hierbei werden zunächst Kontrollpunkte mittels des Verschiebevektorfeldes gewählt, die möglichst gleichmäßig über dem Homographie-Bild 18 und dem Referenzbild 17 verteilt sind. Die Verbindung zwischen den Kontrollpunkten ist in diesen Bildern durch die Verschiebevektoren gegeben. Anhand der Kontrollpunkte wird ein System linearer Gleichungen aufgestellt. Dieses Gleichungssystem wird gelöst, um Gewichtungen für jeden Kontrollpunkt zu ermitteln. Diese Gewichtungen bestimmen die Stärke und Richtung der Transformation eines jeden Kontrollpunkts.Using this displacement vector field, the homography image 18 is subjected to freeform distortion correction. First, control points are selected using the displacement vector field, which are distributed as evenly as possible across the homography image 18 and the reference image 17. The connection between the control points in these images is given by the displacement vectors. A system of linear equations is established based on the control points. This system of equations is solved to determine weights for each control point. These weights determine the strength and direction of the transformation of each control point.

Für einen jeden Bildpunkt im Homographie-Bild 18 wird eine neue Position in einem Transformations-Bild 19 auf Grundlage der Gewichtungen und einer radialen Basisfunktion der Kontrollpunkte errechnet. Die Bildpunktwerte werden direkt übertragen, wenn sie perfekt mit dem Raster des Transformations-Bildes 19 übereinstimmen. Da dies in den meisten Fällen nicht der Fall ist, werden die Bildpunktwerte für das Transformations-Bild 19 interpoliert. Dies kann beispielsweise mit einer bilinearen oder bikubischen Interpolation ausgeführt werden.For each pixel in the homography image 18, a new position in a transformation image 19 is calculated based on the weights and a radial basis function of the control points. The pixel values are transferred directly if they perfectly match the grid of the transformation image 19. Since this is not the case in most cases, the pixel values for the transformation image 19 are interpolated. This can be done, for example, using bilinear or bicubic interpolation.

Um eine Überanpassung zu vermeiden, insbesondere wenn viele Kontrollpunkte verwendet werden, kann ein regulärer Realisierungsterm zur Transformation hinzugefügt werden. Dadurch werden die Transformation geglättet und hochfrequente Verformungen vermieden. Es können noch weitere Nachbearbeitungsschritte am Transformations-Bild 19 ausgeführt werden, beispielsweise indem es geschärft, beschnitten oder eine andere Bildanpassung zur Verbesserung der Qualität durchgeführt wird. Das Transformations-Bild 19 ist somit ein freiformentzerrtes Abtastbild. Mit den oben anhand der Schritte S3 bis S7 erläuterten Verfahren wird das Abtastbild 16 bezüglich des Referenzbildes 17 ausgerichtet und entzerrt, wobei eine bildpunktgenaue Übereinstimmung erzielt werden kann.To avoid overfitting, especially when many control points are used, a regular realization term can be added to the transformation. This smooths the transformation and avoids high-frequency deformations. Further post-processing steps can be performed on the transformation image 19, for example, by sharpening, cropping, or performing other image adjustments to improve quality. The transformation image 19 is thus a freeform-distorted scan image. Using the methods explained above with reference to steps S3 to S7, the scan image 16 is aligned and rectified with respect to the reference image 17, whereby a pixel-precise match can be achieved.

Im Schritt S8 wird geprüft, ob ein Abtastbild eines weiteren Dokuments vorhanden ist, das bezüglich des Referenzbildes 17 auszurichten und zu entzerren ist. Wenn dies der Fall ist, geht der Verfahrensablauf auf den Schritt S3 über und die Ausrichtung und Entzerrung des weiteren Abtastbildes 16 wird durchgeführt.In step S8, a check is made to determine whether a scanned image of another document is present that is to be aligned and rectified with respect to the reference image 17. If this is the case, the method flow proceeds to step S3, and the alignment and rectification of the additional scanned image 16 is performed.

Wird im Schritt S8 festgestellt, dass die Abtastbilder aller Dokumente ausgerichtet und entzerrt sind, dann geht der Verfahrensablauf auf den Schritt S9 über, in dem die derart ausgerichteten und entzerrten Abtastbilder einander überlagert werden. Hierbei werden die Bildpunkte an den gleichen Orten der ausgerichteten Abtastbilder 19 gemittelt und der so erzeugte Mittelungswert an dem entsprechenden Ort in einem Prototypbild als Punktwert eingetragen. Der Mittelungswert kann ein Mittelwert, ein Medianwert oder auch ein Mittelwert sein, bei dem die Bildpunktwerte der unterschiedlichen ausgerichteten Abtastbilder 19 unterschiedlich stark gewichtet sind. Das Prototypbild stellt somit eine Abbildung des Dokuments 12 dar, welche entzerrt und bezüglich des Referenzbildes 17 ausgerichtet ist und das Dokument 12 so zeigt, wie es von der Abtastvorrichtung 1 erfasst wird. Das Prototypbild enthält also anders als das Referenzbild die Auswirkungen der optischen Abtastung durch die Abtastvorrichtung 1. Im Prototypbild können im Vergleich zum Referenzbild gewisse Unschärfen enthalten sein und/oder Helligkeiten und/oder Farben können abweichen. Die Abweichungen können gering sein, jedoch können auch geringe Abweichungen bei der weiteren Bearbeitung Fehler verursachen.If it is determined in step S8 that the scanned images of all documents are aligned and rectified, the process flow proceeds to step S9, in which the scanned images aligned and rectified in this way are superimposed on one another. Here, the pixels at the same locations of the aligned scan images 19 are averaged, and the average value thus generated is entered as a point value at the corresponding location in a prototype image. The average value can be a mean value, a median value, or even a mean value in which the pixel values of the different aligned scan images 19 are weighted differently. The prototype image thus represents an image of the document 12 that is rectified and aligned with respect to the reference image 17 and shows the document 12 as it is captured by the scanning device 1. Unlike the reference image, the prototype image therefore contains the effects of the optical scanning by the scanning device 1. Compared to the reference image, the prototype image may contain certain blurring and/or brightness and/or colors may differ. The deviations may be small, but even small deviations can cause errors during further processing.

Aus dem so erzeugten Prototypbild werden Merkmale des Bildes extrahiert. Die Merkmale sind vorzugsweise SIFT-Merkmale. Es können jedoch auch andere Typen von Merkmalen, wie zum Beispiel ORB-Merkmale oder SURF-Merkmale verwendet werden. Vorzugsweise sind die Merkmale skalierungs- und/oder rotationsinvariante Merkmale.Features are extracted from the resulting prototype image. The features are preferably SIFT features. However, other types of features, such as ORB features or SURF features, can also be used. Preferably, the features are scale- and/or rotation-invariant features.

Die Merkmale umfassen zumindest einen Deskriptor bzw. Merkmalsvektor und Koordinaten, die den Ort der Merkmale im Bild definieren. Vorzugsweise wird der Deskriptor bzw. Merkmalsvektor anhand des Prototypbildes und die zugehörige Koordinate anhand des Referenzbildes bestimmt. Hierdurch erhält man eine Beschreibung des Merkmals, wie es von der Abtastvorrichtung 1 gesehen wird, wobei als Ort der exakte Ort des Referenzbildes verwendet wird. Die so erzeugten Merkmale werden in einer Datenbank gespeichert, wobei die Merkmale eines solchen Prototypbildes eine Dokumentenklasse repräsentieren. Die übereinstimmenden Darstellungen in den Dokumenten einer Dokumentenklasse sind in dem jeweiligen Prototypbild enthalten. Die hieraus abgeleiteten Merkmale gelten somit für alle Dokumente einer Dokumentenklasse. Sie sind somit repräsentativ für eine bestimmte Dokumentenklasse.The features comprise at least one descriptor or feature vector and coordinates that define the location of the features in the image. Preferably, the descriptor or feature vector is determined based on the prototype image and the associated coordinate is determined based on the reference image. This provides a description of the feature as seen by the scanning device 1, with the exact location of the reference image being used as the location. The features thus generated are stored in a database, with the features of such a prototype image representing a document class. The matching representations in the documents of a document class are contained in the respective prototype image. The features derived from this therefore apply to all documents of a document class. They are thus representative of a specific document class.

Das Verfahren wird mit dem Schritt S11 beendet.The method is terminated with step S11.

Dieses Verfahren kann für unterschiedliche Dokumentenklassen wiederholt werden, wobei jeweils mehrere Exemplare der Dokumente einer Dokumentenklasse abgetastet und die Abtastbilder entsprechend dem obigen Verfahren zur Extrahierung der Merkmale ausgewertet werden. Eine solche Datenbank kann zum Klassifizieren von Dokumenten verwendet werden, wie es anhand eines beispielhaften Verfahrens nachfolgend erläutert wird (4).This process can be repeated for different document classes, with multiple copies of the documents of a document class being scanned and the scanned images being evaluated according to the above feature extraction procedure. Such a database can be used to classify documents, as explained below using an exemplary method ( 4 ).

Bei diesem Verfahren kann ein Dokument einer beliebigen Dokumentenklasse mit der Abtastvorrichtung 1 abgetastet werden. (Schritt S13).In this method, a document of any document class can be scanned with the scanning device 1 (step S13).

Aus dem so erzeugten Abtastbild werden Merkmale extrahiert und diese Merkmale werden mit den Merkmalsgruppen der unterschiedlichen Dokumentenklassen verglichen. Der Vergleich kann beispielsweise durch Berechnung eines Abstandes, insbesondere des euklidischen Abstandes, erfolgen. Das Dokument wird dann der Dokumentenklasse zugeordnet, bei der die geringste Abweichung zu den entsprechenden Merkmalen vorliegt.Features are extracted from the resulting scanned image, and these features are compared with the feature groups of the different document classes. The comparison can be performed, for example, by calculating a distance, particularly the Euclidean distance. The document is then assigned to the document class with the smallest deviation from the corresponding features.

Im nächsten Schritt wird das Abtastbild mit dem digitalen Referenzbild der entsprechenden Dokumentenklasse registriert, d.h. in Übereinstimmung gebracht. Dies kann in ähnlicher Weise wie mit den Schritten S3 bis S7 des oben erläuterten Verfahrens ausgeführt werden. Vorzugsweise sind in der Datenbank Bearbeitungshinweise zu den einzelnen Dokumentenklassen enthalten, wie die einzelnen Dokumente zu bearbeiten sind. Diese Bearbeitungshinweise umfassen typischerweise Anweisungen, welche Bereiche des Dokuments auszulesen sind und welche Information darin enthalten ist. So können die Bereiche, in welchen ein Name, Vorname, eine Adresse, eine Telefonnummer, eine E-Mail-Adresse oder dergleichen enthalten sein sollen, definiert sein, wobei auch der Typ der jeweiligen Information entsprechend hinterlegt ist. Dies erlaubt das zuverlässige Auslesen vorbestimmter Informationsgehalte und deren Weiterverarbeitung. Aufgrund der Registrierung des Abtastbildes mit dem digitalen Referenzbild sind die entsprechenden Anweisungen, die anhand des Referenzbildes definiert sind, unmittelbar auf dem jeweiligen Bereich des Abtastbildes anwendbar.In the next step, the scanned image is registered with the digital reference image of the corresponding document class, i.e., aligned. This can be carried out in a similar manner to steps S3 to S7 of the method explained above. The database preferably contains processing instructions for the individual document classes, specifying how the individual documents are to be processed. These processing instructions typically include instructions as to which areas of the document are to be read out and what information they contain. For example, the areas in which a surname, first name, address, telephone number, email address, or the like are to be contained can be defined, with the type of respective information also being stored accordingly. This allows for the reliable reading of predetermined information content and its further processing. Due to the registration of the scanned image with the digital reference image, the corresponding instructions defined based on the reference image can be directly applied to the respective area of the scanned image.

Im Schritt S16 werden die entsprechenden Informationen aus dem Abtastbild anhand dieser Anweisungen ausgelesen und der Weiterverarbeitung zugeführt.In step S16, the corresponding information is read from the scanned image based on these instructions and passed on for further processing.

Das Verfahren wird mit dem Schritt S17 beendet.The method is terminated with step S17.

Die oben erläuterten Verfahren und die oben erläuterte Vorrichtung erlauben die Teil- oder vollautomatische Verarbeitung von Dokumenten, selbst wenn diese geknickt oder gefaltet waren und somit nicht mehr glatt sind. Die Dokumente können Dokumentenklassen zugeordnet werden und dementsprechend klassenspezifisch verarbeitet und insbesondere ausgelesen werden. Dies erlaubt insbesondere die Verarbeitung von maschinengedruckten Dokumenten, in welche Personen handschriftlichen Text eingefügt haben.The methods and device described above allow for the partially or fully automatic processing of documents, even if they were bent or folded and are therefore no longer smooth. The documents can be assigned to document classes and processed and, in particular, read out accordingly. This allows, in particular, the processing of machine-generated printed documents in which people have inserted handwritten text.

BezugszeichenlisteList of reference symbols

11: Abtastvorrichtungscanning device
22: Abtasteinheitscanning unit
33: AuswerteeinheitEvaluation unit
44: ArbeitsplatzrechnerWorkstation computer
55: EingabeeinheitInput unit
66: AnzeigeeinheitDisplay unit
77: GrundplatteBase plate
88: Seitenwandungside wall
99: Seitenwandungside wall
1010: DeckenwandungCeiling wall
1111: AbtastbereichScanning range
1212: Dokumentdocument
1313: Abtastelementscanning element
1414: BeleuchtungseinrichtungLighting equipment
1515: Datenleitungdata line
1616: AbtastbildScanning image
1717: ReferenzbildReference image
1818: HomographiebildHomography image
1919: TransformationsbildTransformation image

ZITATE ENTHALTEN IN DER BESCHREIBUNGQUOTES CONTAINED IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of documents submitted by the applicant was generated automatically and is included solely for the convenience of the reader. This list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

US 6,711,293 B1 [0005]
EP 1 594 078 B1 [0007]

Zitierte Nicht-PatentliteraturCited non-patent literature

X. Wangming et. al. (December 2008), “Application of Image SIFT Features to the Context of CBIR,” in “2008 International Conference on Computer Science and Software Engineering” (Issue 4, pages 552-555 [0002]
K.R. Reddy et. al. (2016), "A Comparative Study of SIFT and PCA for Content-Based Image Retrieval, Inter. Refereed J. Ing. Sci. (IRJES) 5 (11), 12-19 [0002]
Rublee Ethan et al. “ORB: An Efficient Alternative to SIFT or SURF” 2011 International Conference on Computer Vision, IEEE 2011 [0006]

Claims

A method for creating a database for document classes, each document class being defined by multiple features, wherein: multiple instances of a document type to be classified are scanned with a scanning device, each generating a scanned image, and each scanned image is aligned with a digital reference image of the document type to be classified. The scanned images thus aligned are superimposed on one another, with the individual pixels being averaged to generate a prototype image. whereby features are again determined from the prototype image, which are entered in the database as the database features defining the document class.

Procedure according to Claim 1 , characterized in that one type of scanning device or a single specific scanning device is used to create a specific database.

Procedure according to Claim 1 or 2 , characterized in that the alignment of each scanned image is carried out by means of a feature assignment or by means of an image section assignment.

Method according to one of the Claims 1 until 3 , characterized in that the database features each comprise a feature vector describing the respective feature and coordinates defining the location of the respective feature, the feature vector being obtained from the prototype image and the corresponding coordinates being obtained from the reference image.

Method according to one of the Claims 1 until 4 , characterized in that certain predefined areas of the aligned scan images are not taken into account.

Method according to one of the Claims 1 until 5 , characterized in that a meaning is assigned to certain predefined areas.

Method according to one of the Claims 1 until 6 , characterized in that the following steps are carried out in each case to align the scanned images to the digital reference image: - rectifying and aligning the respective scanned image by means of a feature assignment, wherein features of the scanned image, which each comprise a feature vector, are assigned to corresponding features of the reference image of the document and a homography matrix is determined in accordance with this feature assignment, with which matrix all pixels of the respective scanned image are mapped to form a homography image, - adding the coordinates of the features in the homography image to the respective feature vectors and adding the coordinates of the features in the reference image to the respective feature vectors and, - re-assigning the features of the homography image to corresponding features of the reference image of the document, wherein the coordinates assigned to the feature vectors are taken into account in the assignment, - rectifying the homography image in accordance with the assignment of the features of the homography image to the features of the reference image.

Procedure according to Claim 7 , characterized in that the rectification of the homography image is carried out using a free-form rectification method.

Method according to one of the Claims 1 until 8 , characterized in that the features are scaling and/or rotation invariant features.

Method according to one of the Claims 1 until 6 , characterized in that before the scanning images are superimposed, they are rectified, wherein the rectification is carried out by means of a feature assignment and/or by means of an assignment of image sections.

A method for scanning and processing a document, comprising the steps of: - scanning a document with an optical scanning device, generating a scanned image of the document, - classifying the scanned image into a document class, extracting features from the scanned image, and comparing these features with database features of a feature database that defines multiple document classes, and assigning the scanned image to the document class that best matches the database features of the respective document class.

Procedure according to Claim 11 , characterized in that the scanned image is registered to a reference image on the basis of a feature assignment to the corresponding database features.

Procedure according to Claim 11 or 12 , characterized in that the feature database contains a feature database created with the method according to one of the Claims 1 until 10 generated database.

Method according to one of the Claims 11 until 13 , characterized in that a type of scanning device or a single specific scanning device is used as the scanning device, in particular a type of scanning device or a single specific scanning device which has also been used to create the feature database.