HK1171850A - System for mobile image capture and processing of financial documents - Google Patents
System for mobile image capture and processing of financial documents Download PDFInfo
- Publication number
- HK1171850A HK1171850A HK12112184.7A HK12112184A HK1171850A HK 1171850 A HK1171850 A HK 1171850A HK 12112184 A HK12112184 A HK 12112184A HK 1171850 A HK1171850 A HK 1171850A
- Authority
- HK
- Hong Kong
- Prior art keywords
- image
- micr
- check
- document
- size
- Prior art date
Links
Abstract
The present disclosure relates generally to a automated document processing and more particularly, mobile image capture and processing of financial documents to enhance an image for data extractions from images captured on a mobile device with camera capabilities. The systems comprise a mobile device that includes a capture device configured to caputre color images of documents. The mobile device has a processor for performing certain operations, such as color reduction, and a transmitter for sending an image from the mobile device to a server. The server is configured to process the image to optimize and enhance the image for data extraction. The server is configure to apply an improved binarization algorithm using a window within a relevant document field and/or a threshold for the document field. Orientation correction may also be performed at the server by reading the MICR line on a check and comparing a MICR confidence to a threshold. A check image may also be size corrected using features of the MICR line and expected document dimensions.
Description
Technical Field
The present invention relates generally to automated ticket processing and more particularly to mobile image capture and processing of financial tickets to enhance images for data extraction from images captured on mobile devices with camera capabilities.
Background
Typically, financial institutions have the largest automated inspection processing system by printing financial information (such as account numbers and bank routing numbers) on checks. The total, account number and other important information must be extracted from the check before the total amount of the check is deducted from the payer account. This highly automated form of extraction is accomplished by a check processing control system that obtains information from the magnetic ink character recognition ("MICR") line. The MICR line consists of special design numbers printed on the bottom of the check using magnetic ink. The MICR data field includes a bank routing number, bank code number, account number, check serial number, check total, processing code, and extended processing code.
Checks and other instruments may be handled in large numbers by banks and other financial institutions. The notes that may be processed may include checks, deposit slips, cash payment flyers, and the like. In some cases, a bank or other financial institution may be required to use the actual physical instrument. For example, checks may need to be transferred between multiple banks or other financial institutions. This may reduce the processing of financial instruments. In addition, other types of instruments that are not financial in nature may be handled by businesses and other institutions in large quantities.
To facilitate processing of a ticket depicted in an image captured by a mobile device, image optimization and enhancement processing operations must be applied so that data can be extracted from the ticket. One method of processing images captured from a mobile device is described in U.S. patent No. 7,778,457 to neponiachthet, al, which is incorporated herein by reference in its entirety.
Neponitachchi et al discloses performing multiple image processing on a mobile device or transmitting large color images to a server. Mobile devices are typically limited in available processing power and transmission bandwidth. Performing multiple image processing operations on a mobile device may take a long time due to limited processing power and prevent a user from efficiently performing other tasks on the mobile device. Similarly, sending an image with a large ticket size also takes a long time, and the communication function of the mobile device is limited when the image is transmitted.
Nepomniachchi et al also discloses an algorithm for binarizing images that apply the same algorithm to the entire document. Unfortunately, many images have complex background or weak image foreground (some foreground pixels have gray values very close to those of some background pixels). In these cases, it is not possible to find a single threshold or window that completely separates the foreground image from the background image. This results in background noise in the bi-tonal image. In addition, specific document regions can be read by computer processing, such that these regions should have limited background noise.
Nepomniachchi et al also discloses a system and method for correcting the inverted positioning of checks in an image that relies on comparing the MICR confidence from the original image to the MICR confidence from the 180 degree rotated image. Relying on comparing MICR confidence reads limits the speed of the algorithm when the method is executed on a server with multiple threads/processors.
The method in nepomniachchi et al does not address the case where the MICR confidence of the two images is too low for the later processing to be acceptable.
Nepommiacechhi et al also discloses a system and method for correcting the size of an image by using the width of the MICR characters. Using the width of the MICR character may produce an inaccurate size transformation because the geometry correction can distort the shape of the MICR character, nepomniachhi et al also depends on the aspect ratio of the geometry-corrected image, which may also be slightly distorted. And it is difficult to distinguish a particular MICR character from other characters. Nepomniachchi et al also does not scale the ticket to correspond to a known or expected ticket or check size.
Disclosure of Invention
Accordingly, an improved system for processing images captured by a mobile device is provided.
According to a first aspect, a system for image capture and processing of financial instruments by a mobile device is provided. The mobile device includes an image capture device configured to capture a color image of the financial instrument. The mobile device also includes a processor configured to generate a discolored image (colorreduced image); and a transmitter that transmits the discolored image to the server. In some cases, the discolored image is a grayscale image. The server receives the faded image from the mobile device and detects the financial instrument in the faded image, geometrically corrects the faded image, binarizes the faded image to generate a two-tone image, and corrects the orientation and size of the two-tone image. In some cases, the server also corrects the orientation and size of the discolored image.
According to another aspect, the server is further configured to binarize the geometrically corrected image to generate a two-tone image. The server selects a pixel on the grayscale image, determines whether the selected pixel is located within the document region, and if the selected pixel is within the document region, selects a window within the document region for which to calculate a mean and a standard deviation for the selected pixel. If the standard deviation is too small, the pixel is converted to white, and if the standard deviation is not too small, the selected pixel is converted to black or white based on intensity. This process is repeated until no pixels are selectable. In another aspect, a threshold value is selected for the document region and is utilized by the determining operation to determine whether the standard deviation is too small. In the case where the document is a check, the document fields may be any of the following: MICR line, lower case amount, upper case amount, date, signature, and payee.
According to another aspect, correcting the orientation of the captured image includes correcting the orientation of the document within the image if the document is in an inverted orientation. In some cases, correcting the orientation of the captured image further includes determining the orientation of the document within the image using an associated object of known location on the document. In some cases, the server is further configured to determine that the two-tone image is forward-facing (right-side up) when the MICR confidence value exceeds a threshold value, and when the MICR confidence value does not exceed the threshold value by reading the MICR line on the bottom of the financial instrument, generating a MICR confidence value for the read MICR line, comparing the MICR execution value to the threshold value: determining that the two-tone image is not forward-facing, rotating the image 180 degrees, re-reading the MICR line, generating a new MICR confidence value, comparing the new MICR confidence value to a threshold, determining that the rotated two-tone image is forward-facing when the new MICR confidence value exceeds the threshold, thereby correcting the orientation of the two-tone image. On the other hand, if both MICR confidence values do not exceed the threshold, the server is further configured to indicate that the orientation of the image is unknown.
According to another aspect, where the financial instrument in the image is a check, the server is further configured to correct the size of the two-tone image using the MICR line. In one aspect, the average height of the MICR characters is used to determine a scaling factor for calculating the size of the image. In another aspect, the scaling factor is determined using a distance relative to the MICR symbol, such as the distance between the transition symbols (transition symbo1) on the one hand, or the distance between the transition symbols and the front edge of the check on the second hand. In another aspect, both the height and width of the MICR character are used to determine height and width scaling factors for calculating the size of the image. In another aspect, to adjust the scaling of the image, the calculated size is compared to a desired size. The desired size may use a known check size or a size based on a ticket size having a multiple of 1/8 inches.
Other features and advantages will be apparent from the following description taken in conjunction with the accompanying drawings which illustrate various embodiments.
Drawings
For a better understanding of the various embodiments described herein, and to show more clearly how they may be carried into effect, reference will now be made, by way of example, to the accompanying drawings which show at least one exemplary embodiment, and in which:
FIG. 1 is a diagram illustrating an example check that may be imaged by the systems and methods described herein.
FIG. 2 is a diagram illustrating an example payment coupon that may be imaged using the systems and methods described herein.
FIG. 3 is a diagram illustrating an example out-of-focus image of the check shown in FIG. 1.
FIG. 4 is a diagram illustrating an out-of-focus image of the payment coupon shown in FIG. 2.
Fig. 5 is a diagram showing an example of perspective distortion in an image of a rectangular shaped bill.
Fig. 6 is a diagram showing an exemplary original image, a focus rectangle, and a bill quadrilateral ABCD according to the example of fig. 5.
FIG. 7 is a flow chart illustrating an example method according to the systems and methods described herein.
FIG. 8 is a diagram illustrating an example bi-tonal image of the check of FIGS. 1 and 3 in accordance with the systems and methods described herein.
Fig. 9 is a diagram illustrating an example two-tone image of the payment coupons of fig. 2 and 4 in accordance with the systems and methods described herein.
FIG. 10 is a flow chart of an example method used at an image processing stage in accordance with the systems and methods described herein.
FIG. 11a is a flow chart illustrating a known method for automatic ticket detection in a color image from a mobile device according to the systems and methods described herein.
FIG. 11b is an example moving image depicting a check in which various corners have been detected according to the systems and methods described herein.
FIG. 11c is a flow chart illustrating an improved method for automatic ticket detection in grayscale images from a mobile device according to the systems and methods described herein.
FIG. 12a is a flow chart illustrating an example method for converting a color image to a smaller "icon" image in accordance with the systems and methods described herein.
Fig. 12b is a moving image depicting an example of the moving image of fig. 11b after the moving image of fig. 11b is converted to a color "icon" image according to the systems and methods described herein.
Fig. 13a is a flow chart illustrating an example method of color depth (color depth) reduction according to the systems and methods described herein.
Fig. 13b is a moving image depicting an example of the color "icon" image of fig. 12b after dividing the color density reduction operation into 3 x 3 gray levels according to the systems and methods described herein.
Fig. 13c is a moving image depicting an example of the color "icon" image of fig. 12b once it has been converted to a grayscale "icon" image by a color density reduction operation according to the systems and methods described herein.
FIG. 14 is a flow chart illustrating an example method for finding a note edge and corner from a grayscale "icon" image according to the systems and methods described herein.
FIG. 15a is a flow chart illustrating an example method for geometry correction according to the systems and methods described herein.
FIG. 15b is an example moving image depicting a check in screen orientation.
Fig. 15c is a moving image depicting an example of the moving image of fig. 11b after a geometry correction operation according to the systems and methods described herein.
FIG. 16a is a flow chart illustrating an example method for binarization according to the systems and methods described herein.
Fig. 16b is a moving image depicting an example of the moving image of fig. 15c after the moving image of fig. 15c has been converted to a bi-tonal image by a binarization operation according to the systems and methods described herein.
FIG. 16c is a flow chart illustrating additional operations of the binary method of FIG. 16 a.
FIG. 17a is a flow chart illustrating a known method for correcting the inverted orientation of a ticket in a moving image according to the systems and methods described herein.
FIG. 17b is an example two-tone image depicting a check in inverted orientation.
FIG. 17c is a flow chart showing an improved method for correcting the inverted orientation of a document in a moving image or indicating if the orientation is unknown.
FIG. 18a is a flow chart illustrating an example method for performing size correction of an image using the height of a MICR character according to the systems and methods described herein.
FIG. 18b is a flow chart illustrating an example method for performing size correction of an image using the height and width of MICR characters according to the systems and methods described herein.
FIG. 19 is a simplified block diagram illustrating an example computing module.
Detailed Description
FIG. 1 is a diagram illustrating an exemplary check 100 that may be imaged by the systems and methods described herein. The mobile image capture and processing system and method may be used with a variety of documents, including financial documents such as personal checks, business checks, cashier checks, registered checks, and vouchers. By using the image of the check 100, the check clearing process is performed more efficiently. Those skilled in the art will appreciate that checks are not the only type of ticket in which these systems may be used. For example, other instruments such as deposit vouchers may also be processed using the systems and methods described herein. FIG. 2 is a diagram illustrating an exemplary payment coupon 200 that may be imaged using the systems and methods described herein.
In some implementations, the check 100, payment coupon 200, or other instrument may be imaged using a mobile device. The mobile device may be a mobile telephone handset, personal digital assistant or other mobile communication device. The mobile device may include a camera or may include features that allow it to connect to a camera. The connection may be wired or wireless. In this way, the mobile device may connect to an external camera and receive images from the camera.
Images of tickets captured using the mobile device or downloaded to the mobile device may be transmitted to the server. For example, in some cases, images may be transmitted over a mobile communication device network, such as a code division multiple access ("CDMA") telephone network or other mobile telephone network. An image taken using, for example, a camera of a mobile device may be initially formatted as a 24-bit per pixel (24 bits/pixel) JPEG image. However, it should be understood that many other types of images may be taken using different cameras, mobile devices, etc.
The various tickets may include various fields. Some areas in the document may be considered "primary" areas. For example, the primary areas of interest for check 100 may include a lower case amount 102, a legal amount 104, and a MICR line 106. The MICR line 106 may include symbols within the MICR line that delimit the region, such as a transition flag 113 that represents, for example, a transmission and an on-us flag 115 that represents an account number. Other areas of interest include payee 108, date 110, and signature 112. The primary area of interest of the payment coupon 200 may include payment amounts 202 such as balance, minimum payment, and interest. Billing company name and address 204, account number 206, and code line 208 may also be areas of interest. In some embodiments, it is desirable to electronically read various information in these areas in the ticket. For example, to process a check for deposit, it is necessary to electronically read the legal amount 104 and the lower case amount 102 on the check, the MICR line 106, the payee 108, the date 110, and the signature 112. In some cases, this information is difficult to read because of, for example, a poor image of a check or other document that is out of focus or otherwise.
FIG. 3 is a diagram illustrating an exemplary out-of-focus image of the check shown in FIG. 1. In some cases, the document image may be out of focus. Out-of-focus ticket images may be difficult or impossible to read, electronically process, etc. For example, it may be difficult to read the amounts 302 and 304 or the payee 306 on the image 300 of the check 100. Fig. 4 is a diagram illustrating an exemplary out-of-focus image of the payment coupon (coupon) shown in fig. 2. Because the image 400 of the payment coupon 200 is out of focus, it is difficult to properly deposit the payment. For example, the payment may be credited to the wrong account, or an incorrect amount may be credited. This is especially true if both the check and the payment coupon are difficult to read or the scanning quality is poor.
Many different factors may affect the capabilities and image quality of mobile devices based on image capture and processing systems. Optical defects such as out-of-focus images (as described above), unequal contrast or brightness, or other optical defects can make it difficult to process images of documents (e.g., checks, payment coupons, deposit documents, etc.). The quality of the image can also be affected by the surface on which the document is positioned when photographed or the angle at which the document is photographed. This affects image quality by presenting the document, for example, to the correct orientation, inversion, skew, etc. Furthermore, if the ticket is imaged while inverted, it is impossible or nearly impossible for the system to determine the information contained in the ticket.
In some cases, the type of surface may affect the final image. For example, if the document is on a rough surface when the image is taken, it may be shown completely on the rough surface. In some cases, the surface of the document may be rough because the surface is underneath the document. In addition, rough surfaces can cause shadows or other problems for the camera. These problems may make it difficult or impossible to read the information contained in the ticket.
Illumination also affects the quality of the image, for example, the position of the light source and the light source distortion. The use of a light source on the bill can illuminate the bill in a manner that improves image quality, while a light source on the side of the bill can produce images that are difficult to process. Illumination from the sides may, for example, cause shadows or other illumination distortions. For example, the type of light of the sun, light bulbs, fluorescent lighting, etc. may also be a factor. If the illumination is too bright, the document may exhibit image fading. On the other hand, if the illumination is too dark, it is difficult to read the image.
The quality of the image may also be affected by the characteristics of the document, such as the type of document, the font used, the color selected, etc. For example, images of white documents with black words are easier to process than dark documents with black words. Image quality can also be affected by the mobile device used. Some mobile camera phones, for example, may have cameras that use a large number of megapixels to hold images. Other mobile camera phones may have auto-focus features, auto-flash, etc. In general, these characteristics may improve the image when compared to a mobile device that does not include these characteristics.
Ticket images taken using a mobile device may have one or more of the drawbacks discussed above. These defects or other defects may result in low precision in processing the image (e.g., processing one or more regions of the document). Thus, in some implementations, systems and methods for creating an image of a ticket using a mobile device can include the ability to identify low quality images. If the quality of the image is determined to be low, the user may be prompted to take another image.
Various metrics may be used to detect out-of-focus images. For example, focus measurements may be utilized. The focus measurement may be the ratio of the maximum video gradient and the "pixel pitch" between adjacent pixels measured over the entire image and normalized with respect to the gray level dynamic range of the image. The pixel pitch may be the distance between points on the image. In some implementations, the focus score can be used to determine whether the image is sufficiently focused. If the image is not sufficiently focused, the user may be prompted to take another image.
The image focus score may be calculated as a function of maximum video gradient, gray level dynamic range, and pixel pitch. For example, in one embodiment: the image focus score (score) is (maximum video gradient) gray level dynamic range (pixel pitch) (equation 1).
The video gradient may be the gray level of the first pixel "i" minus the absolute value of the gray level of the second pixel "i + 1". For example: the video gradient is ABS [ (gray level of pixel "i") - (gray level of pixel "i + 1") (equation 2).
The gray level dynamic range may be the average of the "n" brightest pixels minus the average of the "n" darkest pixels. For example: gray level dynamic range ═ AVE ("N" brightest pixels) -AVE ("N" darkest pixels) ] (equation 3).
In equation 3 above, N may be defined as the number of pixels used to determine the average darkest and brightest pixel gray levels of the image. In some embodiments, N may be selected to be 64. Thus, in some embodiments, the 64 darkest pixels are averaged together and the 64 darkest pixels are averaged together to calculate the gray level dynamic range value.
The pixel pitch may be the inverse of the image resolution, e.g., dots per inch. Pixel pitch ═ 1/image resolution (equation 4).
In other words, as defined above, the pixel pitch is the distance between the dots on the image, since the image resolution is the inverse of the distance between the dots on the image.
Fig. 5 is a diagram showing an example of perspective distortion in an image of a rectangular-shaped bill. The image may include perspective transformation distortion 500 such that the rectangle may become a quadrilateral ABCD 502, as shown in the figure. Perspective distortion can occur because the image is taken using a camera at an angle to the document rather than directly above the document. When directly above a rectangular note, it will typically appear rectangular. As the imaging device moves from directly above the surface, the document distorts until it is no longer visible but only the edges of the page can be seen.
The dashed box 504 includes image frames (frames) obtained by the camera. The image frame is hXw in size, as shown in the figure. Generally, it is preferred that it include the entire ticket within the hXw box of a single image. However, it should be understood that it may be preferable or even feasible that some tickets may be too large or include too many pages therefor.
In some implementations, the image can be processed or pre-processed to automatically find or "raise" the quadrilateral 502. In other words, the document forming the quadrilateral 502 can be separated from the rest of the image, so that only the document can be processed. By separating the quadrilateral 502 from any background in the image, it can then be further processed.
The quadrilateral 502 may be mapped onto a rectangular bitmap to remove or reduce perspective distortion. Additionally, image sharpening may be used to improve the out-of-focus score of an image. Subsequently, the resolution of the image may be increased and the image converted to a black and white image. In some cases, black and white images may have a higher recognition rate when processed using an automated ticket processing system according to the systems and methods described herein.
A two-tone (e.g., black and white) image may be used in some systems. Such a system would require an image with a resolution of at least 200 points per inch. Therefore, color images acquired using mobile devices need to be of sufficiently high quality so that the images can be successfully converted from, for example, 24-bit per pixel (24 bits/pixel) RGB images to bi-tonal images. The image is resized as if the ticket, such as a check, payment ticket, etc., were scaled at 200 points per inch.
Fig. 6 is a diagram showing an exemplary original image, a focus rectangle, and an exemplary ticket quadrilateral ABCD according to fig. 5. In some embodiments, it may be necessary to place the input image for the document in process at or near the camera. All points A, B, C and D are located in the image, and the focus rectangle 602 is located inside the quadrilateral ABCD 502. The document may also have a low defocus score and the background around the document may be selected to be darker than the document. In this way, a lighter note will stand out from a darker background.
Fig. 7 is a flow diagram illustrating an exemplary method 700 in accordance with the systems and methods described herein. Referring now to FIG. 7, in operation 701, a user logs into a ticket capture system on a mobile communication device. According to various embodiments, the method and system for ticket capture on a mobile communication device may further include an application that requires a user to log in. In this manner, access to the ticket capture system using the mobile communication device may be defined as an authorized user.
In operation 702, in the illustrated embodiment, a type of ticket is selected. For example, the user may select the type of ticket to be a check, a payment coupon, or a deposit document. By selecting the type of ticket, the mobile device can scan a particular portion of the image to determine, for example, the payee, the amount of the check, the signature, and the like. However, in some embodiments, the device may determine which type of ticket image to capture by processing the image.
In operation 704, an image is captured using, for example, a mobile communication device. In the illustrated embodiment, an application running on the mobile communication device may prompt the user of the device to obtain a frontal image of the ticket. The back image of the ticket can also be acquired. For example, if the ticket is a check, the back image of the ticket is necessary because the back of the check needs to be signed. If an image of the back of the ticket needs to be captured, the application may prompt the user to capture the image. The application may also perform some image processing to determine whether the quality of the image or images is sufficient for further processing according to the systems and methods described herein. The quality of the further processing required may vary from one implementation to another. For example, some systems may have a better ability to determine information contained on poor quality images than others.
In the illustrated embodiment, at operation 706, an amount is entered. When the processed ticket is a check, the amount entered may be the amount of the check. Alternatively, the amount may be a payment amount or a deposit amount, depending on the type of ticket being processed.
In some embodiments, the system may determine the amount of money by processing the image. For example, in some cases, optical character recognition ("OCR") can be used to determine what characters and numbers are on a ticket. For example, the numbers located in the money box of the check or payment coupon may then be determined using OCR or other computer based character determination. This would require manually entering the amount to replace. In other implementations, manual input may be used to verify computer-generated values determined using, for example, OCR or other computer based character determination.
In operation 708, the image is transmitted to a server. The image may be transmitted from a mobile communication device (e.g., a camera phone) that captured the ticket image using, for example, hypertext transfer protocol ("HTTP") or mobile messaging service ("MMS"). The server then confirms that the message was received, for example by transmitting a message back to the mobile device.
In operation 710, image processing is performed. In an exemplary embodiment, the server may process the image by performing automatic rotation, tilting, perspective distortion correction, cropping, and the like. The server may also process the image to produce a bi-tonal image for use in extracting data.
In other embodiments, some or all of the data processing may be performed in the mobile communication device. For example, the mobile communication device may perform automatic rotation, tilting, perspective distortion correction, cropping, and the like. Additionally, the mobile device may also process the image to produce a bi-tonal image for extracting the data. In some cases, processing may be distributed between the mobile device and the server.
In operation 712, ticket processing using the mobile device is complete. For example, when the server has confirmed that all necessary data has been extracted from the received image, it may transmit a status message to the mobile device that transmitted the image. Alternatively, the server may transmit a request for additional data if some necessary data cannot be extracted. The request may include a request for additional images. In some cases, the request may be for data entered by the user, for example, using a keyboard on the mobile communication device to enter an amount such as a check.
In some implementations, the quality of the image is determined in the mobile device. In this way, the number of requests for additional images from the server may be reduced. The request may originate directly from the mobile device. This may allow for a more rapid determination of the request and may allow for additional images to be taken from earlier images in a shorter time. This may mean, for example, that the user is also physically close to the ticket and also holds the mobile device. This may make it easier to reacquire the image. If the image quality processing is done at the server, it takes a longer time to determine that the image quality is acceptable and communicate the message back to the user. This may mean that the user is no longer near the ticket or has recently begun to perform another task. However, it should be understood that in some embodiments, an implementation-based server may be used to offload processing requirements from a mobile device. Additionally, in some cases, it may be faster than or as fast as processing the image using the mobile communication device to determine image quality.
FIG. 8 is a diagram illustrating an exemplary bi-tonal image 800 of the check of FIGS. 1 and 3 in accordance with the systems and methods described herein. Fig. 9 is a diagram illustrating an exemplary bi-tonal image 900 of the payment coupon of fig. 2 and 4 in accordance with the systems and methods described herein. As shown, in the two-tone images of fig. 8 and 9, necessary information (such as a payee, an amount, an account number, etc.) is retained, and additional information is removed. For example, a background pattern that may be placed on his check by some person is not present in the bi-tonal image 800 of the check.
Fig. 10 is a flow chart of an exemplary method 1000 used at an image processing stage. In particular, some or all of the operations illustrated in FIG. 10 may be performed during various operations illustrated in FIG. 7. Referring now to fig. 10, at operation 1001, method 700 receives a color image (also referred to as a "moving image") originally acquired by a mobile device. For example, the image may originate from a camera phone, which now transmits the image to a server for post-capture processing. The moving image has a ticket located somewhere in the image. To detect the ticket, an automatic ticket detection module is provided at operation 1002. Depending on the embodiment, the automatic bill detection module may be dedicated to detecting only certain types of bills, such as financial bills (e.g., checks or deposit coupons), or may detect multiple types of transaction bills universally. At the end of the automatic bill detection operation, the positions of the bill corners (e.g., cheque corners) are output as corners A, B, C and D of the quadrilateral ABCD (e.g., quadrilateral ABCD 502). Further details regarding the automatic bill detection operation will be provided with reference to fig. 11A.
After automatic bill detection, the method 1000 performs geometric correction on the moving image at operation 1004. As previously described, the correction may include sorting the images by performing: automatic rotation operation, tilt operation, perspective distortion correction operation, and clipping operation. Typically, this is due to perspective distortion present in the original moving image, and the possibility of incorrect orientation of the document within the moving image. The discussion of fig. 15a will give further details regarding the geometry correction operation.
Next is image binarization at operation 1006. Binarization of an image is also known as generating a bi-tonal image of the document at 1 bit per pixel. Remote deposit systems typically require binarization of the image for processing. The binarization operation will be discussed in more detail with reference to fig. 16a and 16 c.
Since many processing engines are sensitive to image size, size correction operation 1010 may be utilized. For example, in the case of checks, the processing engine for amount recognition may rely on check size to distinguish personal checks from commercial checks, while the processing engine for form recognition may rely on ticket size as an important feature in determining form type. The size correction operation 1010 will be discussed in more detail with reference to fig. 18.
The method 1000 ends at operation 1012 with outputting the ticket as a two-tone image and a grayscale image. These images are then available for processing (e.g., financial processing) depending on the type of instrument present in the image. The financial process is typically performed during completion of the process described with respect to operation 712 of FIG. 7. The bi-tonal image is an image that is friendly to recognize by a financial processing system.
With continued reference to the automatic ticket detection operation previously described with respect to operation 1002 of fig. 10, fig. 11-14 illustrate the automatic ticket detection operation in greater detail.
Referring now to fig. 11a, a flow chart is provided illustrating a known method 1100 for automatic ticket detection within a color image from a mobile device. Typically, the operations described in method 1100 are performed within an automated ticket detection module, however, there are embodiments where the operations reside in multiple modules. In addition, automatic bill inspection modules typically take into account a number of factors when inspecting bills in moving images. The automatic document detection module may take into account any positioning of documents in the moving image, 3D distortion in the moving image, unknown size of documents, unknown color of background, and various other characteristics of the moving engine (e.g., resolution, size, etc.).
Method 1100 begins at operation 1102 by receiving an original color image from a mobile device. Once received, the original color image is converted to a smaller color image, also referred to as a color "icon" image, at operation 1104. The color "icon" image preserves the color contrast between the document and the background, while reducing the contrast within the document. A detailed description of the conversion process is given with reference to fig. 12 a.
Next, at operation 1106, a fade operation is applied to the color "icon" image. During this operation, the overall color of the image fades, while the contrast between the document and its background is preserved in the image. Specifically, the color "icon" image of operation 1104 is converted to a gray "icon" image (also referred to as a gray-scale "icon" image) having the same size. The color depth reduction process is described in further detail with reference to fig. 13 a.
The method 1100 then locates the corners of the ticket in the gray "icon" image at operation 1108. As previously mentioned in fig. 6, these corners A, B, C and D form a quadrilateral ABCD (e.g., quadrilateral ABCD 502). The quadrilateral ABCD, in turn, constitutes the perimeter of the note. For example, FIG. 11b depicts a check 1112, with the corner 1114 detected by operation 1108. Once the corner is detected, the position of the corner is output at operation 1110.
Referring now to FIG. 11c, a flow chart illustrating an improved method 1101 for automatic ticket detection in images from a mobile device is provided. Method 1101 provides for faster automatic ticket detection, including: the image is converted to a faded image on the mobile device and then transmitted to the server to perform the remaining steps of the automatic ticket detection method 1101. Mobile devices are typically limited in available processing power and transmission bandwidth. If the method 1100 of FIG. 11a were performed entirely on the mobile device, automatic ticket detection would take a significant amount of time on the device and take the processor of the mobile device to run, possibly for other tasks. Sending color images from the mobile device to the server also takes a significant amount of transmission time and uses valuable bandwidth. Converting to a faded image and transmitting the faded image is faster than transmitting a full-color image or performing image processing on a mobile device. The improved method 1101 relies on sending faded images and using a server to perform the processor intensive steps of the method 1101, thereby providing fast image processing and fast image quality feedback from the server. The method 1101 also provides the user with a perception of fast processing because the mobile device is not occupied with image processing operations or sending enormous amounts of data about color images.
The method 1101 begins at operation 1122 with receiving a color image from a mobile device. Upon receipt, a fade operation is applied to the color image at operation 1124. During this operation, the overall color of the image becomes lighter, resulting in a faded image requiring less storage size. The fading may be performed on a pixel-by-pixel basis using the RGB values and preferred weights for each pixel. Other methods of fading may also be used, including the method described in fig. 13 a. In some embodiments, the faded image may be a grayscale image.
Next, at operation 1126, the faded image is transmitted from the mobile device to a server. Since the color depth reduction operation produces an image having a smaller size, the transmission time is reduced compared to transmitting a color image. Once received at the server, the faded image is converted to a smaller faded image at operation 1128, which may be referred to as an "icon" image for convenience. The conversion process involves downscaling the faded image, similar to the process described with reference to fig. 12 a.
The method 1101 then determines the location of the corner of the ticket in the "icon" image at operation 1130. As described above, the corners constitute a quadrilateral (e.g., quadrilateral ABCD 502) defining the note perimeter. Once these corners are detected, the locations of the corners are output at step 1132. The server may use the location of the corners to geometrically correct the ticket in the faded image received by the server.
Referring now to fig. 12a, a flow chart describing an exemplary method 1200 for converting a color image to a smaller "icon" image is provided. This smaller "icon" image preserves the color contrast between the document depicted therein and its background, while reducing the contrast inside the document. Once a color image is received from a mobile device at operation 1201, the method 1200 eliminates over-sharpening in the image at operation 1202. Thus, assuming that the color input image I has a size of WXH pixels, operation 1202 averages the intensity of the image I and reduces the image I to an image I 'such that the image I' has a size of half of the image I (i.e., W '═ W/2 and H' ═ H/2). In a particular embodiment, the color transformation formula can be described as: c (p') ═ ave { C (q): p, q in an SXS window, (equation 5), where C is any of the red, green, or blue components of color intensity; p 'is any arbitrary pixel on the image I' at coordinates (x ', y'); p is the corresponding pixel on image I: p ═ p (x, y), where x ═ 2 × x 'and y ═ 2 × y'; q is any pixel contained in an SXS window centered on p; s is established by an experimental method; and ave is the average of all q's in the SXS window.
At a next operation 1204, smaller "dark" objects in the image are eliminated. Examples of such smaller "dark" objects include, but are not limited to, machine-printed characters and handwritten characters within a ticket. Thus, assuming operation 1204 receives image I' from operation 1202, operation 1204 creates a new color image I ", referred to as an" icon "having a width W" set to a fixed small value and a height H "set to W" × (H/W), thereby preserving the original aspect ratio of image I. In some embodiments, the variation may be described as: c (p ″) max { C (q'): q 'in the S' XS 'window of p', (equation 6), where C is any of the red, green or blue components of color intensity; p "is an arbitrary pixel on image I"; p ' is a pixel on the image I ', p ' corresponding to p "in a similar transformation as defined above; q ' is any pixel contained on image I ' of the S ' XS ' window centered on p '; max is the maximum of all q ' S in the S ' XS ' window; w' is established experimentally; s is established by an experimental method for calculating the intensity I'; and I "(p") is the intensity value defined for each color plane individually by taking the maximum value of the intensity function I '(p') in the window of the corresponding pixel p 'on the image I'. The reason why "maximum value" is used instead of "average value" is to make "icon" whiter (white pixel with RGB values of (255, 255, 255)).
In a next operation 1206, high local contrast of "smaller" objects, such as lines, text, and handwriting on the ticket, is suppressed while other object edges in the "icon" are preserved. Typically, these other object edges are thick. Multiple dilation and erosion operations (also called morphological image transformations) are utilized in suppressing high local contrast of "small" objects. Such morphological image variations are well known to those skilled in the art and are used. The order and amount of dilation and erosion operations are determined experimentally. After the suppression operation 1206, a color "icon" image is output at operation 1208. Fig. 12b depicts an example of the moving image of fig. 11b after conversion to a color "icon" image.
Referring now to fig. 13a, a flow diagram is provided illustrating an exemplary method 1300 that provides further details regarding the color depth reduction operation 1106 shown in fig. 11 a. At operation 1301, the method 1300 receives a color "icon" image for fading. The method divides the color "icon" image into fixed-length and width grids (or matrices) of equal-size grid elements at operation 1302. In some embodiments, a preferred grid size is the presence of a central grid element. For example, a grid size of 3 × 3 may be employed. Fig. 13b depicts an example of the color "icon" image of fig. 12b after being divided into 3 x 3 grids operation 1302.
Next, at operation 1304, the colors of the "center portion" of the icon, which is typically the center-most grid element, are averaged. The method 1300 then calculates the average color of the remaining portion of the icon at operation 1306. More specifically, the colors of the "outer" grid elements of the "center portion" of the "icon" are averaged. In general, in the case where there is a center grid element (e.g., a 3 × 3 grid), the "outer" of the "center portion" includes all grid elements except the center grid element.
Next, the method 1300 determines a linear transformation for the RGB space at operation 1308. The linear transformation is defined as it maps the average color of the "center portion" calculated during operation 1304 to white (i.e., 255), and the average color of the "outer portion" calculated during operation 1306 to black (i.e., 0). All remaining colors are linearly mapped as shades of gray. Once determined, all RGB values from the color "icons" are transformed into a grayscale "icon" image using the linear transformation at operation 1310, which is then output at operation 1312. In certain embodiments, the gray "icon" image (also referred to as a gray scale "icon" image) produced is the contrast between the bill background (assuming the bill is positioned near the center of the image) and the background. FIG. 13c depicts an example of the colored "icon" image of FIG. 12b once converted to a gray "icon" image.
Referring now to fig. 14, a flow chart is provided to illustrate an exemplary method 1400 for deriving ticket corners from gray "icon" images containing tickets. Once the gray "icon" image is received at operation 1401, the method proceeds to operation 1402 by finding "voting" points on the gray "icon" image with respect to the sides of the ticket depicted in the image. Thus, operation 1402 results in all positions on the gray "icon" image that can be approximated by straight line segments to represent the left, top, right, and bottom edges of the document.
According to one embodiment, operation 1402 accomplishes its goal by first finding a "voting" point in one half of the "icon" that corresponds to the edge currently of interest. For example, if the edge currently of interest is the top edge of the document, the upper part of the "icon" is checked (Y < H/2) and the bottom part of the "icon" is ignored (Y.gtoreq.H/2).
Within half of the selected "icon", the intensity gradient (contrast) in the correct direction for each pixel will be calculated after operation 1402. In some embodiments, the pixel is divided by considering a small window located in the center of the pixel and then dividing the window into the desired "background" half (lower gray, i.e., assuming it is darker) and the desired "bill" half (higher gray, i.e., assuming it is whiter). Between these two halves there is a line break, either horizontal or vertical, based on the edge of the note to be obtained. Next, the average gray level in each half window is calculated, thereby obtaining the average image brightness of "background" and the average image brightness of "bill". The intensity gradient of the pixel is calculated by subtracting the average image intensity of the "background" from the average image intensity of the "bill".
Eventually, those pixels with sufficient gray scale gradient in the correct direction are marked as "voting" points for the selected edge. The sufficiency of the actual gray scale gradient threshold for making the determination is experimentally established.
Continuing with the method 1400, operation 1404 obtains candidate edges (i.e., line segments) that potentially represent edges (i.e., left, top, right, and bottom edges) of the document. To do this, in some embodiments, a subset within the "voting" point determined in operation 1402 is obtained that can be approximated (linearly approximated) by straight line segments. In many embodiments, the threshold value for the linear approximation is established experimentally. This subset of lines is defined as "candidate" edges. As a guarantee that the group of candidate edges is never empty, the corresponding left, top, right, and bottom edges of the gray "icon" image are also added to the group.
Next, operation 1406 selects the best candidate for each side of the document from the set of candidates selected in operation 1404, thereby defining the position of the document within the gray "icon" image. In some embodiments, the best candidate for each side of the document is selected using the following process.
The process begins by selecting a quadruple of line segments { L, T, R, B }, where L is one of the best candidates to the left of the document, T is one of the best candidates to the top edge of the document, R is one of the best candidates to the right of the document, and B is one of the best candidates to the bottom edge of the document. The process then measures the following characteristics of the currently selected quadruple.
The number of "voting" points is approximated and measured for all line segments of all four edges. This numerical value is based on the following assumptions: the edges of the document are linear and there is significant color contrast along these edges. The larger value of this feature increases the overall quadruple rank.
The sum of all intensity gradients over all voting points for all line segments is measured. This sum is also based on the following assumptions: the edges of the document are linear and there is significant color contrast along these edges. Again, a larger value of this feature increases the total quadruple level.
The total length of the segment is measured. This length value is based on the following assumptions: the document occupies a large portion of the image. Again, a larger value of this feature increases the total quadruple level.
The maximum value of the gap in each corner is measured. For example, the gap in the left/top corner is defined by the distance between the uppermost point in the L segment and the leftmost point in the T segment. This maximum value is based on the extent to which the candidate edge fits into the assumption that the shape of the document is quadrilateral. Smaller values of this feature increase the overall quadruple rank.
The maximum of the two angles between opposite segments (i.e., between the L and R segments, and between the T and B segments) is measured. This maximum is based on the extent to which the candidate edge fits into the assumption that the shape of the document is close to a parallelogram. Smaller values of this feature increase the overall quadruple rank.
The deviation of the aspect ratio of the quadruple from the aspect ratio of the "ideal" note is measured. This feature can be used for documents with a known aspect ratio, such as checks. If the aspect ratio is known, this characteristic should be excluded from use in calculating the quadruple rank. The quadruple rank is calculated in the following way: a) obtaining a quadrilateral by crossing elements of the quadruple; b) obtaining the midpoint of each of the four sides of the quadruple; c) calculating the distance between the midpoints of the opposite edges, referred to as D1 and D2; d) the larger of the two ratios is obtained: r ═ max (D1/D2, D2/D1); e) assuming that the aspect ratio of the "ideal" document is known and that MinAspectRatio and MaxAspectRatio represent the minimum and maximum aspect ratio, respectively, the deviation in question is defined as: 0 (if MinAspectrRate ≦ R ≦ MaxAspectrRate), MinAspectrRate-R (if R < MinAspectrRate), and R-MaxAspectrRate (if R > MaxAspectrRate).
For checks, MinAspectRatio may be set to 2.0 and MaxAspectRatio may be set to 3.0.
The value of this aspect ratio is based on the following assumptions: the shape of the document is maintained to some extent during the perspective transformation. Smaller values of this feature increase the overall quadruple rank.
After the measurement of the features of the quadruples described above, the quadruple features are combined into a single value (referred to as a quadruple level) using a weighted linear combination approach. A positive weight is assigned to the number of "voting" points, the sum of all intensity gradients, and the total length of the segment. Negative weights are assigned to the maximum gap in each corner, the maximum in the two corners between opposing segments, and the deviation in the aspect ratio of the quadruple. The exact values of the weights are established experimentally.
The above operation is repeated for all possible combinations of candidate edges, eventually resulting in the "best" quadruple, which is the quadruple with the highest rank. The corners of the document are defined as the intersections of the "best" quadruple's edges (i.e., the best candidate edges).
Operation 1408 then defines corners of the document using the intersection of the best candidate edges. Those of ordinary skill in the art will appreciate that by using the aforementioned similarity transformation to transform the resulting corner locations on the "icons," the corners may be located on the original moving image. The method 1400 ends at operation 1410 by outputting the locations of the corners defined in operation 1408.
For the geometry correction operation described in operation 1004 of fig. 10, fig. 15 provides a flow chart illustrating an example method 1500 for geometry correction. As previously mentioned, geometric correction is required to correct any possible perspective distortion present in the original moving image. In addition, the geometric correction may correct the orientation of the document within the original moving image (e.g., the document is oriented 90 degrees, 180 degrees, or 270 degrees, with the right-edge-up orientation being 0 degrees). It should be noted that in some embodiments, the orientation of the document is based on the type of document depicted in the moving image, as well as the area associated with the document.
In examples where the document is in a direction oriented in the transverse direction (90 degrees or 270 degrees), as shown by the check in FIG. 15b, geometric correction is appropriate to correct the orientation of the document. Detection and subsequent correction of the 180 degree orientation is applicable when attempting to locate an object associated with a ticket (which is known to be in a particular position) when the ticket is in the 180 degree orientation. For example, the MICR line on a financial instrument may be a related object, as the MICR line is typically located at a particular location on such instruments. Thus, when the financial instrument is a check, the MICR line may be used as the relevant object (since it is located at the bottom of the check) to determine the current orientation of the check within the moving image. In some implementations, the object associated with the ticket is based on the type of ticket. For example, when the instrument is a contract, the associated object may be a notary seal, signature, or watermark located at a known location on the contract. More details regarding bill (specifically, check) correction with up and down orientation (180 degree orientation) are provided in fig. 17a and 17 c.
A mathematical model of the projective transformation is formulated that transforms the distorted image into a rectangular image of a predetermined size. For example, when the ticket depicted in the moving image is a check, the predetermined size is specified as 1200 × 560 pixels, which is roughly equivalent to the size of a personal check scanned at 200 DPI.
With continued reference to method 1500, there are two independent paths of operations, which may be performed sequentially or in parallel, whose outputs are ultimately used as the final output of method 1500. One operational path begins at operation 1504 where the method 1500 receives a color raw moving image. Operation 1508 then reduces the color intensity of the original moving image from a color image having 24 bits per pixel (24 bits/pixel) to a grayscale image having 8 bits per pixel (8 bits/pixel). This image is then output to operation 1516, via operation 1512.
If the automated ticket detection method 1101 shown in FIG. 11c is used, steps 1504 and 1508 may not be needed because the server has already received a faded image of the original size.
Another operational path begins at operation 1502 where method 1500 receives the location of the corners of the document within the gray "icon" image generated by method 1300. Based on the location of the corners, the orientation of the document is determined and corrected after operation 1506. In some embodiments, this operation uses corner positions to measure the aspect ratio of the document within the original image. Operation 1506 then obtains midpoints between sets of corners (where each set of corners corresponds to one of the four sides of the depicted document), obtaining left (L), top (T), right (R), and bottom (B) midpoints. The distance between the L midpoint and the R midpoint and the distance between the T midpoint and the B midpoint are then compared to determine which of the two pairs has the greater distance. This provides the orientation of the ticket for operation 1506.
In some embodiments, the correct orientation of the ticket is based on the type of ticket detected. For example, as shown in FIG. 15b, where the note of interest is a check, when the distance between the top and bottom midpoints is greater than the distance between the left and right midpoints, the note is determined to be in the transverse direction. For other types of documents the opposite may be true.
If operation 1506 determines that a directional correction is required, the corners of the document are moved in a circular motion, in some embodiments clockwise, and in other embodiments counterclockwise.
At operation 1510, the method 1500 builds a projective transformation to map the image of the document to a predetermined target image size of width W pixels and height H pixels. In some embodiments, the projective transformation maps corners A, B, C and D of the document as follows: corners a to (0, 0), corners B to (W, 0), corners C to (W, H), and corners D to (0, H). Algorithms for creating projective transforms are well known and commonly used by those of ordinary skill in the art.
At operation 1516, the projective transformation created during operation 1514 is used for the grayscale moving image output from operation 1512. The projective transformation of the grayscale moving image used for operation 1512 results in all pixels within the quadrilateral ABCD depicted in the grayscale image being mapped to only the geometrically corrected grayscale image of the ticket. FIG. 15c is an example of a gray scale image of the document depicted in FIG. 11b after application of geometric correction. The method 1500 ends at operation 1518 where the grayscale image of the document is output to the next operation.
Now, for the binarization operation described in operation 1006 of FIG. 10, a flow chart illustrating an example of a method 1600 for binarization is provided in FIG. 16 a. The binarization operation produces a bi-tonal image with a color intensity of 1 bit per pixel (1 bit/pixel). In the case of tickets such as checks and deposit coupons, a two tone image is required for processing by an automated system, such as a remote deposit system. In addition, many image processing apparatuses require such an image as an output. Method 1600 shows how binarization to produce a gray scale image of a document by geometric operation 1004 is implemented. This embodiment uses a variation of the known Niblack binarization method. Likewise, assume that the received grayscale image has a size of W pixels × H pixels, and that the intensity function I (x, y) gives the intensity at position (x, y) as one of the 256 possible grayscale values (8 bits/pixel). Using the intensity function B (x, y), the binarization operation converts the 256 level gray values into 2 level gray values (1 bit/pixel). In addition, applying this method, a sliding window with a size of w pixels × h pixels is defined, and a threshold T for local (within-window) standard deviation of the grayscale image intensity I (x, y) is defined. The values of w, h and T are determined experimentally.
Once method 1600 receives the grayscale image of the document at operation 1602, method 1600 selects a pixel p (x, y) within the image at operation 1604. The method 1600 calculates an average (mean) value ave and a standard deviation σ of the selected pixel intensities I (x, y) within (around) the current w × h window position of the pixel p (x, y) at operation 1606. If the standard deviation σ is determined to be too small (i.e., σ < T) at operation 1608, then the pixel p (x, y) is considered low contrast and thus part of the background. Thus, at operation 1610, the low-contrast pixels are turned to white (i.e., B (x, y) is set to 1, which is white). However, if the standard deviation σ is determined to be equal to or greater than the threshold T (i.e., σ ≧ T), the pixel p (x, y) is considered as part of the foreground. In operation 1612, if i (p) < ave-k σ, pixel p is considered to be a foreground pixel, and thus B (x, y) is set to 0 (black). Otherwise, the pixel is taken as background (and thus B (x, y) is set to 1). In the above formula, k is a coefficient experimentally formulated.
After the pixel is converted at operation 1610 or operation 1612, the next pixel is selected at operation 1614 and operation 1606 is repeated until all gray-scale pixels (8 bits/pixel) are converted to a two-tone pixel (1 bit/pixel). However, if there are no more pixels to be converted at operation 1618, a two-tone image of the bill is output at operation 1620. FIG. 16 shows an example of the image of the check shown in FIG. 15c after the binarization operation.
Referring now to FIG. 16c, there is illustrated a flow diagram of additional operations 1601 that may be incorporated into the method 1600 to provide improved binarization of an image that may be used for further processing. Additional operations 1601 provide for precise definition of the sliding window and threshold used in the binarization process described above to improve the quality of the bi-tonal image within the selected document region. Additional operations 1601 may be performed as part of the binarization operation 1006 or as part of the output of the bi-tonal image in operation 1012 of fig. 10. Additional operations 1601 may be more applicable to operation 1012, where the ticket region may be more clearly defined. Additional operations 1601 provide an improved two-tone image where regions of the ticket may later undergo computer recognition techniques, such as OCR or handwriting recognition processing, for example.
Additional operations 1601 are performed after selecting a pixel on the grayscale image in operation 1604. It is then determined whether the pixel is located within a document region in the image. A ticket region is a region of a ticket where information is expected to be located on the ticket. For example, in the case of checks, the rectangular area where the MICR line is expected to be may be the bill area. Other ticket areas on the check may include, but are not limited to, lower case amount 102, upper case amount 104, date 110, and payee 108. If the pixel is not located within the document region, then at operation 1604, the window and threshold used by the document in question for processing in operation 1606 is selected.
At operation 1626, if the pixel is determined to be within the document region, a window is selected to be within the document region. The size of the window can be determined experimentally, but should not extend beyond the extent of the document area to avoid capturing features of the document background within the window. Each note zone may have its own respective window size. By limiting the window to the bill area, background artifacts outside the relevant bill area do not add noise to the binarization process. After the window size is selected, the window may be positioned such that the selected pixel is positioned near the center of the window.
In some embodiments, operation 1628 may further be included to select a threshold value for the document region for the selected pixel. The threshold may be determined experimentally, and each document region may have its own respective threshold. Selection of a threshold value for a document region may allow the binarization process to optimize the machine processing of information in the document region.
With respect to the orientation correction operation 1008 in fig. 10 described hereinabove, fig. 17a is a flow chart illustrating a known method for correcting a ticket in an inverted orientation in an image. In particular, FIG. 17a shows a method 1700 for correcting a check with an inverted orientation within a two-tone image. FIG. 17b depicts an exemplary two-tone image of a check with the direction inverted. Those skilled in the art will appreciate that method 1700 may operate differently for other types of tickets, such as deposit coupons.
As described above, the geometry correction operation of FIG. 15 is one method for correcting a transverse document within a moving image. However, even after lateral correction, the bill may still be in an inverted orientation.
To correct the inversion orientation of certain tickets, some embodiments require that the image containing the ticket be pre-binarized. Therefore, the direction correcting operation 1008 shown in fig. 10 follows the binarizing operation 1006.
Once the binary image of the check is received in operation 1702, the method 1700 reads the MICR line at the bottom of the binary image and generates a MICR confidence value in operation 1704. At operation 1706, this MICR confidence value (MC1) is compared to a threshold T to determine if the check is face-up. If MC1 > T in operation 1708, the check's two-tone image is right side up and output in operation 1710.
However, if MC1 ≦ T in operation 1708, the image is rotated 180 degrees in operation 1712, the bottom MICR line is again read, and a new MICR confidence value is generated (MC 2). The image is rotated 180 degrees by methods known in the art. The rotated MICR confidence value (MC2) is compared to the previous MICR confidence value (MC1) plus a value of Δ at operation 1714 to determine if the check is now right side up. If MC2 > MC1+ Δ in operation 1716, the rotated binary image has the check face up, and thus the rotated image is output in operation 1718. Otherwise, if MC2 ≦ MC1+ Δ in operation 1716, then the original binary image of the check is right side up, which is output in operation 1710. Δ is a positive value of experimental choice that reflects a higher prior probability that the note is initially right side up than upside down.
Referring now to FIG. 17c, FIG. 17c is a flow chart illustrating an improved method for correcting a document with an inverted orientation within an image. In particular, figure 17c shows a method 1701 that provides faster speeds and gives an indication if the MICR line of the document is not readable. Some of the steps of method 1701 may be performed in parallel to more quickly determine whether the image can be properly oriented. Another improvement of the method 1700 is to provide an indication that the bi-tonal image is not properly oriented based on the MICR line information. Since the MICR line is used for subsequent processing (e.g., image processing for size correction and reading financial account information), an indication that the MICR line cannot be read can be used to discard the image or alert the user of the mobile device that the image is not accepted.
The method 1701 begins after operation 1722 receives a two-tone image of a check. The MICR line is then read at the expected location of the MICR line at the bottom of the two tone image in operation 1724, yielding the MICR confidence value. This confidence value (MICR-Conf1) is compared to a threshold T in operation 1726 to determine if the MICR confidence value is greater than the threshold. If the MICR confidence value is greater than the threshold value, it is determined at operation 1732 that the original two-tone image of the check is right-side-up and the original two-tone image of the check is output. Alternatively, operation 1732 may simply provide an indication by setting/clearing a flag associated with the two-tone image (e.g., the inverted flag may be cleared in operation 1732 to indicate that the image is properly oriented).
In operation 1728, the two tone image is rotated 180 degrees, the MICR line at the bottom of the check is again read, and a new MICR confidence value (MICR-Conf2) is generated. The rotated MICR confidence value (MICR-Conf2) is compared to a threshold value T at operation 1730 to determine if the MICR confidence value is greater than the threshold value. If the MICR confidence value is greater than the threshold, then at operation 1734 it is determined that the original two-tone image of the check is inverted and the 180 degree rotated two-tone image of the check is output. Alternatively, operation 1734 may simply provide an indication by setting/clearing a flag associated with the two-tone image (e.g., an inverted flag may be set in operation 1734 to indicate that the two-tone image is inverted).
To more quickly determine whether both MICR confidence readings are below the threshold T, operations 1724 and 1726 may be performed in parallel with operations 1728 and 1730. By performing the two operations in parallel, faster feedback may be provided from the server to the mobile device to indicate the suitability of the image provided from the mobile device. In addition, by comparing the MICR confidence value to a threshold in both directions, false positives can be avoided where it is unclear whether the image is oriented correctly.
If neither the MICR confidence values of the original image nor the rotated image are above the threshold, the orientation of the image is indicated as unknown in operation 1736. A flag may be set in association with the two tone image to indicate that the direction is unknown. In some embodiments, the two tone image may be re-provided to an optional orientation correction module that relies on another feature of the document to correct orientation. If it is desired or required that the image have a clear MICR line, the image can be discarded and the mobile device can be alerted that the image is unacceptable.
Fig. 17c also shows an alternative sequential implementation of routing 1740 through operation 1728. In this embodiment, the operation 1728 for rotating the image is only performed if the MICR confidence value for the original image is not above the threshold. If the MICR confidence value for the original two-tone image is not above the threshold, the sequential embodiment may proceed along the "NO" path 1740 of the flowchart to operation 1728 where the two-tone image is rotated and a second MICR confidence value is generated.
Other embodiments may rely on the number of MICR characters that can be read in the form of MICR confidence values. For example, at operation 1724, it may be determined how many MICR characters are read, and at operation 1726, the threshold may be the expected number of MICR characters and then compared to the actual number of MICR characters read.
With respect to the size correction operation 1010 illustrated in fig. 10, fig. 18a and 18b are flowcharts illustrating an example method for size correction of an image. In particular, FIGS. 18a and 18b illustrate example methods 1800, 1801 for correcting the size of a check within a two-tone image, where the check is right side up. Those skilled in the art will appreciate that the methods 1800, 1801 can operate differently for other types of tickets (e.g., deposit coupons).
Since many image processing engines are sensitive to image size, it is critical to correct the size of the document image before it can be processed correctly. For example, the form recognition engine may depend on the document size, which is an important feature for identifying the type of document to be processed. Typically, for financial instruments such as checks, the image size should be equal to that obtained by a standard scanner operating at 200 DPI.
Further, when the document is a check, the predetermined image size for geometric correction during the geometric correction operation is 1200 x 560 pixels (see, e.g., FIG. 15) that is approximately equal to the size of a personal check scanned at 200 DPI. However, the size of commercial checks tends to vary widely, and it is known that most commercial checks have a width of 8.75 "which translates to 1750 pixels when scanned at 200 DPI. Therefore, in order to resize commercial checks and other check types that have been geometrically corrected at a predetermined image size of 1200 × 560 pixels, a size correction operation is performed.
Referring now to FIG. 18a, after receiving a two-tone image containing a check with the front side facing up in operation 1802, the method 1800 reads the MICR line at the bottom of the check in operation 1804. This allows the method 1800 to calculate the average height of the MICR character in operation 1806. In doing so, the calculated average height is compared with the height of the MICR character of 200DPI in operation 1808, and a scaling factor is calculated accordingly. The scaling factor SF is calculated as follows: SF-H200/AH (eq.7), where AH is the average height of the resulting MICR characters; h200 is the corresponding "theoretical" height value at 200DPI based on ANSI X9.37 standard (Specifications for Electronic Exchange of Check and image Data).
The method 1800 uses the scaling factor in operation 1810 to determine whether the check's two tone image requires size correction. If the scaling factor SF is determined to be less than or equal to 1.0+ Δ, the method 1800 outputs the two tone image of the check and the latest version of the grayscale image of the check in operation 1812. Δ defines the system tolerance to false image sizes.
However, if the scaling factor SF is determined to be above 1.0+ Δ, then the preliminary dimensions of the check are calculated in operation 1811 as follows: AR ═ HS/WS (eq.8), H ═ SF (eq.9), W ═ H'/AR (eq.10), where HS and WS are the height and width of the check page (check snippet) obtained at the original image; AR is check aspect ratio (it is desirable to maintain aspect ratio while changing size); w is the width of the image after geometric correction before resizing; w' is the preliminary adjusted check pixel width; preliminary H' is the adjusted check pixel height.
The preliminary height and width (H 'and W') are then compared to the closest check of known check dimensions at 200DPI to adjust the scaling factor. For example, if H 'and W' are calculated to correspond to 2.48 ". times.4.82", these metrics are closest to the known check metrics of 2.5 ". times.5". Since many check dimensions are multiples of 1/8 inches, alternative embodiments may simply round the expected dimensions of the check to the nearest 1/8 inches to determine the nearest known check dimension. The scaling factor may then be adjusted as follows: AFH ═ HNK/H '(eq.11), AFW ═ WNK/W' (eq.12) H ═ AFH ═ H '(eq.13), W ═ AFW ═ W' (eq.14), where AFH and AFW are adjustment factors applied to height and width, respectively; HNK and WNK are the closest known height and width, respectively; h "is the final adjusted check pixel height; w "is the final adjusted check pixel width. Since separate adjustment factors are applied to the height and width, small errors in the aspect ratio of the image are also corrected.
After recalculating the final scale, operation 1814 repeats the geometric correction and binarization using the newly scaled check image. After the repeating operation, operation 1812 outputs the resulting two tone image of the check and the gray scale image of the check.
Referring now to FIG. 18b, an example method 1801 for size correction of an image by scaling the image using height and width measurements is shown. Operations corresponding to those shown in FIG. 18a are numbered similarly. Method 1801 is used to improve scaling when the geometrically corrected image does not correspond to the original document. Instead of performing the size correction depending on the aspect ratio of the image, the scale factor of the height and width is calculated.
In operation 1807, an average height and width of the MICR character is calculated. The calculated average height and width are compared to the MICR characters for 200DPI in operation 1809, and a scaling factor is calculated accordingly. The scaling factor is calculated as follows: SFH ═ H200/AH (eq.7), SFW ═ W200/AW (eq.15), where SFH and SFW are proportionality coefficients applied to height and width, respectively; AW is the average width of the MICR character obtained; w200 is the corresponding "theoretical" width value based on ANSI x 9.37 standard at 200 DPI. The final height and width can then be calculated independently of each other (i.e. independent of the aspect ratio) as follows: h ″ ═ SFH (eq.16), W ″ ═ SFW (eq.17).
As an alternative to calculating the average width in operations 1806 and 1807, greater accuracy is obtained by using the distance relative to the MICR symbol (e.g., transition symbol 113 or on-us symbol 115 shown in fig. 1). The check uses a standard distance between specific MICR symbols or between the MICR symbol and the leading edge (opposite the right edge of the check) or bottom edge of the note. The scaling factor may be calculated using a theoretical distance based on a standard and a distance measured in the image. Since these distances are greater than the width of the MICR character, scaling using the distance relative to the MICR symbol is less prone to error and thus more accurate.
As an example, in operation 1804, the MICR line is read to determine the distance between two transition symbols. The scaling factor SF is then calculated in operation 1808 or 1809 as follows:
SF-TDist 200/MD, (eq.18), where MD is the distance between the measured transition symbols;
TDist200 is the corresponding distance according to the check MICR standard at 200 DPI. The scaling factor may be similarly calculated using the measured distance from the leading edge of the check to the first transition symbol and the distance at 200DPI according to the standard. Some embodiments may use multiple metrics for the check image to calculate multiple scaling factors, which may then be averaged to calculate the scaling factor for application to the image. Some embodiments may only select one of the window (window) or threshold, while other embodiments may select the window and threshold.
The term module as used herein may describe a designated unit of functionality that may be performed. Modules used herein may be implemented using any form of hardware, software, or combination thereof. For example, one or more processors, controllers, ASICs, PLAs, logic elements, software approaches, or other mechanisms may be used to construct a module. In implementation, the different modules described herein may be implemented as discrete modules or the functions or features described may be partially or fully shared among one or more modules. In other words, after reading this description, it will be apparent to those skilled in the art that the different features and functions described herein may be implemented in any given application, and in different combinations and permutations with one or more separate or shared modules. Even though different features or functional elements may be described or claimed separately as separate modules, those skilled in the art will appreciate that such features and functions may be shared among one or more common software or hardware elements, and that such description does not require or imply that separate hardware or software components are used to implement such features or functions.
When the components or modules of a process used in connection with the operations described herein are implemented, in whole or in part, in software, these software elements may be executed to operate in conjunction with a computing or processing module capable of performing the functions described in connection with the blocks, in one embodiment. Fig. 19 illustrates one such exemplary computing module. Various embodiments are described in terms of this exemplary computing module 1900. After reading this description, it will become apparent to a person skilled in the relevant art how to implement the invention using other computing modules or structures.
Referring to fig. 19, for example, computing module 1900 may represent computing or processing power found in desktop, laptop and notebook computers, mainframe computers, supercomputers, workstations or servers, or any other type of special or general purpose computing device as may be appropriate for a given program or environment. Computing module 1900 may also represent computing power embedded in or available to a given device. For example, the computing module may be built in other electronic devices. For example, a computing module may include one or more processors or processing devices, such as processor 1904. The processor 1904 may be implemented with a general or special purpose processing engine (processing engine), such as a microprocessor, controller or other control logic.
Computing module 1900 may also include one or more memory modules (referred to as main memory 1908). For example, Random Access Memory (RAM) or other dynamic memory may be used to store information and instructions to be executed by the processor 1904. Main memory 1908 also may be used for storing temporary variables or other intermediate information during execution of instructions by processor 1904. Computing module 1900 may also include a read only memory ("ROM") or other static storage device coupled to bus 1903 for storing static information and instructions for processor 1904.
The computing module 1900 may also include one or more forms of information storage mechanism 1910, which may include, for example, a media drive 1912 and a storage unit interface 1920. The media drive 1912 may include a drive or a mechanism that supports fixed or removable storage media 1914. Such as a hard disk drive, a floppy disk drive, a magnetic tape drive, an optical disk drive, a CD or DVD drive, or other removable or fixed media drive. Thus, storage media 1914 may include, for example, a hard disk, floppy disk, magnetic tape, cassette, optical disk, CD or DVD, or other fixed or removable medium that is read by, written to, or accessed by media drive 1912. As these illustrated examples, the storage media 1914 may include a computer-usable storage medium having stored therein particular computer software or data.
In alternative embodiments, information storage mechanism 1910 may include other similar means for allowing computer programs or other instructions or data to be loaded into computing module 1900. Such means may include, for example, a fixed or removable storage unit 1922 and an interface 1920. Examples of such storage units 1922 and interfaces 1920 include program cartridge and cartridge interfaces, removable memory (e.g., flash memory or other removable storage modules) and memory slots, PCMCIA slots and cards, and other fixed or removable storage units 1922 and interfaces 1920 that allow software or data to be transferred from the storage unit 1922 to the computing module 1900.
Computing module 1900 may also include a communications interface 1924. Communication interface 1924 may be used to allow software and data to be transferred between computing module 1900 and external devices. Examples of communication interfaces 1924 may include a modem or soft modem (softmodem), a network interface (such as an ethernet, network interface card, Wimedia, IEEE 802.XX (or other interface)), a communication port (such as a USB port, IR port, RS232 port, bluetooth port, etc.)Interface or other port) or other communicationsAn interface. Software and data transferred via communications interface 1924 are typically carried on signals which can be electrical, electromagnetic (including optical) or other signals capable of being exchanged over a given communications interface 1924. These signals may be provided to communications interface 1924 via a channel 1928. These channels may carry signals and may be implemented using a wired or wireless communication medium. These signals may transfer software and data from a memory or other storage medium in one computing system to a memory or other storage medium in computing system 1900. Some examples of a channel may include a telephone line, a cellular link, an RF link, an optical link, a network interface, a local or wide area network, and other wired or wireless communication channels.
Computing module 1900 may also include a communications interface 1924. Communication interface 1924 may be used to allow software and data to be transferred between computing module 1900 and external devices. Examples of communication interface 1924 may include a modem or soft modem, a network interface (such as an Ethernet, network interface card, WIMAX, 802.XX, or other interface), a communication port (such as a USB port, IR port, RS232 port, BluetoothInterfaces or other ports) or other communication ports. Software and data transferred via communications interface 1924 are typically carried on signals which may be electrical, electromagnetic, optical, or other signals capable of being exchanged via a given communications interface 1924. These signals may be provided to communications interface 1924 via a channel 1928. This channel 1928 may carry signals and may be implemented using a wired or wireless medium. Some examples of a channel may include a telephone line, a cellular link, an RF link, an optical link, a network interface, a local or wide area network, and other wired or wireless communication channels.
The terms "computer program medium" and "computer usable medium" are used herein to generally refer to physical storage media such as memory 1908, storage unit 1920, and media 1914. These and other various forms of computer program media or computer usable media may be involved in storing one or more sequences of one or more instructions to a processing device for execution. Such instructions contained on the medium are often referred to as "computer program code" or "computer program product" (which may be grouped in the form of computer programs or other groupings). Such instructions, when executed, may cause the computing module 1900 to perform features or functions discussed herein.
While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. The breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments. The techniques to which the document refers will be apparent and known to those skilled in the art, and such techniques include those that are apparent or known to the skilled artisan at present or anytime in the future. Further, the invention is not limited to the exemplary constructions or configurations shown, but rather, various alternative constructions and configurations may be utilized to achieve the desired characteristics. The illustrated embodiments and their various modifications may be practiced, without limitation, to the examples shown, as the present invention will become apparent to those skilled in the art upon reading the present disclosure. Those skilled in the art will also understand how to implement desired features using alternative functions, logical circuits, or physical separation and configuration.
Furthermore, although items, elements or components may be described or claimed in the singular, the plural is contemplated to be within the scope thereof unless limitation to the singular is explicitly stated. The presence of broadening words and phrases such as "one or more," "at least," "not limited to" or other phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be present. Although exemplary embodiments have been described herein, it should be understood that the invention is not limited to the disclosed embodiments. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims, which scope is to be accorded an interpretation that encompasses all such modifications and equivalent structures and functions.
Claims (17)
1. A system for image capture and processing of financial instruments, comprising:
a mobile device, the mobile device comprising:
an image capture device configured to capture a color image of a financial instrument;
a processor configured to generate a color reduced image; and
a transmitter configured to transmit the color reduced image to a server,
the server is configured to receive the color reduction from the mobile device
An image and is used to detect the financial instrument in the color reduced image, perform geometric correction on the color reduced image, binarize the color reduced image to produce a bi-tonal image, correct the orientation of the bi-tonal image, and correct the size of the bi-tonal image.
2. The system of claim 1, wherein the server is further configured to detect the financial instrument in the color reduced image by: converting the color reduced image into a smaller color reduced image; and detecting a document corner position of the financial document in the smaller color reduced image.
3. The system of claim 2, wherein the color reduced image is a grayscale image.
4. The system of claim 1, wherein the server is further configured to correct an orientation of the color reduced image and correct a size of the color reduced image.
5. The system of claim 1, wherein the server is further configured to binarize the geometrically corrected image to produce a bi-tonal image by:
selecting a pixel on the grayscale image;
determining whether the selected pixel is within the document region, and if the selected pixel is within the document region:
selecting a window within the ticket region; and
calculating a mean and a standard deviation of the selected pixels over the window;
determining whether the standard deviation is too small, and if the standard deviation is too small,
the selected image is converted to white and, if the standard deviation is not too small,
converting the selected pixel to black or white based on the intensity; and
another pixel is selected and the calculating and determining steps are repeated until there are no selectable pixels.
6. The system of claim 5, wherein the window is selected in a manner such that the window does not extend beyond the document area to avoid capturing background features of the document within the window.
7. The system of claim 5, wherein a threshold is selected if the selected pixel is within the document region, and wherein the determining utilizes the threshold to determine whether the standard deviation is too small.
8. The system of claim 5, wherein the ticket is a check and the ticket field may be selected from the group consisting of a MICR line, a lower case amount, a legal amount, a date, a signature, and a payee.
9. The system of claim 1, wherein the financial instrument is a check, and wherein the server is further configured to correct the orientation of the two-tone image by:
reading the MICR line on the bottom of the financial instrument;
generating MICR confidence values for the read MICR lines;
comparing the MICR confidence value to a threshold value;
when the MICR confidence value exceeds a threshold: determining that the two-tone image is forward-facing; and
when the MICR confidence value does not exceed the threshold:
determining that the two tone image is non-positive;
the image is rotated by 180 degrees and the image is rotated,
the MICR line is re-read and,
generating a new MICR confidence value
Comparing the new MICR confidence value to the threshold,
determining that the rotated bi-tonal image is forward-facing when the new MICR confidence value exceeds the threshold.
10. The system of claim 9, wherein, when the new MICR confidence value does not exceed the threshold, the server is further configured to indicate that the orientation of the image is unknown.
11. The system of claim 1, wherein the financial instrument is a check, and wherein the server is further configured to correct the size of the bi-tonal image by:
reading the MICR line on the bottom of the financial instrument;
calculating an average height of the MICR characters;
calculating a scaling factor based on the average height of the MICR characters and a desired height for the selected DPI;
when the scale factor is not greater than a threshold, outputting the two-tone image; and
when the scaling factor is greater than a threshold: calculating the size of the check based on the scaling factor; and repeating the geometric correction and the binarization, and outputting the two-tone image.
12. The system of claim 11, wherein calculating the size of the check further comprises comparing the calculated size to a desired size; adjusting a scaling factor based on the desired size to provide an adjusted scaling factor; and recalculating the size of the check based on the adjusted scaling factor.
13. The system of claim 12, wherein the desired size is any of a known check size or a size based on a multiple of 1/8 inches.
14. The system of claim 1, wherein the financial instrument is a check, and wherein the server is further configured to correct the size of the bi-tonal image by:
reading the MICR line on the bottom of the financial instrument;
calculating an average height and an average width of the MICR characters;
calculating a height scaling factor based on the average height of the MICR character and a desired height for the selected DPI;
calculating a width scaling factor based on the average width of the MICR character and a desired width at the selected DPI;
outputting the bi-tonal image when both the height scaling factor and the width scaling factor are not greater than a threshold; and
when both the height scaling factor and the width scaling factor are greater than a threshold value, calculating a size of the check based on the height scaling factor and the width scaling factor, scaling the width by the width scaling factor and scaling the height by the height scaling factor; and repeating the geometric correction and the binarization, and outputting the two-tone image.
15. The system of claim 1, wherein the financial instrument is a check, and wherein the server is further configured to correct the size of the bi-tonal image by:
reading the MICR line at the bottom of the financial instrument;
determining the position of the MICR symbol;
calculating a scaling factor based on the distance relative to the MICR symbol and the desired distance at the selected DPI;
when the scale factor is not greater than a threshold, outputting the two-tone image; and
when the scaling factor is greater than a threshold: calculating the size of the check based on the scaling factor; and repeating the geometric correction and the binarization and outputting the two-tone image.
16. The system of claim 15, wherein the distance relative to a MICR symbol is a distance between two transitional MICR symbols.
17. The system of claim 15 wherein the distance from the MICR symbol is the distance between the leading edge of the document and at least one transition MICR symbol.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US12/940,739 | 2010-11-05 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK1171850A true HK1171850A (en) | 2013-04-05 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8995012B2 (en) | System for mobile image capture and processing of financial documents | |
| US7953268B2 (en) | Methods for mobile image capture and processing of documents | |
| US11599861B2 (en) | Systems and methods for mobile automated clearing house enrollment | |
| US11676285B1 (en) | System, computing device, and method for document detection | |
| US12008827B2 (en) | Systems and methods for developing and verifying image processing standards for mobile deposit | |
| US20230386239A1 (en) | Mobile image quality assurance in mobile document image processing applications | |
| US8577118B2 (en) | Systems for mobile image capture and remittance processing | |
| US8483473B2 (en) | Systems and methods for obtaining financial offers using mobile image capture | |
| US20130148862A1 (en) | Systems and methods for obtaining financial offers using mobile image capture | |
| EP2014082A1 (en) | Generating a bitonal image from a scanned colour image | |
| HK1171850A (en) | System for mobile image capture and processing of financial documents | |
| CA2648054C (en) | Apparatus and method for detection and analysis of imagery |