HK1109670A

HK1109670A - Photographic document imaging system

Info

Publication number: HK1109670A
Application number: HK08103751.5A
Authority: HK
Inventors: 小爱德华‧P.‧希尼; 紮查理‧安德里; 紮查理亚‧克里格; 詹姆斯‧达里尼安; 库尔特‧拉佩尔杰; 威廉姆‧J.‧阿达姆斯; 紮查理‧B.‧多德斯
Original assignee: 计算机连接管理中心公司
Priority date: 2004-08-26
Filing date: 2005-07-29
Publication date: 2008-06-13

Abstract

An apparatus and method for processing a captured image and, more particularly, for processing a captured image comprising a document. In one embodiment, an apparatus comprising a camera to capture documents is described. In another embodiment, a method for processing a captured image that includes a document comprises the steps of distinguishing an imaged document from its background, adjusting the captured image to reduce distortions created from use of a camera and properly orientating the document is described.

Description

Photographic document imaging system

Technical Field

Apparatus and method for processing a captured image, in particular for processing a captured image comprising a document.

Background

FIG. 1-A is a block diagram depicting typical components of a scanner. Scanners are commonly used to capture images of documents 110. The document 110 is placed on a scanner board 112. A scan head 120, which typically includes an optical subsystem 122 and a charge coupled device ("CCD") 124, moves over document 110. Although FIG. 1A depicts only a two-dimensional view, the scanning head 120 may be moved in the direction of the document shown by arrow 114 and in a direction perpendicular to the document 110. The optical subsystem 122 focuses the light reflected from the document 110 onto a CCD 124. CCD124 is often implemented as a two-dimensional array of photosensitive capacitive elements. When light is incident on the photosensitive elements of the CCD124, charges are trapped in the depletion regions of the semiconductor elements. The amount of charge associated with the photosensitive capacitive element is related to the intensity of light incident on the respective element received during a sampling period. Accordingly, by sampling the elements, an image is captured by determining the intensity of the incident light on the respective photosensitive capacitive element. The analog information produced by the photosensitive capacitive elements is converted to digital information by an analog-to-digital (a/D) converter 130. The a/D converter 130 may convert analog information received from the CCD124 in both a serial and parallel manner. The converted digital information may be stored in the memory 140. The digital information is then processed by the processor 150 in accordance with control software stored in the ROM 180. The user can control the scan parameters through the user interface 170 and output the image obtained by the scan through the output port 160.

A block diagram of a digital camera is depicted in fig. 1B. An optical subsystem 122 of the digital camera may be used to focus light reflected from the document 110 onto a CCD124, as in a scanner. In other digital cameras, devices other than CCDs (such as CMOS sensors) are used to capture light reflected from an image. In the case of a digital camera, in contrast to a scanner, the optical subsystem 122 does not move along the surface of the document as in a scanner. In contrast, in a digital camera, the optical system 122 is generally fixed relative to the object (e.g., a document) whose image is to be acquired. In addition to digital cameras, photographs captured from cameras using film may also be digitized.

Cameras have significant advantages over scanners when used to capture document images and other images. For example, cameras are generally more portable than scanners. In addition, because scanners require that the captured image be placed on the scanner plate, cameras are able to capture a wider range of images than scanners. However, using a camera encounters difficulties in image capture that do not exist when using a scanner. For example, light conditions can change when using a camera, whereas in scanners light conditions are generally controlled. In addition, using a camera can produce image distortion, which can depend on various variables such as the angle of the camera relative to the image, the lens used by the camera and its distance from the image, whether the image including the document lies on a flat or curved surface, and other factors. These distortions are generally not generated in scanners because scanners utilize a moving scanner head that is a fixed distance from the document to be imaged.

Accordingly, there is a need for an apparatus and method for capturing an image of a document that takes advantage of the advantages of cameras over scanners, but reduces the difficulties presented by capturing document images with cameras that are not present with scanners.

Disclosure of Invention

An apparatus and method for processing a captured image including an imaged document is described. In one embodiment, the apparatus includes a stationary camera used to capture the imaged document. In another embodiment, the imaged document is captured using a non-stationary camera. In yet another embodiment, a method for processing a captured image including a document includes: the imaged document is distinguished from its background, the captured image is adjusted to reduce distortion due to the use of a camera, and the document is correctly oriented.

Drawings

FIG. 1A depicts a document scanner employing the prior art.

Fig. 1B depicts a digital camera employing the prior art.

FIG. 2 depicts a general flow diagram of a method for processing a captured image.

FIG. 3 depicts a flow diagram of another embodiment of a method for processing a captured image.

FIG. 4 depicts a flow diagram of a method of performing segmentation in accordance with one of the embodiments of the method of imaging a document described herein.

FIG. 5 depicts a flowchart of one method of performing the random sample consensus step shown in FIG. 4.

Fig. 6 depicts a flow chart of one method of performing the outlier removal step shown in fig. 4.

FIG. 7 depicts a flow diagram of another method of performing segmentation in accordance with the method of imaging a document described herein.

Fig. 8 depicts a flowchart of one method of performing the distortion removal steps shown in fig. 2 and 3.

FIG. 9 depicts a flowchart of one method of performing the "lines of text" step shown in FIG. 3.

FIG. 10 depicts a flowchart of one method of determining whether a document is properly oriented upright according to one embodiment of the method of imaging a document described herein.

FIG. 11 depicts one embodiment of an apparatus for capturing and processing an image including an imaged document.

FIG. 12 depicts a flowchart of one method of determining whether a document is upright according to one embodiment of the method of imaging a document described herein.

FIG. 13 depicts one embodiment of a system for processing a captured image.

Detailed Description

Embodiments described herein are directed to processing images captured from a camera that include a document. The embodiments described herein are directed to identifying a captured document image from its background. After the captured document image is isolated from its background, embodiments described herein are directed to reducing or removing distortion of the captured document image. After correcting the captured document image, the embodiments described herein are directed to rotating the captured document image to its correct orientation. In addition, the embodiments described herein provide a user with an assessment of the success of implementing each of the steps in their various embodiments.

FIG. 2 depicts a general flow diagram of a method for processing a captured image. After start 210, an image is received 220. Images may be received from a variety of sources. For example, in one embodiment, the image may be received from a digital camera. In another embodiment, the image may be received from a stationary unit comprising a digital camera. In yet another embodiment, the image may be received from a film photograph that has been digitized. The received image 220 includes a document image. Step 230 is used to identify the captured document image from the rest of the image, or background. Step 230 is referred to as segmentation. This step 230 may be used to detect edges of the captured image document. This step 230 may also be used to crop the background of the image from the captured document image so that the document is separated from its background. A step 240, referred to as distortion removal, is used to reduce or remove distortion of the captured document image. Some of the distortions that this step 240 can correct are perspective distortion, lens distortion, and optical distortion. Other distortions may also be corrected in this step 240. Step 250 is for correcting the orientation of the document. This step 250 may be used to determine whether the captured document image should be portrait or landscape and rotate the captured document image accordingly. This step 250 may also be used to determine if the captured document image is inverted and rotate the captured document image accordingly. In step 260, the processed document image is output. The processed document image may be output 260 by various means, such as displaying the processed document image on a monitor, saving the processed document image to a computer file, electronically transmitting the document image, or printing the processed document image.

FIG. 3 depicts a flow diagram 300 of another embodiment of a method for processing a captured image. After the start 205, an image 310 is received. In step 315, the received image is converted to a device independent bitmap. In step 320, segmentation is performed using an edge-based segmentation process. The edge-based segmentation 320 process identifies edges of the captured image document to distinguish the captured document image from its background.

FIG. 4 depicts a flow diagram of one embodiment of edge-based segmentation 320. In this embodiment, horizontal and vertical edge points are located. This is done by searching for edge points. Edge points are determined by identifying portions of the received image that contain transitions from the background portion of the received image to the document portion of the received image. In one embodiment, the received image is scanned 410 starting from the center of the received image and the received image is also scanned 420 starting from the border of the received image. In one embodiment, it is assumed that the document image occupies the center of the received image. In another embodiment, it is assumed that the pixel density of the non-text portion of the captured document image is greater than the pixel density of its background. In scan 410, starting from the center of the received image, after finding a region that can be identified as a document pixel, a transition to background pixels is searched along with the scan. In a scan 420 beginning from the boundary of the received image, an area is identified as background pixels and transitions to document image pixels are identified. This process may be performed using either or both of the scans 410, 420. In one embodiment, the received images are scanned 410, 420 in both the horizontal and vertical directions.

Then, a random sample consensus step 430 is performed. FIG. 5 depicts one embodiment of a random sample consensus step. In this embodiment, random sample consensus 430 is performed by randomly selecting two points per step 510 from the edge points selected in steps 410 and 420. Then, a line connecting the two randomly selected points is calculated per step 520. In one embodiment, an angle-distance coordinate is used, where the angle value corresponds to the angle of a line segment around the center of the received image and the distance value corresponds to the distance from the center of the received image to the nearest point in the line segment. In other embodiments, other coordinate systems may be used, including cartesian coordinates or polar coordinates. These values are then stored. The process of selecting two random points from the edge points obtained in 410 and 420 is repeated to obtain a sufficient set of samples 530. In one embodiment, this process is repeated five thousand times, although different sample sizes may be used. After sampling, all pairs of points lying on the same line are grouped into bins (bins). If the initial edge points selected in 410 and 420 accurately represent the edges of the document in the received image, approximately one-quarter of the points will be distributed to four small regions corresponding to the four document edges, while the remaining points will be evenly distributed over the remainder of the possible coordinates. Four groups of grouped line segments having the majority of the grouped line segments 540 and meeting a minimum threshold of grouped line segments are identified as representing four edges of the document in the received image 550. In one embodiment, the set of line segments are then determined to be a left edge, a right edge, a top edge, and a bottom edge based on their relative positions in the received image.

In one embodiment, after performing the random sample consensus step 430, an outlier elimination step 440 is performed between the set of edge points to further refine the identification of the document edges. In one embodiment, depicted in FIG. 6, this is done by performing a linear regression between a set of edge points corresponding to one edge of the received document image. In the linear regression technique, a straight line is drawn in an attempt to most accurately connect the set of edge points 610. If the point farthest from the linear regression line is determined to be a sufficiently far distance from the linear regression line 620, the point is deleted 630 and a new linear regression is performed. This process is repeated until the point farthest from the linear regression line is within the threshold, and the resulting linear regression line is determined to be an edge line. This is performed in each of the sets of four edge points representing the four edges of the received image document.

Referring back to FIG. 3, in step 325, a calculation of the accuracy of identifying the edge lines and the edge-based segmentation 320 is determined. This step 325 may be referred to as calculation of confidence. In one embodiment, a confidence is calculated for each edge of the received document image, with the lowest value determined as the overall confidence. In another embodiment, the highest confidence value between the edge lines is determined as the overall confidence. In yet another embodiment, a combination of the confidences of the edge lines, e.g., an average of the confidences of the line edges, is used to determine the overall confidence. One embodiment of calculating the confidence of a particular line edge is to calculate the ratio between the number of pixels still remaining in the set of edges after outlier elimination 440 and the total number of pixels already found in the edge. The determination of confidence may be used to improve the distortion removal 240, 350 effect of the received document image and may also be used to inform the user of the accuracy of the system's performance with respect to a particular received image. In step 330, if the confidence in the edge-based segmentation step 320 is not sufficiently high, then the content-based segmentation of step 335 is performed.

A content-based segmentation step 335 (one embodiment of which is described in fig. 7), identifies text of the captured image document, and calculates edges of the captured image document relative to the text. This is accomplished by identifying connected components in the received document image 710 and finding the nearest neighbors of these components 720. Connected components generally refer to those black or dark pixels that are adjacent to each other. These adjacent pixels are then connected as lines 730, and the lines 730 are then used to determine the boundaries 740 of the text. Margins are added 750 from these boundaries to identify the location of the edges of the received document image. Although the size of the margins may vary, in one embodiment, standard margins are added in step 750.

In step 340, corners of the captured document image are computed. In one embodiment, the corners are calculated from the intersection of the edge lines.

The distortion removal 240, 350 steps may involve various adjustments to the received image. In one embodiment, the distortion removal 240, 350 will adjust the received document image to correct for perspective distortion in the received image. For example, in the case where the picture is not taken at an angle and centered directly above the document, there will be perspective distortion in the received document image.

One embodiment for adjusting an image to correct for perspective distortion is depicted in FIG. 8. This embodiment involves mapping a set of image coordinates 810, e.g., (x, y), to a new set of image coordinates, e.g., (u, v). After the segmentation steps 230, 320, 335, the four corners of the document are determined as per step 340. Typically, in a document containing perspective distortion, these four corners will correspond to trapezoids, whereas the document should generally have a rectangular shape. As such, in one embodiment, the mapping 810 is performed between the received trapezoid and the desired rectangle. One embodiment for performing this mapping 810 is to use a homogeneous transformation between non-distorted pixel coordinates and distorted pixel coordinates, using a homogeneous matrix representing the transformation from distorted pixel coordinates to non-distorted pixel coordinates. The transformation may be calculated by comparing the four corners determined during segmentation 230, 320, 335 with the corrected size of the received non-distorted document image. In one embodiment, by simply computing the transformation for each line and using linear interpolation to compute the new pixel coordinates, the transformation at each pixel point may not have to be computed. After mapping the new coordinates corresponding to the document with the reduced perspective distortion, the pixels are resampled 815.

Another aspect of the received image that may be adjusted in the distortion removal 240, 350 step is to adjust for distortion caused by the camera lens 820. Distortion caused by the camera lens may bend a straight line. This distortion depends on the particular lens used and the distance of the camera from the captured image. The curvature caused by lens distortion is generally radial, so that the lens distortion can be uniformly radially adjusted using a parameter approximating the degree of lens distortion. This parameter may be calculated by the system or may be input by the user.

Yet another aspect of the received image that may be adjusted in the distortion removal 240, 350 step is to adjust for distortions caused by the document not being perfectly flat. For example, if the imaged document is a page in a book, the page may have a curvature that creates distortion when captured photographically. While this distortion may be corrected in the distortion removal steps 240, 350. Other distortions may also be corrected, and the description herein of a particular type of distortion is not intended to limit the type of distortion that may be thinned or removed.

In step 365, a thresholding process is performed on the image generated in step 360. Thresholding 365 reduces the color depth of the image, with the potential advantage of reducing distortion from a flash that may be used when capturing the image. In one embodiment, the thresholding process 365 thins twenty-four bit color images to one bit black and white images. The potential advantages of thinning the image to black and white are that the effects produced by the camera's flash are reduced, reducing the amount of information required for processing by the system 300. The thresholding 365 can be performed in a number of ways. One embodiment may utilize presently known dithering techniques. Examples of dithering techniques can be found in existing imaging software, such as SNOWBOUND IMAGE LIBRARY, introduced by LASERFICHE. However, one drawback to using dithering techniques is that noise is generated in the image. Another embodiment of the thresholding 365 process involves selecting a global threshold for an image. In such a technique, a threshold is selected. Those pixels having a density greater than the threshold are considered white, while the remaining pixels are considered black. The threshold may be selected in a number of ways. In one embodiment, a threshold is selected and applied to all received images. A drawback of this technique is that variations in light conditions in the received image are not taken into account. In another embodiment, the threshold is calculated based on an analysis of the received image (e.g., its histogram). In one such embodiment involving analysis of the received image, it is assumed that the received image contains two peaks in its density histogram corresponding to the foreground and background of the received document image. This embodiment may not be ideal for images that do not make this assumption. Another embodiment of the thresholding 365 is to select a separate threshold for each pixel in the received image. This embodiment has the advantage of being responsive to changing conditions within the document, such as changes in illumination or background contrast. One embodiment of this technique is adaptive thresholding. In this embodiment, the previous pixel values are taken into account when analyzing each new pixel to determine the threshold. One way to do this is by calculating a weighted average for each pixel as each progressive pixel of the received image is analyzed. One potential drawback of this embodiment is that noise may be generated if the received image includes a colored document.

In step 370, a "lines of text" step is performed. In this step 370, the system determines the lines of text in the received document image. FIG. 9 depicts one embodiment of a "line of text" 370. In one embodiment, the system assumes that the pixels corresponding to text in the received document image are of a lower density than the background pixels of the received document image. In this embodiment, the sum of the densities of all pixels within each row of the received document image is calculated, 910. These sums are then used to identify local peaks and valleys in pixel concentration 920. These peaks and valleys are then analyzed to determine lines of text in the document. For example, if the received document image has black lines of text with a white background, the lines of pixels that are completely white will have the highest total density and the lines containing black text will have substantially lower pixel density. These differences in density can then be calculated so that lines of text can be determined. In a preferred embodiment, the "lines of text" 370 is performed both horizontally and vertically on the received document image.

Another embodiment of performing "lines of text" 370 is to perform a similar search for lines of text as the search performed in step 335. In one such embodiment, text of a captured document image is identified and lines are formed. This may be accomplished by identifying connected components in the captured document image and finding the nearest neighbors of those components. Connected components generally refer to those black or darker pixels that are adjacent to each other. These adjacent pixels are then connected as rows. This process is similar to the process described in steps 710, 720 and 730 of fig. 7.

Step 375 determines whether the captured document image should be in landscape or portrait format. In one embodiment, this is accomplished by comparing the results of the lines of text 370 in the vertical direction with the results of the lines of text 370 in the horizontal direction. In one embodiment, a direction having a greater number of lines is determined as defining the direction of the received document image. For example, in a received document image whose height is greater than its width, if the lines of text 370 in the vertical direction produce more lines than the lines of text 370 in the horizontal direction, then the received image document is determined to have a lateral direction. As another example, if the lines of text 370 in the horizontal direction produce a greater number of lines than the lines of text 370 in the vertical direction in the same received image document, then the received image document is determined to have a portrait orientation.

Step 380 determines the upright orientation of the document. In this step 385, it is determined whether the document is upright or inverted. FIG. 10 depicts one embodiment of determining whether the received document image is properly oriented upright 380. In one embodiment, each line of text is analyzed. Fewer lines of text may be analyzed, but this may produce less reliable results. In one embodiment, each line of text is divided into three sections 1010: an ascending section, a middle section and a descending section. The english language characters contain certain inherent statistical features that may be used in some embodiments to determine the upright orientation of the received document image. For example, the english language alphabet has only five characters that descend below the bottom boundary of a sentence (i.e., g, j, p, and y), while having many more characters that ascend above the top boundary of a sentence (e.g., b, d, f, h, I, j, k, I). In one embodiment, this characteristic of English language characters may be considered when calculating the corresponding number of pixels contained in the up and down sections 1020 and comparing those pixel densities 1030, 1040. For example, if the upper line character pixels of a received document image with english language characters are more than the lower line character pixels, it is likely to be in an upright state and not need to be rotated, whereas if the lower line character pixels of the same document are more than the upper line character pixels, the document may need to be rotated 180 degrees 1050.

In other embodiments, other characteristics of the English language characters may also be considered. For example, the characteristics of the pixel position in the horizontal direction may be considered. In addition, non-statistical methods may also be used to determine the upright orientation of the document, such as optical character recognition ("OCR"). Another embodiment may utilize a neural network approach. Furthermore, similar inherent features may be utilized for non-English documents. For example, Spanish language characters, similar to those in English, will have similar inherent characteristics. As another example, arabic language characters contain more descending characters, and embodiments can adjust for these features accordingly.

FIG. 12 depicts another embodiment of determining whether the received document image is properly oriented upright 380. In one embodiment, each letter line of text is determined using a connected component. Each component is classified by height into two categories, small and large, 1210. The center of the line of text is determined 1220. In one embodiment, the height of the small letter is used to determine the center 1220 of the line of text. This may improve the estimate of the center of the line of text if distorted, e.g., if warped across the page. The large letters are then matched to the center of the lines of text and grouped as ascending or descending based on their relative position to this center 1230. The total number of ascending and descending letters is calculated. In a typical English language document, the large characters will go up the top of the page. Thus, in one embodiment, if the number of large characters in the top row is greater than the number of characters in the bottom row, then the document need not be rotated 1250 before output. However, if the number of large characters in the lower row is greater than the number of larger characters in the upper row, then the document is rotated 1260 prior to output.

Then, according to the judgments of steps 380 and 375, the image is rotated in step 385. The new document image is then output 390.

As discussed above, the system imaged document may be captured with a film camera or a digital camera. As an alternative to these arbitrary devices, a stationary camera system may be used to capture the imaged document. FIG. 11 depicts an embodiment of a stationary camera system for capturing an image of a document. In this embodiment, the document 1110 is placed in the base 1120 of the system. In a preferred embodiment, the base 1120 of the system is a predetermined color, which may have the advantage of facilitating the segmentation process, discussed above. Extending from the base 1120 is a stand 1130 that can mount a camera 1140 and a flash 1150. The camera and flash may be permanently mounted in the stand 1130 or may be removable or adjustable. The flashlight may be placed anywhere on the base 1120 or stand 1130. In another embodiment, there is no other flashlight on the base 1120 or stand 1130. In yet another embodiment, the flashlight is separate from the base 1120 or stand 1130. The stationary system is then connected to a computer 1160 to perform the above-described processing of the received image document. In another embodiment, a computer may also be embedded in the device. In yet another embodiment, the captured image document may simply be stored in the digital camera 1140 or in another memory source and later connected to a computer for processing. Such stationary camera systems may also be placed in an office as part of a user's workstation.

There are several advantages to using a fixed camera system as opposed to an arbitrary camera. For example, when utilizing a stationary camera system, the amount of perspective distortion may be reduced because the document is likely to be vertical and centered with respect to the camera lens. Furthermore, another advantage may be that it allows the system to better adjust for lens distortion because the distance between the camera and the lens used is known, so that these parameters do not have to be calculated or estimated. Another potential advantage would be to reduce distortion produced by a camera flash. In a preferred embodiment, the flash 1150 of the fixture system will be positioned so as to reduce glare and other distortions created by the camera flash.

The methods for processing a captured image described herein are applicable to any processing application type, but are not limited to, particularly computer-based applications for processing a captured image. The methods described herein may be implemented in hardware circuitry, in computer software, or in a combination of hardware circuitry and computer software, and are not limited to specific hardware or software implementations.

FIG. 13 is a block diagram that illustrates a computer system 1300 upon which an embodiment of the invention may be implemented. Computer system 1300 includes a bus 1345 or other communication mechanism for communicating information, and a processor 1335 coupled with bus 1345 for processing information. Computer system 1300 also includes a main memory 1320, such as a Random Access Memory (RAM) or other dynamic storage device, coupled to bus 1345 for storing information and instructions to be executed by processor 1335. Main memory 1320 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 1335. Computer system 1300 further includes a Read Only Memory (ROM)1325 or other static storage device coupled to bus 1345 for storing static information and instructions for processor 1335. A storage device 1330 such as a magnetic disk or optical disk is provided and coupled to bus 1345 for storing information and instructions.

Computer system 1300 may be coupled via bus 1345 to a display 1305, such as a Cathode Ray Tube (CRT), for displaying information to a computer user. An input device 1310, including alphanumeric and other keys, is coupled to bus 1345 for communicating information and command selections to processor 1335. Another type of user input device is cursor control 1315, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 1335 and for controlling cursor movement on display 1305. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), allowing the device to specify positions in a plane.

The methods described herein involve the use of computer system 1300 to process captured images. According to one embodiment, the captured image is processed by computer system 1300 in response to processor 1335 executing one or more sequences of one or more instructions contained in main memory 1320. Such instructions may be read into main memory 1320 from another computer-readable medium, such as storage device 1330. Execution of the sequences of instructions contained in main memory 1320 causes processor 1335 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in main memory 1320. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the embodiments described herein. As such, embodiments described herein are not limited to any specific combination of hardware circuitry and software.

The term "computer-readable medium" as used herein refers to any medium that participates in providing instructions to processor 1335 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, and transmission media. Non-volatile media includes optical or magnetic disks, such as storage device 1330. Volatile media include dynamic memory, such as main memory 1320. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 1345. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infrared data communications.

Common forms of computer-readable media include a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.

Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 1335 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 1300 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infrared detector coupled to bus 1345 can receive the data carried in the infrared signal and place the data on bus 1345. Bus 1345 carries data to main memory 1320, and processor 1335 retrieves instructions from main memory 1320 and executes the instructions. The instructions received by main memory 1320 may optionally be stored on storage device 1330 either before or after execution by processor 1335.

Computer system 1300 also includes a communication interface 1340 connected to bus 1345. Communication interface 1340 provides a two-way data communication coupling to a network link 1375, which in turn is coupled to local network 1355. For example, communication interface 1340 may be an Integrated Services Digital Network (ISDN) card or a modem to provide a data communication for the corresponding telephone line type. As another example, communication interface 1340 may be a Local Area Network (LAN) card to provide a data communication to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 1340 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Network link 1375 typically provides data communication through one or more networks to other data services. For example, network link 1375 may provide a connection through local network 1355 to a host computer 1350 or to data equipment operated by an Internet Service Provider (ISP) 1365. ISP 1365 in turn provides data communication services through the world wide packet data communication network commonly referred to as the "internet" 1360. Local network 1355 and internet 1360 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 1375, which carry the digital data to and from computer system 1300, through communication interface 1340, are exemplary forms of carrier waves transporting the information.

Computer system 1300 can send messages and receive data, including program code, through the network(s), network link 1375 and communication interface 1340. In the case of the Internet, a server 1370 might transmit a requested code for an application program through Internet 1360, ISP 1365, local network 1355 and communication interface 1340. One such downloaded application is used to process captured images as described herein, in accordance with the present invention.

The received code may be executed by processor 1335 as it is received, and/or stored in storage device 1330, or other non-volatile storage for later execution. As such, computer system 1300 may obtain application program code in the form of a carrier wave.

Claims

1. A method for processing a captured image, the captured image comprising an imaged document, the method comprising:

detecting graphical information in the captured image relating to an edge of the imaged document;

isolating the imaged document from a background of the captured image based on graphical information related to edges of the imaged document;

calculating a deviation of the imaged document from a non-distorted perspective of the imaged document;

resampling pixels of the imaged document based on the calculated deviations;

detecting graphical information in the captured image relating to an orientation of the imaged document;

rotating the imaged document based on graphical information related to the orientation of the imaged document.

2. A method for processing a captured image comprising an imaged document, the method comprising:

detecting graphical information in a captured image relating to a transition between the imaged document and a remainder of the captured image;

selecting one or more lines from the graphical information corresponding to edges of the imaged document;

isolating the imaged document from a background of the captured image based on one or more lines corresponding to edges of the imaged document.

3. A method for processing a captured image comprising an imaged document, the method comprising:

calculating a corner of the imaged document based on an intersection of one or more lines corresponding to an edge of the imaged document;

isolating the imaged document from a background of the captured image based on one or more lines corresponding to edges of the imaged document;

calculating a deviation between coordinates of the corner of the imaged document and coordinates of a corner of a non-distorted perspective of the imaged document;

based on the calculated deviations, mapping coordinates of pixels of the imaged document to coordinates corresponding to a non-distorted perspective of the imaged document.

4. The method of claim 3, further comprising the steps of:

converting the undistorted imaged document to a bi-color representation of the imaged document;

calculating pixel intensity of the bi-color representation along a vertical axis of the non-distorted imaged document;

calculating a pixel density of the bi-color representation along a horizontal axis of the non-distorted imaged document;

identifying contrast in pixel concentration along vertical and horizontal axes of the undistorted imaged document;

identifying lines of text of the imaged document based on the contrast in pixel concentration;

determining a format of the non-distorted imaged document based on a direction of the text lines of the non-distorted imaged document relative to a size of the edges of the imaged document;

rotating the non-distorted imaged document according to the determination of the format of the non-distorted imaged document.

5. The method of claim 3, further comprising the steps of:

rotating the non-distorted imaged document according to the determination of the format of the non-distorted imaged document;

dividing a line of text into three portions along a longitudinal axis of the line of text;

determining a direction of the text line based on a comparison of pixel concentrations of the portion of the text line;

rotating the undistorted imaged document based on the determination of orientation.

6. A system for processing a captured image, the captured image including an imaged document, the system comprising:

means for detecting an edge of the imaged document;

means for reducing distortion of the imaged document;

means for rotating the captured image to a correct orientation.

7. An apparatus for capturing an image including a document, comprising:

a base in which the document is placed;

a cradle extending vertically from the base, wherein the cradle mounts a digital camera to capture the image;

a system coupled to the digital camera for storing and processing the image.

8. The apparatus of claim 7, further comprising:

a computer-readable medium carrying one or more sequences of one or more instructions which, when executed by one or more processors, cause the one or more processors to perform the computer-implemented steps of:

resampling pixels of the imaged document based on the calculated deviations;

9. A computer-readable medium for processing an image, the computer-readable medium carrying one or more sequences of one or more instructions which, when executed by one or more processors, cause the one or more processors to perform the computer-implemented steps of:

resampling pixels of the imaged document based on the calculated deviations;

10. An apparatus for processing a captured image, the captured image comprising an imaged document, the apparatus comprising;

one or more processors; and

a memory communicatively coupled to the one or more processors, the memory including one or more sequences of one or more instructions which, when executed by the one or more processors, cause the one or more processors to perform the steps of:

resampling pixels of the imaged document based on the calculated deviations;