WO 00/60532 PCTtUSOO/08494
TITLE: METHOD AND APPARATUS FOR RESTORATION OF LOW RESOLUTION IMAGES INVENTORS: Paul THOUIN and Chein-I CHANG
This application is entitled to the benefit of the filing date of U.S. Provisional Application Serial No. 60/127,316, filed April 1, 1999.
FIELD AND BACKGROUND OF THE INVENTION
Field of the Invention
The present invention relates in general to image processing, and more particularly to creating a higher resolution image from a lower resolution image.
Background Information
The invention described and claimed herein comprises a novel method and apparatus for improving the definition of a low- resolution image by taking advantage of characteristics of its bimodal distribution so as to produce an expanded image with improved definition. This can not only reconstruct image information deteriorated by low-resolution devices, but also can refine original low spatial resolution images so as to
significantly improve- image quality. An image is improved by iteratively solving a nonlinear optimization problem using a preferred Bimodal-Smooth-Average scoring method. Among the applications for the technique are improved optical character recognition, restoration of binary images and video frames.
As used herein, the terms "low" and "high" are comparative only: they refer to the comparison between two images and are not meant to imply any absolute degree of resolution.
As described in greater detail below, image acquisition consists of converting a continuous image into discrete values obtained from a group of sensor elements. While generally cheaper to produce and transmit, a low resolution imaging system generally produces a less accurate representation of the continuous image than would a high resolution imaging system. Common methods of expanding low resolution images to high resolution images by interpolation typically smooth over important details in the process. Linear interpolation tends to smooth image data at transition regions, resulting in blurry images. Cubic spline expansion allows for sharp transitions, but tends to produce a ringing effect at discontinuities.
Many applications (including OCR, video surveillance and video compression algorithms) would benefit from improved expansion techniques .
Processing of document images has traditionally ϋgen "•prβS fS.eα <jwwg",*e«ea binary format due to the savings in disk storage and computer processing. As computer performance and storage capacity have dramatically increased recently, grayscale scanning and processing of document images have become more prevalent. Text images traditionally have been obtained from documents. However due to the current increase in digital video a significant amount of text is found in this type of imagery as well.
Document images are typically acquired by a scanner, whereas video frames are most often captured by a digital camera. The image acquisition process in both cases consists of converting a continuous image into discrete values obtained from a group of sensor elements. Each sensor element produces a value which is a function of the amount of light incident on the device. For 8-bit grayscale quantization, the allowable range
of values for each sensor are integers from 0 (black) to 255 (white). The sensors are typically arranged in a non-overlapping grid of square elements, smaller elements result
in higher resolution imagery.
Text image resolution expansion has become increasingly important in a number of areas of image processing. Optical Character Recognition (OCR) of document images continues to be of great importance as we attempt to become a paperless society. Restoring text from video surveillance imagery is often crucial to law enforcement agencies. Digital video compression algorithms can also benefit from successful text resolution expansion techniques. Common methods of interpolation, which were not designed specifically for text images, typically smooth over the important details and produce inadequate expansion. This paper proposes a new nonlinear restoration technique for text images, which creates smooth foreground and background regions while preserving sharp edge transitions.
There has been significant research in both the text enhancement and resolution expansion fields. Some text enhancement efforts focus on fixing broken or touching characters [1] [2]. Other methods [3] [4] use degraded samples to model the image and then solve for the restored image using using a recursive technique. To remove the effects of noise, [5] uses a morphological filter and [6] uses pixel patterns to improve OCR results. A variety of methods have been proposed in order to improve contrast within text images including quadratic filters [7], soft morphological filters [8], non-linear mapping [9], and using a multi-resolution pyramid approach and fuzzy edge detectors [10]. Numerous resolution expansion methods have been published in the literature as well [ll]-[17]. Linear interpolation tends to smooth the image data at transition regions
and results in a high-resolution image that appears blurry. Cubic spline expansion allows for sharp transitions, but tends to produce a ringing effect at these discontinuities.
A number of research efforts investigated combining text enhancement with resolu¬
tion expansion in order to improve low-resolution text images. Shannon interpolation is performed with text separation from the image background in [18] to improve the
OCR accuracy of digital video. Deblurring is combined with linear interpolation in [19]
to enhance document images obtained from digital cameras. Both of these methods essentially perform the text enhancement and resolution expansion as two separate steps.
The proposed method computes resolution expansion using an algorithm specifically
designed to enhance text images to perform both of these steps simultaneously.
The goal of resolution expansion is to create an expanded image with improved defi¬
nition from observed low-resolution imagery. Acquisition of this low-resolution imagery
can be modeled by averaging a block of pixels within a high-resolution image. Resolution
expansion is an ill-posed inverse problem. For a given low-resolution image, a virtually
infinite set of expanded images can be generated by the observed data.
SUMMARY OF THE INVENTION
The foregoing problems are overcome, and other advantages are provided by a method and apparatus for enhancing an image by iteratively solving a nonlinear optimization problem using a preferred Bimodal-Smooth-Average (BSA) scoring method, in accordance with the invention, which minimizes a BSA score, defined as the weighted sum of separate bimodal, smoothness and average measures. This can not only reconstruct image information deteriorated by low-resolution devices, but also can refine original low spatial resolution so as to significantly improve image quality.
It is 'an object o'f the invention to restore a high-resolution image given a low-resolution image.
It is an object of the invention to provide a method and apparatus for improving the definition of a low-resolution image so as to produce an image with improved definition.
It is a further object of the invention to provide a method and apparatus for reconstructing image information deteriorated by low-resolution devices.
It is a further object of the invention to provide a method and apparatus for refining original low spatial resolution so as to significantly improve image quality.
It is a further object of the invention to automatically enhance text within document or video images so as to improve the accuracy of subsequent optical character recognition.
It is a further object of the invention to provide a method and apparatus for improving restoration of video images.
A principal feature of the invention is the iterative solution of a nonlinear optimization problem using a preferred Bimodal- Smooth-Average scoring method.
Among the advantages of the invention are automated enhancement of images.
These and other objects, features and advantages which will be apparent from the discussion which follows are achieved, in accordance with the invention, by providing a novel method and apparatus for improving image quality by iteratively solving a nonlinear optimization problem, using a Bimodal-Smooth-Average scoring method.
The various features of novelty which characterize the invention are pointed out with particularity in the claims annexed to and forming a part of this disclosure. For a better understanding of the invention, its advantages and objects, reference is made to the accompanying drawings and . descriptive matter in which a
preferred embodiment of the invention is illustrated.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and still other objects of this invention will become apparent, along with various advantages and features of novelty residing in the present embodiments, from study of the following drawings, in which:
Figure 1 is a schematic of a system suitable for carrying out the invention.
Figure 2 is a block diagram of the text resolution expansion system of the invention.
Figure 3 is a flow chart of the BSA process of the invention.
Figure 4 is an illustration of the results of a BSA restoration experiment, showing the progressive improvement of resolution with iterations.
Figures 5 illustrates the performance of the BSA process as compared to prior art techniques of linear interpolation and cubic spline in grayscale document restoration and grayscale video frame restoration, respectively.
Figures 6 and 7 illustrate the performance of optical character recognition following cubic spline (Figure 6) versus BSA (Figure 7) image enhancement .
Figure 8 compares OCR accuracy of spline vs. BSA for gray-level images and binary document images.
Figure 9 illustrates experimental results of binary document restoration.
Figures 10 and 11 illustrate experimental results of restoration of video frames.
DESCRIPTION OF THE PREFERRED EMBODIMENT
The invention is a novel method and apparatus which improves the definition of a low-resolution image by taking advantage of characteristics of its bimodal distribution so as to produce an expanded image with improved definition. This can not only reconstruct image information deteriorated by low-resolution devices, but also can refine original low spatial resolution so as to significantly improve image quality.
Referring to Figure 1, the invention may be illustrated in the context of an image acquisition system (1) providing image data (10) which serves as an input to a general purpose computer (100) programmed to carry out the following processes.
Image acquisition system (1) comprises a plurality of image sensors (2) which generate image data (10) concerning the image. This image data (10) serves as input to computer (100) via input means (101) . Computer (100) includes processing means (102) , programmed to execute the steps set forth herein so as to generate a high resolution image, which may be displayed or otherwise used via output means (103) .
Referring to Figure 2, a text resolution expansion system receives low-resolution image (10) , preprocesses the image in preprocessing module (201) and outputs the result to bimodal
estimation module (202); the bimodal estimation module (202) processes the data and outputs the result to resolution expansion module (203) for pixel replication and output to BSA restoration module (204) where the image is partitioned into overlapping blocks prior to BSA restoration; the BSA restoration module then outputs the result to a post-processing module (205) which provides an expanded resolution image (206) for display or other use.
The operation of the BSA restoration module (203) is illustrated in greater detail in Figure 3. An initial image (301) is provided and an initial BSA score is computed (302) ; an image update is then computed (303) so as to produce an updated image (304) . A new BSA score is computed for the updated image (305) and compared to the prior BSA score (306) ; if the new score does not represent an improvement over the prior score by a predetermined amount, the restoration process is complete; otherwise, another image update is computed (303) and the process continues iteratively until the improvement is below the predetermined amount.
The image acquisition process consists of converting a continuous image into discrete values obtained from a group of sensor elements. Each sensor element produces a value which is a function of the amount of light incident on the device. For 8-bit grayscale quantization, the allowable range of values for each sensor are integers from 0 (black) to 255 (white). The sensors are typically arranged in a non-overlapping grid of square
elements, smaller elements result in higher resolution imagery. high-resolution imaging system φ the number of sensors is adequate to represent
This low-resolution acquisition
results in significant blockiness and is insufficient to accurately represent this image. Each sensor element effectively averages the image within its section of the grid, resulting in an increased amount of gray pixels. Low-resolution imaging can therefore be thought
of as block-averaging high-resolution images.
The problem 1 is to restore the high-resolution image HIqTi qc
given only the low-resolution image /r,c, where r and c are thSΑtaBef dWC-ws an'tf columns in the low-resolution image and q is the resolution expansion factor. The image acquisition process of obtaining LIrfi from i-7,r,.c is 6iven bv
9 »=?-r t=q-c
The value of LIr>c is the average of the high-resolution pixels within the q x q neigh¬ borhood. Eq. (1) represents a typical image restoration problem where we are required to restore the i-7,r,,. based on the observed LITfi via the. relationship described by this equation. Since there are a great number of high-resolution images which may satisfy the constraint of the observed low-resolution image given by Eq. (1), image restoration is generally an ill-posed inverse problem.
The system presented iτi^lβ^ f Λt^ m ^ aΛmβmβ β^βΛHβKf WttβH
fcblock
five modules, the Preprocessing Module, the Bimodal Estimation Module, the Resolution Expansion Module, the BSA Restoration Module, and the Postprocessing Module. A low-resolution image is initially input to the Preprocessing Module where binary and color images are converted to grayscale images. The image histogram is computed by the Bimodal Estimation Module to estimate the means of the black and white pixel distributions. The initial image expansion is performed by the Resolution Expansion Module which uses pixel replication to create a high-resolution image from the low-resolution original. The image is divided into overlapping blocks of pixels which are restored independently by the BSA algorithm in the BSA Restoration Module. When a binary expanded image is desired, the Postprocessing Module performs a grayscale-to-binary conversion. The next section details the image expansion system for grayscale images, followed by a section on binary image expansion.
3 Grayscale Image Resolution Expansion
The image resolution expansion process for grayscale images, which includes both video frames and grayscale document images, is described in this section. It should be noted that our system depicted in Fig. ψ processes grayscale and color images in a very-similar fashion-." "Hwuπl * difference ''lui l-l«ll!U«yΛPrW"e'ssing
images are initially converted to grayscale images using standard techniques. These formerly color images are processed throughout the system as grayscale and the resulting
high-resolution expanded images are grayscale as well. Since the goal of our system is
to improve automated OCR results, this loss of color information is not a concern.
Once the system has a grayscale image, its histogram is then computed by the Bimodal Estimation Module. The distribution of a text image is typically bimodal with a large
white peak corresponding to the background and a smaller dark peak corresponding
to the text pixels. The peak of the white distribution μw and the peak of the black distribution μB are estimated from the calculated histogram. These values are required
later by the system. Pixel replication, where the value of each high-resolution pixel is
equal to its corresponding low-resolution pixel, is performed in the Resolution Expansion Module to form the initial expanded image. The BSA Restoration Module partitions
the image into overlapping blocks and restores the blocks independently. Finally, the
Postprocessing Module converts the high-resolution grayscale image to a binary image if a binary expanded image is desirable. In what follows, the BSA Restoration Module
will be described in detail.
A block diagram of the BSA Restoration process is shown in Fig 3. W, where L rep¬
resents the image at iteration i, BSAi is the score at iteration i, and δi is the change
to the image at iteration i. The restoration process can be summarized as follows. The
initial score BSAQ is computed from the original image Jo- At each iteration, the image
update δi and new score BSAi are computed. The iterations continue to minimize the
BSA score until a convergence is reached. At this stage, a restored image is produced.
The scoring function used by the BSA Restoration process is designed to measure how
well a group of pixels within an image represent the desired properties of a text image.
This function, referred to as the BSA scoring function, is expressed as the weighted sum
of a Bimodal score B, a Smoothness score S, and an Average score A. More precisely,
the BSA score is defined by
BSA(x) = XlB(x) + λ2S(x) + X3A(x) (2)
where λ\, λ2, and λ3 are Lagrange multipliers and x is a block of pixels to be restored.
Throughout this paper, a block x is defined both as a group of 4 x 4 low-resolution
pixels or as the 4<? x 4q high-resolution pixels that are derived from them. The 4 x 4
size was specifically chosen because it contains enough pixels to adequately measure text
characteristics but is not too large to be computationally burdensome. The goal of the
restoration process is to iteratively solve for the block of pixels x that minimizes the
BSA{x) score given by Eq. (2). Each of the three scores will be bifejπy discussed in the
following subsections followed by a description of an iterative minimization procedure.
.4. The Bimodal Score
The typical distribution of a text image contains two peaks, a large one at μu,/..te-
which normally represent the page's background, and a secondary peak at μblack rep¬
resenting the foreground text JHMflHHfl-fltt From the histogram of the given low-resolution text image, estimates of the means for the black and white distributions
are calculated. These means are used to compute the bimodal score B(x), which mea¬
sures how far an image block x is from bimodal. The bimodal score used in this paper
is defined by
B(x) ~ ∑(xr,c ~ μblack)2{Xτ,c ~ whU Ϋ (3) r,c where r and c are the row and column indices within the block being evaluated.
The bimodal function for μ^ ck = 50 and μwhite = 255 is plotted in Fig. 4. When a
pixel value within x is close to either μ6.αcj. or μWhit , its contribution to B(x) is minimal.
The bimodal minimum score of B(x) = 0 means that the image is perfectly bimodal.
the value of every pixel is equal to either μWhite or μbiack- Solving for the block of pixels
j that minimizes B(x) produces a strongly bimodal image, which is one of the desired
properties of this proposed text restoration technique.
To minimize B(x), both the first and second derivatives of the bimodal score will be
computed. The score given by Eq. (3) is differentiable, the partial derivative of B(x)
with respect to pixel xr)C is
dB(x) Λ 3 _. /
-QX~ — = c ~ {μwhite + μwocfc)z;,c (4)
+2(μ
ω/
li
te + 4μ
tu/
lit
eμ6Jαcfc + βblack)
xr,c ~
+ βblack)
and the second partial derivative for the bimodal score is
lβ(χ) dχ = 12ιJ,e - 12(μωWte + μbi .)a:r,c + 2(μ£,Wte + 4μtt,ωteμMβcfc + μ^) (5)
Partial derivatives of the bimodal score B(x) are independent of neighboring pixels. The estimated means of the bimodal distribution, μbi k and μWhite, are known a priori and
their appropriate constants can be pre-computed.
B. The Smoothness Score
With the exception of edges, text images tend to be very smooth in both the fore¬
ground and background regions which results in neighbors with similar values. A smooth¬
ness score, which is computed for each block of pixels, is introduced to measure this feature. For this proposed algorithm, a simple statistic using only the four nearest
neighbors of each pixel is used. Other more sophisticated smoothness measures could
be implemented as well. The smoothness score S(x) used by this technique is given by
S{X) = ∑[(Zr-l,c - Zr,c)2 + (xr,c-i - Xr,cf + (Zr,c+1 ~ χτ,c)2 + { τ+l,c ~ r,c?} (6) r,c
where r and c are the row and column indices within the block being evaluated. The
minimum value of S(x) = 0 occurs when all pixels have identical values.
The first and second partial derivatives are calculated in a straightforward manner.
The first partial derivative of this smoothing score with respect to pixel xr,c is
dS(x)
= -2(_Cr_lιC + XrtC-i + Ir,c+ι + Xr+i.c) + 8xr dx (7) r,c
The second partial derivatives are nonzero only when
d2S(x) = g d*S(x) _ . d*S(x) = ,
^r,c δxr,caa; r±ι,c dxrtCdxr c±1
and are equal to zero for all other combinations. The first partial derivative of the
smoothness score S(x) is a function of the four neighboring pixels. The second partial derivatives are non-zero only with respect to their corresponding neighbors.
C. The Average Constraint Score
It is reasonable to require that the average of a group of high-resolution pixels is
close to the original value of the low-resolution pixel from which they were derived.
For each block of low-resolution pixels, an average score A(x) is used to measure how
well the restored high-resolution pixels meet the average constraint imposed by their corresponding low-resolution pixels. Fig. 5 shows four low-resolution pixels whose values
are μi, μ2, μ3, and μ . The q x q group of high-resolution pixels that are being restored
from pixel μi are represented by {x , 1 < (r, c) < q}. The average score for this 2 x 2
block is expressed by
where i is the index for the low-resolution pixels, μi is the value of each low-resolution
pixel, and x^c are the restored high-resolution pixels corresponding to pixel μj. The
initial high-resolution image formed by using pixel replication always has an average
score of zero because it satisfies the constraint.
The first partial derivative for the group of high-resolution pixels corresponding to pixel μi is equal to
-Wέέ*-^ (ιo)
The second partial derivatives are simply,
^- = Jr V(r,cr..eι) € ««> (11) dxiLcOXrύci Q
Both the first and second partial derivatives of the average score A(x) are non-zero only
with respect to pixels within the q x q group of high-resolution pixels corresponding to
the low-resolution pixel from which they were expanded.
IV. SOLVING FOR THE RESTORED IMAGE
The goal of the restoration algorithm is to solve for the image block that minimizes
the scoring function BSA(x) introduced in Eq. 2. Throughout this paper, a block x is defined both as a group of 4 x 4 low-resolution pixels and as the 4q x 4q high-resolution
pixels that are derived from them. The 4 4 size was specifically chosen because it contains enough pixels to adequately measure text characteristics but is not too large
to be computationally burdensome. The goal of resolution enhancement is to create a
restored image with improved resolution. This is illustrated in Fig. 6 where a 4 x 4
block of low-resolution pixels is expanded by a factor of q = 4 to create a 16 x 16 block
of high-resolution pixels.
Pixel replication, where every value within a q x q neighborhood is identical to the corresponding low-resolution pixel, is used for the initial expansion. Each 4ς x 4q block of high-resolution pixels is restored independently using iterative optimization techniques
described in this section to solve for the block which minimizes the BSA score. At each
iteration, the first and second partial derivatives of the BSA scoring function are used to determine the image update. To avoid block boundary discontinuities only the center
3q x 2>q pixels, which are highlighted in Fig. 6, are updated. The entire image is therefore
divided into blocks that overlap by one quarter, or q x 4<j pixels, and can be restored independently. This iterative minimization of the BSA score continues until convergence
is reached resulting in the restored image.
Initially, each 4q x 4ς block of pixels x is converted to a (4g)2-long vector x using
raster scanning,
x[q(r - 1) + c] = x(r, c) for 1 < r, c < q (12)
A small distance away from x the BSA function can be represented by its second order
Taylor series approximation [21],
BSA(x + δ) « BSA(x) + [VBSA(x)]δ + (13)
and the change in BSA is given by
where δ is the small change to the image vector x, VBSA(x) is the gradient, and H is
the Hessian matrix. The (4g)2 x (4q)2 Hessian given below by Eq. (15) is the symmetric
matrix of mixed partial second derivatives, which shows how a change in two variables affects the BSA function.
Since the Hessian matrix is symmetric,
d2BSA d2BSA
(16) dxidxj dxjdxi
only half of the matrix needs to be computed. To maximize the function BSA(x), the
variables in the Hessian matrix are first made independent. To do this the Hessian is
diagonalized using a similarity transform. Each eigenvector of the Hessian matrix is
placed in a separate column to form a unitary eigenmatrix E . That is, the product
of the eigenmatrix with its transpose is equal to the identity matrix EET = I. When the Hessian matrix is pre-multiplied by the transposed eigenmatrix and post-multiplied
by the eigenmatrix, the resulting matrix ETHE is diagonal. Because the Hessian is
real and symmetric, it is always diagonalizable. As an example, shown in Fig. 7 is the Hessian matrix for an 8 x 8 block of pixels. Because of the neighborhood dependence
of the scoring functions, their contributions to the Hessian are near the main diagonal.
The similarity transform results in the diagonalized Hessian, ETHE, which is shown at
right.
The Taylor series approximation to the change in the scoring function BSA can now
be expressed in terms of the (4q)2 x (4q)2 Hessian matrix H, its (4q)2 x (4ς)2 eigenmatrix
E, the 1 x (4<j)2 gradient of the scoring function VBS-4(x), and the (4ς)2 x 1 small change
— * in the image vector δ,
With the following substitutions,
VBS-4'(x) = [VBSA{x)}E (18)
δ< = Eτδ (19)
H' = ETHE (20)
Eq. (17) can be simplified to
ABSA = [VBSA'{x))[δ') + (21)
The functional minimum is achieved by stepping in the direction
if' = H'- VBSA'(x) (22)
in the transformed domain, which is simply δ = Eτδ' in the pixel domain. For each
iteration, the image update δ1 is determined. The iterations continue until convergence
is reached, resulting in a desired restored image. 4l»
An example of this iterative image restoration process is shown in Fig.m The original
4 4 block of pixels is expanded by a factor of q = 4 using pixel replication to produce
a 16 x 16 high-resolution image shown in Fig-JBjJ. As the iterative restoration process
proceeds in Figs. Λ Λi the image becomes more bimodal and smooth resulting in a λ
greatly improved image. The majority of gray pixels that occur between characters are replaced with either black or white values, resulting in a strongly bimodal distribution. The resulting image is also smooth in both the foreground and background regions while
maintaing the constraint that the average of each 4 4 block of high-resolution pixels is
close to the original value of each corresponding low-resolution pixel. The minimization
more bimodal and smooth and these two scores are reduced.
The average score increases as the restoration proceeds, but this score is still
significantly smaller than the bimodal and smoothness scores. Minimization of the
BSA score produces a restored image that is the optimal combination of these bimodal, smoothness, and average measures.
V. EXPERIMENTAL RESULTS
The proposed BSA restoration algorithm was compared to several common expansion
methods, including pixel replication, linear interpolation, and cubic spline expansion. In
linear interpolation, a linear fit is calculated between all pixels within each column, and
then repeated for all pixels within each row. These images naturally tend to be smooth,
without sharp discontinuities, producing blurry results. Cubic spline expansion [12]
approximates the given discrete low-resolution pixels as a smooth continuous curve ob¬
tained from the weighted sum of cubic spline basis functions and resamples the curve to
obtain the high resolution image. This method allows for sharp edges but often over¬
shoots at these discontinuities, producing a ringing effect. The BSA text restoration
technique creates smooth foreground and background regions and permits sharp edges at transition regions, while maintaining the low-resolution average constraint. Images
restored with this technique are shown to be both qualitatively and quantitatively supe¬
rior to other common resolution expansion methods. Two experiments were conducted to numerically compare the restoration methods. The first experiment involved creat¬
ing low-resolution images from high-resolution originals, expanding the low-resolution
imagery, and then measuring the distance to the originals. The second experiment involved scanning low-resolution document images, expanding the images with the various
techniques, and using OCR accuracy to measure the success of the restoration.
To qualitatively illustrate the differences between these resolution expansion techniques, Fig. w shows resulting images obtained from linear interpolation, cubic spline
expansion, and the proposed BSA-based restoration technique. The word "applications"
from an image scanned at 100 dpi using 8-bit grayscale quantization is shown in Fig.
i ft a) where significant blockiness is apparent. Linear interpolation, by a factor of four
was used to create the image in
which is very blurry and lacks good contrast.
Fig. (c) depicts the resulting image from cubic spline expansion which has better con¬
trast but is still not sharp at the edges. The image obtained using BSA restoration in
Fig. tf'Cd) has excellent contrast and sharp edges and is supeήoc to the images obtained using other interpolation methods for this example.
These results clearly demonstrate several advantages of this technique ma was de¬ signed specifically for text images. The BSA-restored image in Fig. 11(e) is strongly
bimodal and has both smooth- foreground and background regions.
As a final experiment, we used OCR accuracy to numerically compare our algorithm against existing resolution expansion methods. A set of 122 full-page journal documents from the University of Washington English Document Image Database I CD-ROM were
Fig. _-- (d) has excellent contrast and sharp edges and is superior to the images obtained using other interpolation methods for this example.
The first experiment to quantitatively measure image restoration success involved
creating low-resolution images by block-averaging images as described by Eq. (1). Restored images are then compared with the original to determine the success of restoration
numerically. The mean squared error (MSE) was used to compare the various methods of image resolution expansion. The definition of mean squared
is
^ R c MSE = — - ∑ ∑(originalr c - restoredr c)2 (23)
RC r=l c=l where R and C are the number of rows and columns in the images.
These results clearly demonstrate several advantages of this technique that was de¬ signed specifically for text images. The BSA-restored image in Fig. 11(e) is strongly
used. These binary documents were initially printed by a laser printer wTtn '300~i-.δ"ts per
inch (dpi) resolution. Each page was then scanned using 8-bit grayscale quantization at
75 dpi to create low-resolution original images. These 75 dpi resolution pages were then
expanded using various resolution expansion methods by a factor of four to create 300 dpi
images which were processed by Caere's OmniPage Pro 7.0 commercial OCR package,
the world's best-selling desktop OCR software. For all cases within these experiments, both spline interpolation and BSA restoration produced superior results to linear interpolation, therefore the linear interpolation results will not be discussed further. The
resulting text files were compared to the ground truth provided by the University of Washington CD-ROM using the OCR Accuracy Report Version 5.3 software developed
at UNLV-ISRI. There were a total of 339,575 characters in these 122 images. Cubic
spline interpolation resulted in 36,959 character errors and the BSA-based method had 30,668 character errors for an overall improvement of 17.0% for this set of images.
spline interpolation anϋ BSA expansion. The results are sorted based on the spline OCR results which are shown as a solid line. For each image, the BSA result is plotted as an "X" if the OCR accuracy is worse than spline and plotted as an "O" where the accuracy has been improved. The BSA restoration resulted in improved OCR accuracy for 72% of the images in this test set. Even in the cases where cubic spline resolution expansion
improved OCR more than the BSA algorithm, the images produced by the BSA were
typically more visually appealing. 4flκ9flBBVHHHHHHIB^BHIBBHBS8fc
For the images in this experiment, the expansions produced by the BSA algorithm'produce superior OCR accuracy results compared to other existing methods.
Binary Image Resolution Expansion
Document images have traditionally been scanned using binary thresholding due to
the inherent binary nature of text images. Binary image processing therefore still retains
great significance in the document research community. The BSA restoration method described in the previous section is well suited to process grayscale images but is not
capable of restoring binary images. In order for our system to restore binary text images,
each original binary image is first convolved with a spatial mask in the Preprocessing a. eO
Module shown in Fig. φ to create a grayscale image. The Bimodal Estimation, Reso- is lution Expansion, and BSA Restoration Modules used for binary images are identical to those used for grayscale images in our system. Once the high-resolution grayscale
image has been created, a global threshold TbinaTy is used in the Postprocessing Module
to convert the grayscale image back to a binary image. This threshold is computed to
be halfway between the white and black bimodal peaks
rp μwhite ~τ~ μbiack
This section will detail the conversion from a binary image to a grayscale image using a
spatial convolution mask in the Preprocessing Module.
4.1 Binary to Grayscale Conversion
The first step of the binary restoration process is to convert the original low-resolution
binary image into a low-resolution grayscale image. In our system, 8-bit grayscale quan-
tization is used so that the allowable range of values for each grayscale pixel are integers
from 0 (black) to 255 (white). Convolution is performed to create gray values from a group of neighboring binary pixels. The discrete convolution of two images /I)V and gXtV,
denoted by /X)1, * gX V, is defined by
(10) r=0 c=0
where R is the number of rows and C is the number of columns within each image.
The expression in Eq. (10) is a mathematical representation of a spatial convolution
mask. Because convolution involves flipping pXιV, it must be symmetric about its center.
Throughout this paper the 3 3 mask gXιV defined by
wc ws wc
9x,y wM + 4ws + 4KJC s wM s (11)
Wc Ws c
is used to perform this convolution, where WM is the relative weight for the middle pixel,
ws represents the four side pixels, and wc corresponds to the weight for each of the four
corner pixels. The values of the mask are equal to zero for all other possible locations.
Spatial convolution can be thought of as a weighted averaging over a neighborhood of
pixels. A constraint is enforced on these weights, based on the distance from the center
of the mask,
wM > ws > c > 0 (12)
so pixels closer to the center are more heavily weighted. The value of the grayscale pixels xgτay are easily computed from the original binary pixel xr,c along with its eight
neighbors
r-""Qy [ M T, + WS {Xr-l,c + χr+l,c + χr,c-l + χr,c+l ) + (13) τ'c WM + ^ s + 4wc
Wc (Xτ-l,c-l + Zr-l,c+l + ^r+l.c-l + r+l,c+\))}
4.2 Constrained Binary to Grayscale Conversion
In our system, we impose a further constraint on the computed grayscale pixels. If
the binary pixel is black, then the grayscale pixel should be dark as well. Specifically,
if x = 0 then 0 < x9 Tay < 127 and similarly if x = 255 then 128 < x9ray < 255.
However, this constraint is not always the case by performing the straightforward 3 x 3 convolutions as shown in Figs 8(b-c) where the tail pixel of the character "e" which is black in Fig 8(a) has a very light value of 198 in Fig 8(b) and 144 in Fig 8(c)
In order to compute each grayscale pixel while enforcing this constraint, a parameter Λ is introduced to measure the effect of the surrounding eight pixel neighbors on pixel
x[r c)
Δ = 4»,,- «,e lωs(lr-l,e + Sr + l.e + ϊr,c-l + Ir,e+l) + (1 ) tUc(Xr-l,c- l + Ir-l,c+l + χτ + \ c-1 + χτ+\ c+l )j
The range of this measure is 0 < Δ < 255. The computed grayscale pixel xsτay is then determined based on the original value of the binary image x in the following manner
if ι = 0 then xsrαv(r, c) = L- J (15)
x = 255 then ιsrαy(r, c) = 128 + [- J
w here [x\ is the integer floor rounding function defined by the largest tegei that is
less than or equal to i To quantitatively compare our system with other expansion methods, resulting images were compared with cubic spline expansion to measure OCR performance Shown in Fig ^H( 1S the resulting 300 dpi image obtained using cubic spline expansion on a paragraph from one of the University of Washington document images. The text file created by OCR is shown in Fig. *ψ(b) where mistakes have been highlighted For this example, the OCR results for the cubic spline image had 7 areas where mistakes were made. The same image paragraph was processed by our system resulting in the image (a) which has better contrast than its corresponding cubic spline image in Fig OCR results are improved as well, only 4 mistake areas are highlighted in Fig
PAGE 33 INTENTIONALLY OMITTED
Experiment - University of Washington Binary Documents
To measure the success of the system in 'restoring binary document images, a second experiment was .conducted using a total of 48 document pages from the University of
Washington database. These images were scanned at 100 dpi using binary quantization
and then expanded by a factor of three using the binary BSA restoration technique described in Section 4 to create 300 dpi images. The binary enhancement process for
the sample word "representative" from this experiment is shown in Fig. 9f to visually
illustrate the improvements. The original binary image in Fig. 9 ^(a) was convolved with a spatial mask of weights ws = 1 and wc = 0.707 to produce a grayscale image in
Fig. ^ (b). This low-resolution grayscale image was enhanced by the system to produce
a high-resolution grayscale image in Fig. flp(c) which was converted to a binary image to produce a restored image in Fig. fc(d). The resulting high-resolution image was
noticably less blocky than the original.
To quantitatively measure the success of the system in enhancing binary document
images, OCR accuracy was compared before and after resolution expansion. The ex¬
panded images produced by our system were compared with 300 dpi images created by
replication. βBHHBBB^||00VBHBH->IHVβ fl'^^'^'vBM
There were a total of over 140,000 characters in this dataset. The overall
character accuracy for "the original images was 82.0% and the overall character accuracy for the restored images was 89.1% which was a 39.5% reduction in errors, a significant improvement over pixel replication. £** ^ • )
5.3 System Performance
The previous experimental results demonstrate the advantages of our text restoration system compared to standard methods of interpolation such as linear interpolation and cubic spline expansion. The system was tested using images from the widely available University of Washington document database and evaluated using the commercial Caere OmniPage Pro OCR software package to measure character accuracy. For the document and video images in these experiments, the images produced by our restoration system resulted in a higher OCR accuracy compared to other standard methods.
As stated earlier flHHHHB< ^e βoa' °^ this resolution expansion system is to improve the OCR accuracy- Real-time image processing is not a requirement. Our system takes approximately 10 minutes to expand a single full-page 100 dpi document image by a factor of 3 on a 250 MHz workstation- Cubic spline interpolation is much faster than our system and takes about 10 seconds for a single document page. Video frames are typically smaller than low-resolution document images and are therefore processed more quickly by the system. A 320 x 240 video frame takes approximately 2 minutes for our system to expand by a factor of three. Cubic spline expansion of a video frame only takes about 2 seconds. If the speed concern becomes an issue in the future, our approach has a potential for massive parallelization because the images can
be divided into blocks of pixels which can be restored independently.
ur system was shown to be capable of enhancing grayscale documents and video images as well as binary document images. The success of the system was demonstrated by experiments using images from a standard document im¬
age database and a commercial OCR package. Restoration of grayscale images was performed by optimizing bimodal, smoothness, and average (BSA) scores that measure desired properties of text images. These scores were combined to form a single scoring
function which produced images that were strongly bimodal and smooth, while satisfy¬
ing the average constraint score. When the original image was binary, it was initially
converted to a grayscale image using a spatial convolution mask and then processed as
grayscale images. These resultant images restored using our system were shown to be superior to images expanded using existing linear interpolation and cubic spline expansion
techniques.
Optical character recognition accuracy, was also used to quantitatively measure the success of various resolution expansion methods. Both binary and grayscale text images
restored with our system were shown experimentally to improve OCR accuracy compared
to linear interpolation and cubic spline expansion. Despite that our system was designed
to enhance text images, it can also be used to accurately expand checks, tables, charts,
graphs, line drawings, and other types of documents as well due to their similarities.
References
[1] P. Stubberud, J. Kanai, and V. Kalluri, "Adaptive image restoration of text images
that contain touching or broken characters," ICDAR95 Proceedings International
Conference on Document Analysis Recognition, pp. 778-781, 1995.
[2] A. P. Whichello and H. Yan, "Linking broken character borders with variable sized
masks to improve recognition," Pattern Recognition, Vol. 29, No. 8, pp. 1429-1435,
1996.
[3] M. Y. Jaisimha, E. A. Riskin, R. Ladner, and S. Werner, "Model-based restoration
of document images for OCR," Proceedings of the SPIE, Vol. 2660, pp. 297-308,
1996.
[4] MN. Yoon, S.W. Lee, and J.S. Kim, "Faxed image restoration using Kalman filter¬
ing," ICDAR95 Proceedings International Conference on Document Analysis Recog¬
nition, pp. 677-681, 1995.
[δ] J. Liang, R. M. Haralick, and I. T. Phillips, "Document image restoration using
binary morphological filters," Proceedings of the SPIE, Vol. 2660, pp. 274-285, 1996.
[6] T. Akiyama, N. Miyamoto, M. Oguro, and K. Ogura, "Faxed document image restoration method based on local pixel patterns," Proceedings of the SPIE, Vol.
3305, pp. 253-262, 1998.
[7] G. Ramponi and P. Fontanot, "Enhancing document images with a quadratic filter,"
Signal Processing, Vol. 33, pp..23-34, 1993.
[8] L. Koskinen, H. Huttunen, and J. T. Astola, "Text enhancement method based on
soft morphological filters," Proceedings of the SPIE, Vol. 2181, pp. 243-253, 1994.
[9] Y.C. Shin, R. Sridhar, V. Demjanenko, P. W. Palumbo, and J. J. Hull, "Contrast
enhancement of mail piece images," Proceedings of the SPIE, Vol. 1661, pp. 27-37,
1992.
[10] F. Sattar and D. B. H. Tay, "On the multiresolution enhancement of document
images using fuzzy logic approach," ICPR98 Proceedings International Conference
on Pattern Recognition, pp. 939-942, 1998.
[11] T. C. Chen and R. J. P. de Figueiredo, "Image decimation and interpolation tech¬
niques based on frequency domain analysis," IEEE Transactions on Communica¬
tions, Vol. 32, No. 4, 1984.
[12] H. S. Hou and H. C. Andrews, "Cubic splines for image interpolation and digital
filtering," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 26,
No. 6, December 1978.
[13] N. B. Karayiannis and A. N. Venetsanopoulos, "Image interpolation based on vari- ational principles," Signal Processing, Vol. 25, pp. 259-288, 1991.
[14] R. G. Keys, "Cubic convolution interpolation for digital image processing," IEEE
Transactions on Acoustics, Speech, and Signal Processing, Vol. 29, No. 6, pp. 1153-
1160, 1981.
[15] A. D. Kulkarni and K. Sivaraman, "Interpolation of digital imagery using hyper-
space approximation," Signal Processing, Vol. 7, pp. 65-73, 1987.
[16] V. S. Nalwa, "Edge-detector resolution improvement by image interpolation," IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 9, No. 3, pp. 446-
451, 1987.
[17] R. R. Schultz and R. L. Stevenson, "A Bayesian approach to image expansion for
improved definition," IEEE Transactions on Image Processing, Vol. 3, No. 3, pp.
233-242, 1994.
[18] H. Li, O. E. Kia, and D. S. Doermann, "Text enhancement in digital video," Pro¬
ceedings of the SPIE, Vol. 3651, pp. 2-9, 1999.
[19] M. J. Taylor and C. R. Dance, "Enhancement of document images from cameras,"
Proceedings of the SPIE, Vol. 3305, pp. 230-241, 1998.
[20] R. W. Schafer, R. M. Mersereau, and M. A. Richards, "Constrained iterative
restoration algorithms," Proceedings of the IEEE, Vol. 69, No. 4, April 1981, pp.
432-450.
[21] J. Skilling and R. K. Bryan, "Maximum entropy image reconstruction: general algorithm," Mon. Not. Royal Astronomy Society, Vol. 211, 1984, pp. 111-124.
[22] P. D. Thouin and C. -I Chang, "A Method for Restoration of Low-Resolution Text
Images," Symposium on Document Image Understanding, pp. 143-148, 1999.
References
[1] Caere Corporation, "OmniPage Pro 9.0 Software Package," (1998).
[2] Xerox Corporation, "TextBridge Pro 9.0 Software Package," (1999).
[3] R. G. Keys, "Cubic convolution interpolation for digital image processing," IEEE
Transactions on Acoustics, Speech, and Signal Processing 29(6), pp. 1153-1160
(1981).
[4] T. C. Chen and R. J. P. de Figueiredo, "Image decimation and interpolation techniques based on frequency domain analysis," IEEE Transactions on Communica¬
tions 32(4), pp. 479-484 (1984).
[5] H. Derin and H. Elliott, "Modeling and segmentation of noisy and textured images
using Gibbs random fields," IEEE Transactions on Pattern Analysis and Machine
Intelligence 9(1), pp. 39-55 (1987).
[6] S. Geman and D. Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images," IEEE Transactions on Pattern Analysis and Machine Intelligence 6(6), pp. 721-741 (1984).
[7] A. D. Kulkarni and K. Sivaraman, "Interpolation of digital imagery using hyper- space approximation," Signal Processing 7, pp. 65-73 (1987).
[8] V. S. Nalwa, "Edge-detector resolution improvement by image interpolation,"
IEEE Transactions on Pattern Analysis and Machine Intelligence 9(3), pp. 446-
451 (1987).
[9] H. S. Hou and H. C. Andrews, "Cubic splines for image interpolation and digital
filtering," IEEE Transactions on Acoustics, Speech, and Signal Processing 26(6),
pp. 508-517 (1978).
[10] N. B. Karayiannis and A. N. Venetsanopoulos, "Image interpolation based on vari-
ational principles," Signal Processing 25, pp. 259-288 (1991).
[11] R. R. Schultz and R. L. Stevenson, "A Bayesian approach to image expansion for
improved definition," IEEE Transactions on Image Processing 3(3), pp. 233-242
(1994).
[12] M. Shridhar, M. Ahmadi, and M. El-Gabali, "Restoration of noisy images modeled
by Markov random fields with Gibbs distributions," IEEE Transactions on Circuits
and Systems 36(6), pp. 884-890 (1989).
[13] P. D. Thouin and C. -I Chang, "A Method for Restoration of Low-Resolution Text Images," Symposium on Document Image Understanding, pp. 143-148 (1999).
[14] J. Skilling and R. K. Bryan, "Maximum entropy image reconstruction: general
algorithm," Mon. Not. Royal Astronomy Society 211, pp. 111-124 (1984).
[15] R. W. Schafer, R. M. Mersereau, and M. A. Richards, "Constrained iterative
restoration algorithms," Proceedings of the IEEE 69(4), pp. 432-450 (1981).
US Provisional Application Serial No. 60/127,316, filed April 1, 1999 (including its attachments) and the following attached documents are incorporated herein by reference as if fully set forth herein:
1. "A Method for Restoration of Low-Resolution Bimodal Images" (authored by CHEIN-I CHANG and PAUL THOUIN, unpublished, but scheduled for presentation to 1999 Symposium on Documentation Image Understanding Technology at Annapolis, MD April 14-16, 1999)
2. "Constrained nonlinear restoration of JPEG compressed low- resolution text from grayscale images using a Gibbs-Markov random field prior", authored by CHEIN-I CHANG and PAUL THOUIN, and published in IS&T/SPIE 10th Annual Symposium, Electronic Imaging '98: Science & Technology (presented at San Jose, California January 24-30, 1998); and
3. "A Gibbs-Markov Random Field Approach to Restoration of JPEG- compressed Text Images" (unpublished, authored by CHEIN-I CHANG and PAUL THOUIN) .
While a specific embodiment of the invention has been shown and described in detail to illustrate the application of the principles of the invention, it will be understood that the invention may be embodied otherwise without departing from such
principles and that various modifications, alternate constructions, and equivalents will occur to those skilled in the art given the benefit of this disclosure. Thus, the invention is not limited to the specific embodiment described herein, but is defined by the appended claims.