1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586
|
------------------------------------------------------------------------------
unpaper 0.3 - written by Jens Gulden 2005-2007
Licensed under the GNU General Public License (GPL), see file LICENSE.
This software comes with no warranty.
------------------------------------------------------------------------------
Overview
------------------------------------------------------------------------------
unpaper is a post-processing tool for scanned sheets of paper, especially for
book pages that have been scanned from previously created photocopies.
The main purpose is to make scanned book pages better readable on screen
after conversion to PDF. Additionally, unpaper might be useful to enhance
the quality of scanned pages before performing optical character recognition
(OCR).
unpaper tries to clean scanned images by removing dark edges that appeared
through scanning or copying on areas outside the actual page content (e.g.
dark areas between the left-hand-side and the right-hand-side of a double-
sided book-page scan).
The program also tries to detect disaligned centering and rotation of pages
and will automatically straighten each page by rotating it to the correct
angle. This process is called "deskewing".
Note that the automatic processing will sometimes fail. It is always a good
idea to manually control the results of unpaper and adjust the parameter
settings according to the requirements of the input. Each processing step can
also be disabled individually for each sheet.
Input and output files can be in either .pbm , .pgm or .ppm format, thus
generally in .pnm format, as also used by the Linux scanning tools scanimage
and scanadf.
Conversion to PDF can e.g. be achieved with the Linux tools pgm2tiff, tiffcp
and tiff2pdf.
Installation
------------------------------------------------------------------------------
On i386-Linux systems, the pre-compiled binary included in the distribution
archive can be used. Execute as root:
cp bin/unpaper /usr/local/bin/
Alternative ways to install are running
./make.sh
./make.sh install
Or compile manually and copy the resulting executable to a directory in the
search path.
Compilation
------------------------------------------------------------------------------
The easiest way to manually compile unpaper is running:
gcc -lm -o unpaper unpaper.c
You can also perform this step by invoking the script make. In this case,
optional compilation parameters set via the gobal environment variable $CFLAGS
will also take effect.
unpaper has been tested to be compilable on several hardware platforms and
operating systems, including Linux/Intel, Macintosh/G4, Macintosh/68k,
Solaris/Sparc and Windows/Intel. The following list gives hint on how to
optimally compile unpaper on different systems. This may or may not work
in other environments:
Linux/Intel
The binary executable included in the distribution archive has been compiled
using gcc version 4.1:
gcc -D TIMESTAMP="<yyyy-MM-dd HH:mm:ss>" -lm -O3 -funroll-all-loops -fomit-frame-pointer -ftree-vectorize -o unpaper unpaper.c
Compilation hints for other environments have been provided by users. Note
that the following compiler invocations might work only with specific versions
of the compilers or specific environments:
Macintosh
gcc -O4 -fast -mcpu=7450 -pipe -fomit-frame-pointer -ftree-vectorize -o unpaper unpaper.c
(People with G5 should simply omit the -mcpu=7450 option. Thanks to Olaf Marzocchi.)
In case you use cc as compiler, it may be necessary to increase the stack size
by adding -Wl,-stack_size -Wl,1000000 (tested on Macintosh/68k).
Windows
gcc.exe unpaper.c -o unpaper.exe -I "C:\programme\Dev-Cpp\include" -L "C:\programme\Dev-Cpp\lib"
(Thanks to Uwe Dierolf.)
Usage
------------------------------------------------------------------------------
Usage: unpaper [options] <input-file(s)> <output-file(s)>
Filenames may contain a formatting placeholder starting with '%' to insert a
page counter for multi-page processing. E.g.: 'scan%03d.pbm' to process files
scan001.pbm, scan002.pbm, scan003.pbm etc.
-l --layout single Set default layout options for a sheet:
|double 'single': One page per sheet.
|none 'double': Two pages per sheet, landscape
orientation (one page on the left
half, one page on the right half).
'none': No auto-layout, mask-scan-points
may individually be specified.
Using 'single' or 'double' automatically
sets corresponding --mask-scan-points.
The default is 'single'.
-start --start-sheet <sheet> Number of first sheet to process in multi-
sheet mode. (default: 1)
-end --end-sheet <sheet> Number of last sheet to process in multi-
sheet mode. -1 indicates processing until
no more input file with the corresponding
page number is available (default: -1)
-# --sheet Optionally specifies which sheets to
<sheet>{,<sheet>[-<sheet>]} process in the range between start-sheet
and end sheet.
-x --exclude Excludes sheets from processing in the
<sheet>{,<sheet>[-<sheet>]} range between start-sheet and end-sheet.
--pre-rotate -90|90 Rotates the whole image clockwise (90) or
or anti-clockwise (-90) before any other
processing.
--post-rotate -90|90 Rotates the whole image clockwise (90) or
or anti-clockwise (-90) after any other
processing.
-M --pre-mirror Mirror the image, after possible pre-
[v[ertical]][,][h[orizontal]] rotation. Either 'v' (for vertical
mirroring), 'h' (for horizontal mirroring)
or 'v,h' (for both) can be specified.
--post-mirror Mirror the image, after any other
[v[ertical]][,][h[orizontal]] processing except possible post-
rotation.
--pre-shift <h>,<v> Shift the image before further processing.
Values for 'h' (horizontal shift) and 'v'
(vertical shift) can either be positive
or negative.
--post-shift <h>,<v> Shift the image after other processing.
Values for 'h' (horizontal shift) and 'v'
(vertical shift) can either be positive
or negative.
--pre-wipe Manually wipe out an area before further
<left>,<top>,<right>,<bottom> processing. Any pixel in a wiped area
will be set to white. Multiple areas to
be wiped may be specified by multiple
occurrences of this options.
--post-wipe Manually wipe out an area after
<left>,<top>,<right>,<bottom> processing. Any pixel in a wiped area
will be set to white. Multiple areas to
be wiped may be specified by multiple
occurrences of this options.
--pre-border Clear the border-area of the sheet before
<left>,<top>,<right>,<bottom> further processing. Any pixel in the
border area will be set to white.
--post-border Clear the border-area after processing.
<left>,<top>,<right>,<bottom> Any pixel in the border area will be set
to white.
--pre-mask <x1>,<y1>,<x2>,<y2> Specify masks to apply before any other
processing. Any pixel outside a mask
will be set to white, unless another mask
includes this pixel.
Only pixels inside a mask will remain.
Multiple masks may be specified. No
deskewing will be applied to the masks
specified by --pre-mask.
-s --size <width>,<height> Change the sheet size before other pro-
| <size-name> cessing is applied. Content on the sheet
gets zoomed to fit to the appropriate
size, but the aspect ratio is preserved.
Instead, if the sheet's aspect ratio
changes, the zoomed content gets centered
on the sheet. Size-name can also be a
standard name as 'a4', 'letter', etc.
Possible size names are:
a5
a4
a3
letter
legal.
All size names can also be applied in
rotated landscape orientation, use
'a4-landscape', 'letter-landscape' etc.
--post-size <width>,<height>|<name> Change the sheet size preserving the
content's aspect ratio after other
processing steps are applied.
--stretch <width>,<height>|<name> Change the sheet size before other
processing is applied. Content on the
sheet gets stretched to the specified
size, possibly changing the aspect ratio.
--post-stretch <width>,<height> Change the sheet size after other
|<name> processing is applied. Content on the
sheet gets stretched to the specified
size, possibly changing the aspect ratio.
-z --zoom <factor> Change the sheet size according to the
given factor before other processing is
done.
--post-zoom <factor> Change the sheet size according to the
given factor after processing is done.
-bn --blackfilter-scan-direction Directions in which to search for solidly
[v[ertical]][,][h[orizontal]] black areas. Either 'v' (for vertical
scanning), 'h' (for horizontal scanning)
of 'v,h' (for both) can be specified.
(default: 'v,h')
-bs --blackfilter-scan-size Width of virtual bar used for mask
<size>|<h-size>,<v-size> detection. Two values may be specified
to individually set horizontal and
vertical size. (default: 20,20)
-bd --blackfilter-scan-depth Size of virtual bar used for black area
<depth>|<h-depth,v-depth> detection. (default: 500,500)
-bp --blackfilter-scan-step Steps to move virtual bar for black area
<step>|<h-step,v-step> detection. (default: 5,5)
-bt --blackfilter-scan-threshold <t> Ratio of dark pixels above which a black
area gets detected. (default: 0.95).
-bx --blackfilter-scan-exclude Area on which the blackfilter should not
<left>,<top>,<right>,<bottom> operate. This can be useful to prevent
the blackfilter from working on inner
page content. May be specified multiple
times to set more than one area.
-bi --blackfilter-intensity <i> Intensity with which to delete black
areas. Larger values will leave less
noise-pixels around former black areas,
but may delete page content. (default:
20)
-ni --noisefilter-intensity <n> Intensity with which to delete individual
pixels or tiny clusters of pixels. Any
cluster which only contains n dark pixels
together will be deleted. (default: 4)
-ls --blurfilter-size Size of blurfilter area to search for
<size>|<h-size>,<v-size> 'lonely' clusters of pixels.
(default: 100,100)
-lp --blurfilter-step Size of 'blurring' steps in each
<step>|<h-step>,<v-step> direction. (default: 50,50)
-li --blurfilter-intensity <ratio> Relative intensity with which to delete
tiny clusters of pixels. Any blurred area
which contains at most the ratio of dark
pixels will be cleared. (default: 0.01)
-gs --grayfilter-size Size of grayfilter mask to search for
<size>|<h-size>,<v-size> 'gray-only' areas of pixels.
(default: 50,50)
-gp --grayfilter-step Size of steps moving the grayfilter mask
<step>|<h-step>,<v-step> in each direction. (default: 20,20)
-gt --grayfilter-threshold <ratio> Relative intensity of grayness which is
accepted before clearing the grayfilter
mask in cases where no black pixel is
found in the mask. (default: 0.5)
-p --mask-scan-point <x>,<y> Manually set starting point for mask-
detection. Multiple --mask-scan-point
options may be specified to detect
multiple masks.
-m --mask <x1>,<y1>,<x2>,<y2> Manually add a mask, in addition to masks
automatically detected around the --mask-
scan-point coordinates (unless --no-mask-
scan is specified).
Any pixel outside a mask will be
set to white, unless another mask
covers this pixel.
-mn --mask-scan-direction Directions in which to search for mask
[v[ertical]][,][h[orizontal]] borders, starting from --mask-scan-point
coordinates. Either 'v' (for vertical
scanning), 'h' (for horizontal scanning)
of 'v,h' (for both) can be specified.
(default: 'h' ('v' may cut text-
paragraphs on single-page sheets))
-ms --mask-scan-size <size>|<h,v> Width of the virtual bar used for mask
detection. Two values may be specified
to individually set horizontal and
vertical size. (default: 50,50)
-md --mask-scan-depth <dep>|<h,v> Height of the virtual bar used for mask
detection. (default: -1,-1, using the
total width or height of the sheet)
-mp --mask-scan-step <step>|<h,v> Steps to move the virtual bar for mask
detection. (default: 5,5)
-mt --mask-scan-threshold <t>|<h,v> Ratio of dark pixels below which an edge
gets detected, relative to max. blackness
when counting from the start coordinate
heading towards one edge. (default: 0.1)
-mm --mask-scan-minimum <w>,<h> Minimum allowed size of an auto-detected
mask. Masks detected below this size will
be ignored and set to the size specified
by mask-scan-maximum. (default: 100,100)
-mM --mask-scan-maximum <w>,<h> Maximum allowed size of an auto-detected
mask. Masks detected above this size will
be shrunk to the maximum value, each
direction individually. (default:
sheet size, or page size derived from
--layout option.
-mc --mask-color <color> Color value with which to wipe out pixels
not covered by any mask. Maybe useful for
testing in order to visualize the effect
of masking. (Note that an RGB-value is
expected: R*65536 + G*256 + B.)
-dn --deskew-scan-direction Edges from which to scan for rotation.
[left],[top],[right],[bottom] Each edge of a mask can be used to detect
the mask's rotation. If multiple edges
are specified, the average value will be
used, unless the statistical deviation
exceeds --deskew-scan-deviation. Use
'left' for scanning from the left edge,
'top' for scanning from the top edge,
'right' for scanning from the right edge,
'bottom' for scanning from the bottom.
Multiple directions can be separated by
commas. (default: 'left,right')
-ds --deskew-scan-size <pixels> Size of virtual line for rotation
detection. (default: 1500)
-dd --deskew-scan-depth <ratio> Amount of dark pixels to accumulate until
scanning is stopped, relative to scan-bar
size. (default: 0.5)
-dr --deskew-scan-range <degrees> Range in which to search for rotation,
from -degrees to +degrees rotation.
(default: 5.0)
-dp --deskew-scan-step <degrees> Steps between single rotation-angle
detections.
Lower numbers lead to better results but
slow down processing. (default: 0.1)
-dv --deskew-scan-deviation <dev> Maximum statistical deviation allowed
among the results from detected edges.
No rotation if exceeded. (default: 1.0)
-W --wipe Manually wipe out an area. Any pixel in
<left>,<top>,<right>,<bottom> a wiped area will be set to white.
Multiple --wipe areas may be specified.
This is applied after deskewing and
before automatic border-scan.
-mw --middle-wipe If --layout is set to 'double', this
<size>|<left>,<right> may specify the size of a middle area to
wipe out between the two pages on the
sheet. This may be useful if the
blackfilter fails to remove some black
areas (e.g. resulting from photo-copying
in the middle between two pages).
-B --border Manually add a border. Any pixel in the
<left>,<top>,<right>,<bottom> border area will be set to white. This is
applied after deskewing and before
automatic border-scan.
-Bn --border-scan-direction Directions in which to search for outer
[v[ertical]][,][h[orizontal]] border. Either 'v' (for vertical
scanning), 'h' (for horizontal scanning)
of 'v,h' (for both) can be specified.
(default: 'v')
-Bs --border-scan-size <size>|<h,v> Width of virtual bar used for border
detection. Two values may be specified
to individually set horizontal and
vertical size. (default: 5,5)
-Bp --border-scan-step <step>|<h,v> Steps to move virtual bar for border
detection. (default: 5,5)
-Bt --border-scan-threshold <t> Absolute number of dark pixels covered by
the border-scan mask above which a border
is detected. (default: 5)
-Ba --border-align Direction where to shift the detected
[left],[top],[right],[bottom] border-area. Use --border-margin to
specify horizontal and vertical distances
to be kept from the sheet-edge.
(default: none)
-Bm --border-margin Distance to keep from the sheet edge when
<vertical>,<horizontal> aligning a border area. May use
measurement suffices such as cm, in.
-w --white-threshold <threshold> Brightness ratio above which a pixel is
considered white.
(default: 0.9)
-b --black-threshold <threshold> Brightness ratio below which a pixel is
considered black (non-gray). This is used
by the gray-filter. This value is also
used when converting a grayscale image to
black-and-white mode (default: 0.33)
-ip --input-pages 1|2 If '2' is specified, read two input
images instead of one and internally
combine them to a doubled-layout sheet
before further processing.
Before internally combining, --pre-
rotation is optionally applied
individually to both input images as the
very first processing steps.
-op --output-pages 1|2 If '2' is specified, write two output
images instead of one, as a result of
splitting a doubled-layout sheet after
processing. After splitting the sheet,
--post-rotation is optionally applied
individually to both output images as the
very last processing step.
-S --sheet-size <width>,<height> Force a fix sheet size. Usually, the
| <size-name> sheet size is determined by the input
image size (if input-pages=1), or by the
double size of the first page in a
two-page input set (if input-pages=2).
If the input image is smaller than the
size specified here, it will appear
centered and surrounded with a white
border on the sheet. If the input image is
bigger, it will be centered and the edges
will be cropped. This option may also be
helpful to get regular sized output
images if the input image sizes differ.
Standard size-names like 'a4-landscape',
'letter', etc. may be used (see --size).
(default: as in input file)
--sheet-background black|white Sets a color with which the sheet is
filled before any image is loaded and
placed onto it. This can be useful when
the sheet size and the image size differ.
--no-blackfilter Disables black area scan. Individual sheet
<sheet>{,<sheet>[-<sheet>]} indices can be specified.
--no-noisefilter Disables the noisefilter. Individual sheet
<sheet>{,<sheet>[-<sheet>]} indices can be specified.
--no-blurfilter Disables the blurfilter. Individual sheet
<sheet>{,<sheet>[-<sheet>]} indices can be specified.
--no-grayfilter Disables the grayfilter. Individual sheet
<sheet>{,<sheet>[-<sheet>]} indices can be specified.
--no-mask-scan Disables mask-detection. Masks explicitly
<sheet>{,<sheet>[-<sheet>]} set by --mask will still have effect. In-
dividual sheet indices can be specified.
--no-mask-center Disables auto-centering of each mask.
<sheet>{,<sheet>[-<sheet>]} Auto-centering is performed by default
if the --layout option has been set. In-
dividual sheet indices can be specified.
--no-deskew Disables deskewing. Individual sheet
<sheet>{,<sheet>[-<sheet>]} indices can be specified.
--no-wipe Disables explicit wipe-areas.
<sheet>{,<sheet>[-<sheet>]} This means the effect of parameter
--wipe can be disabled individually per
sheet.
--no-border Disables explicitly set borders.
<sheet>{,<sheet>[-<sheet>]} This means the effect of parameter
--border can be disabled individually per
sheet.
--no-border-scan Disables border-scanning from the
<sheet>{,<sheet>[-<sheet>]} edges of the sheet. Individual sheet
indices can be specified.
--no-border-align Disables aligning of the area detected by
<sheet>{,<sheet>[-<sheet>]} border-scanning (see --border-align). In-
dividual sheet indices can be specified.
-n --no-processing Do not perform any processing on a sheet
<sheet>{,<sheet>[-<sheet>]} except pre/post rotating and mirroring,
and file-depth conversions on saving.
This option has the same effect as setting
all --no-xxx options together. Individual
sheet indices can be specified.
--no-qpixels Disable qpixel-mode for deskewing (do not
internally use a 4x bigger image when
rotating).
--no-multi-pages Disable multi-page processing even if the
input filename contains a '%' (usually
indicating the start of a placeholder for
the page counter).
--dpi <dpi> Dots per inch used for conversion of
measured size values, like e.g.'21cm,
27.9cm'. Mind that this parameter should
occur before specifying any size value
with measurement suffix. (default: 300)
-t --type pbm|pgm Output file type. (default: as input)
-d --depth <bits> Output pixel depth. (default: as input)
-T --test-only Do not write any output. May be useful in
combination with --verbose to get informa-
tion about the input.
-in --input-file-sequence Sequence of input filename patterns which
<file-patterns> is repeatedly traversed while resolving
input filenames. Specifying a single
entry is equivalent to the first filename
argument after the options-list.
-out --output-file-sequence Sequence of output filename patterns
<file-patterns> which is repeatedly traversed while
resolving output filenames. Specifying a
single entry is equivalent to the second
filename argument after the options-list.
-si --start-input <nr> Set the first page number to substitute
for '%d' in input filenames. Every time
the input file sequence is repeated, this
number gets increased by 1. (default:
(startsheet-1)*inputpages+1)
-so --start-output <nr> Set the first page number to substitute
for '%d' in output filenames. Every time
the output file sequence is repeated,
this number gets increased by 1.
(default: (startsheet-1)*outputpages+1)
--insert-blank <nr>{,<nr>[-<nr>]} Use blank input instead of an input file
from the input file sequence at the
specified index-positions. The input file
sequence will be interrupted temporarily
and will continue with the next input
file afterwards. This can be useful to
insert blank content into a sequence of
input images.
--replace-blank <nr>{,<nr>[-<nr>]} Like --insert-blank, but the input images
at the specified index positions get
replaced with blank content and thus will
be ignored.
--overwrite Allow overwriting existing files.
Otherwise the program terminates with an
error if an output-file to be written
already exists.
-q --quiet Quiet mode, no output at all.
-v --verbose Verbose output, more info messages.
-vv Even more verbose output, show parameter
settings before processing.
--time Output processing time consumed.
-V --version Output version and build information.
|