CN111079756A

CN111079756A - Method and equipment for extracting and reconstructing table in document image

Info

Publication number: CN111079756A
Application number: CN201811220913.9A
Authority: CN
Inventors: 宋文东; 苏辉; 蒋海青
Original assignee: Hangzhou Ezviz Network Co Ltd
Current assignee: Hangzhou Ezviz Network Co Ltd
Priority date: 2018-10-19
Filing date: 2018-10-19
Publication date: 2020-04-28
Anticipated expiration: 2038-10-19
Also published as: CN111079756B

Abstract

The application discloses a method for extracting and reconstructing a form in a document image, which comprises the following steps: preprocessing an input document image; carrying out table positioning and extraction on the preprocessed image; segmenting table cells of the extracted table; performing character recognition in each cut table unit; and performing table reconstruction according to the character recognition result and the row and column coordinates of each table unit. Corresponding to the method, the application also discloses a form extracting and reconstructing device in the document image, a nonvolatile computer readable storage medium and an electronic device. By applying the technical scheme disclosed by the application, the calculation speed and the accuracy of form reconstruction in the document image can be improved, the calculation complexity is reduced, and the method has the characteristics of simplicity, quickness and high efficiency.

Description

Method and equipment for extracting and reconstructing table in document image

Technical Field

The application relates to the technical field of picture identification, in particular to a method and equipment for extracting and reconstructing a form in a document image.

Background

The table positioning and extraction in the image are very wide in practical application. The current methods for locating and extracting the table in the image include methods based on run length coding, machine learning, deep learning and the like.

The prior patent application with the application number of 201710813108.6 provides a table identification method based on deep learning, which comprises the steps of transversely dividing a table into a plurality of parts, judging whether each picture part contains a straight line or not by establishing a deep learning model, longitudinally dividing a transverse picture containing the straight line to obtain a plurality of blocks, forming the table according to the blocks, and finally positioning characters in the table and identifying.

The method provided in the above-cited patent application mainly suffers from the following drawbacks:

(1) the first step of transverse cutting and longitudinal cutting has no basis, blind cutting results in high cutting frequency, and accuracy can be ensured.

(2) Deep learning requires a large number of samples to obtain a model with high accuracy, and a large number of samples are often difficult to obtain.

(3) The deep learning has huge calculation amount, higher requirement on the calculation capability of a computer, low calculation speed and long time.

(4) This patent application cannot deal with tables that are angled obliquely, but only with tables that are oriented horizontally or vertically.

(5) This patent application can only deal with closed forms.

Disclosure of Invention

The application provides a method and equipment for extracting and reconstructing a form in a document image, which are simple, convenient and efficient, can improve the calculation speed and accuracy of form reconstruction in the document image, and reduce the calculation complexity.

The application discloses a form extraction and reconstruction method in a document image, which comprises the following steps:

preprocessing an input document image;

carrying out table positioning and extraction on the preprocessed image;

segmenting table cells of the extracted table;

performing character recognition in each cut table unit;

and performing table reconstruction according to the character recognition result and the row and column coordinates of each table unit.

Preferably, the preprocessing the input document image includes:

and whitening, thresholding and rotating the image of the input document.

Preferably, the performing the angle rotation on the input document image includes:

and based on a set angle range, rotating the document image, solving the mean value of each line, forming a vector by the mean values of all the lines, and then solving the variance of the vector, wherein the angle corresponding to the maximum variance is the angle needing to be rotated.

Preferably, the table locating and extracting the preprocessed image includes:

extracting all transverse lines in the table to obtain a transverse line binary image img 1;

extracting all vertical lines in the table to obtain a vertical line binary image img 2;

and performing OR operation on the horizontal line binary image img1 and the vertical line binary image img2 to obtain an image img3 containing a table.

Preferably, the extracting all horizontal lines in the table includes: according to the size of the document image, performing corrosion operation on the image by using structural elements, then performing expansion operation to perform linear enhancement and connection, and finally extracting all transverse lines in the form;

all vertical lines in the extraction table include: according to the size of the document image, performing corrosion operation on the image by using the structural elements, then performing expansion operation to perform linear enhancement and connection, and finally extracting all vertical lines in the form.

Preferably, the segmenting the extracted table into table cells includes:

horizontally projecting the image img3 to obtain a pixel histogram, finding out the maximum value maxh in the pixel histogram, judging the position of the peak value exceeding 0.5 x maxh as the position of a transverse line, and cutting out each row of the table through two adjacent lines after positioning all the transverse lines;

and vertically projecting each row of the cut table to obtain a pixel histogram, finding a maximum value maxh in the pixel histogram, judging the position of a peak value exceeding 0.5 x maxh as the position of a vertical line, positioning all the vertical lines, cutting each column of the table through two adjacent lines to obtain each table unit, and recording a row number and a column number.

Preferably, the performing character recognition in each cut table unit includes:

and respectively carrying out character recognition on each cut table unit, carrying out integral feature extraction on characters in each table unit by adopting a recurrent neural network (LSTM), and then sending the characters to LSTM prediction recognition.

The application also provides a form extraction and reconstruction device in a document image, comprising: a processor to:

preprocessing an input document image;

carrying out table positioning and extraction on the preprocessed image;

segmenting table cells of the extracted table;

performing character recognition in each cut table unit;

Preferably, the apparatus further comprises: the image acquisition module is used for acquiring a receipt image and sending the receipt image to the processor.

Preferably, the processor is specifically configured to: and whitening, thresholding and rotating the image of the input document.

Preferably, the processor is specifically configured to:

according to the size of the document image, performing corrosion operation on the image by using structural elements, then performing expansion operation to perform linear enhancement and connection, and finally extracting all transverse lines in the form;

according to the size of the document image, performing corrosion operation on the image by using the structural elements, then performing expansion operation to perform linear enhancement and connection, and finally extracting all vertical lines in the form.

Preferably, the processor is specifically configured to:

The present application further provides a non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the method of table extraction and reconstruction in document images as previously described.

The present application also provides an electronic device comprising the non-volatile computer-readable storage medium as described above, and the processor having access to the non-volatile computer-readable storage medium.

According to the technical scheme, the form extraction and reconstruction scheme for the document image has the characteristics of high calculation speed, wide application, strong robustness and the like, provides an effective preprocessing method for optical character recognition (ocr), and has high practical value. In particular, the innovations of the present application are found in several areas:

(1) and (4) directly processing the table picture and automatically correcting the rotation angle.

(2) The noise is generally small in area in an image, and is directly erased through a morphological operation kernel (structural element), so that the noise reduction process is omitted.

(3) The method and the device adopt the image processing technology for processing, have high calculation speed and strong robustness, and are easy to realize.

(4) The character positioning link is omitted, and the table segmentation result is directly input to a subsequent identification module.

(5) Only projection is needed to locate a certain row, a certain column and a certain table unit of the table.

(6) And the character recognition is carried out based on the LSTM, so that the speed is high and the accuracy is high.

(7) The scheme can not only process closed forms, but also is applicable to various forms.

Drawings

FIG. 1 is a flow chart of a method for extracting and reconstructing a form in a document image according to the present application;

FIG. 2 is an original document image and an angle corrected document image;

FIG. 3 is a diagram showing the extraction results of horizontal and vertical lines of the table of the present application;

FIG. 4 shows the table extraction result of the present application;

FIG. 5 is a diagram illustrating a row in a table positioned by a row projection of the present application;

FIG. 6 is a table reconstructed according to coordinate information according to the present application;

fig. 7 is a schematic structural diagram of a form extraction and reconstruction device in a document image according to the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below by referring to the accompanying drawings and examples.

The method for extracting and reconstructing the form is rapid and efficient and is more universal. The flow chart of the method is shown in fig. 1, and specifically comprises the following steps:

1) a document image is input.

2) Preprocessing an input document image, wherein the preprocessing steps comprise:

a) and whitening the document image.

Much of the information is redundant due to the strong correlation between adjacent pixels in the image. The purpose of whitening is to remove redundant information from the input data.

b) And carrying out image thresholding on the whitened image.

The general purpose of image thresholding is to separate the target region and the background region from the grayscale image, however, it is difficult to achieve the desired segmentation effect by merely setting a fixed threshold. In practical application, the threshold value that a certain pixel should have can be determined in some way through the neighborhood of the certain pixel, and therefore the threshold value of each pixel in an image is guaranteed to change along with the change of the surrounding neighborhood blocks. In a gray image, an area with obvious gray value change is always the outline of an object, so that the outline of the image is always obtained by dividing the image into small blocks and calculating a threshold value, and a method for fixing the threshold value is not available. Under comprehensive consideration, the scheme adopts a self-adaptive thresholding picture.

c) And performing angle rotation on the image after the image thresholding.

In practice, the image data to be processed is generally angular, not horizontal, and therefore, rotation correction is performed. The scheme is based on a set angle range, the image after thresholding is rotated, the mean value of each line is calculated, the mean values of all the lines form a vector, and then the variance of the vector is calculated. For example: assuming that a variance is calculated every 1 degree by plus or minus 5 degrees in a graph, a total of 10 variance values can be obtained, and the corresponding angle when the variance is maximum is the angle needing to be rotated. Because a normal graph is always a black line, a blank line, and the variance is greatest. The image preprocessing section ends up so far. The left and right halves of fig. 2 are the original document image and the angle corrected document image, respectively.

3) Performing table positioning and extraction on the preprocessed image, and specifically comprising the following steps:

a) all the horizontal lines in the table are extracted.

First, an image is subjected to an erosion operation with a structural element (kernel) according to the picture size. In mathematical morphology, assuming that a is a target area on the (x, y) plane, S is a structural element with a specified size and shape, and the area represented by the structural element S located on the coordinates (x, y) is defined as S (x, y), the corrosion result for a can be expressed as:

corrosion is a convolution operation, which can be understood as: and moving the structure S, if the intersection of the structure S and the structure A completely belongs to the area of the structure A, storing the position point, and forming the result of corrosion of the structure A by the structure S by all points meeting the condition. Through the erosion operation, only a straight line with a certain length in the image can be left, and other contents which do not meet the erosion condition can be erased.

The reinforcement and joining of the line can be achieved with an expansion operation, since the line remaining after erosion may be too thin or discontinuous. Therefore, the same size of structural elements (nuclei) are used again for the expansion operation of the etched straight line. The expansion operation result can be expressed as:

it can be understood that: performing convolution operation on the structure S on the structure A, and recording the position if an overlapping area exists with the structure A in the process of moving the structure S, wherein all the moving structures S and the structure A existThe set of positions at the intersection is the result of the expansion of structure a under the influence of structure S.

After the above-described erosion and expansion treatment, an image including only the horizontal lines was obtained, and it was bitwise inverted and denoted as img 1. As shown in the left half of fig. 3.

b) All vertical lines in the table are extracted.

In a similar way, as described in a), firstly, the etching operation is performed to etch out the interference of irrelevant content, then the expansion operation is performed to perform linear enhancement and connection, and finally, all vertical lines of the table are extracted. At this time, the picture including only the vertical lines is inverted by bit and denoted as img 2. As shown in the right half of fig. 3.

According to a) and b) above, the processing steps for extracting the horizontal lines and the vertical lines in the table are the same, and the difference is only the value of the structural element S. In extracting the horizontal line, the structural element S may be set to (1,100), (1,85), or other values like: the structure is a transverse line segment; while extracting the vertical lines, the structural element S may be set to (100,1), (90,1) or other similar values, i.e.: the structure is a vertical line segment.

c) And extracting the table. And performing OR (or) operation on the obtained horizontal line and vertical line binary images img1 and img2, wherein only the pixels on the table line are 1, and the rest are 0, so that an image only comprising a table is obtained and is recorded as img 3. As shown in fig. 4.

4) The table unit segmentation for the extracted table specifically includes:

a) each row in the table is located.

And horizontally projecting img3 to obtain a pixel histogram, wherein the number of pixels on the horizontal line of the table is obviously higher than that of the pixels on other parts, finding the maximum value maxh in the histogram, and judging the position where the peak value exceeds 0.5 x maxh as a straight line, namely the position of the horizontal line. After all the straight lines are positioned, each row of the table can be cut out through two adjacent lines. Fig. 5 is a schematic diagram illustrating positioning of a row in a table for the row projection of the present application.

b) Each column in the table is located.

And performing vertical projection on each row to obtain a projection histogram, cutting each column in each row to obtain each table unit, and recording a row number and a column number.

5) Performing character recognition in each cut table unit, specifically comprising:

and respectively sending each cut table unit to an identification module for character identification. The module can adopt a recurrent neural network (LSTM) to extract the overall characteristics of the characters in each table unit and then send the characters into the LSTM for prediction and identification. Its advantages are high recognition speed, high accuracy and high prediction speed. After the recognition is finished, the next step is carried out.

6) The table reconstruction specifically comprises the following steps:

and reconstructing the table according to the character recognition result output by the recognition module and the row and column coordinates of each table unit. As shown in fig. 6, a table reconstructed according to the coordinate information of the present application is shown.

Corresponding to the above method, the present application further provides a form extraction and reconstruction device in a document image, a composition structure of which is shown in fig. 7, including: a processor to:

preprocessing an input document image;

carrying out table positioning and extraction on the preprocessed image;

segmenting table cells of the extracted table;

performing character recognition in each cut table unit;

Preferably, the processor is specifically configured to:

Further, the present application provides a non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the method of extracting and reconstructing a table in a document image as previously described.

Further, the present application provides an electronic device comprising the non-volatile computer-readable storage medium as described above, and the processor having access to the non-volatile computer-readable storage medium.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A method for extracting and reconstructing a form in a document image is characterized by comprising the following steps:

preprocessing an input document image;

carrying out table positioning and extraction on the preprocessed image;

segmenting table cells of the extracted table;

performing character recognition in each cut table unit;

2. The method of claim 1, wherein preprocessing the input document image comprises:

and whitening, thresholding and rotating the image of the input document.

3. The method of claim 2, wherein angularly rotating the input document image comprises:

4. The method of any one of claims 1 to 3, wherein the table locating and extracting the preprocessed image comprises:

5. The method of claim 4, wherein:

all horizontal lines in the extraction table comprise: according to the size of the document image, performing corrosion operation on the image by using structural elements, then performing expansion operation to perform linear enhancement and connection, and finally extracting all transverse lines in the form;

6. The method of claim 4, wherein the segmenting the extracted table cells comprises:

7. The method of claim 6, wherein the performing text recognition in each of the cut table cells comprises:

8. An apparatus for extracting and reconstructing a form in a document image, comprising: a processor to:

preprocessing an input document image;

carrying out table positioning and extraction on the preprocessed image;

segmenting table cells of the extracted table;

performing character recognition in each cut table unit;

9. The apparatus of claim 8, further comprising: the image acquisition module is used for acquiring a receipt image and sending the receipt image to the processor.

10. The apparatus of claim 8, the processor to be specifically configured to:

and whitening, thresholding and rotating the image of the input document.

11. The device of claim 10, wherein the processor is specifically configured to:

12. The apparatus of any of claims 8 to 11, wherein the processor is specifically configured to:

13. The device of claim 12, wherein the processor is specifically configured to:

14. The device of claim 12, wherein the processor is specifically configured to:

15. The device of claim 14, wherein the processor is specifically configured to:

16. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps of a method of table extraction and reconstruction in document images as claimed in any one of claims 1 to 7.

17. An electronic device comprising the non-volatile computer-readable storage medium of claim 16, and the processor having access to the non-volatile computer-readable storage medium.