[go: up one dir, main page]

CN111814778B - Text line region positioning method, layout analysis method and character recognition method - Google Patents

Text line region positioning method, layout analysis method and character recognition method Download PDF

Info

Publication number
CN111814778B
CN111814778B CN202010640573.6A CN202010640573A CN111814778B CN 111814778 B CN111814778 B CN 111814778B CN 202010640573 A CN202010640573 A CN 202010640573A CN 111814778 B CN111814778 B CN 111814778B
Authority
CN
China
Prior art keywords
area
image
text line
line
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010640573.6A
Other languages
Chinese (zh)
Other versions
CN111814778A (en
Inventor
张岩
刘丽辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinosecu Technology Co ltd
Original Assignee
Beijing Sinosecu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinosecu Technology Co ltd filed Critical Beijing Sinosecu Technology Co ltd
Priority to CN202010640573.6A priority Critical patent/CN111814778B/en
Publication of CN111814778A publication Critical patent/CN111814778A/en
Application granted granted Critical
Publication of CN111814778B publication Critical patent/CN111814778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Character Input (AREA)

Abstract

The invention belongs to the technical field of optical character recognition in digital image recognition, and particularly relates to a text line region positioning method, a layout analysis method based on text line region positioning, a character recognition method based on layout analysis, a device and a storage medium. The text line area positioning method comprises the following steps: acquiring a gray level image of an image to be identified; obtaining a positioning image according to the gray level image; a forward text line area and/or a reverse text line area in the positioning image is identified. The layout analysis method based on text line area positioning and the character recognition method based on layout analysis both comprise the steps of obtaining a positive text line area and/or a negative text line area by applying the text line area positioning method. According to the invention, the result image with only the positive text line area or only the negative text line area is spliced through the text line area, so that all character information is acquired through one-time recognition, and the character recognition efficiency is improved.

Description

Text line region positioning method, layout analysis method and character recognition method
Technical Field
The invention belongs to the technical field of optical character recognition in digital image recognition, and particularly relates to a text line region positioning method, a layout analysis method based on text line region positioning, a character recognition method based on layout analysis, a device and a storage medium.
Background
OCR (Optical Character Recognition ) refers to the process of checking characters printed on paper by an electronic device (such as a scanner or a digital camera), determining the shape of the characters by detecting dark and bright modes, and translating the shape into computer characters through a large amount of operation processing by a character recognition method; that is, the technology of converting the characters in the paper document into the image file of black-white lattice by optical mode and converting the characters in the image into the text format by the recognition software for further editing and processing by the word processing software is adopted.
In the field of digital image recognition, a region of a shallow-bottom and deep word is generally defined as a positive color region, whereas a region of a deep-bottom and shallow word is defined as a negative color region. Existing OCR technology generally recognizes images of shallow and deep words. Therefore, based on the conventional OCR technology, an image having only a positive color region can be recognized directly, and an image having only a negative color region can be recognized after performing a negative color process. For the identification of the image with complicated layout conditions of the positive color area and the negative color area, the processing means is complicated, the identification is firstly carried out directly to identify the positive color area in the image, then the identification is carried out after the negative color is carried out to identify the negative color area in the image, and then the identification results of the two identifications are combined to obtain the final identification result.
Disclosure of Invention
When character recognition is carried out on an image to be recognized with a complex layout by the existing OCR technology, all character information of the image to be recognized with the complex layout can be obtained only by carrying out recognition twice, and the problems of long recognition time and low recognition efficiency exist.
In order to solve the technical problems, the application aims to provide a text line area positioning method, a layout analysis method based on text line area positioning, a character recognition method based on layout analysis, a device and a storage medium, so that all character information in an image to be recognized with a complex layout can be obtained through OCR recognition only once, and the efficiency of character recognition on the image to be recognized with the complex layout is greatly improved.
In one aspect of the present invention, there is provided a text line area locating method, the method comprising the steps of:
Acquiring a gray level image of an image to be identified;
Obtaining a positioning image according to the gray level image;
And identifying a positive text line area and/or a negative text line area in the positioning image.
In another aspect of the present invention, a layout analysis method based on text line region positioning is provided, the method comprising: the method for positioning the text line area is applied to obtain a positive text line area and a negative text line area in a positioning image; and determining a positive color region and a negative color region in the gray level image of the image to be identified.
In yet another aspect of the present invention, a method for character recognition based on layout analysis is provided, comprising the steps of:
the text line region positioning method is applied to obtain a positive text line region and/or a negative text line region in a positioning image; obtaining a positive color image and a negative color image according to the gray image in the text line region positioning method as described above;
Acquiring at least one of a positive text line area of the positive image, a negative text line area of the positive image, a positive text line area of the negative image and a negative text line area of the negative image according to the positive text line area and/or the negative text line area in the positioning image;
Obtaining a result image by splicing the positive color image and the reverse color image according to at least one of the positive color text line area of the positive color image, the reverse color text line area of the positive color image, the positive color text line area of the reverse color image and the reverse color text line area of the reverse color image; the text line areas in the result image are positive text line areas or reverse text line areas;
And carrying out character recognition on the result image to obtain a recognition result.
In still another aspect of the present invention, there is provided a character recognition method based on layout analysis, the method comprising the steps of:
Acquiring a gray level image of an image to be identified;
Obtaining a hollow binary image, an inverse hollow binary image and an inverse gray image according to the gray image;
identifying a sixth text line region in the open binary image and a seventh text line region in the reverse open binary image;
Splicing the gray level image and the inverse gray level image according to the sixth text line area and the seventh text line area to obtain a third result image, wherein the text line areas in the third result image are positive text line areas;
and carrying out character recognition on the third result image to obtain a recognition result.
The invention also provides a character recognition device comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the device implementing the steps of the method as described above when the computer instructions are executed by the processor.
The invention also provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method steps as described above.
Compared with the prior OCR technology, when character recognition is carried out on an image to be recognized with a complex layout, all character information of the image to be recognized with the complex layout can be obtained by carrying out recognition twice, and the text line area positioning method, the text line area positioning-based layout analysis method, the layout analysis device and the layout analysis-based character recognition method, the device and the storage medium can splice the positive color image and the negative color image according to the positive color text line area and/or the negative color text line area in the positioning image to obtain a result image only containing the positive color text line area or only containing the negative color text line area, so that all character information of the image to be recognized can be obtained by carrying out recognition on the result image once, thereby greatly shortening the recognition service time and improving the efficiency of character recognition on the image to be recognized with the complex layout.
Drawings
The accompanying drawings are included to provide a further understanding of the application, and are incorporated in and constitute a part of this specification.
Fig. 1 shows an exemplary flow chart of the text line area locating method of the present invention.
FIG. 2 illustrates an exemplary flow chart of one embodiment of a text line area locating method of the present invention.
Fig. 3 shows an exemplary flow chart of another embodiment of the text line area locating method of the present invention.
Fig. 4 shows an exemplary flow chart of the layout analysis-based character recognition method of the present invention.
Fig. 5 shows an exemplary flow chart of a preferred embodiment of the layout-based character recognition method of the present invention.
Fig. 6 shows an exemplary diagram of the resulting image in one embodiment of the layout-based character recognition method of the present invention.
Fig. 7 (a) shows an example diagram of a grayscale image.
Fig. 7 (b) shows an example diagram of an inverse gray scale image.
Fig. 7 (c) shows an example diagram of a binary image.
Fig. 7 (d) shows an example diagram of the inverse binary image.
Fig. 7 (e) shows an example diagram of a hollow binary image.
Fig. 7 (f) shows an example diagram of the inverse hollow binary image.
Detailed Description
The present application will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent. The exemplary embodiments of the present application and the descriptions thereof are used herein to explain the present application, but are not intended to limit the application. It should be noted here that, in order to avoid obscuring the present application due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present application are shown in the drawings, while other details not greatly related to the present application are omitted. It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of at least one other feature, element, step or component. Here, it should also be noted that the embodiments of the present application and the features in the embodiments may be combined with each other without collision.
Fig. 1 is an exemplary flow chart of a text line area locating method of the present application. The basic idea of the text line area locating method is that the concepts of a positive text line area and a negative text line area are provided, and the positive text line area and/or the negative text line area are determined from the text line area of the locating image according to the characteristics of the positive text line area and the negative text line area. The "positive text line area" refers to a text line area in a positive area, and the "reverse text line area" refers to a text line area in a reverse area. For the image to be identified with complicated layout conditions of both the positive color area and the negative color area, the method and the device realize that the positioning positive color text line area and the positioning negative color text line area are simultaneously obtained through one-time identification. Compared with the prior art, when the text line area is positioned, only the coordinate position of the text line area in the whole image is acquired, the method and the device also increase the judgment on whether the text line area is of positive color or reverse color, and increase the image information of positive color and reverse color in the positioning information of the text line area, so that the text line area can be more accurately processed later. For example, in OCR recognition of a text line region, an appropriate recognition means can be employed based on a forward text line region and a reverse text line region. It should be clearly inferred to a person skilled in the art that the text line region localization method of the present application is equally applicable to images to be recognized which contain only positive or only negative text line regions, as a result of the recognition being all positive or all negative text line regions.
The text line region positioning method provided by the application comprises the following steps:
Step S101: acquiring a gray level image of an image to be identified;
step S102: obtaining a positioning image according to the gray level image;
Step S103: and identifying a positive text line area and/or a negative text line area in the positioning image.
The image to be identified in step S101 may be an electronic image of a flat printed matter such as a paper photograph, a business card, a bank card, an identity card, a book, a publicity sheet, or a passport, or may be an electronic image obtained by photographing a stereoscopic article. The gray image in step S101 may be a gray image obtained by graying a color image to be recognized, or a gray image pattern to be recognized directly input. The method for obtaining the gray level image in the present application is merely exemplary, and is not limited thereto, and a person skilled in the art may select other ways according to actual needs, which will not be repeated here.
In step S103, for an image to be recognized that includes only a positive color region, only a positive color text line region of the positioning image may be recognized; for an image to be identified that contains only a reverse-colored region, only a reverse-colored text line region of the localization image may be identified; for an image to be identified with a complex layout situation with both a positive color region and a negative color region, it is necessary to identify both a positive text line region in the positioning image and a negative text line region in the positioning image.
In the field of digital image recognition, a region of a shallow-bottom and deep word is generally defined as a positive color region, whereas a region of a deep-bottom and shallow word is defined as a negative color region. In the positive color region, since the character color of the text line region is darker than the color of the background, the line region is displayed in dark color and the line region is displayed in light color for the positive color text line region, and at this time, the gradation value of the line region of the text line region is smaller than the gradation value of the line region. In contrast, in the color-reversal region, the character color of the text line region is lighter than that of the background, so that the color of the text line region is light, and the color of the text line region is dark. In view of this, it is possible to determine whether a text line region is a positive text line region or a negative text line region by merely comparing the gray value magnitudes of the in-line region and the out-of-line region of the text line region. The gray scale comparison method is simple, but can quickly and accurately confirm whether the text line area is of positive color or reverse color.
Based on this, in a specific embodiment of the text line region positioning method provided by the present application, as shown in fig. 2, the text line region in the positioning image is a first text line region, the in-line region of the first text line region is a first in-line region, and the out-of-line region of the first in-line region is a first out-of-line region. The method specifically comprises the following steps:
Step S111: acquiring a gray level image of an image to be identified;
Step S112: obtaining a positioning image according to the gray level image;
Step S1131: identifying a first text line area of the positioning image;
Step S1132: selecting a first line inner area and a first line outer area from the positioning image;
step S1133: determining that the first text line area is a positive text line area under the condition that the gray value of the first line inner area is smaller than the gray value of the first line outer area; and/or the number of the groups of groups,
And determining the first text line area as an inverse text line area under the condition that the gray level value of the first line inner area is larger than the gray level value of the first line outer area.
The first text line area in step S1131 may be obtained by a method of obtaining a connected domain in the prior art. In the above embodiment, the coordinate position of the first text line region in the whole image can be acquired through step S1131, and in step S1132 and step S1133, it is possible to confirm whether the first text line region is the positive text line region or the negative text line region by comparing the magnitude between the gradation value of the region in the first line and the gradation value of the region outside the first line, so that the information of "positive color" and "negative color" is added to the positioning information of the first text line region, and when the information processing is performed on the text line region later, particularly in OCR recognition of the text line region, OCR recognition can be performed more accurately based on these information.
In particular, in the case where it is necessary to identify both the positive text line region in the positioning image and the negative text line region in the positioning image, "in the case where the gradation value of the first line inner region is smaller than the gradation value of the first line outer region," and "in the case where the gradation value of the first line inner region is larger than the gradation value of the first line outer region," in step S1133, "the determination of the first text line region as the negative text line region of the positioning image may be performed sequentially or may be performed in parallel.
In the present technical means, the positioning image may be one of a positioning gray image (see fig. 7 (a)), a positioning inverse gray image (see fig. 7 (b)), a positioning binary image (see fig. 7 (c)), and a positioning inverse binary image (see fig. 7 (d)). When the gray level comparison method is adopted in the images, whether the first text line area is the orthographic text line area or the inverse text line area can be known.
Because the gray level image and the inverse gray level image can keep the information of the outline, the color shade and the like of each character in the image to be recognized to a large extent, when the positioning image is the positioning gray level image or the positioning inverse gray level image, the positive text line area and/or the inverse text line area in the positioning gray level image or the positioning inverse gray level image can be recognized according to more image detail information and character information. The positioning gray-scale image may be the gray-scale image acquired in step S111; the positioning inverse gray level image is an image obtained by performing at least inverse color processing on the gray level image.
Because the gray value of the pixel point in the binary image is only 0 or 255, the gray value difference between the character and the corresponding background is larger, and therefore, when the positioning image is a positioning binary image or a positioning inverse binary image, the positive text line area and the inverse text line area can be positioned more accurately. The positioning binary image is an image obtained by at least binarizing the gray level image, and the positioning inverse binary image is an image obtained by at least performing inverse color processing and binarizing on the gray level image. In this embodiment, the inverse color processing and the binarization processing are both in the prior art, and are not repeated here.
In order to more accurately perform the positioning of the text line area, the inventors propose a blank binary image (see fig. 7 (e)) and an inverse blank binary image (see fig. 7 (f)). In the open binary image and the reverse binary image, the positive color region is displayed with a background of white, the character is a black solid character (see the letter "fly-away total manager" in fig. 7 (e) and the first 6 rows in fig. 7 (f)), and the reverse color region is displayed with a background of white, the character is displayed with a black open character (see the first 6 rows in fig. 7 (e) and the letter "fly-away total manager" in fig. 7 (f)); the positive text line area is displayed with white background and black solid characters, and the negative text line area is displayed with white background and black blank characters.
Accordingly, as shown in fig. 3, in another embodiment of the text line area positioning method provided by the present application, the positioning image includes a positioning hollow binary image and a positioning anti-hollow binary image. The method specifically comprises the following steps:
Step S121: acquiring a gray level image of an image to be identified;
step S122: obtaining a positioning hollow binary image and a positioning anti-hollow binary image according to the gray level image;
Step S123: and identifying a positive color text line area and/or a negative color text line area in the positioning hollow binary image.
As the outline information of each character is reserved to the maximum extent by the open binary image and the reverse open binary image, the text line area can be identified more accurately, and moreover, the color of the outline of each character is the same, and all text line areas in the image can be found by carrying out connected domain screening by using a threshold value. Therefore, in the application, the text line area can be accurately, simply and quickly identified by adopting the locating hollow binary image and the locating anti-hollow binary image, and the accuracy and the efficiency of locating the text line area are greatly improved. The accuracy and efficiency of character recognition can be improved in cooperation with information processing of the text line area later, particularly in cooperation with OCR recognition of characters.
In a possible implementation manner, the text line area in the positioning hollow binary image is a second text line area, the line area in the second text line area is a second line area, the line area outside the second line area is a second line outside area, and the absolute value of the difference between the gray value of the second line inside area and the gray value of the second line outside area is a second gray difference;
A third text line area of the text line area in the locating inverse blank binary image, wherein an in-line area of the third text line area is a third in-line area, an out-of-line area of the third in-line area is a third out-of-line area, and an absolute value of a difference value between a gray value of the third in-line area and a gray value of the third out-of-line area is a third gray difference value;
step S123, identifying the positive text line area and/or the negative text line area in the locating hollow binary image and the locating negative binary image specifically includes:
step S1231: identifying a second text line region in the locating hollow binary image and a third text line region in the locating inverse hollow binary image;
step S12321: when the second text line area and at least one third text line area have overlapping areas, acquiring a second line inner area and a second line outer area of the second text line area, and a third line inner area and a third line outer area of a third text line area corresponding to the second text line area; obtaining a second gray level difference value of the second text line area and a third gray level difference value of a third text line area corresponding to the second text line area through calculation;
Or alternatively
When an overlapping area exists between the third text line area and at least one second text line area, a third line inner area and a third line outer area of the third text line area, and a second line inner area and a second line outer area of a second text line area corresponding to the third text line area are acquired; obtaining a third gray level difference value of the third text line area and a second gray level difference value of a second text line area corresponding to the third text line area through calculation;
Step S12322: when the second gray level difference value is larger than the third gray level difference value, determining that the second text line area is a positive text line area and/or the third text line area is a negative text line area;
and/or the number of the groups of groups,
And when the second gray level difference value is smaller than the third gray level difference value, determining that the second text line area is an inverse color text line area and/or the third text line area is a positive color text line area.
In the art, a text line region refers to a block of the image that is typically stored in terms of the lower left and upper right coordinates of the block. In this embodiment, the overlapping area between two text line areas refers to the intersection between the position coordinates of the two text line areas. For example, "there is an overlap region of the second text line region and the third text line region" means that there is an intersection between the coordinates of the second text line region and the coordinates of the third text line region. The area of the second text line area and the area of the third text line area may be equal or unequal, but the second text line area and the third text line area with the overlapping area may both cover the text line area corresponding to the overlapping area in the gray level image. The second text line region in the locating hollow binary image and the third text line region in the locating inverse hollow binary image can be obtained by a method for obtaining a connected domain in the prior art.
From the foregoing, the positive text line areas in the locating blank binary image and the locating reverse blank binary image are both displayed with white background and black solid characters, and the reverse text line areas are both displayed with white background and black blank characters. That is, the in-line regions of the positive text line regions in the positioning blank binary image and the positioning anti-blank binary image are white-background black solid characters, and the out-of-line regions are white, and the gray level difference (the absolute value of the difference between the gray level value of the in-line region and the gray level value of the out-of-line region) of the positive text line regions in the positioning blank binary image and the positioning anti-blank binary image is the absolute value of the difference between the gray level value of the white-background black solid character region and the gray level value of the white background region under the same area. Similarly, the in-line regions of the reverse text line regions in the positioning blank binary image and the positioning reverse blank binary image are white characters and the out-of-line regions are white characters, and the gray level difference value of the reverse text line regions in the positioning blank binary image and the positioning reverse blank binary image is the absolute value of the difference value between the gray level value of the white background region and the gray level value of the white background region under the same area. Therefore, when there is an overlapping area between the text line area of the locating hollow binary image and the text line area at the corresponding position of the locating anti-hollow binary image, the gray level difference value of the positive text line area is larger than the gray level difference value of the anti-color text line area, that is, the text line area corresponding to the larger value in the second gray level difference value and the third gray level difference value is the positive text line area, and the text line area corresponding to the smaller value is the anti-color text line area. Thus, in the present embodiment, whether the second text line area and the third text line area are the positive text line area of the positioning image or the negative text line area of the positioning image is confirmed by judging the magnitude between the second gray level difference value of the second text line area and the gray level difference value of the third text line area. The invention can accurately and rapidly realize the positioning of the positive text line area of the positioning image and/or the reverse text line area of the positioning image by a simple and reliable method.
In this embodiment, there is also a third text line area where the second text line area cannot find the corresponding third text line area, that is, there is no overlapping area between the second text line area and each third text line area, which means that the content of the similar text line area in the gray level image can only be located in the locating hollow binary image. In view of the fact that the positive text line area and the negative text line area of the locating hollow binary image cannot be simply located by the gray value comparison method, step S123 in this embodiment identifies the positive text line area and/or the negative text line area in the locating hollow binary image and the locating negative binary image, and further includes the following steps:
Step S12331: when the second text line area and each third text line area do not have an overlapping area, a fourth text line area corresponding to the second text line area is acquired in the gray level image;
Step S12332: acquiring a fourth inner row region and a fourth outer row region; the line inner area of the fourth text line area is a fourth line inner area, and the line outer area of the fourth line inner area is the fourth line outer area;
Step S12333: determining that the second text line area is a positive text line area when the gray value of the fourth line inner area is smaller than the gray value of the fourth line outer area; and/or the number of the groups of groups,
And determining that the second text line area is an inverse text line area under the condition that the gray level value of the area in the fourth line is larger than the gray level value of the area outside the fourth line.
In this embodiment, a gray value comparison method with simple calculation and high accuracy is still adopted. In view of the fact that the second text line area in the localization hollow binary image coincides with the "positive color" and the "negative color" of the fourth text line area in the gradation image, the present embodiment determines whether the second text line area is the "positive color" or the "negative color" by judging whether the fourth text line area is the "positive color" or the "negative color". When the gray value of the inner area of the fourth line is smaller than the gray value of the outer area of the fourth line, the fourth text line area is positive, and then the corresponding second text line area is the positive text line area of the positioning hollow binary image; when the gray value of the area in the fourth line is larger than the gray value of the area outside the fourth line, the fourth text line area is the reverse color, and then the corresponding second text line area is the reverse color text line area of the positioning hollow binary image. The application determines whether the second text line area is the positive text line area or the negative text line area in a simple judgment mode, and improves the accuracy of the determination result.
Similarly, there may be a case where there is no overlapping area between the second text line area and the third text line area, so the step S123 may specifically further include the steps of:
Step S12341: when the third text line area and each second text line area do not have an overlapping area, a fifth text line area corresponding to the third text line area is acquired in the gray level image;
Step S12342: acquiring a fifth line inner region and a fifth line outer region; the line inner area of the fifth text line area is a fifth line inner area, and the line outer area of the fifth line inner area is a fifth line outer area;
Step S12343: determining that the third text line area is a positive text line area when the gray level value of the area in the fifth line is smaller than the gray level value of the area outside the fifth line; and/or determining that the third text line area is an inverse text line area when the gray level value of the area in the fifth line is greater than the gray level value of the area outside the fifth line.
In this embodiment, the principle of determining the positive and negative colors of the third text line area by the positive and negative colors of the fifth text line area is the same as that of determining the positive and negative colors of the second text line area by the positive and negative colors of the fourth text line area, and will not be described in detail herein.
The present application will be briefly described below with respect to the locating hollow binary image and the locating inverse hollow binary image, and an in-line area and an out-of-line area of each text line area.
In another embodiment of the above text line area positioning method provided by the present application, in step S122, a positioning hollow binary image and a positioning anti-hollow binary image are obtained according to the gray level image, and the positioning hollow binary image and the positioning anti-hollow binary image may be obtained by performing at least background removing binarization processing on the gray level image. Specifically, at least binarization processing and edge detection processing are carried out on the gray level image so as to obtain the positioning hollow binary image; and performing at least inverse color processing, binarization processing and edge detection processing on the gray level image to obtain the positioning inverse hollow binary image.
In a specific embodiment, step S122, performing at least background removal binarization processing on the gray-scale image to obtain the positioning hollow binary image and the positioning inverse hollow binary image, specifically includes the following steps:
Step S1221, performing at least binarization processing on the gray level image, obtaining a connected domain in the processed image, setting the gray level value of each pixel point of the edge of the connected domain with the gray level value of 255 as 0, and setting the gray level value of each pixel point of the background corresponding to the connected domain as 255, thus obtaining a positioning hollow binary image;
And
Step S1222, performing at least a color inversion process and a binarization process on the gray scale image, obtaining a connected domain in the processed image, setting the gray scale value of each pixel point at the edge of the connected domain with the gray scale value of 255 as 0, and setting the gray scale value of each pixel point of the background corresponding to the connected domain as 255, thereby obtaining a locating inverse hollow binary image.
In this embodiment, the binarization process, the edge detection process, and the inverse color process are all conventional techniques. The gray level image is subjected to global gray level threshold binarization processing and Canny edge detection processing to obtain the locating hollow binary image, and the gray level image is subjected to inverse color processing, global gray level threshold binarization processing and Canny edge detection processing to obtain the locating hollow binary image. And the positioning hollow binary image and the positioning anti-hollow binary image can be obtained simultaneously by carrying out local threshold binarization processing on the gray level image.
In addition, in the above embodiments of the present application, each text line area includes a first text line area, a second text line area, a third text line area, a fourth text line area, and a fifth text line area. The intra-line area of each text line area may be at least one unit character area in the text line area. The unit character area is an area corresponding to any one character in the text line area. In order to facilitate the acquisition of the out-of-line area of each text line area by the following method, preferably, the in-line area is the first unit character area or the last unit character area of the text line area.
Meanwhile, an outer area of each inner area of the line may be acquired, and one of a left area, a right area, an upper area, and a lower area adjacent to the inner area of the line may be selected as the outer area of the line in the positioning image or the grayscale image. Of course, the selection of the first text line area, the second text line area, and the outer area of the inner area of the third text line area is performed in the positioning image corresponding to each text line area, and the selection of the fourth text line area and the outer area of the inner area of the fifth text line area is performed in the grayscale image.
In order to make the number of pixel points of the out-of-line area identical to the number of pixel points of the in-line area, selecting the area of the out-of-line area identical to the area of the in-line area, the acquiring of the out-of-line area specifically includes:
When the area of the left side area is the same as that of the line inner area and the area of the right side area is different from that of the line inner area, selecting the left side area as the line outer area;
When the area of the right side area is the same as that of the in-line area and the area of the left side area is different from that of the in-line area, selecting the right side area as the out-of-line area;
When the left side area and the right side area are the same as the areas of the in-row areas, selecting the left side area or the right side area as the out-row area;
And when the areas of the left side area and the right side area are different from the areas of the in-row areas, selecting the upper area or the lower area as the out-of-row area.
The selecting the upper area or the lower area as the out-of-line area specifically includes:
when the area of the upper region is the same as that of the in-line region and the area of the lower region is different from that of the in-line region, selecting the upper region as the out-of-line region;
when the area of the lower area is the same as that of the in-line area and the area of the upper area is different from that of the in-line area, selecting the lower area as the out-of-line area;
and when the area of the upper area and the area of the lower area are the same as the area of the in-row area, selecting the upper area or the lower area as the out-row area.
Preferably, the outer regions are selected to have the same shape as the inner regions, i.e. the outer regions have the same height and width as the inner regions.
The text line region positioning method provided by the application can accurately determine the coordinate information and the image information of the positive text line region and/or the negative text line region, and process the image to be identified by using the determined coordinate information and the determined image information, so that the processing result is more accurate. Even if the image to be identified with complex layout is processed, the positive color text line area and the negative color text line area can be rapidly and accurately identified.
Based on the text line region positioning method provided by the application, the application also provides a layout analysis method based on text line region positioning, which can be applied to the text line region positioning method to obtain a positive text line region and a negative text line region in a positioning image; and determining a positive color region and a negative color region in the gray level image of the image to be identified. Because the difference between the background gray values of the positive color area and the negative color area is larger, after the positive color text line area or the negative color text line area is positioned, whether the image to be identified only contains the positive color area, only contains the negative color area or is a complex layout containing both the positive color area and the negative color area can be confirmed by judging whether the gray value of the background of the positive color text line area or the negative color text line area of the gray image of the image to be identified has mutation. In this process, a threshold value can be set for the abrupt change of the gray value, and whether the gray value of the gray image has the abrupt change is confirmed by whether the gray difference value of the adjacent two pixels is greater than the threshold value. When the layout analysis is required to be completed for a large number of images to be identified, the layout analysis method based on text line region positioning can be used for rapidly and accurately completing the layout analysis of the images to be identified, so that the working efficiency is effectively improved, and the corresponding labor cost and the error rate in human participation are reduced.
Fig. 4 is an exemplary flow chart of a layout analysis-based character recognition method of the present application. The basic idea of the method is that by applying the above text line region positioning method, a positioning image and a positive text line region and/or a negative text line region in the positioning image are obtained according to the gray level image of the image to be recognized, so that the positive text line region and/or the negative text line region in the positive text line image and/or the negative text line image obtained according to the gray level image are found, and accordingly, the result images that all the text line regions are positive text line regions or all the text line regions are negative text line regions are spliced, and therefore, character recognition is carried out on the result image only once, and all character information in the image to be recognized can be obtained. Compared with the prior art that when character recognition is carried out on an image to be recognized with a complex layout, all character information of the image to be recognized with the complex layout can be obtained through twice recognition, the method and the device can obtain all character information in the image to be recognized with the complex layout through only one time of OCR recognition, ensure the recognition accuracy, greatly shorten the time for carrying out character recognition on the image to be recognized with the complex layout, and effectively improve the efficiency of carrying out character recognition on the image to be recognized with the complex layout.
The character recognition method based on layout analysis comprises the following steps:
Step S201: the text line region positioning method is applied to obtain a positive text line region and/or a negative text line region in a positioning image; obtaining a positive color image and a negative color image according to the gray image in the text line region positioning method as described above;
step S202: acquiring at least one of a positive text line area of the positive image, a negative text line area of the positive image, a positive text line area of the negative image and a negative text line area of the negative image according to the positive text line area and/or the negative text line area in the positioning image;
Step S203: obtaining a result image by splicing the positive color image and the reverse color image according to at least one of the positive color text line area of the positive color image, the reverse color text line area of the positive color image, the positive color text line area of the reverse color image and the reverse color text line area of the reverse color image; the text line areas in the result image are positive text line areas or reverse text line areas;
step S204: and carrying out character recognition on the result image to obtain a recognition result.
In the step S201, the order of acquiring the positive text line area and/or the negative text line area in the positioning image by using the text line area positioning method is not limited, and the two processes may be performed in parallel or sequentially according to any order, which may be selected by those skilled in the art as required.
In the application, the text line area locating method in any embodiment can be used for obtaining the positive text line area and the negative text line area in the locating image. Therefore, according to various embodiments of the above-described text line area locating method, the positive text line area of the locating image may be one of a positive text line area of the locating gray scale image, a positive text line area of the locating inverse gray scale image, a positive text line area of the locating binary image, a positive text line area of the locating inverse binary image, a positive text line area of the locating hollow binary image, and a positive text line area of the locating inverse hollow binary image; the reverse text line region of the positioning image may be one of a reverse text line region of the positioning gray image, a reverse text line region of the positioning reverse gray image, a reverse text line region of the positioning binary image, a reverse text line region of the positioning reverse binary image, a reverse text line region of the positioning hollow binary image, and a reverse text line region of the positioning reverse hollow binary image. For step S202, when the positioning image is one of the positioning grayscale image, the positioning binary image, and the positioning empty binary image, the positive text line region of the positive color image and the negative text line region of the negative color image are regions corresponding to the positive text line region of the positioning image, and likewise, the negative text line region of the positive color image and the positive text line region of the negative color image are regions corresponding to the negative text line region of the positioning image; when the positioning image is one of the positioning inverse gray image, the positioning inverse binary image and the positioning inverse blank binary image, the positive text line area of the positive image and the negative text line area in the negative image are areas corresponding to the negative text line area of the positioning image, and likewise, the negative text line area of the positive image and the positive text line area in the negative image are areas corresponding to the positive text line area of the positioning image.
According to the result obtained in step S202, in a specific embodiment, step S203 for stitching to obtain a result image specifically includes:
According to the positive text line area of the positive image and/or the positive text line area of the negative image, intercepting the positive text line area of the positive image and the positive text line area of the negative image for splicing, or intercepting the positive text line area of the positive image for splicing to the negative image correspondingly, or intercepting the positive text line area of the negative image for splicing to the positive image correspondingly, so as to obtain a first result image; the text line areas in the first result image are positive text line areas;
Or alternatively
According to the anti-color text line area of the positive color image and/or the anti-color text line area of the anti-color image, intercepting the anti-color text line area of the positive color image and the anti-color text line area of the anti-color image for splicing, or intercepting the anti-color text line area of the positive color image for splicing to the anti-color image correspondingly, or intercepting the anti-color text line area of the anti-color image for splicing to the positive color image correspondingly, so as to obtain the second result image; the text line areas in the second result image are all reverse text line areas.
It can be seen that, in order to obtain the first result image that is the positive text line area, the positive text line area of the positive image and the positive text line area of the negative image may be cut out and simultaneously spliced to the position of the corresponding area of the first blank image with white background, or the position of the positive text line area of the cut-out positive image spliced to the corresponding area of the negative image, or the position of the positive text line area of the cut-out negative image spliced to the corresponding area of the positive image. Similarly, in order to obtain the second result image that is the whole of the anti-color text line area, the anti-color text line area of the positive color image and the anti-color text line area of the anti-color image may be cut out and simultaneously spliced to the position of the corresponding area of the second blank image with the black background, or the anti-color text line area of the cut-out positive color image may be spliced to the position of the corresponding area of the anti-color image, or the anti-color text line area of the cut-out anti-color image may be spliced to the position of the corresponding area of the positive color image. It will be appreciated by those skilled in the art that the first blank image and the second blank image herein are each the same size as the image to be identified.
In the layout analysis-based character recognition method provided by the application, the positive color image is at least one of a gray image (shown in fig. 7 (a)), a binary image (shown in fig. 7 (c)) and a hollow binary image (shown in fig. 7 (e)), and the inverse color image is at least one of an inverse gray image (shown in fig. 7 (b)), an inverse binary image (shown in fig. 7 (d)) and an inverse hollow binary image (shown in fig. 7 (f)). The binary image, the open binary image, the inverse gray scale image, the inverse binary image and the inverse open binary image are all obtained through the gray scale image. In the process of stitching, the used positive color image and the used negative color image can be selected according to the actual use scene. For example, two or more positive color images may be spliced with the reverse color image, or two or more reverse color images may be selected and spliced with the positive color image. When one positive color image and one reverse color image are selected for interception and splicing, if more detail information of the image to be identified is acquired, the positive color image can be selected as a gray image, and the reverse color image can be selected as a reverse gray image; if the difference between the characters in the line area and the background in the obtained result image is larger, the positive color image is selected as a binary image, the reverse color image is selected as a reverse binary image, or the positive color image is selected as a hollow binary image, and the reverse color image is selected as a reverse hollow binary image.
The character recognition method based on layout analysis provided by the application is used for recognizing the result image after converting the image to be recognized into the result image only having the positive text line area or only having the negative text line area, so that the recognition of all types of images to be recognized with deep background, shallow background and the combination of the deep background and the shallow background is supported. Meanwhile, as all character information of the image to be recognized can be recognized only by recognizing the result image once, the application has the characteristics of accurate recognition result and high recognition rate. The character recognition method based on layout analysis provided by the application realizes the purpose of rapidly and accurately recognizing all characters in the complex layout, and effectively solves the problems of lower accuracy and lower efficiency of the existing character recognition method when recognizing the complex layout.
As shown in fig. 5, a preferred embodiment of a layout analysis-based character recognition method is shown. The basic idea of the character recognition method based on layout analysis is that according to each text line area in an image to be recognized, a gray level image and an inverse gray level image are spliced to obtain a third result image (see fig. 6) with the text line areas being positive text line areas, and then character recognition is carried out on the third result image. Compared with the character recognition method based on layout analysis, the method does not need to firstly carry out the mark of positive color or reverse color on each text line area of the image to be recognized, but directly finds the text line area meeting the condition of positive color, thereby saving the time of marking and re-searching. For the image to be identified with complicated layout conditions of both the positive color area and the negative color area or the image to be identified with the irregular negative color area and/or the irregular positive color area, the application gets rid of the limitation of the positive color area and the negative color area and the limitation of the shape of the positive color area and the shape of the negative color area, realizes that all character information of the image to be identified is obtained through one-time character identification, ensures the accuracy of character identification results, greatly shortens the image processing time before character identification and improves the character identification efficiency.
The character recognition method based on layout analysis comprises the following steps:
Step S301: acquiring a gray level image of an image to be identified;
Step S302: obtaining a hollow binary image, an inverse hollow binary image and an inverse gray image according to the gray image;
Step S303: identifying a sixth text line region in the open binary image and a seventh text line region in the reverse open binary image;
Step S304: splicing the gray level image and the inverse gray level image according to the sixth text line area and the seventh text line area to obtain a third result image; the text line areas in the third result image are positive text line areas;
and step S305, carrying out character recognition on the third result image to obtain a recognition result.
In this embodiment, the method for obtaining the blank binary image, the inverse blank binary image, and the inverse gray image according to the gray image may refer to the method for obtaining the positioning blank binary image, the positioning inverse blank binary image, and the positioning inverse gray image by using the gray image in the above-mentioned text line region positioning method.
In this embodiment, the in-line region of the sixth text line region is a sixth in-line region, the out-of-line region of the sixth in-line region is a sixth out-of-line region, and an absolute value of a difference between the gray value of the sixth in-line region and the gray value of the sixth out-of-line region is a sixth gray difference;
An in-line region of the seventh text line region is a seventh in-line region, an out-of-line region of the seventh in-line region is an out-of-line region of the seventh in-line region, and an absolute value of a difference between a gray value of the seventh in-line region and a gray value of the out-of-line region of the seventh in-line region is a seventh gray difference value;
In the step S304, the gray-scale image and the inverse gray-scale image are spliced according to the sixth text line area and the seventh text line area to obtain a third result image, which specifically includes:
When there is an overlap region of the sixth text line area with at least one of the seventh text line areas,
Step S30411: acquiring a sixth line inner area and a sixth line outer area of the sixth text line area, and a seventh line inner area and a seventh line outer area of a seventh text line area corresponding to the sixth text line area; obtaining a sixth gray level difference value of the sixth text line area and a seventh gray level difference value of a seventh text line area corresponding to the sixth text line area through calculation;
Step S30512: when the sixth gray level difference value is smaller than the seventh gray level difference value, intercepting that the region corresponding to the seventh text line region in the inverse gray level image is correspondingly spliced to the gray level image, and obtaining the third result image;
when there is no overlap region between the sixth text line area and each of the seventh text line areas or between the seventh text line area and each of the sixth text line areas,
Step S30421, acquiring an eighth text line area corresponding to the sixth text line area or the seventh text line area in the grayscale image;
Step S30422, acquiring an eighth-row inner region and an eighth-row outer region; the line inner area of the eighth text line area is the eighth line inner area, and the line outer area of the eighth line inner area is the eighth line outer area;
and S30423, when the gray value of the region in the eighth line is larger than the gray value of the region outside the eighth line, intercepting the region corresponding to the region in the eighth text line in the inverse gray image, correspondingly splicing the region to the gray image, and obtaining the third result image.
In this embodiment, the color of each character and its background in the blank binary image is the same as the color of each character and its background in the positioned blank binary image in the above-mentioned text line region positioning method, and similarly, the features of the inverse blank binary image and the positioned inverse blank binary image are the same. The larger gray level difference value (the absolute value of the difference between the gray level value of the area inside the line and the gray level value of the area outside the line) in the open binary image and the reverse open binary image is the absolute value of the difference between the gray level value of the black solid character area and the gray level value of the white background area under the same area. And the smaller gray level difference value in the open binary image and the reverse open binary image is the absolute value of the difference value between the gray level value of the black open character area and the gray level value of the white background area under the same area. Therefore, when there is an overlapping region between the sixth text line region and the seventh text line region, if the sixth gray scale difference is greater than the seventh gray scale difference, the region corresponding to the sixth text line region in the gray scale image is "positive color", so that it is not necessary to process the region in the gray scale image, and the text line region in the obtained third result image is still the positive color text line region; if the sixth gray level difference value is smaller than the seventh gray level difference value, the region corresponding to the sixth text line region in the gray level image is "inverse color", and the region corresponding to the seventh text line region in the inverse gray level image is "positive color", so that the region corresponding to the seventh text line region in the inverse gray level image needs to be spliced to the gray level image, and the text line region of the region in the obtained third result image is the positive color text line region.
When the sixth text line area and each seventh text line area do not have an overlapping area, or the seventh text line area and each sixth text line area do not have an overlapping area, directly judging that an eighth text line area corresponding to the sixth text line area or the seventh text line area in the gray level image is 'positive color' or 'reverse color', when the gray level value of the eighth text line area in the eighth line area is larger than the gray level value of the eighth line area outside the eighth line area, the eighth text line area in the gray level image is reverse color, and then the area corresponding to the eighth text line area in the reverse gray level image is positive color, so that the text line area of the area in the third result image obtained by cutting out the area corresponding to the eighth text line area in the reverse gray level image is spliced to the gray level image is positive color text line area.
In summary, according to the overlapping condition between the sixth text line area and the seventh text line area, different comparison modes are adopted for the condition that the overlapping area exists and the condition that the overlapping area does not exist respectively, then the corresponding areas in the gray level image and the inverse gray level image are intercepted according to the comparison result to splice, and the third result image with only the positive text line area is directly obtained while the corresponding areas in the third result image of each text line area of the image to be identified are ensured by the comparison and splicing modes. And then character recognition is carried out on the third result image, and all character information of the image to be recognized can be recognized only once, so that the method has the advantages of accurate character recognition result and short recognition time.
Each text line area in this embodiment includes a sixth text line area, a seventh text line area, and an eighth text line area. The in-line area of each text line area is the first unit character area or the last unit character area of the text line area. The outer line region is one of a left side region, a right side region, an upper region, and a lower region adjacent to the inner line region.
Preferably, the outer line region is one of a left side region, a right side region, an upper region and a lower region adjacent to the inner line region, specifically:
When the area of the left side area is the same as that of the line inner area and the area of the right side area is different from that of the line inner area, selecting the left side area as the line outer area;
when the area of the right side area is the same as that of the line inner area and the area of the left side area is different from that of the line inner area, selecting the right side area as the line outer area;
When the left side area and the right side area are the same as the areas of the in-row areas, selecting the left side area or the right side area as the out-row area;
And when the areas of the left side area and the right side area are different from the areas of the in-row areas, selecting the upper area or the lower area as the out-of-row area.
The selecting the upper area or the lower area as the out-of-line area specifically includes:
when the area of the upper region is the same as that of the in-line region and the area of the lower region is different from that of the in-line region, selecting the upper region as the out-of-line region;
when the area of the lower area is the same as that of the in-line area and the area of the upper area is different from that of the in-line area, selecting the lower area as the out-of-line area;
and when the area of the upper area and the area of the lower area are the same as the area of the in-row area, selecting the upper area or the lower area as the out-row area.
Further, the height and width of the out-of-row region and the in-row region are the same.
According to the character recognition method based on layout analysis, the gray level image and each positive text line area in the reverse gray level image are integrated into one image to obtain the third result image, and then the character recognition is carried out on the third result image, so that all character information of an image to be recognized can be obtained by carrying out one-time recognition on the result image, the recognition service time is greatly shortened, and the efficiency of carrying out character recognition on images with complex layout is improved.
Accordingly, the present invention also discloses a character recognition device, which may include a processor and a memory, the memory storing computer instructions, the processor being configured to execute the computer instructions stored in the memory, the device implementing the steps of the method as described above when the computer instructions are executed by the processor.
The invention also relates to a storage medium having stored thereon a computer program code which, when executed by a processor, realizes the method steps as described in any of the above, the storage medium may be a tangible storage medium such as an optical disc, a usb-drive, a floppy disk, a hard disk, etc.
Those skilled in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented in hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. The present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.
In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (22)

1. A text line area locating method, comprising the steps of:
Acquiring a gray level image of an image to be identified;
Obtaining a positioning image according to the gray level image;
identifying a positive text line area and/or a negative text line area in the positioning image;
The text line area in the positioning image is a first text line area, the line area in the first text line area is a first line area, and the line area outside the first line area is a first line area; identifying a forward text line area and/or a reverse text line area in the positioning image comprises:
identifying a first text line area of the positioning image;
Selecting a first line inner area and a first line outer area from the positioning image;
determining that the first text line area is a positive text line area under the condition that the gray value of the first line inner area is smaller than the gray value of the first line outer area; and/or determining that the first text line area is an inverse text line area when the gray value of the first line inner area is larger than the gray value of the first line outer area;
Or the positioning image comprises a positioning hollow binary image and a positioning anti-hollow binary image; the text line area in the positioning hollow binary image is a second text line area, the line area of the second text line area is a second line area, the line area of the second line area is a second line area, and the absolute value of the difference between the gray value of the second line area and the gray value of the second line area is a second gray difference; the text line area in the positioning anti-empty binary image is a third text line area, the line inner area of the third text line area is a third line inner area, the line outer area of the third line inner area is a third line outer area, and the absolute value of the difference between the gray value of the third line inner area and the gray value of the third line outer area is a third gray difference; identifying a forward text line area and/or a reverse text line area in the positioning image comprises:
identifying a second text line region in the locating hollow binary image and a third text line region in the locating inverse hollow binary image;
When the second text line area and at least one third text line area have an overlapping area, acquiring a second gray level difference value of the second text line area and a third gray level difference value of the third text line area corresponding to the second text line area; or when the third text line area and at least one second text line area have an overlapping area, acquiring a third gray level difference value of the third text line area and a second gray level difference value of the second text line area corresponding to the third text line area;
When the second gray level difference value is larger than the third gray level difference value, determining that the second text line area is a positive text line area and/or the third text line area is a negative text line area;
and/or the number of the groups of groups,
And when the second gray level difference value is smaller than the third gray level difference value, determining that the second text line area is an inverse color text line area and/or the third text line area is a positive color text line area.
2. The method of claim 1, wherein the positioning image is one of a positioning gray scale image, a positioning inverse gray scale image, a positioning binary image, and a positioning inverse binary image; the positioning gray level image is the gray level image, at least the gray level image is subjected to inverse color processing to obtain the positioning inverse gray level image, at least the gray level image is subjected to binarization processing to obtain the positioning binary image, and at least the gray level image is subjected to inverse color processing and binarization processing to obtain the positioning inverse binary image.
3. The method of claim 1, wherein identifying the forward text line area and/or the reverse text line area in the localization image further comprises the steps of:
when the second text line area and each third text line area do not have an overlapping area, a fourth text line area corresponding to the second text line area is acquired in the gray level image;
Acquiring a fourth inner row region and a fourth outer row region; the line inner area of the fourth text line area is a fourth line inner area, and the line outer area of the fourth line inner area is the fourth line outer area;
Determining that the second text line area is a positive text line area when the gray value of the fourth line inner area is smaller than the gray value of the fourth line outer area; and/or the number of the groups of groups,
Determining that the second text line area is an inverse text line area under the condition that the gray level value of the fourth line inner area is larger than the gray level value of the fourth line outer area;
and/or the number of the groups of groups,
When the third text line area and each second text line area do not have an overlapping area, a fifth text line area corresponding to the third text line area is acquired in the gray level image;
Acquiring a fifth line inner region and a fifth line outer region; the line inner area of the fifth text line area is a fifth line inner area, and the line outer area of the fifth line inner area is a fifth line outer area;
determining that the third text line area is a positive text line area when the gray level value of the area in the fifth line is smaller than the gray level value of the area outside the fifth line; and/or the number of the groups of groups,
And determining that the third text line area is an inverse text line area under the condition that the gray level value of the area in the fifth line is larger than the gray level value of the area outside the fifth line.
4. A method according to any one of claims 1 to 3, characterized in that a locating hollow binary image and a locating anti-hollow binary image are obtained from the gray scale image, in particular comprising the steps of:
and at least performing background removing binarization processing on the gray level image to obtain the positioning hollow binary image and the positioning inverse hollow binary image.
5. The method of claim 4, wherein the gray scale image is subjected to at least a binarization process and an edge detection process to obtain the localization hollow binary image; and performing at least inverse color processing, binarization processing and edge detection processing on the gray level image to obtain the positioning inverse hollow binary image.
6. The method according to claim 4, wherein at least background-removing binarization is performed on the gray-scale image to obtain the locating hollow binary image and the locating inverse hollow binary image, and the method specifically comprises the following steps:
Performing at least binarization processing on the gray level image to obtain a connected domain in the processed image, setting the gray level value of each pixel point of the edge of the connected domain with the gray level value of 255 as 0, and setting the gray level value of each pixel point of the background corresponding to the connected domain as 255 to obtain a positioning hollow binary image;
And
And carrying out at least inverse color processing and binarization processing on the gray level image to obtain a connected domain in the processed image, setting the gray level value of each pixel point of the edge of the connected domain with the gray level value of 255 as 0, and setting the gray level value of each pixel point of the background corresponding to the connected domain as 255 to obtain the locating inverse hollow binary image.
7. The method of any one of claims 1, 2, 3, 5 and 6, wherein the in-line region of each text line region is at least one unit character region in the text line region.
8. The method of claim 7, wherein acquiring the out-of-line regions of each in-line region comprises:
And selecting one of a left side area, a right side area, an upper area and a lower area adjacent to the in-line area from the positioning image or the gray level image as the out-line area.
9. The method of claim 8, wherein the out-of-row region is the same area as the in-row region; in the positioning image or the gray level image, selecting one of a left side area, a right side area, an upper area and a lower area adjacent to the in-line area as the out-line area, specifically including:
When the area of the left side area is the same as that of the line inner area and the area of the right side area is different from that of the line inner area, selecting the left side area as the line outer area;
when the area of the right side area is the same as that of the line inner area and the area of the left side area is different from that of the line inner area, selecting the right side area as the line outer area;
When the left side area and the right side area are the same as the areas of the in-row areas, selecting the left side area or the right side area as the out-row area;
And when the areas of the left side area and the right side area are different from the areas of the in-row areas, selecting the upper area or the lower area as the out-of-row area.
10. The method of claim 9, wherein the outer row regions are the same as the inner row regions in both height and width.
11. The method of any of claims 8 to 10, wherein the inline region is a first unit character region or a last unit character region of the text line region.
12. A layout analysis method based on text line area positioning, characterized in that a positive text line area and a negative text line area in a positioning image are obtained by applying the text line area positioning method according to any one of claims 1 to 11; and determining a positive color region and a negative color region in the gray level image of the image to be identified.
13. The character recognition method based on layout analysis is characterized by comprising the following steps:
Applying the text line area locating method according to any one of claims 1 to 11 to obtain a forward text line area and/or a reverse text line area in a locating image; the gray-scale image in the text line area locating method according to any one of claims 1 to 11, a positive color image and a negative color image are obtained;
Acquiring at least one of a positive text line area of the positive image, a negative text line area of the positive image, a positive text line area of the negative image and a negative text line area of the negative image according to the positive text line area and/or the negative text line area in the positioning image;
Obtaining a result image by splicing the positive color image and the reverse color image according to at least one of the positive color text line area of the positive color image, the reverse color text line area of the positive color image, the positive color text line area of the reverse color image and the reverse color text line area of the reverse color image; the text line areas in the result image are positive text line areas or reverse text line areas;
And carrying out character recognition on the result image to obtain a recognition result.
14. The layout analysis-based character recognition method according to claim 13, wherein a result image is obtained by stitching the positive color image and the reverse color image according to at least one of a positive color text line area of the positive color image, a reverse color text line area of the positive color image, a positive color text line area of the reverse color image, and a reverse color text line area of the reverse color image, wherein the text line areas in the result image are both the positive color text line area or the reverse color text line area, specifically:
According to the positive text line area of the positive image and/or the positive text line area of the negative image, intercepting the positive text line area of the positive image and the positive text line area of the negative image for splicing, or intercepting the positive text line area of the positive image for splicing to the negative image correspondingly, or intercepting the positive text line area of the negative image for splicing to the positive image correspondingly, so as to obtain a first result image, wherein the text line areas in the first result image are the positive text line areas;
Or alternatively
And according to the anti-color text line area of the positive color image and/or the anti-color text line area of the anti-color image, intercepting the anti-color text line area of the positive color image and the anti-color text line area of the anti-color image for splicing, or intercepting the anti-color text line area of the positive color image for splicing to the anti-color image correspondingly, or intercepting the anti-color text line area of the anti-color image for splicing to the positive color image correspondingly, so as to obtain a second result image, wherein the text line areas in the second result image are all anti-color text line areas.
15. The layout analysis-based character recognition method according to claim 13 or 14, wherein the positive color image is at least one of a gray image, a binary image, and a blank binary image, and the negative color image is at least one of a negative gray image, a negative binary image, and a negative blank binary image; the binary image, the open binary image, the inverse gray scale image, the inverse binary image and the inverse open binary image are all obtained through the gray scale image.
16. The layout analysis based character recognition method according to claim 15, wherein when the positive color image is a gray image, the negative color image is a negative gray image; when the positive color image is a binary image, the inverse color image is an inverse binary image; when the positive color image is a hollow binary image, the inverse color image is an inverse hollow binary image.
17. A character recognition method based on layout analysis is characterized by comprising the following steps:
Acquiring a gray level image of an image to be identified;
Obtaining a hollow binary image, an inverse hollow binary image and an inverse gray image according to the gray image;
identifying a sixth text line region in the open binary image and a seventh text line region in the reverse open binary image;
Splicing the gray level image and the inverse gray level image according to the sixth text line area and the seventh text line area to obtain a third result image, wherein the text line areas in the third result image are positive text line areas;
performing character recognition on the third result image to obtain a recognition result;
The line inner area of the sixth text line area is a sixth line inner area, the line outer area of the sixth line inner area is a sixth line outer area, and the absolute value of the difference between the gray value of the sixth line inner area and the gray value of the sixth line outer area is a sixth gray difference;
An in-line region of the seventh text line region is a seventh in-line region, an out-of-line region of the seventh in-line region is an out-of-line region of the seventh in-line region, and an absolute value of a difference between a gray value of the seventh in-line region and a gray value of the out-of-line region of the seventh in-line region is a seventh gray difference value;
And according to the sixth text line area and the seventh text line area, splicing the gray level image and the inverse gray level image to obtain a third result image, wherein the method specifically comprises the following steps:
When there is an overlap region of the sixth text line area with at least one of the seventh text line areas,
Acquiring a sixth gray level difference value of the sixth text line area and a seventh gray level difference value of a seventh text line area corresponding to the sixth text line area;
when the sixth gray level difference value is smaller than the seventh gray level difference value, intercepting a region corresponding to the seventh text line region in the inverse gray level image, and splicing the region to the gray level image to obtain the third result image;
when there is no overlap region between the sixth text line area and each of the seventh text line areas and there is no overlap region between the seventh text line area and each of the sixth text line areas,
Acquiring an eighth text line area corresponding to the sixth text line area or the seventh text line area in the gray scale image;
Acquiring an eighth row inner region and an eighth row outer region; the line inner area of the eighth text line area is the eighth line inner area, and the line outer area of the eighth line inner area is the eighth line outer area; and when the gray value of the region in the eighth line is larger than the gray value of the region outside the eighth line, intercepting a region corresponding to the region in the eighth text line in the inverse gray image, and correspondingly splicing the region to the gray image to obtain the third result image.
18. The layout-based character recognition method according to claim 17, wherein an in-line area of each text line area is a first unit character area or a last unit character area of the text line area; the height and width of the outer row area and the inner row area are the same; the outer line region is one of a left side region, a right side region, an upper region, and a lower region adjacent to the inner line region.
19. The layout analysis based character recognition method according to claim 18, wherein the out-of-line area is one of a left side area, a right side area, an upper side area and a lower side area adjacent to the in-line area, specifically:
When the area of the left side area is the same as that of the line inner area and the area of the right side area is different from that of the line inner area, selecting the left side area as the line outer area;
when the area of the right side area is the same as that of the line inner area and the area of the left side area is different from that of the line inner area, selecting the right side area as the line outer area;
When the left side area and the right side area are the same as the areas of the in-row areas, selecting the left side area or the right side area as the out-row area;
And when the areas of the left side area and the right side area are different from the areas of the in-row areas, selecting the upper area or the lower area as the out-of-row area.
20. The method of claim 19, wherein the outer row regions are the same as the inner row regions in both height and width.
21. A character recognition device comprising a processor and a memory, wherein the memory has stored therein computer instructions for executing the computer instructions stored in the memory, which device, when executed by the processor, implements the steps of the method according to any one of claims 1 to 20.
22. A computer storage medium, having stored thereon a computer program which, when executed by a processor, implements the method steps of any of claims 1 to 20.
CN202010640573.6A 2020-07-06 2020-07-06 Text line region positioning method, layout analysis method and character recognition method Active CN111814778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010640573.6A CN111814778B (en) 2020-07-06 2020-07-06 Text line region positioning method, layout analysis method and character recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010640573.6A CN111814778B (en) 2020-07-06 2020-07-06 Text line region positioning method, layout analysis method and character recognition method

Publications (2)

Publication Number Publication Date
CN111814778A CN111814778A (en) 2020-10-23
CN111814778B true CN111814778B (en) 2024-10-22

Family

ID=72841612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010640573.6A Active CN111814778B (en) 2020-07-06 2020-07-06 Text line region positioning method, layout analysis method and character recognition method

Country Status (1)

Country Link
CN (1) CN111814778B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887484B (en) * 2021-10-20 2022-11-04 前锦网络信息技术(上海)有限公司 Card type file image identification method and device
CN114419636A (en) * 2022-01-10 2022-04-29 北京百度网讯科技有限公司 Text recognition method, device, equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102054271A (en) * 2009-11-02 2011-05-11 富士通株式会社 Text line detection method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8189961B2 (en) * 2010-06-09 2012-05-29 Microsoft Corporation Techniques in optical character recognition
CN110807457A (en) * 2019-10-12 2020-02-18 浙江大华技术股份有限公司 OSD character recognition method, device and storage device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102054271A (en) * 2009-11-02 2011-05-11 富士通株式会社 Text line detection method and device

Also Published As

Publication number Publication date
CN111814778A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
US10817741B2 (en) Word segmentation system, method and device
CN108229386B (en) Method, apparatus, and medium for detecting lane line
EP3309703B1 (en) Method and system for decoding qr code based on weighted average grey method
CN101689300B (en) Image segmentation and enhancement
US9171224B2 (en) Method of improving contrast for text extraction and recognition applications
KR101461233B1 (en) Image processing device, image processing method and recording medium
CN106326887B (en) A kind of method of calibration and device of optical character identification result
CN109325492B (en) Character cutting method, device, computer equipment and storage medium
US11341739B2 (en) Image processing device, image processing method, and program recording medium
JP2002133426A (en) Ruled line extraction device for extracting ruled lines from multi-valued images
CN110097059B (en) Document image binarization method, system and device based on generative adversarial network
CN111814778B (en) Text line region positioning method, layout analysis method and character recognition method
CN117094975A (en) Steel surface defect detection methods, devices and electronic equipment
CN109389110B (en) Region determination method and device
CN111445402B (en) An image denoising method and device
CN111259891B (en) Method, device, equipment and medium for identifying identity card in natural scene
US8442348B2 (en) Image noise reduction for digital images using Gaussian blurring
US9076225B2 (en) Image processing device, an image processing method and a program to be used to implement the image processing
CN110210467B (en) Formula positioning method of text image, image processing device and storage medium
US6694059B1 (en) Robustness enhancement and evaluation of image information extraction
CN115410191A (en) Text image recognition method, device, equipment and storage medium
CN112733829B (en) Feature block identification method, electronic equipment and computer readable storage medium
CN112215783B (en) Image noise point identification method, device, storage medium and equipment
CN102682308B (en) Imaging processing method and device
CN113255657B (en) Method and device for detecting scratch on bill surface, electronic equipment and machine-readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant