Disclosure of Invention
The invention aims to provide the electrical drawing frame removing method which can achieve accurate extraction of the effective area of the electrical drawing, so that the accuracy and the efficiency of electrical drawing datamation are improved.
In order to solve the technical problems, the technical scheme adopted by the invention is that the electric drawing frame removing method based on direction search comprises the following steps:
s1, judging the type of an input drawing, and judging whether the input drawing is an electronic drawing or a paper tracing drawing;
s2, after judging that the input drawing is an electronic drawing, entering an electronic drawing processing flow, firstly judging whether the drawing has an outer frame, if so, entering a step S3, and if not, directly returning to the original drawing and ending the flow;
S3, removing the central axis of the frame, and separating the frame from the drawing content;
S4, extracting connected domains from drawings with the center line of the frame removed, arranging the connected domains in sequence from large to small, detecting by using Hough straight lines to find the connected domains where the frame is located, searching the connected domains from inside to outside to find the inner contour boundary, and cutting out an effective area;
S5, after judging that the input drawing is paper scanned, firstly judging whether the paper scanned is provided with an outer frame, if so, performing expansion operation on the paper scanned, and removing gaps between the frames;
S6, extracting a connected domain from the paper scanned image, traversing from large to small, searching for linear projection from outside to inside on the connected domain image by using a direction searching method, calculating the area ratio of the outer frame to the external matrix to determine the connected domain where the frame is located, searching for a roi area from inside to outside on the searched connected domain, cutting out the roi area on a binary image, and cutting out an effective area.
The technical scheme has the advantages that the method comprises the steps of firstly carrying out pretreatment after judging the type of an electrical drawing, searching from outside to inside by using a direction searching method, removing redundant white background areas outside the drawing frame, processing the situation of adhering the drawing content and the frame, extracting connected areas from images, searching the connected areas from the centroid outwards according to the number of pixel points of the connected areas from large to small, searching the connected areas where the frame is located by using a direction searching method, finding coordinates of inflection points of the inner outline of the frame, and carrying out image cutting by using an opencv library according to the coordinates, so that the rapid removal of the electrical drawing frame is realized, and the accuracy and the efficiency of data extraction and analysis are improved. The method uses the direction search to find the specific coordinates of the frame, can solve the problems of unclosed frame, missing frame part, inclined drawing and the like, and is suitable for electronic drawing and paper tracing drawing at the same time. The preprocessing of the drawing before searching the frame can solve the problem that the center line of the frame in the electronic drawing is adhered with the content of the drawing, and the robustness of frame removal is improved.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
FIG. 1 is a flow chart of the invention, because the difference between the electronic drawing and the paper scanned drawing is large, wherein the paper scanned drawing has the problems of scanning inclination, crease, burr, insufficient scanning definition and the like, the invention adopts two sets of technical routes, the drawing type can be manually selected, and the drawing type can be divided by using a classification model, and the paper scanned drawing is processed according to the flow after the drawing type is selected.
Therefore, as shown in fig. 1, the embodiment of the invention discloses an electrical drawing frame removing method based on direction search, which comprises the following steps of.
Step S1, judging the type of the input drawing, wherein the electronic drawing is relatively regular, and has the advantage of less noise interference, and the paper tracing drawing can have the problems of crease, inclination and the like, and the two types of drawing correspond to different processing flows.
Step S2, if the electronic drawing flow is entered, firstly judging whether the drawing (shown in fig. 2) has an outer frame, if the drawing has the outer frame, entering step S3 for processing, if the drawing has no outer frame, directly returning to the original drawing and ending the flow, wherein fig. 2 is an electronic drawing with sensitive information removed, the middle line of the frame below the drawing is adhered with the drawing content, and if the input electronic drawing is the electronic drawing, entering the electronic drawing processing flow. The specific method comprises the following steps:
(1) And expanding the image boundary, adding a white frame for the image, and preventing the contour from being close to the edge to influence detection.
(2) The original image is converted into a gray image and binarized.
(3) All the closed outer contours in the binary image are extracted and the areas are calculated based on a Canny edge detection algorithm, and the area of the plane polygon is calculated by using coordinates passing through the vertexes because the closed outer contours can be irregular polygons, and the formula is as follows:
Wherein A represents the area of the polygon, n represents the number of vertices of the polygon, (x i,yi) represents the two-dimensional coordinates of the ith vertex of the polygon, the vertices are ordered clockwise, and the outline with the largest area and the coordinates of the circumscribed rectangle thereof are saved.
(4) Performing row scanning and column scanning on the binarized image from outside to inside according to four directions of up, down, left and right to find a black pixel boundary meeting the requirements, and defining a minimum circumscribed rectangle comprising the pixels:
Where x column and y row denote the number of black pixels on the column and row vectors, h is the height of the image (total number of rows), w is the width of the image (total number of columns), 1[ condition ] is an indication function, and 1 if the condition is satisfied, otherwise 0.
(5) And judging the area ratio of the circumscribed rectangle with the maximum outline and the black pixel boundary rectangle, and judging whether the input drawing has a frame or not.
And S3, removing the central axis of the frame. Because of the morphological characteristics of the frame, the interior of the frame is generally uneven, as shown in fig. 3, and because the line of the frame below the drawing is adhered with the drawing content, if the connected domain is directly extracted, the adhered drawing content and the frame are extracted together, so that the connected domain and the frame are required to be separated, fig. 3 is an electronic drawing with the central axis of the frame removed by using the separation algorithm of the invention, the central axis of the frame protrudes inwards and is adhered with the drawing content, and the frame is required to be separated from the drawing content, and the specific method is as follows:
(1) Step S2 has ensured that the drawing is located in the image, and the border coordinates of the frame are searched from outside to inside, so as to calculate the approximate coordinates of the central axis, draw a range rectangle according to the coordinates, count the number of black pixels in each row or each column according to the difference between the rows and columns, and divide the rows into two types (adhesion and non-adhesion) according to the feature that the adhesion area and the non-adhesion area have pixel differences in the range rectangle, wherein k=2 is set, namely 2 centroids are set, and the distance d between each data point and each centroid is calculated:
Where x i is the data point, c i is the centroid, and n is the number of features, n=1 in the present invention, because there is only one feature, the number of black vectors per row/column, that assigns the data point to the cluster where the closest centroid is located. After all data points are assigned to clusters, the centroid of each cluster is recalculated, assuming that there are n k data points x i, centroid c k=(ck1 in the kth cluster), then the updated new centroid c k1:
Through the steps, the rectangle of the range is divided into two types (adhesion and non-adhesion) and the adhesion area is set as the background color of the drawing, as shown in the result of fig. 4, the fig. 4 is a picture of the connected domain where the frame is located, the position of the frame on the picture of the connected domain corresponds to the position on the original picture, if the frame needs to be removed, the coordinates of the frame need to be accurately obtained, firstly, the connected domain where the frame is located in all the extracted connected domains, and then, the coordinates of the frame need to be found on the connected domain.
Step S4, extracting connected domains from drawings with the center lines of the frames removed, arranging the connected domains according to the sequence from large to small, finding the connected domain where the frames are located by using Hough straight line detection, and as shown in FIG. 5, FIG. 5 shows the result of using Hough straight line detection to find the connected domain where the frames are located in all the extracted connected domain images by using a straight line detection method, wherein the electronic drawings are relatively neat, the problems of inclination, burrs, folds, definition and the like are avoided, the frames have certain morphological characteristics, if the found straight lines meet the morphological characteristics of the frames, the connected domains are considered to have the frames, the inner contour boundary is found by using a method of searching from inside to outside for the connected domains, and the effective area is cut out, and the method is as follows:
(1) Pixels having the same pixel value and being 8-connected to each other are grouped into a set, and when one pixel p (x, y) is the same as its surrounding 8 pixels p (x-1, y), p (x+1, y), p (x, y-1), p (x, y+1), p (x-1, y-1), p (x-1, y+1), p (x+1, y-1), p (x+1, y+1) values are 8-connected. The images are scanned in a left to right, top to bottom order starting from the top left corner of the image. When an unlabeled foreground pixel is encountered, it is assigned a new label (an integer label) that identifies the start of a new connected domain. A Depth First Search (DFS) algorithm is then used to find all other foreground pixels that are connected to this pixel. The current pixel is pushed onto the stack, then the top of the stack pixel is popped up, and whether its neighboring pixels (8 connected) are unlabeled foreground pixels is checked. If so, marking the current connected domain as a label of the current connected domain, and pushing the current connected domain into a stack. This process is repeated until the stack is empty, thus marking a connected domain.
(2) In step S2, the redundant white background in the electronic drawing is removed, and because the electronic drawing is relatively regular, the invention uses hough straight line detection to detect the extracted connected domain from large to small, and when detecting that a certain connected domain simultaneously has a transverse line or a vertical line reaching a certain threshold multiple of the length or width of the whole image, the connected domain is considered to be the connected domain where the frame is located, as shown in fig. 5. After the connected domain where the frame is located is determined, searching for white pixels from the middle point to the periphery of the connected domain, recording current coordinates after searching for the white pixels, and cutting according to the coordinates to obtain a final result of the electronic drawing, wherein as shown in fig. 6, fig. 6 is a final result diagram of the electronic drawing, and compared with the original diagram of fig. 2, the frame is removed.
Step S5, if a paper tracing paper is input, as shown in FIG. 7, the paper is expanded to remove the gaps between the frames, and the rest steps are the same as those of the electronic paper, because the tracing paper may be damaged due to folds, scanning conditions and the like. In terms of flow, the processes of extracting the connected domain, searching the connected domain image where the frame is located, positioning the frame coordinates on the connected domain image and cutting the effective area on the original image after the frame is detected and the expansion operation is carried out on the paper tracing paper are the same, and the methods used in the processes are different. Fig. 7 is a paper scanned original image with sensitive information removed, and compared with an electronic drawing, the drawing has the problems of inclination, frame breakage, burrs and folds, so that the paper scanned original image needs to enter a frame removing process.
And S6, extracting a paper scanning image paper connected domain, sorting the connected domain from large to small, wherein the paper scanning image paper is not applicable to a method for searching the connected domain where the frame is located because the paper scanning image paper is inclined in a drawing, searching from outside to inside by using a direction searching method on the connected domain, when white pixels are searched, recording current coordinates, establishing a projection range rectangle with the length of a picture width (y axis) and a height (x axis), projecting the white pixels in the range onto the x axis or the y axis, calculating the projection length, and considering the connected domain as the connected domain where the frame is located if the projection length meets a certain proportion of the whole drawing length or width, wherein the method for searching the effective area from inside to outside is the same as that shown in FIG. 8, and the FIG. 8 is the connected domain where the frame is located by using the paper scanning image paper frame removing process. And searching the connected domain where the frame is located outwards from the centroid by using a direction searching method, finding the coordinates of each inflection point of the inner outline of the frame, and then performing image cutting by using an opencv library according to the coordinates, wherein as shown in fig. 9, fig. 9 is a result of paper scanning after the frame is removed.
The method can remove the frames of the electronic drawing and the paper tracing drawing, and can also treat the conditions of unclosed frames, missing frame parts, inclined drawing, adhesion of part of drawing content and the frames, and the like, so that the effective area of the electric drawing is accurately extracted, and the accuracy and the efficiency of the data of the electric drawing are improved.