CN114004984B - A method and system for comparing drawings of high-voltage cable accessories process library - Google Patents
A method and system for comparing drawings of high-voltage cable accessories process library Download PDFInfo
- Publication number
- CN114004984B CN114004984B CN202111204535.7A CN202111204535A CN114004984B CN 114004984 B CN114004984 B CN 114004984B CN 202111204535 A CN202111204535 A CN 202111204535A CN 114004984 B CN114004984 B CN 114004984B
- Authority
- CN
- China
- Prior art keywords
- pixel
- compared
- voltage cable
- template
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Character Input (AREA)
Abstract
本发明涉及一种基于高压电缆附件工艺库的图纸比对方法和系统,方法包括:对高压电缆的工艺图纸进行扫描,构建工艺图纸模板;扫描获取待比对图纸文件,进行图纸相似性检测,获取相似度最高的工艺图纸模板;从待比对图纸文件中分割出文字区域和视图区域,对文字区域进行定位、文本检测和识别,然后与工艺图纸模板对应的文本信息对比,判断是否存在差异;对视图区域进行标注尺寸的定位,对定位的标注尺寸进行字符检测和识别,然后与工艺图纸模板对应的标注尺寸信息对比,判断是否存在差异;若判断出现差异,则进行数据溯源,标记差异位置。与现有技术相比,本发明无需大量数据集进行训练,能针对性地判断出图中中的差异点,提高比对的准确性。
The present invention relates to a drawing comparison method and system based on a high-voltage cable accessory process library, the method comprising: scanning the process drawings of the high-voltage cable to construct a process drawing template; scanning to obtain the drawing file to be compared, performing drawing similarity detection, and obtaining the process drawing template with the highest similarity; segmenting the text area and the view area from the drawing file to be compared, positioning, text detection and recognition of the text area, and then comparing it with the text information corresponding to the process drawing template to determine whether there is a difference; positioning the view area with annotated dimensions, performing character detection and recognition on the positioned annotated dimensions, and then comparing it with the annotated dimension information corresponding to the process drawing template to determine whether there is a difference; if it is determined that a difference occurs, data tracing is performed and the difference position is marked. Compared with the prior art, the present invention does not require a large amount of data sets for training, can specifically determine the difference points in the drawing, and improve the accuracy of the comparison.
Description
技术领域Technical Field
本发明涉及高压电缆附件对比技术领域,尤其是涉及一种高压电缆附件工艺库图纸比对方法和系统。The present invention relates to the technical field of high-voltage cable accessory comparison, and in particular to a method and system for comparing drawings of a high-voltage cable accessory process library.
背景技术Background Art
目前电缆公司的高压电缆附件的工艺图纸作为公司至关重要的文档资料,其审核以及存储都基本以传统纸质的形式实现,存在着存储管理成本高、审核效率较低的问题。鉴于当前高压电缆附件的生产厂家众多,电缆附件类型繁多,具有不同类型的特点及局限性,而且部分厂家没有一定历史背景,生成的高压电缆附件在某些设计环节上没有经过长期的运行经验数据作为沉淀,图纸工艺设计上经常会有变化。At present, the process drawings of high-voltage cable accessories of cable companies are the most important documents of the company. Their review and storage are basically realized in the form of traditional paper, which has the problems of high storage management cost and low review efficiency. In view of the fact that there are many manufacturers of high-voltage cable accessories, there are many types of cable accessories, with different types of characteristics and limitations, and some manufacturers do not have a certain historical background. The generated high-voltage cable accessories have not been precipitated by long-term operating experience data in some design links, and the process design of the drawings often changes.
传统高压电缆附件工艺审图是对纸质图纸进行管理,对针对电缆专业高压电缆附件图纸的图像识别技术研究相对较少,但对于图像的比对、识别、分割是图像处理领域研究最多的课题,但目前图像识别技术依然是研究的重心。Traditional high-voltage cable accessories process review is to manage paper drawings. There is relatively little research on image recognition technology for cable professional high-voltage cable accessories drawings. However, image comparison, recognition, and segmentation are the most studied topics in the field of image processing. However, image recognition technology is still the focus of research.
图纸比对是对俩张图纸的差异性进行比较,是提高图纸审查效率、降低高压电缆附件审图成本的重要方法。Drawing comparison is to compare the differences between two drawings. It is an important method to improve the efficiency of drawing review and reduce the cost of high-voltage cable accessories review.
对于图纸比对的方法,目前存在AI识别技术,即通过神经网络进行比对识别,但AI识别技术需要大量的图纸数据集,数据集的数量决定了AI识别的准确性。较少数据集训练出的网络模型,其准确性难以保证。其次,由于厂商提高的图纸等扫描件的分辨率与工艺库中图纸分辨率大小不一,若要进行训练,需要对每张数据集图像进行预处理,其过程非常麻烦,需要耗费大量的时间。所以,大部分情况下,所能提供的高压电缆附件的工艺图纸不足以支持AI识别技术的建立。As for the method of drawing comparison, there is currently AI recognition technology, which uses neural networks for comparison and recognition, but AI recognition technology requires a large number of drawing data sets, and the number of data sets determines the accuracy of AI recognition. The accuracy of the network model trained with a small data set is difficult to guarantee. Secondly, since the resolution of the scanned drawings and other documents improved by the manufacturer is different from the resolution of the drawings in the process library, if training is to be carried out, each data set image needs to be pre-processed, and the process is very cumbersome and takes a lot of time. Therefore, in most cases, the process drawings of high-voltage cable accessories that can be provided are not sufficient to support the establishment of AI recognition technology.
发明内容Summary of the invention
本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种有利于规范审图流程、提高审图效率、保证审图质量以及数字化工艺图纸的存储管理的高压电缆附件工艺库图纸比对方法和系统。The purpose of the present invention is to overcome the defects of the above-mentioned prior art and provide a high-voltage cable accessory process library drawing comparison method and system that is conducive to standardizing the drawing review process, improving the drawing review efficiency, ensuring the drawing review quality and the storage management of digital process drawings.
本发明的目的可以通过以下技术方案来实现:The purpose of the present invention can be achieved by the following technical solutions:
一种基于高压电缆附件工艺库的图纸比对方法,包括以下步骤:A drawing comparison method based on a high-voltage cable accessory process library comprises the following steps:
对高压电缆的工艺图纸进行扫描,预处理后,作为工艺图纸模板,并为每个工艺图纸附带标签信息;Scan the process drawings of high-voltage cables, pre-process them, and use them as process drawing templates, and attach label information to each process drawing;
对待比对的高压电缆的工艺图纸进行扫描并预处理后,获取待比对图纸文件;After scanning and preprocessing the process drawings of the high-voltage cables to be compared, the drawing files to be compared are obtained;
将该待比对图纸文件与各个工艺图纸模板进行图纸相似性检测,获取相似度最高的工艺图纸模板;Perform drawing similarity detection on the drawing file to be compared and each process drawing template to obtain the process drawing template with the highest similarity;
从所述待比对图纸文件中分割出文字区域和视图区域,对所述文字区域进行定位、文本检测和识别,根据文字区域的文本识别结果,与所述相似度最高的工艺图纸模板对应的文本信息对比,判断是否存在差异;Segmenting a text area and a view area from the drawing file to be compared, positioning, detecting and recognizing text in the text area, and comparing the text recognition result of the text area with the text information corresponding to the process drawing template with the highest similarity to determine whether there is a difference;
对所述视图区域进行标注尺寸的定位,对定位的标注尺寸进行字符检测和识别,根据视图区域的标注尺寸识别结果,与所述相似度最高的工艺图纸模板对应的标注尺寸信息对比,判断是否存在差异;Positioning the dimension annotation of the view area, performing character detection and recognition on the positioned dimension annotation, and comparing the dimension annotation recognition result of the view area with the dimension annotation information corresponding to the process drawing template with the highest similarity to determine whether there is a difference;
若判断出现差异,则进行数据溯源,在所述待比对图纸文件中标记差异位置,否则输出该待比对图纸文件与工艺图纸模板相似的结果。If it is determined that a difference occurs, data tracing is performed and the difference position is marked in the drawing file to be compared. Otherwise, a result is outputted that the drawing file to be compared is similar to the process drawing template.
进一步地,采用文本行检测算法进行所述文本检测。Furthermore, a text line detection algorithm is used to perform the text detection.
进一步地,采用基于深度卷积神经网络的目标检测算法,从所述待比对图纸文件中分割出文字区域和视图区域。Furthermore, a target detection algorithm based on a deep convolutional neural network is used to segment the text area and the view area from the drawing file to be compared.
进一步地,采用偏二叉树分类方法,将所述视图区域的标注尺寸定位结果和标注尺寸识别结果,与所述相似度最高的工艺图纸模板对应的视图区域标注尺寸定位信息和标注尺寸信息对比,判断是否存在差异。Furthermore, a partial binary tree classification method is used to compare the dimension positioning result and dimension recognition result of the view area with the dimension positioning information and dimension information of the view area corresponding to the process drawing template with the highest similarity to determine whether there is a difference.
进一步地,所述图纸相似性检测具体为通过活动窗口对比所述待比对图纸文件与工艺图纸模板,根据相同位置的活动窗口内的像素值对比,计算该位置的活动窗口的相似度,通过活动窗口进行遍历,获取待比对图纸文件与工艺图纸模板整体的相似性结果,选取相似度结果最高的工艺图纸模板作为最相似的工艺图纸模板。Furthermore, the drawing similarity detection is specifically to compare the drawing file to be compared with the process drawing template through an active window, calculate the similarity of the active window at the same position based on the pixel value comparison within the active window, traverse through the active window to obtain the overall similarity result of the drawing file to be compared and the process drawing template, and select the process drawing template with the highest similarity result as the most similar process drawing template.
进一步地,所述图纸相似性检测还包括:Furthermore, the drawing similarity detection further includes:
对所述待比对图纸文件进行特征识别,获取图纸文件特征图;Performing feature recognition on the drawing file to be compared to obtain a feature map of the drawing file;
对所述图纸文件特征图进行字符识别,增强字符图案,增强所述待比对图纸文件的字符图案。Character recognition is performed on the characteristic graph of the drawing file to enhance the character pattern and enhance the character pattern of the drawing file to be compared.
进一步地,采用LBP算法对所述待比对图纸文件进行特征识别。Furthermore, the LBP algorithm is used to perform feature recognition on the drawing file to be compared.
进一步地,所述LBP算法的窗口大小的确定过程包括以下步骤:Furthermore, the process of determining the window size of the LBP algorithm includes the following steps:
设定并初始化参数k,Set and initialize parameter k,
根据所述参数k设定每一个像素的活动窗口,并设定每一个像素的活动窗口的像素平均强度值的计算公式;An active window of each pixel is set according to the parameter k, and a calculation formula for the average pixel intensity value of the active window of each pixel is set;
对于每一个像素,分别在水平方向和垂直方向上计算该像素的两个互不重叠的窗口之间的像素平均强度差,For each pixel, the average pixel intensity difference between two non-overlapping windows of the pixel is calculated in the horizontal and vertical directions.
对于每一个像素,获取能使得水平方向上两个互不重叠的窗口之间的像素平均强度差或垂直方向上两个互不重叠的窗口之间的像素平均强度差的值达到最大的k值;For each pixel, obtain a k value that can maximize the average pixel intensity difference between two non-overlapping windows in the horizontal direction or the average pixel intensity difference between two non-overlapping windows in the vertical direction;
根据每个像素确定的k值,确定活动窗口的大小,作为LBP算法的窗口大小。According to the k value determined for each pixel, the size of the active window is determined as the window size of the LBP algorithm.
进一步地,所述像素平均强度值的计算表达式为:Furthermore, the calculation expression of the pixel average intensity value is:
式中,x为像素的横坐标,y为像素的纵坐标,Ak(x,y)为像素(x,y在一个活动窗口中的像素平均强度值,g(i,j)为图像中坐标为(i,j)的像素值。Where x is the horizontal coordinate of the pixel, y is the vertical coordinate of the pixel, Ak (x,y) is the average intensity value of the pixel (x,y) in an active window, and g(i,j) is the pixel value at coordinate (i,j) in the image.
进一步地,所述水平方向上两个互不重叠的窗口之间的像素平均强度差Ek,h(x,y)的计算表达式为:Furthermore, the calculation expression of the pixel average intensity difference E k,h (x, y) between two non-overlapping windows in the horizontal direction is:
Ek,h(x,y)=|Ak(x+2k-1,y)-Ak(x-2k-1,y)|E k, h (x, y) = |A k (x+2 k-1 , y)-A k (x-2 k-1 , y) |
所述垂直方向上两个互不重叠的窗口之间的像素平均强度差Ek,v(x,y)的计算表达式为:The calculation expression of the pixel average intensity difference E k,v (x, y) between two non-overlapping windows in the vertical direction is:
Ek,v(x,y)=|Ak(x,y+2k-1)-Ak(x,y-2k-1)|。E k, v (x, y) = |A k (x, y+2 k-1 )-A k (x, y-2 k-1 )|.
进一步地,通过光学字符识别方法对所述图纸文件特征图进行字符识别,增强字符图案,所述光学字符识别方法包括对所述图纸文件特征图进行预处理后,载入预先训练后的字符识别引擎中进行字符识别,最后对字符识别结果进行图片增强。Furthermore, character recognition is performed on the feature map of the drawing file through an optical character recognition method to enhance the character pattern. The optical character recognition method includes preprocessing the feature map of the drawing file, loading it into a pre-trained character recognition engine for character recognition, and finally enhancing the image of the character recognition result.
进一步地,所述字符识别引擎采用开源OCR引擎Tesseract。Furthermore, the character recognition engine adopts the open source OCR engine Tesseract.
进一步地,所述预处理包括二值化处理、图片锐化处理和去噪处理。Furthermore, the preprocessing includes binarization processing, image sharpening processing and denoising processing.
进一步地,所述扫描包括采用扫描仪扫描识别为电子版图纸。Furthermore, the scanning includes using a scanner to scan and identify the electronic version of the drawing.
进一步地,所述图纸比对方法还包括:建立高压电缆附件工艺库,用于存储所述工艺图纸模板及其标签信息。Furthermore, the drawing comparison method also includes: establishing a high-voltage cable accessory process library for storing the process drawing templates and their label information.
本发明还提供一种基于高压电缆附件工艺库的图纸比对系统,包括:The present invention also provides a drawing comparison system based on a high-voltage cable accessory process library, comprising:
高压电缆附件工艺库模块,用于对高压电缆的工艺图纸进行扫描,预处理后,作为工艺图纸模板,并为每个工艺图纸附带标签信息,建立高压电缆附件工艺库,用于存储所述工艺图纸模板及其标签信息;A high-voltage cable accessory process library module is used to scan the process drawings of high-voltage cables, and after preprocessing, use them as process drawing templates, attach label information to each process drawing, and establish a high-voltage cable accessory process library for storing the process drawing templates and their label information;
待比对图纸文件扫描模块,用于对待比对的高压电缆的工艺图纸进行扫描并预处理后,获取待比对图纸文件;The drawing file scanning module to be compared is used to scan and pre-process the process drawings of the high-voltage cables to be compared, and then obtain the drawing files to be compared;
相似图纸检测模块,用于将所述待比对图纸文件与各个工艺图纸模板进行图纸相似性检测,获取相似度最高的工艺图纸模板;A similar drawing detection module is used to perform drawing similarity detection on the drawing file to be compared with each process drawing template to obtain the process drawing template with the highest similarity;
图纸对比模块,用于从所述待比对图纸文件中分割出文字区域和视图区域,对所述文字区域进行定位、文本检测和识别,根据文字区域的文本识别结果,与所述相似度最高的工艺图纸模板对应的文本信息对比,判断是否存在差异;A drawing comparison module is used to segment the text area and the view area from the drawing file to be compared, locate the text area, detect and recognize the text, and compare the text recognition result of the text area with the text information corresponding to the process drawing template with the highest similarity to determine whether there is a difference;
对所述视图区域进行标注尺寸的定位,对定位的标注尺寸进行字符检测和识别,根据视图区域的标注尺寸识别结果,与所述相似度最高的工艺图纸模板对应的标注尺寸信息对比,判断是否存在差异;Positioning the dimension annotation of the view area, performing character detection and recognition on the positioned dimension annotation, and comparing the dimension annotation recognition result of the view area with the dimension annotation information corresponding to the process drawing template with the highest similarity to determine whether there is a difference;
若判断出现差异,则进行数据溯源,在所述待比对图纸文件中标记差异位置,否则输出该待比对图纸文件与工艺图纸模板相似的结果。If it is determined that a difference occurs, data tracing is performed and the difference position is marked in the drawing file to be compared. Otherwise, a result is outputted that the drawing file to be compared is similar to the process drawing template.
进一步地,采用文本行检测算法进行所述文本检测;Further, a text line detection algorithm is used to perform the text detection;
采用基于深度卷积神经网络的目标检测算法,从所述待比对图纸文件中分割出文字区域和视图区域;Using a target detection algorithm based on a deep convolutional neural network, segmenting the text area and the view area from the drawing file to be compared;
采用偏二叉树分类方法,将所述视图区域的标注尺寸定位结果和标注尺寸识别结果,与所述相似度最高的工艺图纸模板对应的视图区域标注尺寸定位信息和标注尺寸信息对比,判断是否存在差异。A partial binary tree classification method is adopted to compare the dimension positioning result and the dimension recognition result of the view area with the dimension positioning information and the dimension information of the view area corresponding to the process drawing template with the highest similarity to determine whether there is a difference.
进一步地,所述图纸相似性检测具体为通过活动窗口对比所述待比对图纸文件与工艺图纸模板,根据相同位置的活动窗口内的像素值对比,计算该位置的活动窗口的相似度,通过活动窗口进行遍历,获取待比对图纸文件与工艺图纸模板整体的相似性结果,选取相似度结果最高的工艺图纸模板作为最相似的工艺图纸模板。Furthermore, the drawing similarity detection is specifically to compare the drawing file to be compared with the process drawing template through an active window, calculate the similarity of the active window at the same position based on the pixel value comparison within the active window, traverse through the active window to obtain the overall similarity result of the drawing file to be compared and the process drawing template, and select the process drawing template with the highest similarity result as the most similar process drawing template.
进一步地,所述图纸相似性检测还包括:Furthermore, the drawing similarity detection further includes:
对所述待比对图纸文件进行特征识别,获取图纸文件特征图;Performing feature recognition on the drawing file to be compared to obtain a feature map of the drawing file;
对所述图纸文件特征图进行字符识别,增强字符图案,获取待比对图纸文件的字符增强图;将该字符增强图与对应的工艺图纸模板进行比对,进行一致性判断,并输出对该待比对图纸文件的图纸对比结果。Perform character recognition on the feature map of the drawing file, enhance the character pattern, and obtain a character enhanced map of the drawing file to be compared; compare the character enhanced map with the corresponding process drawing template, make a consistency judgment, and output the drawing comparison result of the drawing file to be compared.
进一步地,采用LBP算法对所述待比对图纸文件进行特征识别;Further, the LBP algorithm is used to perform feature recognition on the drawing file to be compared;
所述LBP算法的窗口大小的确定过程包括以下步骤:The process of determining the window size of the LBP algorithm includes the following steps:
设定并初始化参数k,Set and initialize parameter k,
根据所述参数k设定每一个像素的活动窗口,并设定每一个像素的活动窗口的像素平均强度值的计算公式;An active window of each pixel is set according to the parameter k, and a calculation formula for the average pixel intensity value of the active window of each pixel is set;
对于每一个像素,分别在水平方向和垂直方向上计算该像素的两个互不重叠的窗口之间的像素平均强度差,For each pixel, the average pixel intensity difference between two non-overlapping windows of the pixel is calculated in the horizontal and vertical directions.
对于每一个像素,获取能使得水平方向上两个互不重叠的窗口之间的像素平均强度差或垂直方向上两个互不重叠的窗口之间的像素平均强度差的值达到最大的k值;For each pixel, obtain a k value that can maximize the average pixel intensity difference between two non-overlapping windows in the horizontal direction or the average pixel intensity difference between two non-overlapping windows in the vertical direction;
根据每个像素确定的k值,确定活动窗口的大小,作为LBP算法的窗口大小;所述像素平均强度值的计算表达式为:According to the k value determined for each pixel, the size of the active window is determined as the window size of the LBP algorithm; the calculation expression of the pixel average intensity value is:
式中,x为像素的横坐标,y为像素的纵坐标,Ak(x,y)为像素(x,y)在一个活动窗口中的像素平均强度值,g(i,j)为图像中坐标为(i,j)的像素值;Where x is the horizontal coordinate of the pixel, y is the vertical coordinate of the pixel, Ak (x,y) is the average pixel intensity value of the pixel (x,y) in an active window, and g(i,j) is the pixel value with coordinates (i,j) in the image;
所述水平方向上两个互不重叠的窗口之间的像素平均强度差Ek,h(x,y)的计算表达式为:The calculation expression of the pixel average intensity difference E k,h (x, y) between two non-overlapping windows in the horizontal direction is:
Ek,h(x,y)=|Ak(x+2k-1,y)-Ak(x-2k-1,y)|E k, h (x, y) = |A k (x+2 k-1 , y)-A k (x-2 k-1 , y) |
所述垂直方向上两个互不重叠的窗口之间的像素平均强度差Ek,v(x,y)的计算表达式为:The calculation expression of the pixel average intensity difference E k,v (x, y) between two non-overlapping windows in the vertical direction is:
Ek,v(x,y)=|Ak(x,y+2k-1)-Ak(x,y-2k-1)|。E k, v (x, y) = |A k (x, y+2 k-1 )-A k (x, y-2 k-1 )|.
进一步地,通过光学字符识别方法对所述图纸文件特征图进行字符识别,增强字符图案,所述光学字符识别方法包括对所述图纸文件特征图进行预处理后,载入预先训练后的字符识别引擎中进行字符识别,最后对字符识别结果进行图片增强;Further, character recognition is performed on the feature map of the drawing file by an optical character recognition method to enhance the character pattern. The optical character recognition method includes preprocessing the feature map of the drawing file, loading it into a pre-trained character recognition engine for character recognition, and finally enhancing the character recognition result;
所述字符识别引擎采用开源OCR引擎Tesseract。The character recognition engine adopts the open source OCR engine Tesseract.
进一步地,所述预处理包括二值化处理、图片锐化处理和去噪处理。Furthermore, the preprocessing includes binarization processing, image sharpening processing and denoising processing.
进一步地,所述扫描包括采用扫描仪扫描识别为电子版图纸。Furthermore, the scanning includes using a scanner to scan and identify the electronic version of the drawing.
与现有技术相比,本发明具有以下优点:Compared with the prior art, the present invention has the following advantages:
(1)本发明构建工艺图纸模板后,首先通过计算待比对图纸文件的相似度,进行相似度排序,得到相似度最高的工艺图纸模板;然后对图纸进行分割,分割出文字区域和视图区域;对文字区域进行定位,文本检测和识别;对视图区域进行标注尺寸的定位,对标注尺寸进行字符检测和识别;识别出的尺寸信息通过数据的形式与数据库中的信息进行比对,判断图纸的一致性,若出现差异,对数据进行溯源,找到图纸中的差异点;(1) After constructing the process drawing template, the present invention first calculates the similarity of the drawing files to be compared, sorts the similarity, and obtains the process drawing template with the highest similarity; then the drawing is segmented into a text area and a view area; the text area is located, and text is detected and recognized; the view area is located for dimension annotation, and the dimension annotation is detected and recognized; the identified dimension information is compared with the information in the database in the form of data to determine the consistency of the drawing; if there is a difference, the data is traced to find the difference point in the drawing;
该方案无需大量数据集进行训练,通过相似度判断搜索最相似的工艺图纸模板;然后对文字区域和视图区域分别进行检测识别,能针对性地判断出差异点,提高比对的准确性。This solution does not require a large amount of data sets for training. It searches for the most similar process drawing template through similarity judgment. It then detects and identifies the text area and view area separately, and can specifically identify the differences, thereby improving the accuracy of the comparison.
(2)本发明采用LBP算法对高压电缆附件图纸进行特征识别时,发现了现有的LBP算法对复杂的高压电缆附件图纸进行处理时,具有局限性,因此提出一种自适应阈值的活动窗口选取方案,获取使得水平和垂直方向上互不重叠的窗口之间的像素平均强度差最大的窗口大小,与LBP算法的结合,能减少LBP在基元特征提取上的误差。(2) When the present invention uses the LBP algorithm to perform feature recognition on high-voltage cable accessory drawings, it is found that the existing LBP algorithm has limitations when processing complex high-voltage cable accessory drawings. Therefore, an active window selection scheme with an adaptive threshold is proposed to obtain a window size that maximizes the average pixel intensity difference between non-overlapping windows in the horizontal and vertical directions. The combination of this scheme and the LBP algorithm can reduce the error of LBP in extracting primitive features.
(3)本发明搭建了存储有工艺图纸模板的高压电缆附件工艺库,提供了规范化、标准化的技术协议以及工艺图纸的审核及鉴别流程。(3) The present invention builds a high-voltage cable accessory process library that stores process drawing templates, and provides a standardized and standardized technical protocol and a review and identification process for process drawings.
(4)相比传统图纸,数字化工艺图纸能够利用图像降噪及增强技术实现提高图纸清晰度和显示效果的目的,从而可以使字迹图案更清楚,更好地为工作人员展示设计图纸的效果。(4) Compared with traditional drawings, digital process drawings can use image noise reduction and enhancement technology to improve the clarity and display effect of drawings, thereby making the handwriting patterns clearer and better showing the effects of design drawings to staff.
(5)相比传统的人工审核以及管理,数字化工艺库管理更便捷且安全可靠,减少了生产人员手工翻阅纸质资料、统计汇总及资料存储管理的工作,真正达到了提高工艺图纸的管理审核工作效率、降低成本的目的。通过长期使用在生产运行维护、人工管理等方面的成本都大幅下降,为公司创造更好的经济效益。(5) Compared with traditional manual review and management, digital process library management is more convenient, safe and reliable, reducing the work of production personnel manually flipping through paper materials, statistical compilation and data storage management, and truly achieving the goal of improving the management and review efficiency of process drawings and reducing costs. Through long-term use, the costs of production operation maintenance, manual management, etc. have been greatly reduced, creating better economic benefits for the company.
(6)本方法可以实现高压电缆附件全过程动态监管,实时保证工艺库的标准化,大幅减少非标准的工艺图带来的风险成本及安全隐患。(6) This method can realize dynamic supervision of the entire process of high-voltage cable accessories, ensure the standardization of the process library in real time, and significantly reduce the risk costs and safety hazards brought by non-standard process drawings.
(7)通过对所有高压电缆附件的全面审核大幅提升工作效率,审核维度全面科学,具有技术支持和逻辑设计,使审核流程简洁高效且可靠。(7) Through the comprehensive review of all high-voltage cable accessories, work efficiency is greatly improved. The review dimensions are comprehensive and scientific, with technical support and logical design, making the review process simple, efficient and reliable.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明实施例中提供的一种基于高压电缆附件工艺库的图纸比对方法的图纸对比流程示意图;FIG1 is a schematic diagram of a drawing comparison process of a drawing comparison method based on a high-voltage cable accessories process library provided in an embodiment of the present invention;
图2为本发明实施例中提供的一种对标注尺寸识别结果进行比对的流程示意图;FIG2 is a schematic diagram of a flow chart of comparing the result of dimension recognition provided in an embodiment of the present invention;
图3为本实施例中提供的一种待比对图纸文件的识别结果。FIG. 3 is a recognition result of a drawing file to be compared provided in this embodiment.
具体实施方式DETAILED DESCRIPTION
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Generally, the components of the embodiments of the present invention described and shown in the drawings here can be arranged and designed in various different configurations.
因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。Therefore, the following detailed description of the embodiments of the present invention provided in the accompanying drawings is not intended to limit the scope of the invention claimed for protection, but merely represents selected embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that similar reference numerals and letters denote similar items in the following drawings, and therefore, once an item is defined in one drawing, further definition and explanation thereof is not required in subsequent drawings.
在本发明的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,或者是该发明产品使用时惯常摆放的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it should be noted that the terms "center", "up", "down", "left", "right", "vertical", "horizontal", "inside", "outside", etc. indicate directions or positional relationships based on the directions or positional relationships shown in the accompanying drawings, or are the directions or positional relationships in which the inventive product is usually placed when in use. They are only for the convenience of describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific direction, be constructed and operated in a specific direction, and therefore should not be understood as a limitation on the present invention.
需要说明的是,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。It should be noted that the terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of this application, the meaning of "plurality" is two or more, unless otherwise clearly and specifically defined.
此外,术语“水平”、“竖直”等术语并不表示要求部件绝对水平或悬垂,而是可以稍微倾斜。如“水平”仅仅是指其方向相对“竖直”而言更加水平,并不是表示该结构一定要完全水平,而是可以稍微倾斜。In addition, the terms "horizontal", "vertical" and the like do not mean that the components are required to be absolutely horizontal or suspended, but can be slightly tilted. For example, "horizontal" only means that its direction is more horizontal than "vertical", and does not mean that the structure must be completely horizontal, but can be slightly tilted.
实施例1Example 1
如图1所示,本实施例提供一种基于高压电缆附件工艺库的图纸比对方法,包括以下步骤:As shown in FIG1 , this embodiment provides a drawing comparison method based on a high-voltage cable accessory process library, comprising the following steps:
对高压电缆的工艺图纸进行扫描,预处理后,作为工艺图纸模板,并为每个工艺图纸附带标签信息;Scan the process drawings of high-voltage cables, pre-process them, and use them as process drawing templates, and attach label information to each process drawing;
对待比对的高压电缆的工艺图纸进行扫描并预处理后,获取待比对图纸文件;After scanning and preprocessing the process drawings of the high-voltage cables to be compared, the drawing files to be compared are obtained;
将该待比对图纸文件与各个工艺图纸模板进行图纸相似性检测,获取相似度最高的工艺图纸模板;Perform drawing similarity detection on the drawing file to be compared and each process drawing template to obtain the process drawing template with the highest similarity;
从待比对图纸文件中分割出文字区域和视图区域,对文字区域进行定位、文本检测和识别,根据文字区域的文本识别结果,与相似度最高的工艺图纸模板对应的文本信息对比,判断是否存在差异;Segment the text area and view area from the drawing file to be compared, locate the text area, detect and recognize the text, and compare the text recognition result of the text area with the text information corresponding to the process drawing template with the highest similarity to determine whether there is a difference;
对视图区域进行标注尺寸的定位,对定位的标注尺寸进行字符检测和识别,根据视图区域的标注尺寸识别结果,与相似度最高的工艺图纸模板对应的标注尺寸信息对比,判断是否存在差异;Locate the dimension annotation of the view area, perform character detection and recognition on the located dimension annotation, and compare the dimension annotation information corresponding to the process drawing template with the highest similarity based on the dimension annotation recognition result of the view area to determine whether there is a difference;
若判断出现差异,则进行数据溯源,在待比对图纸文件中标记差异位置,否则输出该待比对图纸文件与工艺图纸模板相似的结果。If a difference is found, data tracing is performed and the difference position is marked in the drawing file to be compared. Otherwise, the result that the drawing file to be compared is similar to the process drawing template is output.
下面对图纸对比过程、图纸相似性检测过程以及其他步骤分别进行描述。The drawing comparison process, drawing similarity detection process and other steps are described below.
一、图纸对比过程1. Drawing comparison process
本实施例采用文本行检测算法进行文本检测;This embodiment uses a text line detection algorithm to perform text detection;
采用分割深度学习方法,从待比对图纸文件中分割出文字区域和视图区域;The segmentation deep learning method is used to segment the text area and view area from the drawing file to be compared;
采用偏二叉树分类方法,将视图区域的标注尺寸定位结果和标注尺寸识别结果,与相似度最高的工艺图纸模板对应的视图区域标注尺寸定位信息和标注尺寸信息对比,判断是否存在差异。The partial binary tree classification method is used to compare the dimension positioning results and dimension recognition results of the view area with the dimension positioning information and dimension information of the view area corresponding to the process drawing template with the highest similarity to determine whether there are any differences.
采用Halcon机器视觉软件:HALCON是德国MVtec公司开发的一套完善的标准的机器视觉算法包,拥有应用广泛的机器视觉集成开发环境。HALCON支持Windows,Linux和MacOS X操作环境,整个函数库可以用C,C++,C#,Visual basic和Delphi等多种普通编程语言访问。Using Halcon machine vision software: HALCON is a complete set of standard machine vision algorithm packages developed by MVtec of Germany, with a widely used machine vision integrated development environment. HALCON supports Windows, Linux and MacOS X operating environments, and the entire function library can be accessed using a variety of common programming languages such as C, C++, C#, Visual basic and Delphi.
下面对分割深度学习方法和偏二叉树分类方法的原理进行描述。The principles of the segmentation deep learning method and the partial binary tree classification method are described below.
分割深度学习方法:采用基于深度卷积神经网络的目标检测算法实现图纸的分割,将图纸视图分割成多个子视图。目标检测算法采用基于回归的目标检测算法,该算法的目标检测速度相比采用其他方法的算法快很多。Segmentation deep learning method: Use a target detection algorithm based on a deep convolutional neural network to segment drawings and divide the drawing view into multiple sub-views. The target detection algorithm uses a regression-based target detection algorithm, which has a much faster target detection speed than other algorithms.
采用的检测算法将目标检测作为回归问题来处理,该网络模型接收原始像素直接输出目标的边界回归和类别。在图像特征提取方式上,该算法对图片的全局区域进行训练,速度加快的同时,能够更好的区分目标和背景。在网络预测方式上,在预测图片上采用端对端的检测,将整个图片分为S*S个网格区域,然后从每个网格中心建立多个先验框,这些框是网络预先设定好的框,网络的预测结果会判断这些框内是否包含物体,以及这个物体的种类。再获取预测框中心并计算出先验框的框高,得到整个预测框的位置。得到最终的预测结果后还要进行得分排序与非极大抑制筛选,取出每一类得分大于设定阈值的框和得分,利用框的位置和得分进行非极大值抑制,最后得出结果。The detection algorithm used treats target detection as a regression problem. The network model receives the original pixels and directly outputs the boundary regression and category of the target. In terms of image feature extraction, the algorithm trains the global area of the image, which can better distinguish the target and background while speeding up. In terms of network prediction, end-to-end detection is used on the predicted image, and the entire image is divided into S*S grid areas. Then, multiple prior frames are established from the center of each grid. These frames are pre-set by the network. The prediction results of the network will determine whether these frames contain objects and the type of the object. Then, the center of the predicted frame is obtained and the frame height of the prior frame is calculated to obtain the position of the entire predicted frame. After obtaining the final prediction result, score sorting and non-maximum suppression screening are performed. The frames and scores of each category with scores greater than the set threshold are taken out, and non-maximum suppression is performed using the position and score of the frame to finally obtain the result.
偏二叉树分类方法:偏二叉树方法是根据偏二叉树思想得出的方法,其方法主要是根据输入数组样本的特征参数,如数组长度等特征,得出各数组特征间的相似性测度。根据相似性测度的大小,以此构建多个分类器。从而实现数据的比较和分类。Partial binary tree classification method: The partial binary tree method is a method derived from the partial binary tree idea. The method mainly obtains the similarity measure between the features of each array based on the characteristic parameters of the input array samples, such as array length and other features. According to the size of the similarity measure, multiple classifiers are constructed to achieve data comparison and classification.
具体实施过程如下:The specific implementation process is as follows:
S1:进行图纸预处理;S1: Preprocess drawings;
S2:进行图纸相似性检测,获取相似度最高的工艺图纸模板;S2: Perform drawing similarity detection to obtain the process drawing template with the highest similarity;
S3:进行图像分割,实现区域分离并生成新的图像,包括文字区域图像和视图区域图像;S3: Perform image segmentation to separate regions and generate new images, including text region images and view region images;
S4:在视图区域图像中,进行标注尺寸定位;S4: Positioning the annotation size in the view area image;
S5:采用Text分割识别方法,识别标注尺寸;S5: Use the text segmentation and recognition method to identify the annotation size;
S6:将识别出的尺寸数组与工艺图纸模板对应的尺寸数组进行对比,如图2所示,具体为采用偏二叉树分类方法首先判断数据数量是否一致,若不一致,则存在差异,否则判断数据大小是否一致,若不一致则存在差异;S6: Compare the identified size array with the size array corresponding to the process drawing template, as shown in FIG2 , specifically, using a partial binary tree classification method to first determine whether the number of data is consistent, if not, there is a difference, otherwise determine whether the data size is consistent, if not, there is a difference;
S7:在尺寸定位识别过程中提前获得尺寸的位置坐标,若存在差异,返还坐标并生成醒目的矩形框,标出差异;S7: obtaining the position coordinates of the dimensions in advance during the dimension positioning and recognition process. If there is a difference, the coordinates are returned and a striking rectangular frame is generated to mark the difference;
S8:若数据数量和大小均一致,则进行数据差异对比,并标注存在差异的地方,如图3所示,图3中右上角的方框即为标注的差异区域。S8: If the number and size of the data are consistent, then a data difference comparison is performed and the places where the differences exist are marked, as shown in FIG3 . The box in the upper right corner of FIG3 is the marked difference area.
二、图纸相似性检测过程2. Drawing Similarity Detection Process
图纸相似性检测具体为通过活动窗口对比待比对图纸文件与工艺图纸模板,根据相同位置的活动窗口内的像素值对比,计算该位置的活动窗口的相似度,通过活动窗口进行遍历,获取待比对图纸文件与工艺图纸模板整体的相似性结果,选取相似度结果最高的工艺图纸模板作为最相似的工艺图纸模板。The drawing similarity detection is specifically to compare the drawing file to be compared with the process drawing template through the active window, calculate the similarity of the active window at the same position based on the pixel value comparison within the active window, traverse the active window to obtain the overall similarity result between the drawing file to be compared and the process drawing template, and select the process drawing template with the highest similarity result as the most similar process drawing template.
优选地,在最相似的工艺图纸模板中,选取活动窗口的相似度低于相似度阈值的活动窗口位置,在待比对图纸文件中附加标识框。利用标识框告知审核人员,审核人员将借助比对报告实现高效的审核工作。Preferably, in the most similar process drawing template, the active window position whose similarity of the active window is lower than the similarity threshold is selected, and a mark frame is added to the drawing file to be compared. The mark frame is used to inform the auditor, and the auditor will use the comparison report to achieve efficient audit work.
优选地,图纸相似性检测还包括:Preferably, the drawing similarity detection further includes:
对待比对图纸文件进行特征识别,获取图纸文件特征图;Perform feature recognition on the drawing file to be compared, and obtain a feature map of the drawing file;
对图纸文件特征图进行字符识别,增强字符图案,获取字符增强图,用于图纸比对。Perform character recognition on the feature map of the drawing file, enhance the character pattern, and obtain the character enhanced map for drawing comparison.
具体地,本实施例使用图像识别技术、OCR字符识别技术以及一致性检测算法,通过高清扫描设备采用光电技术和数字处理技术对高压电缆图纸、技术协议等工艺图进行扫描进入工艺库,将电子版工艺图纸或工艺文档进行特征提取和字符识别、与标准工艺库里固化的工艺图纸模板进行比对,最终系统根据算法判断比对的结果从而形成比对报告以便后续审核。并且针对在不同厂家的高压电缆附件工艺图纸或文档上的修改情形进行分析,对图纸的删除、新增和修改部分进行系统提示,简化设计审核人员比对审核图纸的过程,减少错漏现象,提高工作效率。Specifically, this embodiment uses image recognition technology, OCR character recognition technology and consistency detection algorithm, and uses photoelectric technology and digital processing technology to scan high-voltage cable drawings, technical agreements and other process drawings into the process library through high-definition scanning equipment, extract features and character recognition of electronic process drawings or process documents, and compare them with the process drawing templates solidified in the standard process library. Finally, the system determines the comparison results according to the algorithm to form a comparison report for subsequent review. In addition, the modification of high-voltage cable accessory process drawings or documents from different manufacturers is analyzed, and the system prompts the deletion, addition and modification of the drawings, which simplifies the process of design reviewers comparing and reviewing drawings, reduces errors and omissions, and improves work efficiency.
下面对图纸相似性检测的新增步骤进行具体描述The following is a detailed description of the new steps for drawing similarity detection
2.1、根据图像识别技术的特征识别2.1. Feature recognition based on image recognition technology
在高压电缆附件数字化工艺库中的图像识别技术中,主要涉及到LBP算子和HOG算子等特征抽取及边缘检测算法。在本实施例中整个图像识别部分的流程包含图像预处理(图像降噪、图像增强)、图像复原(重建图像,恢复图像)、图像编码与压缩、图像分割(划分不同特征的区域)以及最终的识别。The image recognition technology in the digital process library of high-voltage cable accessories mainly involves feature extraction and edge detection algorithms such as LBP operator and HOG operator. In this embodiment, the entire image recognition process includes image preprocessing (image denoising, image enhancement), image restoration (reconstruction of image, restoration of image), image coding and compression, image segmentation (dividing regions with different features) and final recognition.
其中,LBP算子由ojala等人于96年提出,是一种特征描述的经典算子,广泛应用于图像分析领域,该算子不仅能捕获丰富的细节信息,而且能压缩冗余信息。但是当这种LBP算子的半径太大时,噪声的敏感度就会加强。梯度直方图HOG由法国研究人员Dalal提出。HOG算法的主要目的是将灰度化,归一化的图像进行梯度计算,统计图像的梯度信息,将图像划分成小的细胞单元形成每张图纸的独有的HOG特征,从而实现后续图纸的比对。Among them, the LBP operator was proposed by Ojala et al. in 1996. It is a classic operator for feature description and is widely used in the field of image analysis. This operator can not only capture rich detail information, but also compress redundant information. However, when the radius of this LBP operator is too large, the sensitivity to noise will be enhanced. The gradient histogram HOG was proposed by French researcher Dalal. The main purpose of the HOG algorithm is to calculate the gradient of the grayscale and normalized image, count the gradient information of the image, divide the image into small cell units to form the unique HOG features of each drawing, so as to realize the comparison of subsequent drawings.
基于高压电缆工艺化图纸本身复杂的特性,如对比度、颜色、密度分布的方法都具有局限性,为了获取更好的特征抽取和分类结果,深入研究现有的LBP算法。针对原算法的不足,利用全局及局部的像素灰度均差来决定自适应阈值的大小,使其对图纸识别有强的自适应性,提出一种自适应性阈值的LBP算法。Based on the complex characteristics of high-voltage cable process drawings, such as contrast, color, and density distribution, the existing LBP algorithm is studied in depth to obtain better feature extraction and classification results. In view of the shortcomings of the original algorithm, the global and local pixel grayscale mean difference is used to determine the size of the adaptive threshold, making it highly adaptive to drawing recognition, and an adaptive threshold LBP algorithm is proposed.
本实施例中运用的自适应模式的LBP算法将窗口大小与基础LBP算法相结合,具有自适应分析特征的性能。窗口大小由水平和垂直方向的平均强度差来决定。The adaptive mode LBP algorithm used in this embodiment combines the window size with the basic LBP algorithm, and has the performance of adaptive analysis characteristics. The window size is determined by the average intensity difference in the horizontal and vertical directions.
假设图像为g(x,y),计算大小为(2*2k)×(2*2k)的活动窗口中的像素平均强度值:Assuming the image is g(x, y), calculate the average pixel intensity value in the active window of size (2*2 k )×(2*2 k ):
式中,x为像素的横坐标,y为像素的纵坐标,Ak(x,y)为像素(x,y)在一个活动窗口中的像素平均强度值,g(i,j)为图像中坐标为(i,j)的像素值;Where x is the horizontal coordinate of the pixel, y is the vertical coordinate of the pixel, Ak (x,y) is the average pixel intensity value of the pixel (x,y) in an active window, and g(i,j) is the pixel value with coordinates (i,j) in the image;
对于每一个像素,分别计算它在水平和垂直方向上互不重叠的窗口之间的像素平均强度差:For each pixel, calculate the average pixel intensity difference between non-overlapping windows in the horizontal and vertical directions:
式中,Ek,h(x,y)为水平方向上两个互不重叠的窗口之间的像素平均强度差,Ek,v(x,y)为垂直方向上两个互不重叠的窗口之间的像素平均强度差;Where E k,h (x, y) is the average pixel intensity difference between two non-overlapping windows in the horizontal direction, and E k,v (x, y) is the average pixel intensity difference between two non-overlapping windows in the vertical direction;
对于每一个像素,能使Ek,h(x,y)或Ek,v(x,y)值达到最大(无论方向)的k值用来设置最佳尺寸:Sbest(x,y)=(2*2k)×(2*2k);For each pixel, the k value that maximizes the E k,h (x, y) or E k,v (x, y) value (regardless of direction) is used to set the best size: S best (x, y) = (2*2 k ) × (2*2 k );
以上可知,Sbest(x,y)即为以(x,y)为坐标的像素点的特征基元近似大小。该尺寸与LBP算法的结合,减少了LBP在基元特征提取上的误差。From the above, we can see that S best (x, y) is the approximate size of the feature primitive of the pixel with (x, y) as the coordinate. The combination of this size and the LBP algorithm reduces the error of LBP in primitive feature extraction.
2.2、OCR字符识别2.2 OCR Character Recognition
图片的字符识别过程是一整套流程,它包括图片分析、预处理、字符识别和识别矫正等,每个步骤都关系着最终识别结果的准确性。比如要进行字符识别的图片越清晰(即预处理做的越好),识别效果往往就越好。字符识别是图片的字符识别过程中最重要的环节﹒目前,最常用也最成熟的字符识别技术是光学字符识别(Optical CharacterRecognition,OCR)。OCR是针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。The character recognition process of an image is a complete set of procedures, which includes image analysis, preprocessing, character recognition and recognition correction, etc. Each step is related to the accuracy of the final recognition result. For example, the clearer the image to be recognized (that is, the better the preprocessing is), the better the recognition effect is. Character recognition is the most important part of the character recognition process of an image. At present, the most commonly used and most mature character recognition technology is optical character recognition (OCR). OCR is a technology that uses optical methods to convert the text in paper documents into black and white dot matrix image files for printed characters, and converts the text in the image into text format through recognition software for further editing and processing by word processing software.
在OCR识别过程中主要分为四个部分:The OCR recognition process is mainly divided into four parts:
1)图片预处理。该模块的功能主要是将样本图片进行尺寸统一、分割、灰度化和二值化等预处理,为后续的字符识别做准备。1) Image preprocessing: The main function of this module is to preprocess the sample images by unifying the size, segmenting, graying and binarizing them, in preparation for the subsequent character recognition.
2)训练字库。利用Tesseract对样本图片里的字符进行针对性训练,以提高识别准确率。2) Training the character library. Use Tesseract to conduct targeted training on the characters in the sample images to improve recognition accuracy.
3)字符识别。利用开源OCR引擎Tesseract对图片进行字符识别。在系统中实现对一张图片的字符识别只需调用pytesseract库里的image_to_string方法。详情如下式所示,3) Character recognition. Use the open source OCR engine Tesseract to perform character recognition on the image. To implement character recognition on an image in the system, you only need to call the image_to_string method in the pytesseract library. The details are shown in the following formula:
text=Pytesseract.image_to_string(img,lang=LANG,config='--psm7--oem3')text=Pytesseract.image_to_string(img, lang=LANG, config='--psm7--oem3')
其中,text就是识别后返回的字符内容;LANG是自训练的字库或者Tesseract自带语言包;img是预处理后的图片。Among them, text is the character content returned after recognition; LANG is the self-trained character library or Tesseract's own language package; img is the preprocessed image.
4)识别矫正。针对拒识或误识的图片字符进行矫正。对于灰度图,可以进行灰度调整,也就是对比度增强。以其中1张灰度图为例,实验发现增强前拒识,增强后则识别正确。4) Recognition correction. Correct the rejected or misrecognized characters in the image. For grayscale images, grayscale adjustment can be performed, that is, contrast enhancement. Taking one of the grayscale images as an example, the experiment found that it was rejected before enhancement, but correctly recognized after enhancement.
三、其他步骤3. Other steps
3.1、预处理和扫描过程3.1 Preprocessing and scanning process
预处理包括二值化处理、图片锐化处理和去噪处理。Preprocessing includes binarization, image sharpening and denoising.
扫描包括采用扫描仪扫描识别为电子版图纸。Scanning includes using a scanner to scan and recognize the drawings as electronic versions.
3.2、高压电缆附件工艺库3.2. High-voltage cable accessories process library
高压电缆附件工艺库模块利用扫描仪将现有的所有标准的电缆技术协议、工艺图纸以及相关文档通过图像的降噪以及增强技术将其转换成高清电子工艺图纸,并录入高压电缆附件工艺库进行存储以及固化工艺库模板,每个工艺图纸包含其标签、类型、时间及版本等重要信息。The high-voltage cable accessories process library module uses a scanner to convert all existing standard cable technical protocols, process drawings and related documents into high-definition electronic process drawings through image noise reduction and enhancement technology, and enters them into the high-voltage cable accessories process library for storage and solidification of process library templates. Each process drawing contains important information such as its label, type, time and version.
同时在本模块利用工艺库的关键信息例如编号、名称时间等能够快速查询检索。At the same time, this module can use key information of the process library such as number, name and time to quickly query and retrieve.
在电缆公司日程生产工作中需要涉及很多数据图纸资料,相关工作人员可以根据查询到的资料信息更快的找到工艺方案,确保工作的顺利开展,从而提高工作效率,保证了工艺工作进度。基于图像识别技术应用对高压电缆附件工艺图纸进行有效检索及存储,实现全面网络化、信息化管理,将所有档案资源进行全面共享,建立通用数据库,保障多项图纸资源得到全面管理、统一保管、规范化应用灯,突出高压电缆数字化工艺库应用价值。In the daily production work of cable companies, a lot of data and drawings are involved. Relevant staff can find the process plan faster based on the queried information to ensure the smooth progress of the work, thereby improving work efficiency and ensuring the progress of the process work. Based on the application of image recognition technology, the process drawings of high-voltage cable accessories are effectively retrieved and stored, and comprehensive networking and information management are realized. All archival resources are fully shared, and a general database is established to ensure that multiple drawing resources are fully managed, uniformly stored, and standardizedly applied, highlighting the application value of the digital process library of high-voltage cables.
3.3、比对报告3.3 Comparison Report
在数字化工艺图纸进行比对后,在报告中将识别出需识别图纸与标准图纸的比对结果,利用标识框告知审核人员,审核人员将借助比对报告实现高效的审核工作。借助该图纸比对审查报告实现全过程动态监管,促进图纸审查服务和设计质量的共同提升,可以审查信息与安全监督部门动态共享,避免无效施工图带来的安全隐患。After the digital process drawings are compared, the comparison results between the drawings to be identified and the standard drawings will be identified in the report, and the auditors will be informed of the comparison report using the identification box. The auditors will use the comparison report to achieve efficient audit work. With the help of this drawing comparison review report, dynamic supervision of the entire process can be achieved, promoting the joint improvement of drawing review services and design quality. The review information can be dynamically shared with the safety supervision department to avoid safety hazards caused by invalid construction drawings.
实施例2Example 2
本实施例提供一种基于高压电缆附件工艺库的图纸比对系统,为了实现高压电缆附件标准工艺的管控目的,确保运行电缆的安全和稳定应用,从签订技术协议的源头上进行管控和把关,建立标准的高压电缆附件工艺库,利用首台首套电缆附件工艺的仿真安装,实现工艺的鉴定和固化,在后期技术协议签订的工艺中,对其图纸的合规性和吻合度完成比对。This embodiment provides a drawing comparison system based on a high-voltage cable accessories process library. In order to achieve the purpose of controlling the standard process of high-voltage cable accessories and ensure the safe and stable application of operating cables, control and approval are carried out from the source of signing the technical agreement, a standard high-voltage cable accessories process library is established, and the simulation installation of the first set of cable accessories process is used to realize the identification and solidification of the process. In the process of signing the technical agreement in the later stage, the compliance and consistency of its drawings are compared.
本实施例主要研究通过建立高压电缆附件数字化工艺库系统,通过图像识别、文字识别、图纸比对的方式来发现图纸中的变更与差异,从而一定程度上提高工作效率带来更好的经济效益以及社会效益。This embodiment mainly studies the establishment of a digital process library system for high-voltage cable accessories, and discovers changes and differences in drawings through image recognition, text recognition, and drawing comparison, thereby improving work efficiency to a certain extent and bringing better economic and social benefits.
具体地,基于高压电缆附件工艺库的图纸比对系统包括:Specifically, the drawing comparison system based on the high-voltage cable accessories process library includes:
高压电缆附件工艺库模块,用于对高压电缆的工艺图纸进行扫描,预处理后,作为工艺图纸模板,并为每个工艺图纸附带标签信息,建立高压电缆附件工艺库,用于存储工艺图纸模板及其标签信息;The high-voltage cable accessory process library module is used to scan the process drawings of high-voltage cables, and after pre-processing, use them as process drawing templates, and attach label information to each process drawing to establish a high-voltage cable accessory process library for storing process drawing templates and their label information;
待比对图纸文件扫描模块,用于对待比对的高压电缆的工艺图纸进行扫描并预处理后,获取待比对图纸文件;The drawing file scanning module to be compared is used to scan and pre-process the process drawings of the high-voltage cables to be compared, and then obtain the drawing files to be compared;
相似图纸检测模块,用于将待比对图纸文件与各个工艺图纸模板进行图纸相似性检测,获取相似度最高的工艺图纸模板;A similar drawing detection module is used to perform drawing similarity detection on the drawing file to be compared with each process drawing template to obtain the process drawing template with the highest similarity;
图纸对比模块,用于从待比对图纸文件中分割出文字区域和视图区域,对文字区域进行定位、文本检测和识别,根据文字区域的文本识别结果,与相似度最高的工艺图纸模板对应的文本信息对比,判断是否存在差异;The drawing comparison module is used to segment the text area and view area from the drawing file to be compared, locate the text area, detect and recognize the text, and compare the text recognition results of the text area with the text information corresponding to the process drawing template with the highest similarity to determine whether there is a difference;
对视图区域进行标注尺寸的定位,对定位的标注尺寸进行字符检测和识别,根据视图区域的标注尺寸识别结果,与相似度最高的工艺图纸模板对应的标注尺寸信息对比,判断是否存在差异;Locate the dimension annotation of the view area, perform character detection and recognition on the located dimension annotation, and compare the dimension annotation information corresponding to the process drawing template with the highest similarity based on the dimension annotation recognition result of the view area to determine whether there is a difference;
若判断出现差异,则进行数据溯源,在待比对图纸文件中标记差异位置,否则输出该待比对图纸文件与工艺图纸模板相似的结果。If a difference is found, data tracing is performed and the difference position is marked in the drawing file to be compared. Otherwise, the result that the drawing file to be compared is similar to the process drawing template is output.
高压电缆附件工艺库模块中,预处理包括二值化处理、图片锐化处理和去噪处理。扫描包括采用扫描仪扫描识别为电子版图纸。In the high-voltage cable accessories process library module, preprocessing includes binarization, image sharpening and denoising. Scanning includes using a scanner to scan and identify the electronic version of the drawing.
下面对图纸对比过程、图纸相似性检测过程以及其他步骤分别进行描述。The drawing comparison process, drawing similarity detection process and other steps are described below.
一、图纸对比过程1. Drawing comparison process
本实施例采用文本行检测算法进行所述文本检测;This embodiment uses a text line detection algorithm to perform the text detection;
采用分割深度学习方法,从所述待比对图纸文件中分割出文字区域和视图区域;Using a segmentation deep learning method, segmenting the text area and the view area from the drawing file to be compared;
采用偏二叉树分类方法,将所述视图区域的标注尺寸定位结果和标注尺寸识别结果,与所述相似度最高的工艺图纸模板对应的视图区域标注尺寸定位信息和标注尺寸信息对比,判断是否存在差异。A partial binary tree classification method is adopted to compare the dimension positioning result and the dimension recognition result of the view area with the dimension positioning information and the dimension information of the view area corresponding to the process drawing template with the highest similarity to determine whether there is a difference.
采用Halcon机器视觉软件:HALCON是德国MVtec公司开发的一套完善的标准的机器视觉算法包,拥有应用广泛的机器视觉集成开发环境。HALCON支持Windows,Linux和MacOS X操作环境,整个函数库可以用C,C++,C#,Visual basic和Delphi等多种普通编程语言访问。Using Halcon machine vision software: HALCON is a complete set of standard machine vision algorithm packages developed by MVtec of Germany, with a widely used machine vision integrated development environment. HALCON supports Windows, Linux and MacOS X operating environments, and the entire function library can be accessed using a variety of common programming languages such as C, C++, C#, Visual basic and Delphi.
下面对分割深度学习方法和偏二叉树分类方法的原理进行描述。The principles of the segmentation deep learning method and the partial binary tree classification method are described below.
分割深度学习方法:采用基于深度卷积神经网络的目标检测算法实现图纸的分割,将图纸视图分割成多个子视图。目标检测算法采用基于回归的目标检测算法,该算法的目标检测速度相比采用其他方法的算法快很多。Segmentation deep learning method: Use a target detection algorithm based on a deep convolutional neural network to segment drawings and divide the drawing view into multiple sub-views. The target detection algorithm uses a regression-based target detection algorithm, which has a much faster target detection speed than other algorithms.
采用的检测算法将目标检测作为回归问题来处理,该网络模型接收原始像素直接输出目标的边界回归和类别。在图像特征提取方式上,该算法对图片的全局区域进行训练,速度加快的同时,能够更好的区分目标和背景。在网络预测方式上,在预测图片上采用端对端的检测,将整个图片分为S*S个网格区域,然后从每个网格中心建立多个先验框,这些框是网络预先设定好的框,网络的预测结果会判断这些框内是否包含物体,以及这个物体的种类。再获取预测框中心并计算出先验框的框高,得到整个预测框的位置。得到最终的预测结果后还要进行得分排序与非极大抑制筛选,取出每一类得分大于设定阈值的框和得分,利用框的位置和得分进行非极大值抑制,最后得出结果。The detection algorithm used treats target detection as a regression problem. The network model receives the original pixels and directly outputs the boundary regression and category of the target. In terms of image feature extraction, the algorithm trains the global area of the image, which can better distinguish the target and background while speeding up. In terms of network prediction, end-to-end detection is used on the predicted image, and the entire image is divided into S*S grid areas. Then, multiple prior frames are established from the center of each grid. These frames are pre-set by the network. The prediction results of the network will determine whether these frames contain objects and the type of the object. Then, the center of the predicted frame is obtained and the frame height of the prior frame is calculated to obtain the position of the entire predicted frame. After obtaining the final prediction result, score sorting and non-maximum suppression screening are performed. The frames and scores of each category with scores greater than the set threshold are taken out, and non-maximum suppression is performed using the position and score of the frame to finally obtain the result.
偏二叉树分类方法:偏二叉树方法是根据偏二叉树思想得出的方法,其方法主要是根据输入数组样本的特征参数,如数组长度等特征,得出各数组特征间的相似性测度。根据相似性测度的大小,以此构建多个分类器。从而实现数据的比较和分类。Partial binary tree classification method: The partial binary tree method is a method derived from the partial binary tree idea. The method mainly obtains the similarity measure between the features of each array based on the characteristic parameters of the input array samples, such as array length and other features. According to the size of the similarity measure, multiple classifiers are constructed to achieve data comparison and classification.
具体实施过程如下:The specific implementation process is as follows:
S1:进行图纸预处理;S1: Preprocess drawings;
S2:进行图纸相似性检测,获取相似度最高的工艺图纸模板;S2: Perform drawing similarity detection to obtain the process drawing template with the highest similarity;
S3:进行图像分割,实现区域分离并生成新的图像,包括文字区域图像和视图区域图像;S3: Perform image segmentation to separate regions and generate new images, including text region images and view region images;
S4:在视图区域图像中,进行标注尺寸定位;S4: Positioning the annotation size in the view area image;
S5:采用Text分割识别方法,识别标注尺寸;S5: Use the text segmentation and recognition method to identify the annotation size;
S6:将识别出的尺寸数组与工艺图纸模板对应的尺寸数组进行对比,具体为采用偏二叉树分类方法首先判断数据数量是否一致,若不一致,则存在差异,否则判断数据大小是否一致,若不一致则存在差异;S6: comparing the identified size array with the size array corresponding to the process drawing template, specifically, using a partial binary tree classification method to first determine whether the data quantity is consistent, if not, there is a difference, otherwise determine whether the data size is consistent, if not, there is a difference;
S7:在尺寸定位识别过程中提前获得尺寸的位置坐标,若存在差异,返还坐标并生成醒目的矩形框,标出差异。S7: In the process of dimension positioning and identification, the position coordinates of the dimension are obtained in advance. If there is a difference, the coordinates are returned and a striking rectangular frame is generated to mark the difference.
S8:若数据数量和大小均一致,则进行数据差异对比,并标注存在差异的地方。S8: If the number and size of the data are consistent, compare the data differences and mark the places where there are differences.
二、图纸相似性检测过程2. Drawing Similarity Detection Process
图纸相似性检测具体为通过活动窗口对比所述待比对图纸文件与工艺图纸模板,根据相同位置的活动窗口内的像素值对比,计算该位置的活动窗口的相似度,通过活动窗口进行遍历,获取待比对图纸文件与工艺图纸模板整体的相似性结果,选取相似度结果最高的工艺图纸模板作为最相似的工艺图纸模板。The drawing similarity detection is specifically to compare the drawing file to be compared with the process drawing template through the active window, calculate the similarity of the active window at the same position based on the pixel value comparison within the active window, traverse through the active window to obtain the overall similarity result of the drawing file to be compared and the process drawing template, and select the process drawing template with the highest similarity result as the most similar process drawing template.
优选地,在所述最相似的工艺图纸模板中,选取活动窗口的相似度低于相似度阈值的活动窗口位置,在待比对图纸文件中附加标识框。利用标识框告知审核人员,审核人员将借助比对报告实现高效的审核工作。Preferably, in the most similar process drawing template, the active window position whose similarity of the active window is lower than the similarity threshold is selected, and a mark frame is added to the drawing file to be compared. The mark frame is used to inform the auditor, and the auditor will use the comparison report to achieve efficient audit work.
优选地,图纸相似性检测还包括:Preferably, the drawing similarity detection further includes:
对待比对图纸文件进行特征识别,获取图纸文件特征图;Perform feature recognition on the drawing file to be compared, and obtain a feature map of the drawing file;
对图纸文件特征图进行字符识别,增强字符图案,获取字符增强图,用于图纸比对。Perform character recognition on the feature map of the drawing file, enhance the character pattern, and obtain the character enhanced map for drawing comparison.
具体地,本实施例使用图像识别技术、OCR字符识别技术以及一致性检测算法,通过高清扫描设备采用光电技术和数字处理技术对高压电缆图纸、技术协议等工艺图进行扫描进入工艺库,将电子版工艺图纸或工艺文档进行特征提取和字符识别、与标准工艺库里固化的工艺图纸模板进行比对,最终系统根据算法判断比对的结果从而形成比对报告以便后续审核。并且针对在不同厂家的高压电缆附件工艺图纸或文档上的修改情形进行分析,对图纸的删除、新增和修改部分进行系统提示,简化设计审核人员比对审核图纸的过程,减少错漏现象,提高工作效率。Specifically, this embodiment uses image recognition technology, OCR character recognition technology and consistency detection algorithm, and uses photoelectric technology and digital processing technology to scan high-voltage cable drawings, technical agreements and other process drawings into the process library through high-definition scanning equipment, extract features and character recognition of electronic process drawings or process documents, and compare them with the process drawing templates solidified in the standard process library. Finally, the system determines the comparison results according to the algorithm to form a comparison report for subsequent review. In addition, the modification of high-voltage cable accessory process drawings or documents from different manufacturers is analyzed, and the system prompts the deletion, addition and modification of the drawings, which simplifies the process of design reviewers comparing and reviewing drawings, reduces errors and omissions, and improves work efficiency.
下面对图纸相似性检测的新增步骤进行具体描述The following is a detailed description of the new steps for drawing similarity detection
2.1、根据图像识别技术的特征识别2.1. Feature recognition based on image recognition technology
在高压电缆附件数字化工艺库中的图像识别技术中,主要涉及到LBP算子和HOG算子等特征抽取及边缘检测算法。在本实施例中整个图像识别部分的流程包含图像预处理(图像降噪、图像增强)、图像复原(重建图像,恢复图像)、图像编码与压缩、图像分割(划分不同特征的区域)以及最终的识别。The image recognition technology in the digital process library of high-voltage cable accessories mainly involves feature extraction and edge detection algorithms such as LBP operator and HOG operator. In this embodiment, the entire image recognition process includes image preprocessing (image denoising, image enhancement), image restoration (reconstruction of image, restoration of image), image coding and compression, image segmentation (dividing regions with different features) and final recognition.
其中,LBP算子由ojala等人于96年提出,是一种特征描述的经典算子,广泛应用于图像分析领域,该算子不仅能捕获丰富的细节信息,而且能压缩冗余信息。但是当这种LBP算子的半径太大时,噪声的敏感度就会加强。梯度直方图HOG由法国研究人员Dalal提出。HOG算法的主要目的是将灰度化,归一化的图像进行梯度计算,统计图像的梯度信息,将图像划分成小的细胞单元形成每张图纸的独有的HOG特征,从而实现后续图纸的比对。Among them, the LBP operator was proposed by Ojala et al. in 1996. It is a classic operator for feature description and is widely used in the field of image analysis. This operator can not only capture rich detail information, but also compress redundant information. However, when the radius of this LBP operator is too large, the sensitivity to noise will be enhanced. The gradient histogram HOG was proposed by French researcher Dalal. The main purpose of the HOG algorithm is to calculate the gradient of the grayscale and normalized image, count the gradient information of the image, divide the image into small cell units to form the unique HOG features of each drawing, so as to realize the comparison of subsequent drawings.
基于高压电缆工艺化图纸本身复杂的特性,如对比度、颜色、密度分布的方法都具有局限性,为了获取更好的特征抽取和分类结果,深入研究现有的LBP算法。针对原算法的不足,利用全局及局部的像素灰度均差来决定自适应阈值的大小,使其对图纸识别有强的自适应性,提出一种自适应性阈值的LBP算法。Based on the complex characteristics of high-voltage cable process drawings, such as contrast, color, and density distribution, the existing LBP algorithm is studied in depth to obtain better feature extraction and classification results. In view of the shortcomings of the original algorithm, the global and local pixel grayscale mean difference is used to determine the size of the adaptive threshold, making it highly adaptive to drawing recognition, and an adaptive threshold LBP algorithm is proposed.
本实施例中运用的自适应模式的LBP算法将窗口大小与基础LBP算法相结合,具有自适应分析特征的性能。窗口大小由水平和垂直方向的平均强度差来决定。The adaptive mode LBP algorithm used in this embodiment combines the window size with the basic LBP algorithm, and has the performance of adaptive analysis characteristics. The window size is determined by the average intensity difference in the horizontal and vertical directions.
假设图像为g(x,y),计算大小为(2*2k)×(2*2k)的活动窗口中的像素平均强度值:Assuming the image is g(x, y), calculate the average pixel intensity value in the active window of size (2*2 k )×(2*2 k ):
式中,x为像素的横坐标,y为像素的纵坐标,Ak(x,y)为像素(x,y)在一个活动窗口中的像素平均强度值,g(i,j)为图像中坐标为(i,j)的像素值;Where x is the horizontal coordinate of the pixel, y is the vertical coordinate of the pixel, Ak (x,y) is the average pixel intensity value of the pixel (x,y) in an active window, and g(i,j) is the pixel value with coordinates (i,j) in the image;
对于每一个像素,分别计算它在水平和垂直方向上互不重叠的窗口之间的像素平均强度差:For each pixel, calculate the average pixel intensity difference between non-overlapping windows in the horizontal and vertical directions:
式中,Ek,h(x,y)为水平方向上两个互不重叠的窗口之间的像素平均强度差,Ek,v(x,y)为垂直方向上两个互不重叠的窗口之间的像素平均强度差;Where E k,h (x, y) is the average pixel intensity difference between two non-overlapping windows in the horizontal direction, and E k,v (x, y) is the average pixel intensity difference between two non-overlapping windows in the vertical direction;
对于每一个像素,能使Ek,h(x,y)或Ek,v(x,y)值达到最大(无论方向)的k值用来设置最佳尺寸:Sbest(x,y)=(2*2k)×(2*2k);For each pixel, the k value that can maximize the E k,h (x, y) or E k,v (x, y) value (regardless of direction) is used to set the best size: S best (x, y) = (2*2 k ) × (2*2 k );
以上可知,Sbest(x,y)即为以(x,y)为坐标的像素点的特征基元近似大小。该尺寸与LBP算法的结合,减少了LBP在基元特征提取上的误差。From the above, we can know that S best (x, y) is the approximate size of the feature primitive of the pixel with (x, y) as the coordinate. The combination of this size and the LBP algorithm reduces the error of LBP in primitive feature extraction.
2.2、OCR字符识别2.2 OCR Character Recognition
图片的字符识别过程是一整套流程,它包括图片分析、预处理、字符识别和识别矫正等,每个步骤都关系着最终识别结果的准确性。比如要进行字符识别的图片越清晰(即预处理做的越好),识别效果往往就越好。字符识别是图片的字符识别过程中最重要的环节﹒目前,最常用也最成熟的字符识别技术是光学字符识别(Optical CharacterRecognition,OCR)。OCR是针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。The character recognition process of an image is a complete set of processes, which includes image analysis, preprocessing, character recognition and recognition correction, etc. Each step is related to the accuracy of the final recognition result. For example, the clearer the image to be recognized (that is, the better the preprocessing is), the better the recognition effect is. Character recognition is the most important part of the character recognition process of an image. At present, the most commonly used and most mature character recognition technology is optical character recognition (OCR). OCR is a technology that uses optical methods to convert the text in paper documents into black and white dot matrix image files for printed characters, and converts the text in the image into text format through recognition software for further editing and processing by word processing software.
在OCR识别过程中主要分为四个部分:The OCR recognition process is mainly divided into four parts:
1)图片预处理。该模块的功能主要是将样本图片进行尺寸统一、分割、灰度化和二值化等预处理,为后续的字符识别做准备。1) Image preprocessing: The main function of this module is to preprocess the sample images by unifying the size, segmenting, graying and binarizing them, in preparation for the subsequent character recognition.
2)训练字库。利用Tesseract对样本图片里的字符进行针对性训练,以提高识别准确率。2) Training the character library. Use Tesseract to conduct targeted training on the characters in the sample images to improve recognition accuracy.
3)字符识别。利用开源OCR引擎Tesseract对图片进行字符识别。在系统中实现对一张图片的字符识别只需调用pytesseract库里的image_to_string方法。详情如下式所示,3) Character recognition. Use the open source OCR engine Tesseract to perform character recognition on the image. To implement character recognition on an image in the system, you only need to call the image_to_string method in the pytesseract library. The details are shown in the following formula:
text=pytesseract.image_to_string(img,lang=LANG,config='--psm7--oem3′)text=pytesseract.image_to_string(img,lang=LANG,config='--psm7--oem3′)
其中,text就是识别后返回的字符内容;LANG是自训练的字库或者Tesseract自带语言包;img是预处理后的图片。Among them, text is the character content returned after recognition; LANG is the self-trained character library or Tesseract's own language package; img is the preprocessed image.
4)识别矫正。针对拒识或误识的图片字符进行矫正。对于灰度图,可以进行灰度调整,也就是对比度增强。以其中1张灰度图为例,实验发现增强前拒识,增强后则识别正确。4) Recognition correction. Correct the rejected or misrecognized characters in the image. For grayscale images, grayscale adjustment can be performed, that is, contrast enhancement. Taking one of the grayscale images as an example, the experiment found that it was rejected before enhancement, but correctly recognized after enhancement.
三、其他步骤3. Other steps
3.1、预处理和扫描过程3.1 Preprocessing and scanning process
预处理包括二值化处理、图片锐化处理和去噪处理。Preprocessing includes binarization, image sharpening and denoising.
所述扫描包括采用扫描仪扫描识别为电子版图纸。The scanning includes using a scanner to scan and identify the drawings as electronic versions.
3.2、高压电缆附件工艺库3.2. High-voltage cable accessories process library
高压电缆附件工艺库模块利用扫描仪将现有的所有标准的电缆技术协议、工艺图纸以及相关文档通过图像的降噪以及增强技术将其转换成高清电子工艺图纸,并录入高压电缆附件工艺库进行存储以及固化工艺库模板,每个工艺图纸包含其标签、类型、时间及版本等重要信息。The high-voltage cable accessories process library module uses a scanner to convert all existing standard cable technical protocols, process drawings and related documents into high-definition electronic process drawings through image noise reduction and enhancement technology, and enters them into the high-voltage cable accessories process library for storage and solidification of process library templates. Each process drawing contains important information such as its label, type, time and version.
同时在本模块利用工艺库的关键信息例如编号、名称时间等能够快速查询检索。At the same time, this module can use key information of the process library such as number, name and time to quickly query and retrieve.
在电缆公司日程生产工作中需要涉及很多数据图纸资料,相关工作人员可以根据查询到的资料信息更快的找到工艺方案,确保工作的顺利开展,从而提高工作效率,保证了工艺工作进度。基于图像识别技术应用对高压电缆附件工艺图纸进行有效检索及存储,实现全面网络化、信息化管理,将所有档案资源进行全面共享,建立通用数据库,保障多项图纸资源得到全面管理、统一保管、规范化应用灯,突出高压电缆数字化工艺库应用价值。In the daily production work of cable companies, a lot of data and drawings are involved. Relevant staff can find the process plan faster based on the queried information to ensure the smooth progress of the work, thereby improving work efficiency and ensuring the progress of the process work. Based on the application of image recognition technology, the process drawings of high-voltage cable accessories are effectively retrieved and stored, and comprehensive networking and information management are realized. All archival resources are fully shared, and a general database is established to ensure that multiple drawing resources are fully managed, uniformly stored, and standardizedly applied, highlighting the application value of the digital process library of high-voltage cables.
3.3、比对报告3.3 Comparison Report
在数字化工艺图纸进行比对后,在报告中将识别出需识别图纸与标准图纸的比对结果,利用标识框告知审核人员,审核人员将借助比对报告实现高效的审核工作。借助该图纸比对审查报告实现全过程动态监管,促进图纸审查服务和设计质量的共同提升,可以审查信息与安全监督部门动态共享,避免无效施工图带来的安全隐患。After the digital process drawings are compared, the comparison results between the drawings to be identified and the standard drawings will be identified in the report, and the auditors will be informed of the comparison report using the identification box. The auditors will use the comparison report to achieve efficient audit work. With the help of this drawing comparison review report, dynamic supervision of the entire process can be achieved, promoting the joint improvement of drawing review services and design quality. The review information can be dynamically shared with the safety supervision department to avoid safety hazards caused by invalid construction drawings.
上述系统的硬件结构可以为以下几种形式:The hardware structure of the above system can be in the following forms:
1、整体结构1. Overall structure
高压电缆附件数字化工艺库系统包括:The digital process library system for high-voltage cable accessories includes:
一个或多个处理器;one or more processors;
存储器;和Memory; and
被存储在存储器中的一个或多个程序,所述一个或多个程序包括用于执行上述各个模块的数据处理步骤。One or more programs stored in the memory, wherein the one or more programs include data processing steps for executing the above-mentioned modules.
2、模块式结构2. Modular structure
高压电缆附件数字化工艺库系统包括:The digital process library system for high-voltage cable accessories includes:
高压电缆附件工艺库模块、待比对图纸文件扫描模块和图纸对比模块;High-voltage cable accessories process library module, drawing file scanning module to be compared and drawing comparison module;
给模块均为单独的硬件结构,即各模块均包括存储器和处理器,Each module has a separate hardware structure, that is, each module includes a memory and a processor.
所述存储器存储有计算机程序,处理器调用所述计算机程序执行上述每个模块对应的的数据处理步骤。The memory stores a computer program, and the processor calls the computer program to execute the data processing steps corresponding to each of the above modules.
本实施例通过建立一个基于高压电缆附件工艺库的图纸比对系统,研究了对高压电缆数字化图纸特征处理方法,实现对工艺图特征的提取和识别,采用SQL数据库技术建立了高压电缆历史工艺图纸及技术协议库,提供了规范化、标准化的技术协议以及工艺图纸的审核及鉴别流程。This embodiment establishes a drawing comparison system based on the high-voltage cable accessories process library, studies the feature processing method of the high-voltage cable digital drawing, realizes the extraction and identification of the process drawing features, and uses SQL database technology to establish a high-voltage cable historical process drawing and technical protocol library, providing standardized and standardized technical protocols and process drawing review and identification procedures.
相较于传统高压电缆附件工艺资料的审核流程,本系统的优势在于:Compared with the traditional high-voltage cable accessories process data review process, the advantages of this system are:
1、相比传统图纸,数字化工艺图纸能够利用图像降噪及增强技术实现提高图纸清晰度和显示效果的目的,从而可以使字迹图案更清楚,更好地为工作人员展示设计图纸的效果。1. Compared with traditional drawings, digital process drawings can use image noise reduction and enhancement technology to improve the clarity and display effect of drawings, so that the handwriting patterns can be clearer and the effects of design drawings can be better displayed to the staff.
2、相比传统的人工审核以及管理,数字化工艺库管理更便捷且安全可靠,减少了生产人员手工翻阅纸质资料、统计汇总及资料存储管理的工作,真正达到了提高工艺图纸的管理审核工作效率、降低成本的目的。通过长期使用在生产运行维护、人工管理等方面的成本都大幅下降,为公司创造更好的经济效益。2. Compared with traditional manual review and management, digital process library management is more convenient, safe and reliable, reducing the work of manual review of paper materials, statistical summary and data storage management by production personnel, truly achieving the purpose of improving the management and review efficiency of process drawings and reducing costs. Through long-term use, the costs of production operation maintenance, manual management, etc. have been greatly reduced, creating better economic benefits for the company.
3、本方案可以实现高压电缆附件全过程动态监管,实时保证工艺库的标准化,大幅减少非标准的工艺图带来的风险成本及安全隐患。3. This solution can realize dynamic supervision of the entire process of high-voltage cable accessories, ensure the standardization of the process library in real time, and significantly reduce the risk costs and safety hazards brought by non-standard process drawings.
4、高压电缆附件数字化工艺库具有一定的自学习功能。随着录入的工艺图纸与技术协议的不断增多,工艺数据库不断扩充,系统功能将不断完善和改进,届时将可推广提升高压电缆专业相关资料文件的存储及审核管理工作,应用前景良好。4. The digital process library of high-voltage cable accessories has a certain self-learning function. With the continuous increase of process drawings and technical agreements entered, the process database continues to expand, and the system functions will continue to be improved and improved. By then, it will be possible to promote and enhance the storage and review management of relevant data files of high-voltage cable professionals, and the application prospects are good.
5、通过对所有高压电缆附件的全面审核大幅提升工作效率,审核维度全面科学,具有技术支持和逻辑设计,使审核流程简洁高效且可靠。5. Through the comprehensive review of all high-voltage cable accessories, the work efficiency is greatly improved. The review dimensions are comprehensive and scientific, with technical support and logical design, making the review process simple, efficient and reliable.
6、采用LBP算法对高压电缆附件图纸进行特征识别时,发现了现有的LBP算法对复杂的高压电缆附件图纸进行处理时,具有局限性;6. When using the LBP algorithm to perform feature recognition on high-voltage cable accessory drawings, it was found that the existing LBP algorithm has limitations when processing complex high-voltage cable accessory drawings;
因此提出一种自适应阈值的活动窗口选取方案,获取使得水平和垂直方向上互不重叠的窗口之间的像素平均强度差最大的窗口大小,与LBP算法的结合,能减少LBP在基元特征提取上的误差。Therefore, an active window selection scheme with adaptive threshold is proposed to obtain the window size that maximizes the average pixel intensity difference between non-overlapping windows in the horizontal and vertical directions. Combining it with the LBP algorithm can reduce the error of LBP in primitive feature extraction.
以上详细描述了本发明的较佳具体实施例。应当理解,本领域的普通技术人员无需创造性劳动就可以根据本发明的构思做出诸多修改和变化。因此,凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案,皆应在由权利要求书所确定的保护范围内。The preferred specific embodiments of the present invention are described in detail above. It should be understood that a person skilled in the art can make many modifications and changes based on the concept of the present invention without creative work. Therefore, any technical solution that can be obtained by a person skilled in the art through logical analysis, reasoning or limited experiments based on the concept of the present invention on the basis of the prior art should be within the scope of protection determined by the claims.
Claims (15)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111204535.7A CN114004984B (en) | 2021-10-15 | 2021-10-15 | A method and system for comparing drawings of high-voltage cable accessories process library |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111204535.7A CN114004984B (en) | 2021-10-15 | 2021-10-15 | A method and system for comparing drawings of high-voltage cable accessories process library |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114004984A CN114004984A (en) | 2022-02-01 |
CN114004984B true CN114004984B (en) | 2024-08-27 |
Family
ID=79923084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111204535.7A Active CN114004984B (en) | 2021-10-15 | 2021-10-15 | A method and system for comparing drawings of high-voltage cable accessories process library |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114004984B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115100657B (en) * | 2022-02-22 | 2025-07-22 | 云南电网有限责任公司保山供电局 | Character and strip width line identification method for electrical CAD drawing scanned image |
CN114694165B (en) * | 2022-06-01 | 2023-05-09 | 济南大学 | A method for intelligent identification and redrawing of PID drawings |
CN115171111B (en) * | 2022-08-08 | 2023-11-17 | 江苏满锐精密工具有限公司 | Drawing annotation recognition method for metal cutting tool |
TWI839304B (en) * | 2023-09-15 | 2024-04-11 | 中國信託商業銀行股份有限公司 | File comparison method and system |
CN119152538A (en) * | 2024-11-15 | 2024-12-17 | 广东电网有限责任公司 | Power grid drawing difference identification method, device, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361336A (en) * | 2014-11-26 | 2015-02-18 | 河海大学 | Character recognition method for underwater video images |
CN113221752A (en) * | 2021-05-13 | 2021-08-06 | 北京惠朗时代科技有限公司 | Multi-template matching-based multi-scale character accurate identification method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8995774B1 (en) * | 2013-09-19 | 2015-03-31 | IDChecker, Inc. | Automated document recognition, identification, and data extraction |
-
2021
- 2021-10-15 CN CN202111204535.7A patent/CN114004984B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104361336A (en) * | 2014-11-26 | 2015-02-18 | 河海大学 | Character recognition method for underwater video images |
CN113221752A (en) * | 2021-05-13 | 2021-08-06 | 北京惠朗时代科技有限公司 | Multi-template matching-based multi-scale character accurate identification method |
Also Published As
Publication number | Publication date |
---|---|
CN114004984A (en) | 2022-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114004984B (en) | A method and system for comparing drawings of high-voltage cable accessories process library | |
US6996295B2 (en) | Automatic document reading system for technical drawings | |
CN111626146B (en) | Merging cell table segmentation recognition method based on template matching | |
CN111401372A (en) | Method for extracting and identifying image-text information of scanned document | |
CN114663904B (en) | A PDF document layout detection method, device, equipment and medium | |
CN107247950A (en) | A kind of ID Card Image text recognition method based on machine learning | |
US6597808B1 (en) | User drawn circled region extraction from scanned documents | |
CN109446345A (en) | Nuclear power file verification processing method and system | |
CN115424282A (en) | A method and system for recognizing unstructured text tables | |
CN106169080A (en) | A kind of combustion gas index automatic identifying method based on image | |
US20230048495A1 (en) | Method and platform of generating document, electronic device and storage medium | |
CN113673528B (en) | Text processing method, text processing device, electronic equipment and readable storage medium | |
CN117437647B (en) | Oracle bone text detection method based on deep learning and computer vision | |
CN116740723A (en) | A PDF document recognition method based on the open source Paddle framework | |
WO2021218183A1 (en) | Certificate edge detection method and apparatus, and device and medium | |
CN113112567A (en) | Method and device for generating editable flow chart, electronic equipment and storage medium | |
CN113780116A (en) | Invoice classification method, apparatus, computer equipment and storage medium | |
CN116893162A (en) | Rare anti-nuclear antibody karyotype detection method based on YOLO and attention neural network | |
CN110502605B (en) | LCC cost collection system of power assets based on artificial intelligence technology | |
CN112232390A (en) | Method and system for identifying high-pixel large image | |
CN115810197A (en) | Multi-mode electric power form recognition method and device | |
CN113538291B (en) | Card image inclination correction method, device, computer equipment and storage medium | |
CN114565749A (en) | Method and system for identifying key content of visa document of power construction site | |
Bhatt et al. | Text Extraction & Recognition from Visiting Cards | |
CN112507999B (en) | Non-invasive user interface input item identification method based on visual characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |