CN104376300A - Identification method used for intelligent matching of incomplete Chinese characters on basis of grid characteristics - Google Patents
Identification method used for intelligent matching of incomplete Chinese characters on basis of grid characteristics Download PDFInfo
- Publication number
- CN104376300A CN104376300A CN201410607290.6A CN201410607290A CN104376300A CN 104376300 A CN104376300 A CN 104376300A CN 201410607290 A CN201410607290 A CN 201410607290A CN 104376300 A CN104376300 A CN 104376300A
- Authority
- CN
- China
- Prior art keywords
- incomplete
- chinese character
- submatrix
- grid
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/245—Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Character Discrimination (AREA)
Abstract
Description
技术领域 technical field
本发明涉及一种基于网格特征智能匹配残缺汉字的识别方法。 The invention relates to a recognition method for intelligently matching incomplete Chinese characters based on grid features.
背景技术 Background technique
如今,碎纸复原技术在司法物证复原、历史文献修复以及军事情报获取等重要领域都有着重大作用。在对隐私信息进行处理时,也要将碎纸复原技术考虑在内。 Today, shredded paper recovery technology plays an important role in important fields such as judicial evidence recovery, historical document restoration, and military intelligence acquisition. Shredded paper recovery technology should also be taken into consideration when processing private information.
如图1和图2所示,现在的碎纸复原技术主要是使用一种拼接算法,将汉字按像素点以矩阵形式存储,根据纸片边距和汉字匹配程度进行碎纸还原。虽然此方法科学易实现,但是由机器进行识别匹配,行列拼接均有出错的情况,最终会导致无法对汉字进行识别的问题。 As shown in Figure 1 and Figure 2, the current shredded paper restoration technology mainly uses a splicing algorithm to store Chinese characters in matrix form by pixel, and restore shredded paper according to the matching degree of paper margins and Chinese characters. Although this method is scientific and easy to implement, there are errors in the recognition and matching by the machine, and in the splicing of rows and columns, which will eventually lead to the problem that Chinese characters cannot be recognized.
发明内容 Contents of the invention
本发明的目的在于克服现有技术的不足,提供一种基于网格特征智能匹配残缺汉字的识别方法,解决碎纸复原技术虽然由机器进行识别匹配,但行列拼接均有出错的情况导致最终无法对残缺汉字进行识别的问题。 The purpose of the present invention is to overcome the deficiencies of the prior art, to provide a recognition method for intelligently matching incomplete Chinese characters based on grid features, and to solve the problem that although the shredded paper recovery technology is recognized and matched by a machine, errors in the splicing of rows and columns lead to failure in the end. The problem of recognizing incomplete Chinese characters.
本发明的目的是通过以下技术方案来实现的:一种基于网格特征智能匹配残缺汉字的识别方法,包括以下步骤: The purpose of the present invention is achieved through the following technical solutions: a recognition method based on grid features intelligently matching incomplete Chinese characters, comprising the following steps:
S1:将碎纸复原图转化成0-1矩阵; S1: Convert the shredded paper recovery map into a 0-1 matrix;
S2:根据图像位置定位规则,用完整字大小(大小取决于图像中平均字大小)的子矩阵逐行逐列的循环方法来定位汉字的图像位置; S2: According to the image position positioning rules, use the sub-matrix of the complete character size (the size depends on the average character size in the image) to locate the image position of the Chinese character row by row and column by row;
S3:将步骤S2中得到的残缺汉字进行网格分块分成子矩阵,提取特征; S3: dividing the incomplete Chinese characters obtained in step S2 into sub-matrix by grid block, and extracting features;
S4:分别对残缺汉字网格分割后的每一网格子矩阵特征通过标准词库进行智能匹配识别。 S4: Carry out intelligent matching and recognition on the features of each grid sub-matrix after grid segmentation of the incomplete Chinese characters through the standard lexicon.
步骤S1采用MATLAB软件对碎纸复原图进行转化。 Step S1 uses MATLAB software to convert the shredded paper restoration map.
步骤S2中所述的图像位置定位规则包括: The image location positioning rules described in step S2 include:
(1)如果在完整字大小的子矩阵中含有宽/长等于一个字大小的,则确定一个残缺字,同时记录位置; (1) If there is a sub-matrix with a width/length equal to the size of a word in the sub-matrix of the complete word size, determine an incomplete word and record the position at the same time;
(2)如果完整字大小的子矩阵中含有宽/长大于一个字大小的,则确定为1个残缺字,同时记录位置,并且再分别从左右/上下两个反方向循环,再确定一个残缺字,同时记录位置; (2) If there is a sub-matrix with a width/length greater than one word size in the complete word size sub-matrix, it will be determined as an incomplete word, and the position will be recorded at the same time, and then cycle from left to right/up and down respectively, and then an incomplete character will be determined word, and record the position at the same time;
(3)如果完整字大小的子矩阵中含有宽/长少于一个字大小的,确定为1个残缺字,同时记录位置。 (3) If the sub-matrix of the complete word size contains a sub-matrix whose width/length is less than one word size, it is determined as one incomplete word, and the position is recorded at the same time.
所述的步骤S3包括以下子步骤: Described step S3 comprises the following sub-steps:
S31:按照残缺汉字大小,将残缺汉字分成多个子矩阵; S31: divide the incomplete Chinese characters into multiple sub-matrices according to the size of the incomplete Chinese characters;
S32:对每个子矩阵分别用小波函数分析提取这多个子矩阵图片的多个参数矩阵,将这多个参数矩阵一起作为该残缺字的特征。 S32: Analyze each sub-matrix by wavelet function to extract multiple parameter matrices of the multiple sub-matrix pictures, and use the multiple parameter matrices together as the feature of the incomplete character.
一种基于网格特征智能匹配残缺汉字的识别方法还包括一个建立标准词库子步骤:将每一个完整汉字的每种字号,分别进行网格分解,得到标准特征的多个子矩形及其多个参数矩阵,确定一个完整汉字的特征值。 A recognition method for intelligently matching incomplete Chinese characters based on grid features also includes a sub-step of establishing a standard lexicon: decomposing each font size of each complete Chinese character into grids to obtain multiple sub-rectangles of standard features and their multiple Parameter matrix to determine the eigenvalues of a complete Chinese character.
所述的子矩阵为2*2大小的子矩阵。 The sub-matrix is a sub-matrix with a size of 2*2.
所述的参数矩阵包括垂直属性、水平属性和对角属性的3个参数矩阵。 The parameter matrix includes three parameter matrices of vertical attribute, horizontal attribute and diagonal attribute.
所述的多种字号为10号字至22号字之间的8种字号。 The multiple font sizes mentioned are 8 font sizes between the 10th and the 22nd.
所述的步骤S4包括以下子步骤: Described step S4 comprises the following sub-steps:
S41:将步骤S3得到的多个网格子矩阵与标准词库中每一个完整汉字的标准特征矩阵进行比较; S41: Comparing the plurality of grid sub-matrices obtained in step S3 with the standard feature matrix of each complete Chinese character in the standard lexicon;
S42:如果相似度大于某一比例,就判定该残缺字为词库中的这个完整的字。 S42: If the similarity is greater than a certain ratio, determine that the incomplete character is the complete character in the thesaurus.
步骤S42所述的某一比例为百分之五十。 A certain ratio described in step S42 is 50%.
本发明的有益效果是:本发明首先将碎纸复原图转化成0-1矩阵,再根据图像位置定位规则,用完整字大小的子矩阵逐行逐列的循环方法来定位汉字的图像位置,判断其是否可能是一个残缺的字,有可能是字的话将其保存,然后通过基于小波函数提取汉字特征向量来实现与词库中的汉字识别。本发明解决碎纸复原技术虽然由机器进行识别匹配,但行列拼接均有出错的情况导致最终无法对残缺汉字进行识别的问题,提供一种残缺汉字识别方法。 The beneficial effects of the present invention are: firstly, the present invention converts the shredded paper restoration image into a 0-1 matrix, and then uses the sub-matrix of the complete character size to locate the image position of the Chinese character row by row according to the image position positioning rule. Judging whether it may be an incomplete character, if possible, save it, and then extract the Chinese character feature vector based on the wavelet function to realize the recognition of the Chinese character in the thesaurus. The invention solves the problem that although the shredded paper recovery technology is recognized and matched by a machine, the incomplete Chinese characters cannot be recognized due to errors in the splicing of rows and columns, and provides a method for recognizing incomplete Chinese characters. the
附图说明 Description of drawings
图1为商务函电样本图; Figure 1 is a sample diagram of business correspondence;
图2为样本碎纸复原效果图; Figure 2 is the restoration effect diagram of the sample shredded paper;
图3为本发明方法流程图。 Fig. 3 is a flow chart of the method of the present invention.
具体实施方式 Detailed ways
下面结合附图进一步详细描述本发明的技术方案:如图3所示,一种基于网格特征智能匹配残缺汉字的识别方法,包括以下步骤: The technical solution of the present invention is further described in detail below in conjunction with the accompanying drawings: as shown in Figure 3, a kind of recognition method based on grid feature intelligently matching incomplete Chinese characters comprises the following steps:
S1:将碎纸复原图转化成0-1矩阵; S1: Convert the shredded paper recovery map into a 0-1 matrix;
S2:用完整字大小(大小取决于图像中平均字大小)的子矩阵逐行逐列的循环方法来定位汉字的图像位置; S2: Use the sub-matrix of the full word size (the size depends on the average word size in the image) to locate the image position of the Chinese character row by row and column by row;
S3:将步骤S2中得到的残缺汉字进行网格分块分成子矩阵,提取特征; S3: dividing the incomplete Chinese characters obtained in step S2 into sub-matrix by grid block, and extracting features;
S4:分别对残缺汉字网格分割后的每一网格子矩阵特征通过标准词库进行智能匹配识别。 S4: Carry out intelligent matching and recognition on the features of each grid sub-matrix after grid segmentation of the incomplete Chinese characters through the standard lexicon.
步骤S1采用MATLAB软件对碎纸复原图进行转化。 Step S1 uses MATLAB software to convert the shredded paper restoration map.
步骤S2中所述的定位汉字的图象位置的规则包括以下子步骤: The rule of the image position of the positioning Chinese character described in the step S2 comprises the following substeps:
S21:如果在完整字大小的子矩阵中含有宽/长等于一个字大小的,则确定一个残缺字,同时记录位置; S21: If there is a sub-matrix with a width/length equal to the size of a word in the sub-matrix of the full word size, determine an incomplete word and record the position at the same time;
S22:如果完整字大小的子矩阵中含有宽/长大于一个字大小的,则确定为1个残缺字,同时记录位置,并且再分别从左右/上下两个反方向循环,再确定一个残缺字,同时记录位置; S22: If the sub-matrix of the full word size has a width/length greater than one word size, determine it as an incomplete character, record the position at the same time, and cycle from left to right/up and down respectively, and then determine an incomplete character , and record the position at the same time;
S23:如果完整字大小的子矩阵中含有宽/长少于一个字大小的,确定为1个残缺字,同时记录位置。 S23: If the sub-matrix of the complete word size contains a sub-matrix whose width/length is less than one word size, determine it as one incomplete word, and record the position at the same time.
所述的步骤S3包括以下子步骤: Described step S3 comprises the following sub-steps:
S31:按照残缺汉字大小,将残缺汉字分成多个子矩阵; S31: divide the incomplete Chinese characters into multiple sub-matrices according to the size of the incomplete Chinese characters;
S32:对每个子矩阵分别用小波函数分析提取这多个子矩阵图片的多个参数矩阵,将这多个参数矩阵一起作为该残缺字的特征。 S32: Analyze each sub-matrix by wavelet function to extract multiple parameter matrices of the multiple sub-matrix pictures, and use the multiple parameter matrices together as the feature of the incomplete character.
一种基于网格特征智能匹配残缺汉字的识别方法还包括一个建立标准词库子步骤:将每一个完整汉字的每种字号,分别进行网格分解,得到标准特征的多个子矩形及其多个参数矩阵,确定一个完整汉字的特征值。 A recognition method for intelligently matching incomplete Chinese characters based on grid features also includes a sub-step of establishing a standard lexicon: decomposing each font size of each complete Chinese character into grids to obtain multiple sub-rectangles of standard features and their multiple Parameter matrix to determine the eigenvalues of a complete Chinese character.
所述的子矩阵为2*2大小的子矩阵。 The sub-matrix is a sub-matrix with a size of 2*2.
所述的参数矩阵包括垂直属性、水平属性和对角属性的3个参数矩阵。 The parameter matrix includes three parameter matrices of vertical attribute, horizontal attribute and diagonal attribute.
所述的多种字号为10号字至22号字之间的8种字号。 The multiple font sizes mentioned are 8 font sizes between the 10th and the 22nd.
所述的步骤S4包括以下子步骤: Described step S4 comprises the following sub-steps:
S41:将步骤S3得到的多个网格子矩阵与标准词库中每一个完整汉字的标准特征矩阵进行比较; S41: Comparing the plurality of grid sub-matrices obtained in step S3 with the standard feature matrix of each complete Chinese character in the standard lexicon;
S42:如果相似度大于某一比例,就判定该残缺字为词库中的这个完整的字。 S42: If the similarity is greater than a certain ratio, determine that the incomplete character is the complete character in the thesaurus.
步骤S42所述的某一比例为百分之五十。 A certain ratio described in step S42 is 50%. the
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410607290.6A CN104376300B (en) | 2014-11-03 | 2014-11-03 | A kind of recognition methods based on grid search-engine intelligent Matching incompleteness Chinese character |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410607290.6A CN104376300B (en) | 2014-11-03 | 2014-11-03 | A kind of recognition methods based on grid search-engine intelligent Matching incompleteness Chinese character |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN104376300A true CN104376300A (en) | 2015-02-25 |
| CN104376300B CN104376300B (en) | 2018-01-30 |
Family
ID=52555198
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201410607290.6A Expired - Fee Related CN104376300B (en) | 2014-11-03 | 2014-11-03 | A kind of recognition methods based on grid search-engine intelligent Matching incompleteness Chinese character |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN104376300B (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105069766A (en) * | 2015-07-24 | 2015-11-18 | 北京航空航天大学 | Inscription restoration method based on contour feature description of Chinese character image |
| CN109447058A (en) * | 2018-09-10 | 2019-03-08 | 昆明理工大学 | A kind of incomplete Chinese characters recognition method based on the partitioning of matrix |
| CN110019418A (en) * | 2018-01-02 | 2019-07-16 | 中国移动通信有限公司研究院 | Object factory method and device, mark system, electronic equipment and storage medium |
| CN116029939A (en) * | 2023-02-24 | 2023-04-28 | 汤毅超 | An Image Restoration Method Based on Image Detection and Region Extraction |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030086617A1 (en) * | 2001-10-25 | 2003-05-08 | Jer-Chuan Huang | Triangle automatic matching method |
| CN101286202A (en) * | 2008-05-23 | 2008-10-15 | 中南民族大学 | Multi-font multi- letter size print form charater recognition method based on 'Yi' character set |
| CN102750556A (en) * | 2012-06-01 | 2012-10-24 | 山东大学 | Off-line handwritten form Chinese character recognition method |
-
2014
- 2014-11-03 CN CN201410607290.6A patent/CN104376300B/en not_active Expired - Fee Related
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030086617A1 (en) * | 2001-10-25 | 2003-05-08 | Jer-Chuan Huang | Triangle automatic matching method |
| CN101286202A (en) * | 2008-05-23 | 2008-10-15 | 中南民族大学 | Multi-font multi- letter size print form charater recognition method based on 'Yi' character set |
| CN102750556A (en) * | 2012-06-01 | 2012-10-24 | 山东大学 | Off-line handwritten form Chinese character recognition method |
Non-Patent Citations (2)
| Title |
|---|
| 何耘娴: "印刷体文档图像的中文字符识别", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
| 甘恒: "基于笔划密度特征的二叉树SVM脱机手写体汉字识别方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105069766A (en) * | 2015-07-24 | 2015-11-18 | 北京航空航天大学 | Inscription restoration method based on contour feature description of Chinese character image |
| CN105069766B (en) * | 2015-07-24 | 2017-12-08 | 北京航空航天大学 | A kind of an inscription on a tablet restorative procedure based on the description of Chinese character image contour feature |
| CN110019418A (en) * | 2018-01-02 | 2019-07-16 | 中国移动通信有限公司研究院 | Object factory method and device, mark system, electronic equipment and storage medium |
| CN110019418B (en) * | 2018-01-02 | 2021-09-14 | 中国移动通信有限公司研究院 | Object description method and device, identification system, electronic equipment and storage medium |
| CN109447058A (en) * | 2018-09-10 | 2019-03-08 | 昆明理工大学 | A kind of incomplete Chinese characters recognition method based on the partitioning of matrix |
| CN109447058B (en) * | 2018-09-10 | 2022-04-12 | 昆明理工大学 | A Method for Recognition of Incomplete Chinese Characters Based on Matrix Blocking |
| CN116029939A (en) * | 2023-02-24 | 2023-04-28 | 汤毅超 | An Image Restoration Method Based on Image Detection and Region Extraction |
Also Published As
| Publication number | Publication date |
|---|---|
| CN104376300B (en) | 2018-01-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111401371B (en) | Text detection and identification method and system and computer equipment | |
| CN107491752B (en) | A method and device for recognition of ship license plate characters in natural scenes based on deep learning | |
| CN108805076B (en) | Method and system for extracting table characters of environmental impact evaluation report | |
| CN104966081B (en) | Spine Image Recognition Method | |
| CN104036280A (en) | Video fingerprinting method based on region of interest and cluster combination | |
| CN105095857B (en) | Human face data Enhancement Method based on key point perturbation technique | |
| CN103295009B (en) | Based on the license plate character recognition method of Stroke decomposition | |
| CN107301414B (en) | Chinese positioning, segmenting and identifying method in natural scene image | |
| CN103617413B (en) | Method for identifying object in image | |
| CN104376300A (en) | Identification method used for intelligent matching of incomplete Chinese characters on basis of grid characteristics | |
| CN104239872A (en) | Abnormal Chinese character identification method | |
| CN104063701B (en) | Fast electric television stations TV station symbol recognition system and its implementation based on SURF words trees and template matches | |
| Wu et al. | Document layout analysis via dynamic residual feature fusion | |
| Xing et al. | A recover-then-discriminate framework for robust anomaly detection | |
| CN105574535A (en) | Graphic symbol identification method based on indirect distance angle histogram space relation expression model | |
| TWI430187B (en) | License plate number identification method | |
| Wu et al. | High-order diversity feature learning for pedestrian attribute recognition | |
| Gao et al. | Learning co-occurrence strokes for scene character recognition based on spatiality embedded dictionary | |
| CN105956606A (en) | Method for re-identifying pedestrians on the basis of asymmetric transformation | |
| CN104794210A (en) | Image retrieval method combining visual saliency and phrases | |
| Guo et al. | CLANet: A Contrastive Learning based Attention Network for Image Forgery Detection | |
| CN103218613B (en) | Handwritten Numeral Recognition Method and device | |
| CN108734174A (en) | A kind of complex background image conspicuousness detection method based on low-rank representation | |
| CN113554026A (en) | Power equipment nameplate identification method, identification device and electronic equipment | |
| Ghasemi Yegane et al. | Copy-move forgery detection using fast retina keypoint (FREAK) descriptor |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180130 Termination date: 20181103 |