[go: up one dir, main page]

CN113780154A - Image recognition system of standardized document - Google Patents

Image recognition system of standardized document Download PDF

Info

Publication number
CN113780154A
CN113780154A CN202111048036.3A CN202111048036A CN113780154A CN 113780154 A CN113780154 A CN 113780154A CN 202111048036 A CN202111048036 A CN 202111048036A CN 113780154 A CN113780154 A CN 113780154A
Authority
CN
China
Prior art keywords
unit
image
document
template
standardized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111048036.3A
Other languages
Chinese (zh)
Inventor
耿峰
彭明齐
周振泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Songxin Intelligent Technology Co ltd
Original Assignee
Shanghai Songxin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Songxin Intelligent Technology Co ltd filed Critical Shanghai Songxin Intelligent Technology Co ltd
Priority to CN202111048036.3A priority Critical patent/CN113780154A/en
Publication of CN113780154A publication Critical patent/CN113780154A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Character Input (AREA)

Abstract

本申请涉及一种标准化文档的图像识别系统,涉及图像识别技术领域,其包括控制单元、图像录入单元、图像识别单元、信息提取单元、图像替换单元、输出单元;图像录入单元构建标准化文档的模版;控制单元储存图像录入单元构建的标准化文档模版;图像识别单元对图像中的文字进行识别;信息提取单元将图像识别单元识别的文字与控制单元内标准化文档的模版进行比对;信息提取单元在控制单元内比对到对应的标准化文档的模版,图像替换单元将识别的文字录入控制单元储存的标准化文档的模版中;输出单元将图像转换成文档输出。本申请具有省去人工构建规则提取关键信息的方法对不同类型的文档都需要重新构建规则,通用性较高,匹配成功率较高的效果。

Figure 202111048036

The present application relates to an image recognition system for standardized documents, and relates to the technical field of image recognition. The control unit stores the standardized document template constructed by the image entry unit; the image recognition unit identifies the text in the image; the information extraction unit compares the text recognized by the image recognition unit with the template of the standardized document in the control unit; The control unit compares with the corresponding template of the standardized document, the image replacement unit enters the recognized text into the template of the standardized document stored in the control unit; the output unit converts the image into a document for output. The present application has the effect of eliminating the need for manual construction of rules and extracting key information. For different types of documents, rules need to be reconstructed, and the generality is high and the matching success rate is high.

Figure 202111048036

Description

Image recognition system of standardized document
Technical Field
The application relates to the technical field of image recognition, in particular to an image recognition system of a standardized document.
Background
Currently, image recognition refers to a technology of processing, analyzing and understanding an image by a computer to recognize various different patterns of objects and objects.
In the related technology, the image identification and the key information extraction of the document mainly adopt a key word matching method and manually construct rules to extract the key information.
For the related technologies, the inventor thinks that there is a method for extracting key information by manually constructing rules, which needs to reconstruct rules for different types of documents, and has poor universality, and when the keyword identification is wrong, the matching rules are easy to fail.
Disclosure of Invention
In order to solve the problem that the generality of a method for extracting key information by manually constructing rules is poor, the application provides an image recognition system of a standardized document.
The image recognition system for the standardized document adopts the following technical scheme:
an image recognition system of a standardized document comprises a control unit, an image input unit, an image recognition unit, an information extraction unit, an image replacement unit and an output unit; the image input unit constructs a template of a standardized document; the control unit stores the standardized document template constructed by the image input unit; an image recognition unit that recognizes characters in an image; the information extraction unit compares the characters identified by the image identification unit with the template of the standardized document in the control unit; the information extraction unit is used for comparing the template of the corresponding standardized document in the control unit, and the image replacement unit is used for inputting the identified characters into the template of the standardized document stored in the control unit; an output unit that converts the image into a document to be output.
Optionally, an image analysis unit is connected between the image input unit and the control unit, the image analysis unit analyzes the image input by the image input unit, and when the image input by the image input unit is a clear image, the control unit stores the image into a template of a standardized document.
Optionally, an offset correction unit is connected between the image recognition unit and the information extraction unit, and the offset correction unit is configured to perform offset correction on the text in the image recognized by the image recognition unit.
Optionally, a standard value a is set in the information extraction unit, the information extraction unit compares the characters in the image identified by the image identification unit with the template of the standardized document in the control unit, when the coincidence rate of the characters in the image identified by the image identification unit and the template of the standardized document in the control unit is greater than the standard value a, the image replacement unit records the identified characters into the template of the standardized document, and when the coincidence rate of the characters in the image identified by the image identification unit and the template of the standardized document in the control unit is less than the standard value a, the image identification unit prompts the user to identify the image again.
Optionally, the image replacing unit and the output unit are connected with a layout determining unit, the layout determining unit detects a document to be output, the layout determining unit detects the position deviation of the text on the document, the layout determining unit resends the document to the deviation correcting unit, the deviation correcting unit corrects the position of the text on the document, the information extracting unit extracts the text on the document, the image replacing unit inputs the template of the standardized document again, and the document is output through the output unit.
Optionally, the image analysis unit is connected with a bluetooth receiving unit, the bluetooth receiving unit receives an image shot by a mobile phone of a user, and the image analysis unit analyzes the image after the bluetooth receiving unit receives the image shot by the user.
In summary, the present application includes the following beneficial technical effects of an image recognition system for at least one standardized document:
the standardized document refers to a document with a fixed format, such as an identity card, a passport, a family account book, various forms and the like, the format is fixed, only the content is different, in the application, a user only needs to take a blank standardized document, then the standardized document is recorded through the image recording unit, the control unit stores a template of the standardized document constructed by the image recording unit, when the standardized document needs to be output, the image identification unit identifies the image, then the information extraction unit extracts characters in the image and compares the characters with the standardized document in the control unit, then the image replacement unit records the extracted characters in the information extraction unit into the template of the standardized document stored by the control unit, finally the output unit outputs the document, and the method for manually constructing rules and extracting key information is saved, and the rules are required to be reconstructed for different types of documents, the universality is higher, and the matching success rate is higher.
Drawings
FIG. 1 is a flow chart of an image recognition system embodying a standardized document.
Reference numerals: 1. a control unit; 2. an image input unit; 3. an image recognition unit; 4. an information extraction unit; 5. an image replacement unit; 6. an output unit; 7. an image analysis unit; 8. an offset correction unit; 9. a layout determining unit; 10. a Bluetooth receiving unit; 11. and an identifier.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings of the embodiments of the present application. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the application without any inventive step, are within the scope of protection of the application.
Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" or "an" and the like in the description and in the claims of the present application do not denote a limitation of quantity, but rather denote the presence of at least one.
In the description of the present specification and claims, the terms "upper", "lower", "horizontal", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of describing the present application and simplifying the description, but do not indicate or imply that the referred device or unit must have a specific direction, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.
The present application is described in further detail below with reference to fig. 1.
The embodiment of the application discloses an image recognition system of a standardized document.
Referring to fig. 1, an image recognition system for standardized documents comprises a control unit 1, an image input unit 2, an image recognition unit 3, an information extraction unit 4, an image replacement unit 5 and an output unit 6, wherein before use, a user takes a plurality of blank standardized templates to construct a template of the standardized document through the image input unit 2, then the template of the standardized document is stored in the control unit 1, when in use, the image recognition unit 3 recognizes the characters in the image, then the information extraction unit 4 compares the characters identified by the image identification unit 3 with the templates of a plurality of standardized documents in the control unit 1, after the information extraction unit 4 compares the templates of the corresponding standardized documents in the control unit 1, the image replacing unit 5 records the recognized characters into the template of the standardized document stored in the control unit 1, and finally the output unit 6 outputs the document into which the image is converted. The method for extracting key information by manually constructing the rules is omitted, the rules are required to be reconstructed for different types of documents, the universality is high, and the matching success rate is high.
When a user inputs a blank standardized template, the problem of unclear image caused by shaking during shooting can be caused, so that the later-stage image replacing unit 5 cannot find the corresponding template in the control unit 1, the matching success rate of the image recognition system is reduced, after improvement of a designer, an image analyzing unit 7 is connected between the image input unit 2 and the control unit 1, the image analyzing unit 7 analyzes the image input by the image input unit 2, when the image input by the image input unit 2 is a clear image, the control unit 1 stores the image into a template of a standardized document, and when the image analyzing unit 7 analyzes that the image is an unclear image, the image analyzing unit 7 informs the user to input the standardized template again.
The general volume of current image entry unit 2 is great, the user can only be limited to and carry out the entry of standardized document template in a place, image recognition system's adaptability has been reduced, the designer improves the back, be connected with bluetooth receiving element 10 on the image analysis unit 7, bluetooth receiving element 10 receives the image that the user's cell-phone was shot, image analysis unit 7 analyzes the image after bluetooth receiving element 10 received the image that the user shot, carry out the entry of standardized template simultaneously through image entry unit 2 and bluetooth receiving element 10, on the one hand, make, the work efficiency of standardized template entry can promote, on the other hand, the user need not be limited to and carry out the entry of standardized template in a place.
In the using process, although the working efficiency of the standardized template entry is improved, the safety of the Bluetooth receiving unit 10 is low, and some lawless persons may connect with the Bluetooth receiving unit 10 to generate wrong standardized templates for the control unit 1. After the improvement of a designer, an identifier 11 is connected to the Bluetooth receiving unit 10, before a standard template is input to a device, the device needs to be identified by the identifier 11, the device can input the standard template after the identification of the identifier 11, and after the identifier 11 identifies a strange device, the Bluetooth receiving unit 10 rejects a document sent by the strange device, so that the input safety of the standard template is greatly improved. Meanwhile, after the identifier 11 identifies that a strange device sends a document to the bluetooth receiving unit 10, the strange device is located, so that the user can exclude the illegal device.
After the standardized template is recorded, the image recognition unit 3 recognizes the image, the image recognition unit 3 recognizes characters on the image, then the information extraction unit 4 compares the characters recognized by the image recognition unit 3 with the template of the standardized document stored in the control unit 1, in the using process, designers find that the characters in some images have the phenomenon of deviation, and the image recognition unit 3 is based on the main characteristics of the images. Each image has its features such as the letter a having a tip, P having a circle, and the center of Y having an acute angle, etc. As can be seen, in the image recognition process, it is necessary to extract key information by excluding the input redundant information. Meanwhile, the information obtained by stages is arranged into a complete image, when the image is deviated, the success rate of comparison between the information extraction unit 4 and a standardized template stored in the control unit 1 is greatly reduced, after improvement of a designer, an offset correction unit 8 is connected between the image recognition unit 3 and the information extraction unit 4, the offset correction unit 8 is used for carrying out offset correction on characters in the image recognized by the image recognition unit 3, the characters in the image can be better compared with the standardized template in the control unit 1 after being corrected, and the success rate of comparison is greatly increased.
The control unit 1 is internally provided with a plurality of standardized templates, the standardized templates have smaller differences among formats, so that the standardized template output by the final output unit 6 is not required by a user, the working efficiency of the user is reduced, after the improvement of designers, the user can arrange a standard value a in the information extraction unit 4, when the information extraction unit 4 compares characters in an image identified by the image identification unit 3 with the template of the standardized document in the control unit 1, the coincidence rate of the characters in the image identified by the image identification unit 3 and the standardized document template in the control unit 1 is greater than the standard value a, the image replacement unit 5 records the identified characters into the template of the standardized document, when the coincidence rate of the characters in the image identified by the image identification unit 3 and all the standardized document templates in the control unit 1 is less than the standard value a, the image recognition unit 3 prompts the user to recognize the image again, and when the coincidence rate of the characters in the image recognized by the image recognition unit 3 and the partial standardized document template in the control unit 1 is greater than the standard value a, the image replacement unit 5 records the recognized characters into the standardized document template with the coincidence rate greater than the standard value a, so that the success rate of recognizing the standardized document template in the image recorded into the control unit 1 by the image recognition unit 3 is greatly improved.
In the practical process, a user often finds that the position of the characters on the document output by the output unit 6 has some deviation than the specified position, in some forms which are not very important, some deviation of the characters can be received, but in the manufacturing process of some identity cards and passports, when the position of the data on the document is deviated, the consciousness which can be expressed is completely different, in order to prevent the deviation of the characters on the document output by the output unit 6, after the improvement of a designer, a layout determining unit 9 is connected between the image replacing unit 5 and the output unit 6, the layout determining unit 9 detects the document to be output, after the layout determining unit 9 detects the position deviation of the characters on the document, the layout determining unit 9 retransmits the document to the deviation correcting unit 8, after the deviation correcting unit 8 corrects the position of the characters on the document, the information extraction unit 4 extracts characters on the document again, the template of the standardized document stored in the control unit 1 is re-recorded through the image replacement unit 5, the layout determination unit 9 re-detects the document to be output, and the document is output through the output unit 6 after the detection is qualified.
The implementation principle of the image recognition system of the standardized document in the embodiment of the application is as follows: before use, the image input unit 2 and the Bluetooth receiving unit 10 input images, meanwhile, the recognizer 11 on the Bluetooth receiver recognizes the equipment sending the standardized template, then the image analysis unit 7 analyzes the images, when the image analysis unit 7 detects that the images are clear images, the control unit 1 stores the standardized template, when in use, the image recognition unit 3 recognizes the images, the image recognition unit 3 recognizes characters on the images, then the offset correction unit 8 performs offset correction on the characters in the images recognized by the image recognition unit 3, after correction, the information extraction unit 4 compares the characters recognized by the image recognition unit 3 with the template of the standardized document stored in the control unit 1, the information extraction unit 4 compares the characters in the images recognized by the image recognition unit 3 with the template of the standardized document in the control unit 1, the image replacing unit 5 records the recognized characters into a template of a standardized document with the coincidence rate larger than a standard value a, the layout determining unit 9 detects the document to be output, after the layout determining unit 9 detects the position deviation of the characters on the document, the layout determining unit 9 resends the document to the offset correcting unit 8, after the offset correcting unit 8 corrects the positions of the characters on the document, the information extracting unit 4 withdraws the characters on the document again, the image replacing unit 5 re-records the template of the standardized document stored in the control unit 1, the layout determining unit 9 re-detects the document to be output, and the document is output through the output unit 6 after the detection is qualified.
The above embodiments are preferred embodiments of the present application, and the protection scope of the present application is not limited by the above embodiments, so: all equivalent changes made according to the structure, shape and principle of the present application shall be covered by the protection scope of the present application.

Claims (6)

1. An image recognition system for a standardized document, characterized by: the system comprises a control unit (1), an image input unit (2), an image recognition unit (3), an information extraction unit (4), an image replacement unit (5) and an output unit (6);
an image entry unit (2), the image entry unit (2) constructing a template of a standardized document;
the control unit (1), the said control unit (1) stores the standardized document template that the said image entry unit (2) constructs;
an image recognition unit (3), wherein the image recognition unit (3) recognizes characters in an image;
the information extraction unit (4), the information extraction unit (4) compares the characters identified by the image identification unit (3) with the template of the standardized document in the control unit (1);
the image replacing unit (5), the information extracting unit (4) compares the template of the corresponding standardized document in the control unit (1), and the image replacing unit (5) inputs the identified characters into the template of the standardized document stored in the control unit (1);
an output unit (6), the output unit (6) converting the image into a document and outputting the document.
2. A system for image recognition of a standardized document as recited in claim 1, wherein: an image analysis unit (7) is connected between the image recording unit (2) and the control unit (1), the image analysis unit (7) analyzes the image recorded by the image recording unit (2), and when the image recorded by the image recording unit (2) is a clear image, the control unit (1) stores the image into a template of a standardized document.
3. A system for image recognition of a standardized document as recited in claim 1, wherein: an offset correction unit (8) is connected between the image recognition unit (3) and the information extraction unit (4), and the offset correction unit (8) is used for performing offset correction on characters in the image recognized by the image recognition unit (3).
4. A system for image recognition of a standardized document as recited in claim 1, wherein: the information extraction unit (4) is internally provided with a standard value a, the information extraction unit (4) compares characters in an image identified by the image identification unit (3) with a standardized document template in the control unit (1), when the coincidence rate of the characters in the image identified by the image identification unit (3) and the standardized document template in the control unit (1) is greater than the standard value a, the image replacement unit (5) records the identified characters into the standardized document template, and when the coincidence rate of the characters in the image identified by the image identification unit (3) and the standardized document template in the control unit (1) is less than the standard value a, the image identification unit (3) prompts a user to identify the image again.
5. A system for image recognition of a standardized document as recited in claim 1, wherein: image replacement unit (5) with be connected with layout determining unit (9) between output unit (6), layout determining unit (9) are detected the document of treating the output, layout determining unit (9) detect the text position on the document after the skew, layout determining unit (9) resend the document to skew correction unit (8), skew correction unit (8) are with the text position on the document after correcting, information extraction unit (4) are extracted the characters on the document, through image replacement unit (5) inputs the template of standardized document again, at last through output unit (6) output document.
6. A system for image recognition of a standardized document as recited in claim 1, wherein: the image analysis unit (7) is connected with a Bluetooth receiving unit (10), the Bluetooth receiving unit (10) receives images shot by a mobile phone of a user, and the image analysis unit (7) analyzes the images after the Bluetooth receiving unit (10) receives the images shot by the user.
CN202111048036.3A 2021-09-08 2021-09-08 Image recognition system of standardized document Pending CN113780154A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111048036.3A CN113780154A (en) 2021-09-08 2021-09-08 Image recognition system of standardized document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111048036.3A CN113780154A (en) 2021-09-08 2021-09-08 Image recognition system of standardized document

Publications (1)

Publication Number Publication Date
CN113780154A true CN113780154A (en) 2021-12-10

Family

ID=78841956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111048036.3A Pending CN113780154A (en) 2021-09-08 2021-09-08 Image recognition system of standardized document

Country Status (1)

Country Link
CN (1) CN113780154A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150078671A1 (en) * 2013-09-19 2015-03-19 IDChecker, Inc. Automated document recognition, identification, and data extraction
CN107708098A (en) * 2017-10-12 2018-02-16 重庆云停智连科技有限公司 A kind of personal identification method based on Bluetooth communication
WO2019161615A1 (en) * 2018-02-23 2019-08-29 平安科技(深圳)有限公司 Bill entry method, system, optical character recognition server and storage medium
CN111353492A (en) * 2020-03-12 2020-06-30 上海合合信息科技发展有限公司 Image identification and information extraction method and device for standardized document

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150078671A1 (en) * 2013-09-19 2015-03-19 IDChecker, Inc. Automated document recognition, identification, and data extraction
CN107708098A (en) * 2017-10-12 2018-02-16 重庆云停智连科技有限公司 A kind of personal identification method based on Bluetooth communication
WO2019161615A1 (en) * 2018-02-23 2019-08-29 平安科技(深圳)有限公司 Bill entry method, system, optical character recognition server and storage medium
CN111353492A (en) * 2020-03-12 2020-06-30 上海合合信息科技发展有限公司 Image identification and information extraction method and device for standardized document

Similar Documents

Publication Publication Date Title
US7444007B2 (en) Iris-based biometric identification
US8326041B2 (en) Machine character recognition verification
CN103577818B (en) A kind of method and apparatus of pictograph identification
WO2019174131A1 (en) Identity authentication method, server, and computer readable storage medium
CN111353492A (en) Image identification and information extraction method and device for standardized document
CN110728272A (en) Method for inputting certificate information based on OCR and related device
Baig et al. Fingerprint-iris fusion based identification system using a single hamming distance matcher
CN110222168B (en) Data processing method and related device
CN115620312A (en) Cross-modal character handwriting verification method, system, equipment and storage medium
CN111753923A (en) Face-based smart album clustering method, system, device and storage medium
Bukhari et al. High performance layout analysis of Arabic and Urdu document images
TW202018577A (en) Human recognition method based on data fusion
CN112766255A (en) Optical character recognition method, device, equipment and storage medium
CN111104852B (en) A Face Recognition Technology Based on Heuristic Gaussian Cloud Transform
CN107885989A (en) Signing messages acquisition method, signature verification method and electric signing system
CN111428710A (en) A document classification collaborative robot and an image text recognition method based thereon
KR102201930B1 (en) Device and method for generating document automatically information recorded in the image file
CN113780154A (en) Image recognition system of standardized document
JP2020095374A (en) Character recognition system, character recognition device, program and character recognition method
Joseph Advanced digital image processing technique based optical character recognition of scanned document
CN117009460B (en) Auxiliary information quick collection method for dictionary pen
CN108830217B (en) Automatic signature distinguishing method based on fuzzy mean hash learning
CN114078254B (en) A Robot-Based Intelligent Data Acquisition System
KR20090111202A (en) Hangul Recognition Method and Device Using Horizontal Line, Vertical Line, Diagonal Line, Number of Circles and Characteristic Values
CN116311276A (en) Document image correction method, device, electronic equipment and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination