Disclosure of Invention
In order to solve the problem that the generality of a method for extracting key information by manually constructing rules is poor, the application provides an image recognition system of a standardized document.
The image recognition system for the standardized document adopts the following technical scheme:
an image recognition system of a standardized document comprises a control unit, an image input unit, an image recognition unit, an information extraction unit, an image replacement unit and an output unit; the image input unit constructs a template of a standardized document; the control unit stores the standardized document template constructed by the image input unit; an image recognition unit that recognizes characters in an image; the information extraction unit compares the characters identified by the image identification unit with the template of the standardized document in the control unit; the information extraction unit is used for comparing the template of the corresponding standardized document in the control unit, and the image replacement unit is used for inputting the identified characters into the template of the standardized document stored in the control unit; an output unit that converts the image into a document to be output.
Optionally, an image analysis unit is connected between the image input unit and the control unit, the image analysis unit analyzes the image input by the image input unit, and when the image input by the image input unit is a clear image, the control unit stores the image into a template of a standardized document.
Optionally, an offset correction unit is connected between the image recognition unit and the information extraction unit, and the offset correction unit is configured to perform offset correction on the text in the image recognized by the image recognition unit.
Optionally, a standard value a is set in the information extraction unit, the information extraction unit compares the characters in the image identified by the image identification unit with the template of the standardized document in the control unit, when the coincidence rate of the characters in the image identified by the image identification unit and the template of the standardized document in the control unit is greater than the standard value a, the image replacement unit records the identified characters into the template of the standardized document, and when the coincidence rate of the characters in the image identified by the image identification unit and the template of the standardized document in the control unit is less than the standard value a, the image identification unit prompts the user to identify the image again.
Optionally, the image replacing unit and the output unit are connected with a layout determining unit, the layout determining unit detects a document to be output, the layout determining unit detects the position deviation of the text on the document, the layout determining unit resends the document to the deviation correcting unit, the deviation correcting unit corrects the position of the text on the document, the information extracting unit extracts the text on the document, the image replacing unit inputs the template of the standardized document again, and the document is output through the output unit.
Optionally, the image analysis unit is connected with a bluetooth receiving unit, the bluetooth receiving unit receives an image shot by a mobile phone of a user, and the image analysis unit analyzes the image after the bluetooth receiving unit receives the image shot by the user.
In summary, the present application includes the following beneficial technical effects of an image recognition system for at least one standardized document:
the standardized document refers to a document with a fixed format, such as an identity card, a passport, a family account book, various forms and the like, the format is fixed, only the content is different, in the application, a user only needs to take a blank standardized document, then the standardized document is recorded through the image recording unit, the control unit stores a template of the standardized document constructed by the image recording unit, when the standardized document needs to be output, the image identification unit identifies the image, then the information extraction unit extracts characters in the image and compares the characters with the standardized document in the control unit, then the image replacement unit records the extracted characters in the information extraction unit into the template of the standardized document stored by the control unit, finally the output unit outputs the document, and the method for manually constructing rules and extracting key information is saved, and the rules are required to be reconstructed for different types of documents, the universality is higher, and the matching success rate is higher.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings of the embodiments of the present application. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the application without any inventive step, are within the scope of protection of the application.
Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" or "an" and the like in the description and in the claims of the present application do not denote a limitation of quantity, but rather denote the presence of at least one.
In the description of the present specification and claims, the terms "upper", "lower", "horizontal", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of describing the present application and simplifying the description, but do not indicate or imply that the referred device or unit must have a specific direction, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.
The present application is described in further detail below with reference to fig. 1.
The embodiment of the application discloses an image recognition system of a standardized document.
Referring to fig. 1, an image recognition system for standardized documents comprises a control unit 1, an image input unit 2, an image recognition unit 3, an information extraction unit 4, an image replacement unit 5 and an output unit 6, wherein before use, a user takes a plurality of blank standardized templates to construct a template of the standardized document through the image input unit 2, then the template of the standardized document is stored in the control unit 1, when in use, the image recognition unit 3 recognizes the characters in the image, then the information extraction unit 4 compares the characters identified by the image identification unit 3 with the templates of a plurality of standardized documents in the control unit 1, after the information extraction unit 4 compares the templates of the corresponding standardized documents in the control unit 1, the image replacing unit 5 records the recognized characters into the template of the standardized document stored in the control unit 1, and finally the output unit 6 outputs the document into which the image is converted. The method for extracting key information by manually constructing the rules is omitted, the rules are required to be reconstructed for different types of documents, the universality is high, and the matching success rate is high.
When a user inputs a blank standardized template, the problem of unclear image caused by shaking during shooting can be caused, so that the later-stage image replacing unit 5 cannot find the corresponding template in the control unit 1, the matching success rate of the image recognition system is reduced, after improvement of a designer, an image analyzing unit 7 is connected between the image input unit 2 and the control unit 1, the image analyzing unit 7 analyzes the image input by the image input unit 2, when the image input by the image input unit 2 is a clear image, the control unit 1 stores the image into a template of a standardized document, and when the image analyzing unit 7 analyzes that the image is an unclear image, the image analyzing unit 7 informs the user to input the standardized template again.
The general volume of current image entry unit 2 is great, the user can only be limited to and carry out the entry of standardized document template in a place, image recognition system's adaptability has been reduced, the designer improves the back, be connected with bluetooth receiving element 10 on the image analysis unit 7, bluetooth receiving element 10 receives the image that the user's cell-phone was shot, image analysis unit 7 analyzes the image after bluetooth receiving element 10 received the image that the user shot, carry out the entry of standardized template simultaneously through image entry unit 2 and bluetooth receiving element 10, on the one hand, make, the work efficiency of standardized template entry can promote, on the other hand, the user need not be limited to and carry out the entry of standardized template in a place.
In the using process, although the working efficiency of the standardized template entry is improved, the safety of the Bluetooth receiving unit 10 is low, and some lawless persons may connect with the Bluetooth receiving unit 10 to generate wrong standardized templates for the control unit 1. After the improvement of a designer, an identifier 11 is connected to the Bluetooth receiving unit 10, before a standard template is input to a device, the device needs to be identified by the identifier 11, the device can input the standard template after the identification of the identifier 11, and after the identifier 11 identifies a strange device, the Bluetooth receiving unit 10 rejects a document sent by the strange device, so that the input safety of the standard template is greatly improved. Meanwhile, after the identifier 11 identifies that a strange device sends a document to the bluetooth receiving unit 10, the strange device is located, so that the user can exclude the illegal device.
After the standardized template is recorded, the image recognition unit 3 recognizes the image, the image recognition unit 3 recognizes characters on the image, then the information extraction unit 4 compares the characters recognized by the image recognition unit 3 with the template of the standardized document stored in the control unit 1, in the using process, designers find that the characters in some images have the phenomenon of deviation, and the image recognition unit 3 is based on the main characteristics of the images. Each image has its features such as the letter a having a tip, P having a circle, and the center of Y having an acute angle, etc. As can be seen, in the image recognition process, it is necessary to extract key information by excluding the input redundant information. Meanwhile, the information obtained by stages is arranged into a complete image, when the image is deviated, the success rate of comparison between the information extraction unit 4 and a standardized template stored in the control unit 1 is greatly reduced, after improvement of a designer, an offset correction unit 8 is connected between the image recognition unit 3 and the information extraction unit 4, the offset correction unit 8 is used for carrying out offset correction on characters in the image recognized by the image recognition unit 3, the characters in the image can be better compared with the standardized template in the control unit 1 after being corrected, and the success rate of comparison is greatly increased.
The control unit 1 is internally provided with a plurality of standardized templates, the standardized templates have smaller differences among formats, so that the standardized template output by the final output unit 6 is not required by a user, the working efficiency of the user is reduced, after the improvement of designers, the user can arrange a standard value a in the information extraction unit 4, when the information extraction unit 4 compares characters in an image identified by the image identification unit 3 with the template of the standardized document in the control unit 1, the coincidence rate of the characters in the image identified by the image identification unit 3 and the standardized document template in the control unit 1 is greater than the standard value a, the image replacement unit 5 records the identified characters into the template of the standardized document, when the coincidence rate of the characters in the image identified by the image identification unit 3 and all the standardized document templates in the control unit 1 is less than the standard value a, the image recognition unit 3 prompts the user to recognize the image again, and when the coincidence rate of the characters in the image recognized by the image recognition unit 3 and the partial standardized document template in the control unit 1 is greater than the standard value a, the image replacement unit 5 records the recognized characters into the standardized document template with the coincidence rate greater than the standard value a, so that the success rate of recognizing the standardized document template in the image recorded into the control unit 1 by the image recognition unit 3 is greatly improved.
In the practical process, a user often finds that the position of the characters on the document output by the output unit 6 has some deviation than the specified position, in some forms which are not very important, some deviation of the characters can be received, but in the manufacturing process of some identity cards and passports, when the position of the data on the document is deviated, the consciousness which can be expressed is completely different, in order to prevent the deviation of the characters on the document output by the output unit 6, after the improvement of a designer, a layout determining unit 9 is connected between the image replacing unit 5 and the output unit 6, the layout determining unit 9 detects the document to be output, after the layout determining unit 9 detects the position deviation of the characters on the document, the layout determining unit 9 retransmits the document to the deviation correcting unit 8, after the deviation correcting unit 8 corrects the position of the characters on the document, the information extraction unit 4 extracts characters on the document again, the template of the standardized document stored in the control unit 1 is re-recorded through the image replacement unit 5, the layout determination unit 9 re-detects the document to be output, and the document is output through the output unit 6 after the detection is qualified.
The implementation principle of the image recognition system of the standardized document in the embodiment of the application is as follows: before use, the image input unit 2 and the Bluetooth receiving unit 10 input images, meanwhile, the recognizer 11 on the Bluetooth receiver recognizes the equipment sending the standardized template, then the image analysis unit 7 analyzes the images, when the image analysis unit 7 detects that the images are clear images, the control unit 1 stores the standardized template, when in use, the image recognition unit 3 recognizes the images, the image recognition unit 3 recognizes characters on the images, then the offset correction unit 8 performs offset correction on the characters in the images recognized by the image recognition unit 3, after correction, the information extraction unit 4 compares the characters recognized by the image recognition unit 3 with the template of the standardized document stored in the control unit 1, the information extraction unit 4 compares the characters in the images recognized by the image recognition unit 3 with the template of the standardized document in the control unit 1, the image replacing unit 5 records the recognized characters into a template of a standardized document with the coincidence rate larger than a standard value a, the layout determining unit 9 detects the document to be output, after the layout determining unit 9 detects the position deviation of the characters on the document, the layout determining unit 9 resends the document to the offset correcting unit 8, after the offset correcting unit 8 corrects the positions of the characters on the document, the information extracting unit 4 withdraws the characters on the document again, the image replacing unit 5 re-records the template of the standardized document stored in the control unit 1, the layout determining unit 9 re-detects the document to be output, and the document is output through the output unit 6 after the detection is qualified.
The above embodiments are preferred embodiments of the present application, and the protection scope of the present application is not limited by the above embodiments, so: all equivalent changes made according to the structure, shape and principle of the present application shall be covered by the protection scope of the present application.