Disclosure of Invention
In view of the shortcomings of the prior art, the invention aims to provide a method, a device, equipment and a storage medium for registering and identifying a logistics order invoice, which aim to solve the technical problems of low efficiency, high error rate and the like in the traditional invoice input mode in the prior art.
In order to achieve the above purpose, the invention adopts the following technical scheme:
The invention provides a logistic order invoice registration and identification method, which comprises the following steps of obtaining an unregistered invoice picture, preprocessing the unregistered invoice picture to obtain a preprocessed picture, constructing a character recognition model, processing the preprocessed picture by adopting the character recognition model to extract character information in the preprocessed picture, establishing a verification rule, verifying the character information according to the verification rule, marking character information which is not passed through verification, and associating the character information which is passed through verification with a logistic order.
Optionally, in a first implementation manner of the first aspect of the present invention, the obtaining an unregistered invoice picture, preprocessing the unregistered invoice picture to obtain a preprocessed picture, specifically includes obtaining the unregistered invoice picture, denoising the unregistered invoice picture by using a filtering algorithm to obtain a first picture, processing the first picture by using a histogram equalization method to enhance a contrast ratio of the first picture to obtain a second picture, and processing the second picture by using a thresholding method to convert the second picture into a black-white binary image to obtain the preprocessed picture.
Optionally, in a second implementation manner of the first aspect of the present invention, the constructing a character recognition model, processing a preprocessed picture by using the character recognition model to extract character information in the preprocessed picture, specifically includes obtaining an invoice sample image, marking characters in the invoice sample image to obtain training data, training the basic model by using a deep learning frame and the training data with a convolutional neural network model as the basic model to obtain a preliminary model, verifying accuracy of the preliminary model, and continuously optimizing the preliminary model according to a verification result to obtain the character recognition model.
Optionally, in a third implementation manner of the first aspect of the present invention, the establishing a verification rule, verifying the character information according to the verification rule, marking character information that fails to pass the verification, associating the character information that fails to pass the verification with the logistics order, specifically includes obtaining a format specification and a business rule of an invoice, establishing the verification rule according to the format specification and the business rule of the invoice, verifying the character information according to the verification rule and a pre-established enterprise database to obtain a verification result, marking as abnormal if the verification result is that the character information does not conform to the verification rule or cannot be matched with the enterprise database, regarding as that the verification is passed if the verification result is that the character information conforms to the verification rule and can be matched with the enterprise database, and associating the character information that fails to pass the verification with the logistics order.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the checking the character information according to the checking rule and the pre-built enterprise database to obtain a checking result specifically includes obtaining different types of paper invoice images, building a definition judgment model according to the different types of paper invoice images, judging whether the invoice is a paper invoice according to the character information, if the invoice is a paper invoice, judging an unregistered invoice picture by using the definition judgment model to obtain a judging result, if the judging result is not passed, marking the corresponding unregistered invoice picture, and if the judging result is passed, checking the character information according to the checking rule and the pre-built enterprise database to obtain the checking result.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the obtaining different types of paper invoice images and constructing a definition judgment model according to the different types of paper invoice images specifically includes obtaining different types of paper invoice images, marking whether the definition of the paper invoice images meets requirements, forming a training set according to the paper invoice images and marking results thereof, training the basic model by using a machine learning algorithm as a basic model and using the training set to obtain a definition judgment model, formulating an automatic optimization iteration strategy, automatically updating the training set according to the automatic optimization iteration strategy, and optimizing and iterating the definition judgment model by using the updated training set.
Optionally, in a sixth implementation manner of the first aspect of the present invention, if the judgment result is passing, the character information is checked according to the check rule and a pre-built enterprise database to obtain a check result, and specifically includes, if the judgment result is passing, obtaining key information of an enterprise and related companies, where the key information includes a tax payer identification number and an enterprise name, building an enterprise database with the key information, checking the character information according to the check rule and the enterprise database, where the character information includes tax payer information, an invoicing date, an amount and a tax rate, matching the tax payer information in the character information with the enterprise database to obtain a first check result, and checking the invoicing date, the amount or the tax rate in the character information according to the check rule to obtain a second check result.
The invention provides a logistic order invoice registration recognition device which comprises a preprocessing module, an extraction module and a verification module, wherein the preprocessing module is used for acquiring unregistered invoice pictures and preprocessing the unregistered invoice pictures to obtain preprocessed pictures, the extraction module is used for constructing a character recognition model, processing the preprocessed pictures by adopting the character recognition model to extract character information in the preprocessed pictures, the verification module is used for establishing a verification rule, verifying the character information according to the verification rule, marking character information which is not verified, and associating the character information which is verified to pass with a logistic order.
Optionally, in a first implementation manner of the second aspect of the present invention, the preprocessing module includes a denoising unit, an adjusting unit, and a converting unit, wherein the denoising unit is used for obtaining an unregistered invoice picture, denoising the unregistered invoice picture by adopting a filtering algorithm to obtain a first picture, the adjusting unit is used for processing the first picture by adopting a histogram equalization method to enhance the contrast of the first picture to obtain a second picture, and the converting unit is used for processing the second picture by adopting a threshold method to convert the second picture into a black-white binary image to obtain a preprocessed picture.
Optionally, in a second implementation manner of the second aspect of the present invention, the extracting module includes an obtaining unit, a training unit, and an optimizing unit, where the obtaining unit is configured to obtain an invoice sample image, label characters in the invoice sample image to obtain training data, the training unit is configured to train the basic model by using the deep learning frame and the training data with the convolutional neural network model as a basic model to obtain a preliminary model, and the optimizing unit is configured to verify accuracy of the preliminary model, and continuously optimize the preliminary model according to a verification result to obtain a character recognition model.
Optionally, in a third implementation manner of the second aspect of the present invention, the verification module includes a creation sub-module configured to obtain a format specification and a service rule of an invoice, create a verification rule according to the format specification and the service rule of the invoice, verify character information according to the verification rule and a pre-built enterprise database to obtain a verification result, a marking sub-module configured to mark as abnormal if the verification result is that the character information does not conform to the verification rule or cannot be matched with the enterprise database, and a correlation sub-module configured to consider that the verification is passed if the verification result is that the character information conforms to the verification rule and can be matched with the enterprise database, and correlate the character information passed by the verification with a logistics order.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the verification sub-module includes a construction unit, a judgment unit, a marking unit, and a verification unit, wherein the construction unit is used for acquiring different types of paper invoice images, constructing a definition judgment model according to the different types of paper invoice images, the judgment unit is used for judging whether the invoice is paper invoice according to character information, judging unregistered invoice pictures by adopting the definition judgment model to obtain a judgment result if the invoice is paper invoice, marking the corresponding unregistered invoice pictures if the judgment result is not passed, and the verification unit is used for verifying the character information according to the verification rule and a pre-constructed enterprise database to obtain a verification result if the judgment result is passed.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the building unit includes an obtaining subunit, configured to obtain different types of paper invoice images, label whether the sharpness of the paper invoice images meets the requirement, form a training set based on the paper invoice images and the labeling results thereof, train the basic model with the training set based on the machine learning algorithm to obtain a sharpness judgment model, and an optimizing subunit, configured to formulate an automatic optimization iteration strategy, automatically update the training set according to the automatic optimization iteration strategy, and optimize and iterate the sharpness judgment model with the updated training set.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the verification unit includes a construction subunit, configured to obtain key information of an enterprise and its related company if the determination result is passed, where the key information includes a tax payer identifier, an enterprise name, and construct an enterprise database with the key information, and the first verification subunit is configured to verify, according to the verification rule and the enterprise database, character information, where the character information includes tax payer information, an invoicing date, an amount, and a tax rate, and match tax payer information in the character information with the enterprise database to obtain a first verification result, and the second verification subunit is configured to verify, according to the verification rule, the invoicing date, the amount, or the tax rate in the character information to obtain a second verification result.
A third aspect of the present invention provides a logistics order invoice registration recognition device comprising a memory having computer readable instructions stored therein and at least one processor invoking the computer readable instructions in the memory to perform the steps of the logistics order invoice registration recognition method as described above.
A fourth aspect of the present invention provides a computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the method for registering and identifying a logistics order invoice as described above.
The invention has the beneficial effects that the method for registering and identifying the invoice of the logistics order has the advantages that firstly, the unregistered invoice picture is obtained through preprocessing the unregistered invoice picture, the quality of the picture is improved, the processing result of a follow-up model is more accurate, then, the preprocessed picture is processed through adopting the character recognition model, the character information in the preprocessed picture is extracted, the manual operation is not needed, the efficiency is higher, the error rate is low, finally, the verification rule is established, the character information which is not passed through verification is verified according to the verification rule, the character information which is not passed through verification is marked, the character information which is passed through verification is associated with the logistics order, the invoice with abnormality is automatically verified and identified, the invoice which is successfully verified is automatically associated, and the working efficiency and the timeliness of information collection are effectively improved.
Detailed Description
The invention provides a method, a device, equipment and a storage medium for registering and identifying a logistics order invoice. The method comprises the steps of firstly obtaining an unregistered invoice picture, preprocessing the unregistered invoice picture to obtain a preprocessed picture, then constructing a character recognition model, processing the preprocessed picture by adopting the character recognition model to extract character information in the preprocessed picture, finally checking the character information according to a checking rule by establishing the checking rule, marking character information which is not checked, and associating the character information which is checked to a logistics order. The invention adopts a mode of automatically extracting and verifying the information in the invoice, effectively improves the registering efficiency of the invoice, further carries out preprocessing on unregistered invoice pictures for improving the accuracy of character recognition, adopts a character recognition model for extracting the character information, effectively avoids the problem of error in character information extraction, and ensures that the registering process of the logistics order invoice is carried out quickly and efficiently.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, the following description will describe a specific flow of an embodiment of the present invention, and it should be noted that, in the present invention, the steps related to obtaining personal information of a user are all performed under the condition that authorization of the user is obtained.
Referring to fig. 1, a first embodiment of a method for registering and identifying a logistic order invoice according to an embodiment of the present invention includes:
s101, acquiring an unregistered invoice picture, and preprocessing the unregistered invoice picture to acquire a preprocessed picture;
Specifically, each website of the logistics enterprise can directly upload invoice pictures to the system at each place of business, so that the system can acquire unregistered invoice pictures. The unregistered invoice pictures shot by different personnel or different equipment have very different picture quality, so that the unregistered invoice pictures need to be preprocessed in order to be more beneficial to the accurate extraction of character information in the follow-up character recognition model, for example, denoising, contrast enhancement, brightness adjustment, conversion into black-and-white pictures and the like can be adopted. Various pretreatment modes can be combined for use, so that the pretreatment effect is improved.
S102, constructing a character recognition model, and processing the preprocessed picture by adopting the character recognition model to extract character information in the preprocessed picture;
The character recognition model can automatically process a large number of pictures, extract character information in the pictures, remarkably improve working efficiency and reduce the requirement of manual intervention. The trained character recognition model generally has high recognition accuracy, particularly when processing clear, canonical character pictures. This helps to reduce errors that may occur in manual identification. The character recognition model can also process character pictures in various complex scenes, such as blurring, rotation, warping, and the like. This enables a more flexible model in practical applications.
The character information extracted from the preprocessed picture comprises various key information on the invoice, such as enterprise name, tax payer identification number, amount, tax amount, ticketing item and the like. The extracted character information may be further used for verification analysis such as verifying whether the invoice is valid or counterfeit, etc.
S103, establishing a verification rule, verifying the character information according to the verification rule, marking character information which does not pass the verification, and associating the character information which passes the verification with the logistics order.
The invoice has strict requirements on the format of each character, such as the format of fonts, the writing format of dates, the invoicing items, the tax size and the like. By establishing the verification rules, the system can verify each item of content acquired from the invoice picture one by one according to the verification rules, so that the accuracy of the invoice content can be ensured, and the requirements are met.
When the invoice content is found to be unsatisfactory, the system marks the abnormal character content and specifically outputs the abnormal character content to the interface for display to the user. And if the character information accords with the verification rule, correlating the character information with the logistics order to finish the registration and identification of the invoice of the logistics order.
By the method, the registering efficiency of the invoice can be improved, the website can directly upload the invoice accessories during specific implementation, corresponding invoice information can be automatically brought out through the background interface, the invoicing attribution month is selected, and the invoicing registering operation time is shortened.
In addition, the invention can also improve the registration accuracy and the validity, and can also identify the authenticity of the verification and the receipt information feedback through a three-party interface during verification, thereby avoiding the condition of error of manually inputting the receipt information. By the automatic invoice registration and identification method, the workload of manual input is reduced, and the invoice registration efficiency is improved. Meanwhile, the accuracy and the integrity of invoice information are ensured by adopting an image recognition technology and a data checking algorithm. The accurately registered invoice information can provide reliable data support for tax declaration and financial management of logistics enterprises, and is beneficial to enterprises to obey tax regulations and reduce tax risks.
Referring to fig. 2, a second embodiment of a method for registering and identifying a logistic order invoice according to an embodiment of the present invention includes:
s201, obtaining an unregistered invoice picture, and denoising the unregistered invoice picture by adopting a filtering algorithm to obtain a first picture;
specifically, during denoising processing, algorithms such as median filtering, gaussian filtering and the like can be adopted to remove noise points in invoice images. The median filtering is a nonlinear filtering method, and the gray value of each pixel in the image is replaced by the median of the gray values of all pixels in the neighborhood of the pixel, so that salt and pepper noise and the like are effectively removed. Gaussian filtering is a linear filtering method, and noise is removed by carrying out weighted average on an image, so that the Gaussian noise removing method has a good Gaussian noise removing effect.
S202, processing the first picture by adopting a histogram equalization method to enhance the contrast of the first picture and obtain a second picture;
histogram equalization is a method of enhancing image contrast by adjusting the gray level histogram of an image. It stretches the gray scale of the image so that dark areas in the image become darker and bright areas become brighter, thereby improving the readability of the image.
S203, processing the second picture by adopting a threshold method to convert the second picture into a black-white binary image, and obtaining a preprocessed picture.
The preprocessed image is converted into a black-and-white binary image for subsequent character recognition. The binarization may be performed using a fixed threshold method or an adaptive threshold method. The fixed threshold method is to set pixels in an image with gray values greater than a certain fixed threshold to white and pixels below the threshold to black. The self-adaptive threshold rule automatically determines a threshold according to the local characteristics of the image, and has a good processing effect on invoice images with uneven illumination.
Referring to fig. 3, a third embodiment of a method for registering and identifying a logistic order invoice according to an embodiment of the present invention includes:
s301, acquiring an invoice sample image, marking characters in the invoice sample image to obtain training data, wherein a manual marking or automatic marking method can be adopted during marking, and the accuracy and the completeness of marking are ensured.
S302, training a basic model by using a Convolutional Neural Network (CNN) model as the basic model and using a deep learning frame and training data to obtain a preliminary model;
The deep learning framework may employ, for example TensorFlow, pyTorch or the like. In the training process, parameters of the model, such as learning rate, batch size, network structure and the like, need to be continuously adjusted, so that the accuracy of character recognition is improved. Meanwhile, the cross-validation, early-stop method and other technologies can be adopted to prevent the model from being fitted excessively.
S303, verifying the accuracy of the preliminary model, and continuously optimizing the preliminary model according to a verification result to obtain a character recognition model.
After training is completed, the character recognition model can be deployed into an invoice registration recognition system, so that automatic character recognition of an invoice image is realized. In practical application, the model can be further optimized and adjusted according to the characteristics and the identification effect of the invoice.
Further, post-processing can be performed on the recognized characters to remove some erroneous recognition results. For example, the recognized characters can be checked and corrected by dictionary inquiry, grammar analysis and other methods, so that the accuracy of the recognition result is ensured.
In addition to constructing the character recognition model in the above manner, an excellent performing Optical Character Recognition (OCR) engine, such as TESSERACT OCR, hundred degrees OCR, etc., may be selected. These OCR engines are capable of accurately recognizing characters in a variety of fonts and languages through extensive training and optimization.
When the method is applied specifically, parameters can be adjusted and optimized according to the characteristics of the invoice. For example, for a particular font, font size, color, etc. on the invoice, the recognition parameters of the OCR engine may be adjusted to improve the accuracy of character recognition. Meanwhile, the invoice image can be segmented, characters in different areas are respectively identified, and the identification accuracy and efficiency are improved.
Referring to fig. 4, a fourth embodiment of a method for registering and identifying a logistic order invoice according to an embodiment of the present invention includes:
S401, acquiring format specifications and business rules of an invoice, and establishing a verification rule according to the format specifications and the business rules of the invoice;
The invoice has special format specifications, such as the invoice name, invoice code and number, connection times and uses, customer names, issuing banks and accounts, commodity names or business items, measurement units, quantity, unit price, case and case amount, tax rate (collection rate), tax amount, invoicer, invoicing date, invoicing unit (individual) names (chapters) and the like.
In addition, the invoice must be fully issued once in a row according to the specified time limit, sequence and columns, and the special chapter of the invoice is added. The words are used in Chinese, the case and the amount are used in Chinese, and the date of invoicing is also used in Chinese. The columns of the purchase unit name, the goods name or the service item, the specification, the unit, the number, the unit price and the like must be filled in with the specification. The project is completely filled, and the handwriting is clear. All the combinations should be copied or printed once and filled in according to the number sequence.
By establishing the verification rule according to the format specification and the business rule of the invoice, the system can verify according to the related specified requirements, and the verification result is ensured to be correct.
S402, checking the character information according to the checking rule and a pre-constructed enterprise database to obtain a checking result;
s403, if the verification result is that the character information does not accord with the verification rule or cannot be matched with the enterprise database, marking as abnormal;
when the abnormality is found, the corresponding content is marked, so that the user can be reminded to perform manual processing in time, and if the problem that the invoice picture is still unregistered is found through manual inspection, the user is required to provide a new invoice again. If the model itself identifies a problem, the verification can be performed through a manual channel.
S404, if the verification result is that the character information accords with the verification rule and can be matched with the enterprise database, the character information passing the verification is considered to pass the verification, and the character information passing the verification is associated with the logistics order.
After association, the invoice information may be stored using a database or in the form of an electronic document. During registration, the integrity and accuracy of invoice information are ensured, and the information of invoice registration time, registration personnel and the like is recorded at the same time so as to facilitate subsequent inquiry and management. Invoice information may be categorized and archived for ease of query and management. For example, invoice information may be stored in different folders or database tables, classified by invoice type, date of invoicing, tax payer identification number, etc. Meanwhile, an index can be established, so that specific invoice information can be conveniently and rapidly inquired.
Referring to fig. 5, a fifth embodiment of a method for registering and identifying a logistic order invoice according to an embodiment of the present invention includes:
S501, acquiring paper invoice images of different types, and constructing a definition judgment model according to the paper invoice images of different types;
Paper invoices may have many printed problems or mispreserved problems relative to electronic invoices. For example, if the key information cannot be checked due to stains on the paper invoice, the system cannot recognize the key information and only can provide the invoice again. For another example, paper invoices are printed by a printer, and there may be a problem of unclear printing. The electronic invoice is an electronic file directly generated, so that the problems of fuzzy and unclear content are solved.
S502, judging whether the invoice is a paper invoice according to the character information, and if the invoice is the paper invoice, judging unregistered invoice pictures by adopting a definition judgment model to obtain a judgment result;
the character information comprises all character contents on the invoice, wherein the character information comprises an invoice head, the invoice head accurately represents whether the invoice is an electronic invoice, and whether the invoice is a paper invoice can be judged by judging whether the character information contains information of the electronic invoice;
s503, if the judgment result is that the invoice does not pass, marking the corresponding unregistered invoice picture;
s504, if the judgment result is that the character information passes, checking the character information according to the checking rule and a pre-constructed enterprise database to obtain a checking result.
In this embodiment, if the definition judgment model considers that the definition of the paper invoice picture is too low, the paper invoice picture is marked, at this time, the manual intervention is reminded to check whether the picture is not satisfactory, if the picture is photographed, the user is required to provide the picture again, and if the picture is the paper invoice picture, the user is required to provide a new invoice additionally.
Since all character information is not required to be checked during verification, the paper invoice itself cannot be ensured to have other problems by passing the verification. When the character information is checked, other problems of the paper invoice are eliminated in advance through the definition judgment model, and the subsequently received paper invoice can be ensured to meet the financial requirements.
Referring to fig. 6, a sixth embodiment of a method for registering and identifying a logistic order invoice according to an embodiment of the present invention includes:
S601, acquiring paper invoice images of different types, marking whether the definition of the paper invoice images meets the requirements, and forming a training set by the paper invoice images and marking results thereof;
Specifically, in order to form a training set with more complete data, it is necessary to obtain paper invoice images with different defects, such as an invoice with dirt, which causes part of key information to be unrecognizable, an invoice with breakage, an invoice with unclear printing, an invoice with incomplete content, and the like. The obtained definition judgment model has higher judgment accuracy by manually marking which types or the degree of invoice is acceptable in advance and then forming a corresponding training set.
S602, training a basic model by using a machine learning algorithm as the basic model and adopting a training set to obtain a definition judgment model;
s603, an automatic optimization iteration strategy is formulated, a training set is automatically updated according to the automatic optimization iteration strategy, and the updated training set is adopted to optimize and iterate the definition judgment model.
Specifically, a new batch of paper invoice images acquired through the system can be formulated, added into a training set, and the training data size is enlarged, so that the accuracy of the definition judgment model can be improved.
Referring to fig. 7, a seventh embodiment of a method for registering and identifying a logistic order invoice according to an embodiment of the present invention includes:
S701, if the judgment result is that the business and the related companies pass, acquiring key information of the business and the related companies, wherein the key information comprises a tax payer identification number and a business name, and constructing a business database by using the key information;
in order to reduce the system development cost of enterprises, all the subsidiary companies or the associated companies can adopt the same invoice registration and identification system, and related information data of all the subsidiary companies or the associated companies are stored in an enterprise database, so long as the enterprise name and the tax payer identification number thereof meet the requirements, the enterprise name and the tax payer identification number can be matched and found from the enterprise database.
S702, checking character information according to the checking rule and an enterprise database, wherein the character information comprises tax payer information, billing date, amount and tax rate, and matching the tax payer information in the character information with the enterprise database to obtain a first checking result;
specifically, the character information may further include key information such as goods or tax service, service name, unit, number, unit price, tax, etc. The tax payer information comprises information such as enterprise names, tax payer identification numbers and the like, and as long as one of the tax payer information cannot be matched correctly, the tax payer information is regarded as not passing the verification.
S703, checking the billing date, the amount or the tax rate in the character information according to the checking rule to obtain a second checking result.
For example, if tax payer information in the character information cannot be matched with an enterprise database, the verification is failed, if the billing date in the character information does not accord with the date format, the verification is failed, if the amount in the character information is not positive, the verification is failed, and if the tax rate in the character information does not accord with goods or tax service and service names, the verification is failed.
The method for registering and identifying the material flow order invoice in the embodiment of the invention is described above, and the device for registering and identifying the material flow order invoice in the embodiment of the invention is described below, referring to fig. 8, one embodiment of the device for registering and identifying the material flow order invoice in the embodiment of the invention includes:
the preprocessing module 10 is used for acquiring unregistered invoice pictures and preprocessing the unregistered invoice pictures to acquire preprocessed pictures;
The extracting module 20 is configured to construct a character recognition model, and process the preprocessed picture by using the character recognition model to extract character information in the preprocessed picture;
And the verification module 30 is used for establishing a verification rule, verifying the character information according to the verification rule, marking character information which does not pass the verification, and associating the character information which passes the verification with the logistics order.
Referring to fig. 9, an embodiment of a device for registering and identifying a logistic order invoice according to an embodiment of the present invention includes:
the preprocessing module 10 is used for acquiring unregistered invoice pictures and preprocessing the unregistered invoice pictures to acquire preprocessed pictures;
The extracting module 20 is configured to construct a character recognition model, and process the preprocessed picture by using the character recognition model to extract character information in the preprocessed picture;
The verification module 30 is configured to establish a verification rule, verify the character information according to the verification rule, mark character information that fails to pass the verification, and associate the character information that passes the verification with the logistics order;
in this embodiment, the preprocessing module 10 includes:
The denoising unit 11 is used for acquiring unregistered invoice pictures, and denoising the unregistered invoice pictures by adopting a filtering algorithm to acquire first pictures;
an adjusting unit 12, configured to process the first picture by using a histogram equalization method, so as to enhance a contrast ratio of the first picture, and obtain a second picture;
a conversion unit 13, configured to process the second picture by using a threshold method, so as to convert the second picture into a black-white binary image, thereby obtaining a preprocessed picture;
In this embodiment, the extracting module 20 includes:
An acquiring unit 21, configured to acquire an invoice sample image, and label characters in the invoice sample image to obtain training data;
a training unit 22, configured to train the basic model with the deep learning framework and training data by using the convolutional neural network model as the basic model, so as to obtain a preliminary model;
An optimizing unit 23, configured to verify the accuracy of the preliminary model, and continuously optimize the preliminary model according to the verification result, so as to obtain a character recognition model;
in this embodiment, the verification module 30 includes:
the establishing sub-module 31 is used for acquiring the format specification and the business rule of the invoice, and establishing a verification rule according to the format specification and the business rule of the invoice;
A verification sub-module 32, configured to verify the character information according to the verification rule and a pre-constructed enterprise database, so as to obtain a verification result;
A marking sub-module 33, configured to mark as abnormal if the verification result indicates that the character information does not conform to the verification rule or cannot be matched with the enterprise database;
The association sub-module 34 is configured to, if the verification result indicates that the character information accords with the verification rule and can be matched with the enterprise database, consider that the verification is passed, and associate the character information passed by the verification with the logistics order;
In this embodiment, the verification sub-module 32 includes:
the construction unit 321 is configured to obtain different types of paper invoice images, and construct a definition judgment model according to the different types of paper invoice images;
the judging unit 322 is configured to judge whether the invoice is a paper invoice according to the character information, and if the invoice is a paper invoice, judge an unregistered invoice picture by using a definition judgment model to obtain a judgment result;
a marking unit 323, configured to mark the corresponding unregistered invoice picture if the determination result is not passed;
the checking unit 324 is configured to check the character information according to the checking rule and a pre-constructed enterprise database if the determination result is passed, so as to obtain a checking result;
in this embodiment, the building unit 321 includes:
the obtaining subunit 3211 is configured to obtain paper invoice images of different types, label whether the sharpness of the paper invoice images meets the requirement, and form a training set according to the paper invoice images and the labeling results thereof;
a training subunit 3212, configured to train the basic model with a training set by using a machine learning algorithm as the basic model, so as to obtain a definition judgment model;
An optimizing subunit 3213, configured to formulate an automatic optimization iteration strategy, automatically update the training set according to the automatic optimization iteration strategy, and optimize and iterate the sharpness judgment model by using the updated training set;
In this embodiment, the verification unit 324 includes:
A construction subunit 3241, configured to obtain key information of the enterprise and its related companies if the determination result is passed, where the key information includes a tax payer identifier and an enterprise name, and construct an enterprise database according to the key information;
The first checking subunit 3242 is configured to check character information according to the checking rule and the enterprise database, where the character information includes tax payer information, date of invoicing, amount of money, and tax rate, and match the tax payer information in the character information with the enterprise database to obtain a first checking result;
and a second checking subunit 3243, configured to check the billing date, amount, or tax rate in the character information according to the checking rule, so as to obtain a second checking result.
According to the logistics order invoice registration recognition device, the quality of the obtained unregistered invoice picture is improved by automatically preprocessing the unregistered invoice picture, the character recognition model is built, the preprocessed picture is processed by adopting the character recognition model, character information in the preprocessed picture can be accurately extracted, the efficiency is high, manual processing is not needed, the registration of invoice contents can be rapidly completed, finally, the character information is checked according to the check rule by establishing the check rule, if the character information is not checked, the mark is carried out, and the checked character information is associated with a logistics order, so that the invoice information is stored.
The above describes the logistics order invoice registration and identification apparatus in the embodiment of the present invention in detail from the point of view of the modularized functional entity, and the logistics order invoice registration and identification device in the embodiment of the present invention is described in detail from the point of view of hardware processing.
Fig. 10 is a schematic diagram of a configuration of a device for registering and identifying a logistics order invoice according to an embodiment of the present invention, where the device 900 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 910 (e.g., one or more processors) and a memory 920, one or more storage media 930 (e.g., one or more mass storage devices) storing application programs 933 or data 932. Wherein the memory 920 and storage medium 930 may be transitory or persistent storage. The program stored on the storage medium 930 may include one or more modules (not shown), each of which may include a series of instruction operations in the logistics order invoice registration identification apparatus 900. Still further, the processor 910 may be configured to communicate with the storage medium 930 and execute a series of instruction operations in the storage medium 930 on the logistics order invoice registration recognition device 900 to implement the steps of the logistics order invoice registration recognition method provided by the above-described method embodiments.
The logistics order invoice registration identification apparatus 900 may also include one or more power supplies 940, one or more wired or wireless network interfaces 950, one or more input/output interfaces 960, and/or one or more operating systems 931, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the configuration of the logistics order invoice registration recognition device illustrated in fig. 10 does not constitute a limitation of the logistics order invoice registration recognition device, and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and which may also be a volatile computer readable storage medium, having stored therein instructions that, when executed on a computer, cause the computer to perform the steps of a method for registering and identifying a logistic order invoice.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus or device described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
It will be understood that equivalents and modifications will occur to those skilled in the art in light of the present invention and their spirit, and all such modifications and substitutions are intended to be included within the scope of the present invention as defined in the following claims.