Disclosure of Invention
The embodiment of the application aims to provide an information input method, device, computer equipment and storage medium of an invoice file, so as to solve the technical problems that the management of the input processing of the existing electronic invoice lacks intelligence and the accuracy of the input invoice information cannot be ensured.
In order to solve the technical problems, the embodiment of the application provides an information input method of an invoice file, which adopts the following technical scheme:
acquiring an uploaded invoice file;
The method comprises the steps of obtaining a target file type corresponding to the invoice file, wherein the target file type comprises a first preset file type, a second preset file type or a third preset file type;
Acquiring a target invoice analysis strategy corresponding to the target file type;
performing invoice analysis processing on the invoice file based on the target invoice analysis strategy to obtain corresponding invoice information;
Filling the invoice information into a preset invoice entity class to obtain an invoice entity class object, and constructing a corresponding invoice entity class list based on the invoice entity class object;
and carrying out information input processing on the invoice entity class list.
Further, the step of performing invoice analysis processing on the invoice file based on the target invoice analysis policy to obtain corresponding invoice information specifically includes:
if the target file type is a first preset file type, a first analysis component corresponding to the first preset file type is called;
based on the first analysis component, a coordinate range corresponding to the target keyword is searched out from the invoice file;
Acquiring a service type of the invoice file, and judging whether a coordinate offset corresponding to the service type exists in a preset cache database;
if the coordinate offset corresponding to the service type does not exist in the cache database, extracting first invoice contents in the coordinate range;
preprocessing the first invoice content to obtain a corresponding second invoice content;
and searching invoice information corresponding to the invoice file from the second invoice content based on a preset regular expression.
Further, after the step of obtaining the service type of the invoice file and judging whether the coordinate offset corresponding to the service type exists in the preset cache database, the method further includes:
if the coordinate offset corresponding to the service type exists in the cache database, acquiring the coordinate offset from the cache database;
Adjusting the coordinate range based on the coordinate offset to obtain a corresponding target coordinate range;
extracting third invoice contents in the target coordinate range;
preprocessing the third invoice content to obtain a corresponding fourth invoice content;
And searching invoice information corresponding to the invoice file from the fourth invoice content based on the regular expression.
Further, the step of performing invoice analysis processing on the invoice file based on the target invoice analysis policy to obtain corresponding invoice information specifically includes:
If the target file type is a second preset file type, a second analysis component corresponding to the second preset file type is called;
decompressing the invoice file based on the second parsing component to obtain a corresponding decompressed file;
Acquiring an invoice information label from the decompressed file, and acquiring an invoice information identifier corresponding to the invoice information label;
Performing invoice information searching processing on the decompressed file based on the invoice information identifier to obtain first invoice information corresponding to the invoice information identifier;
and taking the first invoice information as invoice information corresponding to the invoice file.
Further, the step of performing invoice analysis processing on the invoice file based on the target invoice analysis policy to obtain corresponding invoice information specifically includes:
if the target file type is a third preset file type, calling a third analysis component corresponding to the third preset file type;
Reading the invoice file based on the third analysis component, and analyzing to obtain a label in the invoice file;
Extracting second invoice information corresponding to the label based on a preset label path;
And taking the second invoice information as invoice information corresponding to the invoice file.
Further, the step of obtaining the target file type corresponding to the invoice file specifically includes:
acquiring a specified byte code of the invoice file;
Calling a preset identification rule;
and analyzing and processing the appointed byte codes based on the identification rule to obtain the target file type of the invoice file.
Further, the step of performing information input processing on the invoice entity class list specifically includes:
Calling a preset target database;
Acquiring a preset data transmission mode;
And storing the invoice entity class list into the target database based on the data transmission mode.
In order to solve the technical problems, the embodiment of the application also provides an information input device of an invoice file, which adopts the following technical scheme:
the first acquisition module is used for acquiring the uploaded invoice file;
The second acquisition module is used for acquiring a target file type corresponding to the invoice file, wherein the target file type comprises a first preset file type, a second preset file type or a third preset file type;
The third acquisition module is used for acquiring a target invoice analysis strategy corresponding to the target file type;
the analysis module is used for carrying out invoice analysis processing on the invoice file based on the target invoice analysis strategy to obtain corresponding invoice information;
the processing module is used for filling the invoice information into a preset invoice entity class to obtain an invoice entity class object, and constructing a corresponding invoice entity class list based on the invoice entity class object;
and the input module is used for inputting information into the invoice entity class list.
In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:
acquiring an uploaded invoice file;
The method comprises the steps of obtaining a target file type corresponding to the invoice file, wherein the target file type comprises a first preset file type, a second preset file type or a third preset file type;
Acquiring a target invoice analysis strategy corresponding to the target file type;
performing invoice analysis processing on the invoice file based on the target invoice analysis strategy to obtain corresponding invoice information;
Filling the invoice information into a preset invoice entity class to obtain an invoice entity class object, and constructing a corresponding invoice entity class list based on the invoice entity class object;
and carrying out information input processing on the invoice entity class list.
In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:
acquiring an uploaded invoice file;
The method comprises the steps of obtaining a target file type corresponding to the invoice file, wherein the target file type comprises a first preset file type, a second preset file type or a third preset file type;
Acquiring a target invoice analysis strategy corresponding to the target file type;
performing invoice analysis processing on the invoice file based on the target invoice analysis strategy to obtain corresponding invoice information;
Filling the invoice information into a preset invoice entity class to obtain an invoice entity class object, and constructing a corresponding invoice entity class list based on the invoice entity class object;
and carrying out information input processing on the invoice entity class list.
Compared with the prior art, the embodiment of the application has the following main beneficial effects:
The method comprises the steps of firstly obtaining an uploaded invoice file, then obtaining a target file type corresponding to the invoice file, then obtaining a target invoice analysis strategy corresponding to the target file type, subsequently carrying out invoice analysis processing on the invoice file based on the target invoice analysis strategy to obtain corresponding invoice information, further filling the invoice information into a preset invoice entity class to obtain an invoice entity class object, constructing a corresponding invoice entity class list based on the invoice entity class object, and finally carrying out information input processing on the invoice entity class list. The method comprises the steps of obtaining an uploaded invoice file, obtaining a target file type corresponding to the invoice file, obtaining a target invoice analysis strategy corresponding to the target file type, further performing invoice analysis processing on the invoice file based on the target invoice analysis strategy to obtain corresponding invoice information, filling the invoice information into preset invoice entity class objects to obtain invoice entity class objects, constructing a corresponding invoice entity class list based on the invoice entity class objects, and finally performing information input processing on the invoice entity class list. According to the method and the device, the corresponding invoice information analysis processing of the invoice files of different file types can be intelligently completed based on the invoice analysis strategy, so that the intelligent management of the information input of the invoice files of all different file types is realized, the extraction efficiency and the input efficiency of the invoice information are effectively improved, and the accuracy of the obtained invoice information is ensured.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs, the terms used in the description herein are used for the purpose of describing particular embodiments only and are not intended to limit the application, and the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the above description of the drawings are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include a terminal device 101, a network 102, and a server 103, where the terminal device 101 may be a notebook 1011, a tablet 1012, or a cell phone 1013. Network 102 is the medium used to provide communication links between terminal device 101 and server 103. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 103 via the network 102 using the terminal device 101 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal device 101.
The terminal device 101 may be various electronic devices having a display screen and supporting web browsing, and the terminal device 101 may be an electronic book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer III), an MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer IV) player, a laptop portable computer, a desktop computer, and the like, in addition to the notebook 1011, the tablet 1012, or the mobile phone 1013.
The server 103 may be a server providing various services, such as a background server providing support for pages displayed on the terminal device 101.
It should be noted that, the method for inputting information of an invoice file provided by the embodiment of the present application is generally executed by a server/terminal device, and correspondingly, the device for inputting information of an invoice file is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow chart of one embodiment of a method of information entry of an invoice file in accordance with the present application is shown. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs. The method for inputting the information of the invoice file provided by the embodiment of the application can be applied to any scene needing to input the information of the invoice file, and can be applied to products in the scenes, such as the information input of the invoice file in the field of financial insurance. The information input method of the invoice file comprises the following steps:
step S201, the uploaded invoice file is obtained.
In this embodiment, the electronic device (for example, the server/terminal device shown in fig. 1) on which the information input method of the invoice file operates may acquire the uploaded invoice file through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection. The execution subject of the present application may be an invoice processing system, which may be simply referred to as a system. The user can upload the invoice file to be processed through the system interface, and the system can receive the invoice file and store the invoice file as a temporary file. Wherein the number of invoice files may include a plurality.
In the business scenario of the financial field, the invoice file may refer to a banking invoice (such as a deposit and withdrawal certificate, a transfer certificate, a loan contract), an insurance business invoice (a policy, a premium invoice), a securities business invoice (a trade confirmation, a commission invoice), and the like. In the business scenario of the medical field, the invoice file may be a medical clinic charging bill, a hospitality invoice, a medicine sales invoice, and the like.
Step S202, obtaining a target file type corresponding to the invoice file, wherein the target file type comprises a first preset file type, a second preset file type or a third preset file type.
In this embodiment, the target file types may specifically include a first preset file type (PDF file type), a second preset file type (OFD file type), and a third preset file type (XML file type). The above specific implementation process of obtaining the target file type corresponding to the invoice file will be described in further detail in the following specific embodiments, which will not be described herein.
Step S203, a target invoice analysis strategy corresponding to the target file type is obtained.
In this embodiment, for invoice files of different file types, an invoice parsing policy corresponding to the file type is pre-constructed, and an implementation process corresponding to invoice parsing logic of a specific invoice parsing policy is described in further detail in a subsequent specific embodiment, which is not described herein too much.
And step S204, performing invoice analysis processing on the invoice file based on the target invoice analysis strategy to obtain corresponding invoice information.
In this embodiment, the above-mentioned invoice analysis processing is performed on the invoice file based on the target invoice analysis policy, so as to obtain a specific implementation process of the corresponding invoice information, which will be described in further detail in the following specific embodiments, which are not described herein too much.
Step S205, the invoice information is filled into a preset invoice entity class to obtain an invoice entity class object, and a corresponding invoice entity class list is constructed based on the invoice entity class object.
In this embodiment, an invoice entity class is predefined, and includes all relevant information fields of an invoice (such as an invoice number, an invoice buyer, an invoice seller, an invoice amount, a tax free amount, etc.). And for the successfully resolved invoice file, the system fills the invoice information obtained by resolving the invoice file into the invoice entity class to obtain a corresponding invoice entity class object, and further stores the invoice entity class object in a list to obtain the invoice entity class list. After obtaining the invoice entity class list, the invoice entity class list may be further returned to a subsequent processing module or a salesman interface of the system for further operations (such as auditing, storage, printing, etc.).
And S206, carrying out information input processing on the invoice entity class list.
In this embodiment, the specific implementation process of information input processing on the invoice entity class list is described in further detail in the following specific embodiments, which are not described herein.
The method comprises the steps of firstly obtaining an uploaded invoice file, then obtaining a target file type corresponding to the invoice file, then obtaining a target invoice analysis strategy corresponding to the target file type, subsequently carrying out invoice analysis processing on the invoice file based on the target invoice analysis strategy to obtain corresponding invoice information, further filling the invoice information into a preset invoice entity class to obtain an invoice entity class object, constructing a corresponding invoice entity class list based on the invoice entity class object, and finally carrying out information input processing on the invoice entity class list. The method comprises the steps of obtaining an uploaded invoice file, obtaining a target file type corresponding to the invoice file, obtaining a target invoice analysis strategy corresponding to the target file type, further performing invoice analysis processing on the invoice file based on the target invoice analysis strategy to obtain corresponding invoice information, filling the invoice information into preset invoice entity class objects to obtain invoice entity class objects, constructing a corresponding invoice entity class list based on the invoice entity class objects, and finally performing information input processing on the invoice entity class list. According to the method and the device, the corresponding invoice information analysis processing of the invoice files of different file types can be intelligently completed based on the invoice analysis strategy, so that the intelligent management of the information input of the invoice files of all different file types is realized, the extraction efficiency and the input efficiency of the invoice information are effectively improved, and the accuracy of the obtained invoice information is ensured.
In some alternative implementations, step S204 includes the steps of:
And if the target file type is a first preset file type, calling a first analysis component corresponding to the first preset file type.
In this embodiment, the first preset file type is a PDF file type. The first analysis component is a pre-constructed analysis module for analyzing and processing invoice information corresponding to the invoice belonging to the PDF file type, and a pdfbox package can be adopted specifically.
And based on the first analysis component, searching out a coordinate range corresponding to the target keyword from the invoice file.
In this embodiment, the target keywords may specifically include all specific chinese keywords corresponding to the business requirements extracted from the invoice information, at least an invoice verification code, an invoice number, and so on. Specifically, the target keywords contained in the invoice file are found out by using the first analysis component, and then the coordinate range of the invoice information is determined according to the positions of the target keywords.
And acquiring the service type of the invoice file, and judging whether a coordinate offset corresponding to the service type exists in a preset cache database.
In this embodiment, the service type refers to a service type represented by an invoice of the invoice file, such as a value-added tax invoice, a general invoice, a special invoice, and the like. Different business types of invoices may have different formats and layouts and therefore require different coordinate offsets. Wherein, the cache database can specifically adopt a Redis database. In the special case that the system cannot accurately analyze the invoice information in the PDF file (for example, the situation that the invoice files in each province and city have some minor differences), the coordinate offset is stored in the Redis database in advance, so that the coordinate offset is used for performing the analysis processing of automatically adjusting the offset of the coordinate range of the invoice information to adapt to the invoice information.
And if the coordinate offset corresponding to the service type does not exist in the cache database, extracting the first invoice content in the coordinate range.
In this embodiment, if it is detected that the cached database does not have the coordinate offset corresponding to the service type, the first invoice content in the coordinate range is directly extracted. Specifically, according to the target keyword corresponding to the coordinate range and the coordinates of the boundary, the invoice content in each rectangle is analyzed according to each rectangle frame formed by the coordinates, so as to obtain the first invoice content.
And preprocessing the first invoice content to obtain a corresponding second invoice content.
In this embodiment, the preprocessing may include noise removal, contrast adjustment, and the like, and the preprocessing is performed on the first invoice content to improve the accuracy of the data of the obtained second invoice content.
And searching invoice information corresponding to the invoice file from the second invoice content based on a preset regular expression.
In this embodiment, if the specific content (the fourth invoice content) obtained by parsing has redundant data, the matched required characters can be extracted by using a regular expression. Specifically, the invoice information such as invoice number, invoice purchasing party, invoice selling party, invoice amount, and tax free amount can be searched in the second invoice content according to the use of the regular expression.
The method comprises the steps of calling a first analysis component corresponding to a first preset file type if the target file type is detected to be the first preset file type, then searching a coordinate range corresponding to a target keyword from an invoice file based on the first analysis component, obtaining a service type of the invoice file, judging whether a coordinate offset corresponding to the service type exists in a preset cache database, extracting first invoice content in the coordinate range if the coordinate offset corresponding to the service type does not exist in the cache database, preprocessing the first invoice content to obtain corresponding second invoice content, and finally searching invoice information corresponding to the invoice file from the second invoice content based on a preset regular expression. When the target file type is detected to be the first preset file type, the first analysis component corresponding to the first preset file type is automatically and intelligently called to search the coordinate range corresponding to the target keyword from the invoice file, then whether the coordinate offset corresponding to the business type of the invoice file exists in the preset cache database is judged, when the coordinate offset corresponding to the first preset file type does not exist in the cache database is determined, the first invoice content in the coordinate range is extracted, the first invoice content is preprocessed to obtain the corresponding second invoice content, and then the invoice information corresponding to the invoice file can be quickly and accurately searched from the second invoice content based on the use of the preset regular expression, so that the extraction efficiency of the invoice information is effectively improved, and the accuracy of the obtained invoice information is ensured.
In some optional implementations of this embodiment, after the step of obtaining the service type of the invoice file and determining whether there is a coordinate offset corresponding to the service type in a preset cache database, step S204 includes the following steps:
And if the coordinate offset corresponding to the service type exists in the cache database, acquiring the coordinate offset from the cache database.
In this embodiment, if it is detected that there is a coordinate offset corresponding to a service type of an invoice file in the cache database, it is determined that a special condition of processing an invoice file currently corresponding to the service type may cause that accurate analysis of invoice information cannot be performed, and then the coordinate offset is obtained from the cache database. The coordinate offset is stored in the cache database in advance, and the offset is automatically adjusted to adapt to invoice information analysis of different invoice files.
And adjusting the coordinate range based on the coordinate offset to obtain a corresponding target coordinate range.
In this embodiment, the coordinate range may be adjusted according to the obtained coordinate offset, so that the adjusted target coordinate range may be adapted to the analysis processing of the invoice information of the invoice file corresponding to the service type.
And extracting the third invoice content in the target coordinate range.
In this embodiment, according to the coordinates of the target keyword and the boundary corresponding to the target coordinate range, the invoice content in each rectangle may be resolved according to each rectangle frame formed by the coordinates, so as to obtain the third invoice content.
And preprocessing the third invoice content to obtain a corresponding fourth invoice content.
In this embodiment, the preprocessing may include noise removal, contrast adjustment, and the like, and the preprocessing is performed on the third invoice content to improve the accuracy of the data of the obtained second invoice content.
And searching invoice information corresponding to the invoice file from the fourth invoice content based on the regular expression.
In this embodiment, the invoice information, such as the invoice number, the invoice buyer, the invoice seller, the invoice amount, and the tax amount, may be searched in the fourth invoice content according to the use of the regular expression.
The method comprises the steps of obtaining a coordinate offset corresponding to a first preset file type from a cache database if the coordinate offset is detected to exist in the cache database, then adjusting the coordinate range based on the coordinate offset to obtain a corresponding target coordinate range, extracting third invoice contents in the target coordinate range, preprocessing the third invoice contents to obtain corresponding fourth invoice contents, and finally searching invoice information corresponding to the invoice file from the fourth invoice contents based on the regular expression. When the coordinate offset corresponding to the service type is detected in the cache database, the coordinate offset is acquired from the cache database, the coordinate range is adjusted based on the coordinate offset, the corresponding target coordinate range is obtained, the third invoice content in the target coordinate range is extracted, the third invoice content is preprocessed, the corresponding fourth invoice content is obtained, and further, based on the use of the preset regular expression, the invoice information corresponding to the invoice file can be quickly and accurately searched out from the fourth invoice content, the extraction efficiency and the extraction accuracy of the invoice information are effectively improved, and the accuracy of the obtained invoice information is ensured.
In some alternative implementations, step S204 includes the steps of:
And if the target file type is a second preset file type, calling a second analysis component corresponding to the second preset file type.
In this embodiment, the second preset file type is an OFD (Open Fixed-layout Document) file type. The second analysis component is a pre-constructed analysis module for analyzing invoice information corresponding to the invoice belonging to the OFD file type, and a ofdreader package can be adopted specifically.
And decompressing the invoice file based on the second parsing component to obtain a corresponding decompressed file.
In this embodiment, the second parsing component may be used to decompress the invoice file to the temporary directory. The temporary directory contains a corresponding decompressed file, and the decompressed file includes an invoice information tag, id data (invoice information identifier) corresponding to the invoice information, and a value (i.e., invoice information) corresponding to the id data. The id data of the invoice information has a corresponding relation with the invoice information label. The invoice information tag may refer to an english tag of invoice information.
And acquiring an invoice information label from the decompressed file, and acquiring an invoice information identifier corresponding to the invoice information label.
In this embodiment, the invoice information tag is obtained from the decompressed file, and then the corresponding id data, that is, the invoice information identifier corresponding to the invoice information tag, is found from the decompressed file according to the invoice information tag.
Performing invoice information searching processing on the decompressed file based on the invoice information identifier to obtain first invoice information corresponding to the invoice information identifier;
In this embodiment, after the invoice information identifier is obtained, the corresponding content of the invoice information identifier (id data) is found from the decompressed file by the invoice information identifier, so that the corresponding first invoice information can be obtained.
For example, the corresponding id data can be obtained according to the english label (InvoiceNo) of the invoice number, and then the specific invoice number content can be resolved according to the id data.
And taking the first invoice information as invoice information corresponding to the invoice file.
In this embodiment, after the extraction processing of the invoice information of the invoice file is completed, the decompressed file may be further deleted, so as to avoid the memory occupation of the useless file on the system, and further improve the usability of the system.
The method comprises the steps of calling a second analysis component corresponding to a second preset file type if the target file type is detected to be the second preset file type, then decompressing the invoice file based on the second analysis component to obtain a corresponding decompressed file, obtaining an invoice information label from the decompressed file, obtaining an invoice information identification corresponding to the invoice information label, subsequently searching the decompressed file based on the invoice information identification to obtain first invoice information corresponding to the invoice information identification, and finally taking the first invoice information as invoice information corresponding to the invoice file. According to the application, when the target file type is detected to be the second preset file type, the second analysis component corresponding to the second preset file type is automatically and intelligently called to decompress the invoice file, so that the corresponding decompressed file is obtained, then the invoice information label is obtained from the decompressed file, the invoice information identifier corresponding to the invoice information label is obtained, and then the decompressed file is subjected to invoice information searching processing based on the invoice information identifier, so that the first invoice information corresponding to the invoice information identifier is obtained and is used as the invoice information of the invoice file, thus the invoice information of the invoice file can be automatically and accurately extracted, the extraction efficiency of the invoice information is effectively improved, and the accuracy of the obtained invoice information is ensured.
In some alternative implementations, step S204 includes the steps of:
And if the target file type is a third preset file type, calling a third analysis component corresponding to the third preset file type.
In this embodiment, the third preset file type is an XML file type. The third parsing component is a pre-built parsing module for parsing invoice information corresponding to an invoice belonging to an XML file type, and may specifically use a dom4j package.
And reading the invoice file based on the third analysis component, and analyzing to obtain the label in the invoice file.
In this embodiment, the third parsing component may be used to read the invoice file and parse the fixed tag, i.e. the tag, in the invoice file.
And extracting second invoice information corresponding to the label based on a preset label path.
In this embodiment, the value corresponding to the tag may be directly obtained according to a predefined tag path (e.g.,/Invoice/InvoiceNo), so as to obtain corresponding second invoice information, such as the number, code, purchase name, purchase tax, sales name, sales tax, and the like of the invoice.
And taking the second invoice information as invoice information corresponding to the invoice file.
The method comprises the steps of calling a third analysis component corresponding to a third preset file type if the target file type is detected to be the third preset file type, reading the invoice file based on the third analysis component, analyzing to obtain a label in the invoice file, extracting second invoice information corresponding to the label based on a preset label path, and taking the second invoice information as invoice information corresponding to the invoice file. When the target file type is detected to be the third preset file type, the third analysis component corresponding to the third preset file type is automatically and intelligently called to read the invoice file, the label in the invoice file is obtained through analysis, and further the second invoice information corresponding to the label is extracted based on the preset label path and used as the invoice information of the invoice file, so that the invoice information of the invoice file can be automatically and accurately extracted, the extraction efficiency of the invoice information is effectively improved, and the accuracy of the obtained invoice information is ensured.
In some alternative implementations of the present embodiment, step S202 includes the steps of:
And acquiring the appointed byte codes of the invoice file.
In this embodiment, the above specified byte code specifically refers to a prefix of an invoice file, or referred to as a file signature. A file signature is a series of specific bytes at the beginning of a file that is used to identify the file type.
And calling a preset identification rule.
In this embodiment, the rule contents of the above identification rule include (1) for the PDF file type (first preset file type), the signature of the PDF file is 25 50 44 46 (i.e.,% PDF), which is the first four bytes of the PDF file. This signature is a fixed feature of the PDF file for quick identification of the PDF file. (2) For an OFD (Open Fixed-layout Document) file type (second preset file type), which is an electronic Document format, ZIP compression is typically based. Thus, the signature of the OFD file is similar to the ZIP file, and also contains additional identification information. (3) For the XML file type (third preset file type), the XML file has no fixed file signature because it is in plain text format. But if the XML file is packaged in some way (e.g., ZIP compressed) it may be determined by checking the signature of the ZIP file (typically 50 b 4b 03 04).
And analyzing and processing the appointed byte codes based on the identification rule to obtain the target file type of the invoice file.
In this embodiment, the analysis processing for the specified byte code may be performed according to the identification steps included in the above-described identification rule, so that the target file type of the invoice file is determined according to the obtained analysis content.
The method comprises the steps of obtaining the appointed byte codes of the invoice file, calling a preset identification rule, and then analyzing and processing the appointed byte codes based on the identification rule to obtain the target file type of the invoice file. According to the method and the device, the appointed byte codes of the invoice file are obtained, and then the appointed byte codes are analyzed and processed based on the use of the identification rule, so that the target file type of the invoice file can be obtained efficiently and accurately, and the accuracy of the obtained target file type is effectively ensured.
In some alternative implementations of the present embodiment, step S206 includes the steps of:
And calling a preset target database.
In this embodiment, the target database is a database in the system for storing invoice information of the extracted invoice file.
And acquiring a preset data transmission mode.
In this embodiment, the selection of the data transmission mode is not specifically limited, and may be selected according to actual service requirements, for example, a data transmission mode corresponding to a wired or wireless communication technology (such as Wi-Fi, bluetooth, loRa, etc.) may be adopted.
And storing the invoice entity class list into the target database based on the data transmission mode.
In this embodiment, before the above invoice entity class list is input to the target database of the system, data verification is further performed on the invoice entity class list, so as to ensure accuracy and integrity of data in the invoice entity class list. Specifically, the method can include checking whether the fields of the invoice entity class list which are filled with characters are filled, verifying whether the formats of the fields of the invoice number, the invoice code and the like meet the regulations, verifying the rationality of the numerical fields of the amount, the tax and the like, and carrying out additional business logic verification, such as checking the consistency of the information of the purchasing party and the selling party.
After the invoice entity class list passes the data verification, the data transmission mode is adopted to store the invoice entity class list into a target database. In particular, the target database relates to one or more database tables, each of which corresponds to an aspect of the invoice (e.g., invoice header, invoice item, etc.). After the invoice entity class list is stored in the target database, the system executes database insertion operation, and invoice information in the invoice entity class list is written into a corresponding table. In addition, the system also supports transaction processing, and the whole input process can be packaged in one transaction so as to ensure the atomicity and consistency of data.
The method comprises the steps of calling a preset target database, then obtaining a preset data transmission mode, and subsequently storing the invoice entity class list into the target database based on the data transmission mode. According to the application, the preset target database is called, the data transmission mode is acquired, and the invoice entity class list is stored in the target database according to the data transmission mode, so that the system input processing of the invoice information contained in the invoice entity class list can be efficiently and accurately completed, and the system input processing is used for subsequent business operation, and the integrity and safety of the invoice information are effectively ensured.
In some alternative implementations, the obtained user information solicits user consent and meets the specifications of the relevant laws and relevant policies.
In addition, the non-native company software tools or components present in the embodiments of the present application are presented by way of example only and are not representative of actual use.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
It should be emphasized that to further ensure the privacy and security of the product transformation data, the product transformation data may also be stored in a blockchain node.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an information input apparatus for an invoice file, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 3, the information input device 300 of an invoice file according to the present embodiment includes a first obtaining module 301, a second obtaining module 302, a third obtaining module 303, an analyzing module 304, a processing module 305, and an input module 306. Wherein:
a first obtaining module 301, configured to obtain an uploaded invoice file;
A second obtaining module 302, configured to obtain a target file type corresponding to the invoice file;
A third obtaining module 303, configured to obtain a target invoice resolution policy corresponding to the target file type;
The parsing module 304 is configured to perform invoice parsing processing on the invoice file based on the target invoice parsing policy, so as to obtain corresponding invoice information;
The processing module 305 is configured to fill the invoice information into a preset invoice entity class to obtain an invoice entity class object, and construct a corresponding invoice entity class list based on the invoice entity class object;
And the input module 306 is used for inputting information into the invoice entity class list.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the method for inputting information of an invoice file in the foregoing embodiment one by one, which is not described herein again.
In some alternative implementations of the present embodiment, the parsing module 304 includes:
The first calling sub-module is used for calling a first analysis component corresponding to a first preset file type if the target file type is the first preset file type;
The first searching sub-module is used for searching a coordinate range corresponding to the target keyword from the invoice file based on the first analyzing component;
the judging sub-module is used for acquiring the service type of the invoice file and judging whether the coordinate offset corresponding to the service type exists in a preset cache database;
The first extraction sub-module is used for extracting first invoice contents in the coordinate range if the coordinate offset corresponding to the service type does not exist in the cache database;
The first preprocessing sub-module is used for preprocessing the first invoice content to obtain corresponding second invoice content;
And the second searching sub-module is used for searching invoice information corresponding to the invoice file from the second invoice content based on a preset regular expression.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the method for inputting information of an invoice file in the foregoing embodiment one by one, which is not described herein again.
In some optional implementations of this embodiment, the parsing module 304 further includes:
the first acquisition sub-module is used for acquiring the coordinate offset corresponding to the service type from the cache database if the coordinate offset exists in the cache database;
the adjustment sub-module is used for adjusting the coordinate range based on the coordinate offset to obtain a corresponding target coordinate range;
The second extraction submodule is used for extracting third invoice contents in the target coordinate range;
The second preprocessing sub-module is used for preprocessing the third invoice content to obtain a corresponding fourth invoice content;
And the third searching sub-module is used for searching invoice information corresponding to the invoice file from the fourth invoice content based on the regular expression.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the method for inputting information of an invoice file in the foregoing embodiment one by one, which is not described herein again.
In some alternative implementations of the present embodiment, the parsing module 304 includes:
the second calling sub-module is used for calling a second analysis component corresponding to a second preset file type if the target file type is the second preset file type;
the decompression sub-module is used for decompressing the invoice file based on the second parsing component to obtain a corresponding decompressed file;
The second acquisition sub-module is used for acquiring an invoice information label from the decompressed file and acquiring an invoice information identifier corresponding to the invoice information label;
The fourth searching sub-module is used for searching the invoice information of the decompressed file based on the invoice information identifier so as to obtain first invoice information corresponding to the invoice information identifier;
And the first determination submodule is used for taking the first invoice information as invoice information corresponding to the invoice file.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the method for inputting information of an invoice file in the foregoing embodiment one by one, which is not described herein again.
In some alternative implementations of the present embodiment, the parsing module 304 includes:
the third calling sub-module is used for calling a third analysis component corresponding to a third preset file type if the target file type is the third preset file type;
The analysis sub-module is used for reading the invoice file based on the third analysis component and analyzing to obtain a label in the invoice file;
The third extraction sub-module is used for extracting second invoice information corresponding to the label based on a preset label path;
And the second determining submodule is used for taking the second invoice information as invoice information corresponding to the invoice file.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the method for inputting information of an invoice file in the foregoing embodiment one by one, which is not described herein again.
In some optional implementations of this embodiment, the second obtaining module 302 includes:
the third acquisition sub-module is used for acquiring the appointed byte codes of the invoice file;
a fourth calling sub-module for calling a preset recognition rule;
And the analysis sub-module is used for analyzing and processing the appointed byte codes based on the identification rule so as to obtain the target file type of the invoice file.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the method for inputting information of an invoice file in the foregoing embodiment one by one, which is not described herein again.
In some alternative implementations of the present embodiment, the logging module 306 includes:
a fifth calling sub-module for calling a preset target database;
A fourth obtaining sub-module, configured to obtain a preset data transmission mode;
And the presence sub-module is used for storing the invoice entity class list into the target database based on the data transmission mode.
In this embodiment, the operations performed by the modules or units respectively correspond to the steps of the method for inputting information of an invoice file in the foregoing embodiment one by one, which is not described herein again.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is generally used to store an operating system installed on the computer device 4 and various application software, such as computer readable instructions of an information entry method of an invoice file. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing an information entry method of the invoice file.
The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.
Compared with the prior art, the embodiment of the application has the following main beneficial effects:
According to the method and the device for processing the invoice, the uploaded invoice file is obtained, the target file type corresponding to the invoice file is obtained, then the target invoice analysis strategy corresponding to the target file type is obtained, the invoice file is subjected to invoice analysis processing based on the target invoice analysis strategy to obtain corresponding invoice information, the invoice information is filled into preset invoice entity types to obtain invoice entity type objects, a corresponding invoice entity type list is built based on the invoice entity type objects, and finally information input processing is performed on the invoice entity type list. According to the method and the device, the corresponding invoice information analysis processing of the invoice files of different file types can be intelligently completed based on the invoice analysis strategy, so that the intelligent management of the information input of the invoice files of all different file types is realized, the extraction efficiency and the input efficiency of the invoice information are effectively improved, and the accuracy of the obtained invoice information is ensured.
The present application also provides another embodiment, namely, a computer readable storage medium, where computer readable instructions are stored, where the computer readable instructions are executable by at least one processor, so that the at least one processor performs the steps of the method for entering information of an invoice file as described above.
Compared with the prior art, the embodiment of the application has the following main beneficial effects:
According to the method and the device for processing the invoice, the uploaded invoice file is obtained, the target file type corresponding to the invoice file is obtained, then the target invoice analysis strategy corresponding to the target file type is obtained, the invoice file is subjected to invoice analysis processing based on the target invoice analysis strategy to obtain corresponding invoice information, the invoice information is filled into preset invoice entity types to obtain invoice entity type objects, a corresponding invoice entity type list is built based on the invoice entity type objects, and finally information input processing is performed on the invoice entity type list. According to the method and the device, the corresponding invoice information analysis processing of the invoice files of different file types can be intelligently completed based on the invoice analysis strategy, so that the intelligent management of the information input of the invoice files of all different file types is realized, the extraction efficiency and the input efficiency of the invoice information are effectively improved, and the accuracy of the obtained invoice information is ensured.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.