[go: up one dir, main page]

CN116402056A - Document information processing method and device and electronic equipment - Google Patents

Document information processing method and device and electronic equipment Download PDF

Info

Publication number
CN116402056A
CN116402056A CN202211538157.0A CN202211538157A CN116402056A CN 116402056 A CN116402056 A CN 116402056A CN 202211538157 A CN202211538157 A CN 202211538157A CN 116402056 A CN116402056 A CN 116402056A
Authority
CN
China
Prior art keywords
target
information
sentence information
text
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211538157.0A
Other languages
Chinese (zh)
Inventor
翁兆琦
裴凯洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211538157.0A priority Critical patent/CN116402056A/en
Publication of CN116402056A publication Critical patent/CN116402056A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a document information processing method and device and electronic equipment, and relates to the technical field of artificial intelligence. Wherein the method comprises the following steps: acquiring text information of a target document, wherein the text information characterizes approval information generated in the credit business approval process; extracting key sentence information corresponding to the requirement of a target object in the text information based on the target matching model to obtain target sentence information; classifying the target sentence information according to the context of at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information; and determining the target category corresponding to the target sentence information from the plurality of classification categories according to the probability value. The invention solves the technical problems of low recognition accuracy in the prior art that the neural network model is adopted to carry out semantic recognition on the long text in the document information.

Description

Document information processing method and device and electronic equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a document information processing method and device and electronic equipment.
Background
The credit approval is an important work of the management and management of financial institutions such as banks, and is the last gateway for comprehensively judging the credit state of credit clients and comprehensively controlling the credit asset risks. The traditional credit approval process mainly comprises three aspects of pre-credit investigation, in-credit examination and post-credit management, and the approval process of each credit business can be embodied in one credit approval book.
In recent years, with the increasing growth of data assets, the number of credit approval books has also increased. Along with the rapid development of artificial intelligence technology, machine learning and deep learning technologies are gradually utilized in related technologies to locate information focused by users so as to replace the traditional mode of manually reading credit approval books to obtain semantic information.
However, only the semantics of part of key information in a long text can be identified by the semantic identification through the machine learning technology, and the semantic identification effect depends on the processing degree of data and the quality of feature engineering; the semantic recognition is carried out by the deep learning technology, namely the whole long text is used as the input of the neural network, the obtained semantic recognition effect is limited, and the problem of low recognition accuracy exists.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides a method, a device and electronic equipment for processing document information, which at least solve the technical problems of low recognition accuracy in the prior art that a neural network model is adopted to carry out semantic recognition on long texts in the document information.
According to an aspect of an embodiment of the present invention, there is provided a document information processing method including: acquiring text information of a target document, wherein the text information characterizes approval information generated in the credit business approval process; extracting key sentence information corresponding to the requirement of a target object in text information based on a target matching model to obtain target sentence information, wherein the target matching model is constructed according to a plurality of matching rules, the plurality of matching rules at least comprise a first matching rule and a second matching rule, and the target sentence information is composed of at least one short text sentence information; classifying the target sentence information according to the context of at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information; and determining a target category corresponding to the target sentence information from the plurality of classification categories according to the probability value, wherein the target category corresponds to a target approval process in the credit business approval process.
Further, the processing method of the document information further comprises the following steps: determining a target parser corresponding to a document type of the target document from the at least one parser; based on a target analyzer, extracting information from the text information according to a first matching rule to obtain first sentence information; and extracting information from the first statement information according to a second matching rule based on the target analyzer to obtain target statement information, wherein the first matching rule and the second matching rule are rules of a parent-child hierarchical structure.
Further, the processing method of the document information further comprises the following steps: determining a plurality of word vectors corresponding to the target sentence information according to the context of at least one short text sentence information, wherein each word vector corresponds to each word in the target sentence information; carrying out average calculation on the word vectors to obtain a target vector corresponding to the target sentence information; and inputting the target vector into a target classifier for classification processing, and outputting a probability value.
Further, the processing method of the document information further comprises the following steps: and determining the classification category corresponding to the maximum probability value in the plurality of classification categories as the target category.
Further, the processing method of the document information further comprises the following steps: constructing a first regular expression for a first text region of a target document to obtain a first matching rule, wherein the first text region is a text region focused by a target object; and constructing a second regular expression for a target area of the first text area to obtain a second matching rule, wherein the target area is a text area containing target sentence information.
Further, the processing method of the document information further comprises the following steps: a plurality of classification categories are configured based on the requirements of the target object.
Further, the processing method of the document information further comprises the following steps: and screening the target sentence information from the target document based on the target category.
According to another aspect of the embodiment of the present invention, there is also provided a processing apparatus for document information, including: the acquisition module is used for acquiring text information of the target document, wherein the text information represents approval information generated in the credit business approval process; the first processing module is used for extracting key sentence information corresponding to the requirement of a target object in text information based on a target matching model to obtain target sentence information, wherein the target matching model is constructed according to a plurality of matching rules, the plurality of matching rules at least comprise a first matching rule and a second matching rule, and the target sentence information is composed of at least one short text sentence information; the second processing module is used for classifying the target sentence information according to the context of at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information; and the determining module is used for determining a target category corresponding to the target statement information from the plurality of classification categories according to the probability value, wherein the target category corresponds to a target approval process in the credit business approval process.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the above-described document information processing method at run-time.
According to another aspect of an embodiment of the present invention, there is also provided an electronic device including one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the programs, wherein the programs are configured to perform the above-described document information processing method when run.
In the embodiment of the invention, the text information of the target document is firstly obtained by extracting target sentence information through a target matching model and classifying the target sentence information, then the key sentence information corresponding to the requirement of the target object in the text information is extracted based on the target matching model to obtain the target sentence information, then the target sentence information is classified according to the context of at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information, and then the target category corresponding to the target sentence information is determined from the plurality of classification categories according to the probability values. The target category corresponds to a target approval process in a credit business approval process, text information characterizes approval information generated in the credit business approval process, a target matching model is constructed according to a plurality of matching rules, the plurality of matching rules at least comprise a first matching rule and a second matching rule, and target sentence information is composed of at least one short text sentence information.
In the process, a data basis is provided for the subsequent extraction of key sentence information by acquiring text information of a target document; the key sentence information focused by the target object can be extracted through the target matching model, so that the efficient identification and the rapid extraction of the target sentence information are realized; the target sentence information is classified according to the context of at least one short text sentence information, and compared with the semantic recognition of a long text in the prior art, the accuracy is higher, and the semantic recognition effect is remarkably improved. And the workload of manually reading the target document is reduced, so that the labor cost and the time cost are saved, and the processing efficiency of the target document is further improved.
Therefore, the technical scheme of the invention achieves the aim of improving the processing efficiency of the target document, thereby realizing the technical effect of improving the semantic recognition accuracy of the key sentence information focused on the target object, and further solving the technical problem of low recognition accuracy in the prior art when the neural network model is adopted to carry out semantic recognition on long texts in the document information.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow chart of an alternative method of processing document information according to an embodiment of the invention;
FIG. 2 is a flow chart of semantic analysis of alternative document information according to an embodiment of the present invention;
FIG. 3 is a schematic illustration of an alternative credit approval book according to an embodiment of the invention;
FIG. 4 is a flow chart of an alternative key information extraction according to an embodiment of the invention;
FIG. 5 is a flow chart of an alternative critical information semantic identification according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an alternative document information processing apparatus according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an alternative electronic device according to an embodiment of the invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, the related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present invention are information and data authorized by the user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
Example 1
According to an embodiment of the present invention, there is provided a method embodiment of a document information processing method, it should be noted that the steps shown in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that shown or described herein.
FIG. 1 is a flowchart of an alternative document information processing method according to an embodiment of the present invention, as shown in FIG. 1, the method comprising the steps of:
step S101, obtaining text information of a target document, wherein the text information characterizes approval information generated in a credit business approval process.
In the above steps, the text information of the target document may be acquired by an application system, a processor, an electronic device, or the like. Alternatively, the target document may be a credit approval book of a financial institution, which is a document that records a series of processes of a credit business from a pre-credit investigation, a credit variety examination, post-credit management, and the like. Optionally, FIG. 3 is a schematic diagram of an alternative credit approval book according to an embodiment of the invention, as shown in FIG. 3, in which approval comments, remarks, post-credit management requirements are written, etc. of a credit customer XXX. Alternatively, the text information may be information recorded in a credit approval book.
Step S102, extracting key sentence information corresponding to the requirement of a target object in text information based on a target matching model to obtain target sentence information, wherein the target matching model is constructed according to a plurality of matching rules, the plurality of matching rules at least comprise a first matching rule and a second matching rule, and the target sentence information is composed of at least one short text sentence information.
In the above step, the target matching model may be a rule model constructed according to a plurality of matching rules, the key sentence information corresponding to the requirement of the target object may be sentence information focused on by the user, for example, the auditor compares related sentence information in the precondition part in the focused credit approval, and the target sentence information may be sentence information related to the qualification part. Optionally, the plurality of matching rules may be a rule template set T configured by the user according to the focus of attention of the user, for example, the rule template set T is composed of rules T1, T2, T3..tn, if the user is more interested in the preconditions in the credit approval book, rule T1 may be configured, if the user is more interested in the post-credit management requirements in the credit approval book, rule T2 may be configured, and so on.
Specifically, a rule model is used to extract key statement information focused by a user in a credit approval book. The user can configure corresponding rule templates according to the focused key points, rule matching is carried out on the whole credit approval book through the rule model by utilizing the regular expressions corresponding to the rule templates, and the matched result is focused key statement information of the user.
Step S103, classifying the target sentence information according to the context of at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information.
In the above steps, the classification processing of the target sentence information according to the context of at least one short text sentence information may be implemented by a text classification algorithm, for example, a Fasttext algorithm, which returns probability values of a plurality of classification categories corresponding to the target sentence information, which are the probabilities that the input short text belongs to different category semantic tags. The short text sentence information is subjected to multi-semantic label classification through the Fasttext algorithm, and compared with the neural network algorithm adopted in the prior art, which can only carry out semantic recognition on long text in the document information, the method is simpler and more efficient, the recognition accuracy is higher, pre-training is not needed, and the recognition efficiency is improved.
Step S104, determining a target category corresponding to the target sentence information from a plurality of classification categories according to the probability value, wherein the target category corresponds to a target approval process in the credit business approval process.
In the above steps, the plurality of classification categories may be configured based on the needs of the target object, e.g., underwriting, procedures, government support, owned funds, other, account administration, trusted payment, etc. The target category may be a category corresponding to target statement information, for example, the target statement information is "guarantee procedure before loan issue.
Based on the scheme defined in the steps S101 to S104, it can be known that in the embodiment of the present invention, a mode of extracting target sentence information through a target matching model and classifying the target sentence information is adopted, text information of a target document is firstly obtained, then key sentence information corresponding to a requirement of a target object in the text information is extracted based on the target matching model to obtain target sentence information, then classifying the target sentence information according to a context of at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information, and then determining a target category corresponding to the target sentence information from the plurality of classification categories according to the probability values. The target category corresponds to a target approval process in a credit business approval process, text information characterizes approval information generated in the credit business approval process, a target matching model is constructed according to a plurality of matching rules, the plurality of matching rules at least comprise a first matching rule and a second matching rule, and target sentence information is composed of at least one short text sentence information.
It is easy to notice that in the above process, by acquiring text information of the target document, a data basis is provided for the subsequent extraction of key sentence information; the key sentence information focused by the target object can be extracted through the target matching model, so that the efficient identification and the rapid extraction of the target sentence information are realized; the target sentence information is classified according to the context of at least one short text sentence information, and compared with the semantic recognition of a long text in the prior art, the accuracy is higher, and the semantic recognition effect is remarkably improved. And the workload of manually reading the target document is reduced, so that the labor cost and the time cost are saved, and the processing efficiency of the target document is further improved.
Therefore, the technical scheme of the invention achieves the aim of improving the processing efficiency of the target document, thereby realizing the technical effect of improving the semantic recognition accuracy of the key sentence information focused on the target object, and further solving the technical problem of low recognition accuracy in the prior art when the neural network model is adopted to carry out semantic recognition on long texts in the document information.
Optionally, fig. 2 is a flowchart of semantic analysis of optional document information according to an embodiment of the present invention, as shown in fig. 2, a user (for example, an auditor) first performs user focus extraction on a credit approval book to obtain focus statement information R, and then performs focus information semantic recognition on the focus statement information R to obtain a recognition result. Specifically, the rule model is used for positioning key statement information in the credit approval book, and then the Fasttext algorithm is used for classifying the key statement information, so that the corresponding semantic tag is identified.
In an alternative embodiment, the target matching model includes at least one parser, in the process of extracting key sentence information corresponding to the requirement of the target object in the text information based on the target matching model to obtain target sentence information, the target parser corresponding to the document type of the target document is first determined from the at least one parser, then the text information is extracted according to a first matching rule based on the target parser to obtain first sentence information, and then the first sentence information is extracted according to a second matching rule based on the target parser to obtain target sentence information, wherein the first matching rule and the second matching rule are rules of a father-son hierarchy structure.
Optionally, the at least one parser corresponds to different document types, the document types at least include a. Doc type, a. Txt type, an. Html type, a. Pdf type, a. Jpg type, i.e. the object matching model includes parsers corresponding to a. Doc type, a. Txt type, an. Html type, a. Pdf type, a. Jpg type, respectively. Optionally, the first matching rule may be a matching rule corresponding to a parent layer extraction in the target matching model, and the second matching rule may be a matching rule corresponding to a child layer extraction in the target matching model.
Optionally, fig. 4 is a flowchart of an optional key information extraction according to an embodiment of the present invention, and as shown in fig. 4, the target matching model, that is, the rule model, includes at least one parser, parent layer extraction, and child layer extraction. The credit approval book C is firstly transmitted into a rule model as input, then the corresponding parser is selected according to the document type (doc, txt and the like) of the credit approval book C, and the text information of the credit approval book C is read. For example, when the document type of credit approval book C is of the. Doc type, the target resolver is a resolver of the. Doc type.
Further, the target parser uses a regular matching algorithm to match the rules in the credit approval book C according to the first matching rule and the second matching rule according to a plurality of rules (for example, the first matching rule and the second matching rule) in the rule template set T configured by the user, so as to obtain key sentence information R, namely target sentence information. Specifically, the parent layer extracts, namely, firstly configures a fuzzy rule template Ta which can accord with a large number of credit approval book specifications, namely, a first matching rule, and matches the credit approval book by using a regular matching algorithm to obtain text information comprising user attention points, namely, first statement information. And then sub-layer extraction is carried out on the extraction result of the father layer, namely the first statement information, and the first statement information is matched by using a regular matching algorithm according to a rule template Tb configured by a user, namely a second matching rule, so that the obtained matching result is accurate statement information of interest of the user, namely target statement information, which does not contain other semantic information.
It should be noted that, the user may choose whether to adopt the parent-child layer matching mode according to the requirement, that is, the user may only use the regular matching algorithm to match the key statement information in the credit approval book once, which is not limited herein. In this embodiment, the focus of the user in the credit approval book is precisely hit, and the rules of the parent-child hierarchy are described.
In an alternative embodiment, in the process of classifying the target sentence information according to the context of at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information, determining a plurality of word vectors corresponding to the target sentence information according to the context of at least one short text sentence information, wherein each word vector corresponds to each word in the target sentence information, then performing average calculation on the plurality of word vectors to obtain a target vector corresponding to the target sentence information, and then inputting the target vector into a target classifier to perform classification processing to output the probability values.
Optionally, the target sentence information may be a sentence formed by at least one short text sentence in the credit approval book, each short text sentence is converted into a plurality of word vectors through a Fasttext algorithm, and then the plurality of word vectors are subjected to average calculation to obtain a target vector corresponding to the target sentence information, that is, the target sentence information is represented by the target vector.
Optionally, fig. 5 is a flowchart of an optional semantic recognition of key information according to an embodiment of the present invention, as shown in fig. 5, key sentence information R obtained in the foregoing process, that is, target sentence information, is input into a Fasttext algorithm, and a plurality of corresponding word vectors W1, W2, W3 … Wn are calculated according to N-gram features of at least one short text sentence information constituting the target sentence information by the Fasttext algorithm, and the plurality of word vectors are averaged to obtain a target vector corresponding to the target sentence information. Further, the target vector is input into a linear classifier of a Fasttext algorithm hidden layer, namely a target classifier, and classification processing is carried out by using a hierarchical softmax function, namely the probability that the target sentence information belongs to each classification category is calculated.
In an optional embodiment, in determining, according to the probability value, a target category corresponding to the target sentence information from the plurality of classification categories, a classification category corresponding to a maximum probability value from the plurality of classification categories is determined as the target category.
In an optional embodiment, before classifying the target sentence information according to the context of at least one short text sentence information to obtain probability values of multiple classification categories corresponding to the target sentence information, the multiple classification categories are configured based on requirements of the target object.
For example, a plurality of classification categories such as guarantee, procedure, own funds, other, account supervision, trusted payment and the like are configured in the target classifier according to the user demand, the target vector is input into the target classifier for classification processing, the probability value of the output target vector belonging to the guarantee category is 0.9, the probability value of the output target vector belonging to the account supervision category is 0.1, and the target category corresponding to the target sentence information can be determined to be the guarantee category.
In an alternative embodiment, before extracting key sentence information corresponding to the requirement of a target object in text information based on a target matching model to obtain target sentence information, a first regular expression is constructed for a first text area of a target document to obtain a first matching rule, wherein the first text area is a text area focused by the target object, and then a second regular expression is constructed for a target area of the first text area to obtain a second matching rule, and the target area is a text area containing the target sentence information.
Optionally, the first text area is a text area focused on by the target object, for example, the user focuses on a precondition part in the credit approval book, and the first text area may be a text area corresponding to the precondition part as shown in fig. 3, and a first regular expression is constructed on the first text area of the target document to obtain a first matching rule.
Optionally, the target area is a text area containing target sentence information, for example, the target area is the first 3 lines of the text area corresponding to the precondition part, and a second regular expression is constructed on the target area to obtain a second matching rule.
Alternatively, in order to better understand the above-described embodiments, a specific example will be described below. For example, the content of a credit approval book is: approval opinion: ... Xxx. Precondition (one): ... Xxx. 1. Guarantee procedures of loan issuing before. 2. Legal, foot value, effective. 3. Self-funding. (II) post-loan management requirements: ... Xxx. ", assuming that the user compares the relevant content of the attention precondition part, the rule templates are configured as follows:
the parent layer extracts a rule template Ta:
t1: (;
t2: (;
t3: (.
Sub-layer extraction rule template Tb:
t1: (;
T2:(?<=1\.)[\s\S\n\r]*(?=2\.);
T3:(?<=2\.)[\s\S\n\r]*(?=3\.)。
the rule model is used for respectively matching text sentences in the credit approval book with each template, and the corresponding sentences in the credit approval book are extracted when the text sentences are matched with the templates. Alternatively, the first regular expression may be T1: (: (.
Specifically, rule matching is performed on the content of the credit approval book through a rule model, namely a target matching model, and the first sentence information can be extracted through a first matching rule, namely T3 in a parent layer extraction rule template, wherein the first sentence information is as follows: and 1. Guarantee procedure of loan issuing before. 2. Legal, foot value, effective. 3. Self-funding. ". Further, through the second matching rule, namely the T2 and T3 in the sub-layer extraction rule template, the target sentence information can be extracted as follows: a guarantee procedure of "before loan release. Legal, foot value, effective. ". Namely, the key sentence information R, i.e., the target sentence information, output by the rule model is: a guarantee procedure of "before loan release. Legal, foot value, effective. ".
Further, the key statement information R obtained in the above process, i.e., target statement information, is input into the Fasttext algorithm, and the output recognition result is: guarantee procedures of loan issuing before. The probability of belonging to the vouch-for category is 0.9 and the probability value of belonging to the account supervision category is 0.1. Legal, foot value, effective. The probability of belonging to the vouch-for category is 0.9 and the probability value of belonging to the account supervision category is 0.1.
Further, the target category corresponding to the target sentence information may be determined from the plurality of classification categories according to the probability value, that is: guarantee procedures of loan issuing before. The corresponding target category is a vouch-for category. Legal, foot value, effective. The corresponding target category is a vouch-for category.
In the above process, the user (e.g., auditor) only needs to pay attention to the configuration of the rule templates, and does not need to pay additional attention to other problems, so that the workload of the user is greatly reduced. By using the rule model, accurate positioning of the user attention information is realized, so that the key statement information of the user attention can be extracted; by using the FastText algorithm, the key sentence information can be accurately classified, so that the recognition efficiency is improved, and the processing efficiency of credit approval books is further improved.
In an alternative embodiment, after determining a target category corresponding to the target sentence information from the plurality of classification categories according to the probability value, the target sentence information is screened from the target document based on the target category.
Optionally, after classifying the sentences in the target document, the user (for example, an auditor) does not need to check the whole credit approval book, and only needs to check the focused text of the user according to the classification type to carry out approval processing, so that the time cost is saved and the working efficiency of the user is improved.
Therefore, the technical scheme of the invention achieves the aim of improving the processing efficiency of the target document, thereby realizing the technical effect of improving the semantic recognition accuracy of the key sentence information focused on the target object, and further solving the technical problem of low recognition accuracy in the prior art when the neural network model is adopted to carry out semantic recognition on long texts in the document information.
Example 2
According to an embodiment of the present invention, there is provided an embodiment of a processing apparatus for document information, wherein fig. 6 is a schematic diagram of an alternative processing apparatus for document information according to an embodiment of the present invention, as shown in fig. 6, the apparatus including: the acquiring module 601 is configured to acquire text information of a target document, where the text information characterizes approval information generated in a credit business approval process; the first processing module 602 is configured to extract, based on a target matching model, key sentence information corresponding to a requirement of a target object in text information, to obtain target sentence information, where the target matching model is constructed according to a plurality of matching rules, the plurality of matching rules at least includes a first matching rule and a second matching rule, and the target sentence information is formed by at least one short text sentence information; a second processing module 603, configured to perform classification processing on the target sentence information according to the context of at least one short text sentence information, so as to obtain probability values of multiple classification categories corresponding to the target sentence information; the determining module 604 is configured to determine, according to the probability value, a target category corresponding to the target sentence information from the plurality of classification categories, where the target category corresponds to a target approval process in the credit business approval process.
It should be noted that the above-mentioned obtaining module 601, the first processing module 602, the second processing module 603, and the determining module 604 correspond to steps S101 to S104 in the above-mentioned embodiment, and the four modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the above-mentioned embodiment 1.
Optionally, the first processing module includes: a first determining module for determining a target parser corresponding to a document type of the target document from the at least one parser; the third processing module is used for extracting the text information according to a first matching rule based on the target analyzer to obtain first statement information; and the fourth processing module is used for extracting the information of the first statement information according to the second matching rule based on the target analyzer to obtain the target statement information, wherein the first matching rule and the second matching rule are rules of a parent-child hierarchical structure.
Optionally, the second processing module includes: the second determining module is used for determining a plurality of word vectors corresponding to the target sentence information according to the context of at least one short text sentence information, wherein each word vector corresponds to each word in the target sentence information; the computing module is used for carrying out average computation on the word vectors to obtain a target vector corresponding to the target sentence information; and inputting the target vector into a target classifier for classification processing, and outputting a probability value.
Optionally, the determining module includes: and the third determining module is used for determining the classification category corresponding to the maximum probability value in the plurality of classification categories as the target category.
Optionally, the document information processing apparatus further includes: the first configuration module is used for constructing a first regular expression for a first text region of the target document to obtain a first matching rule, wherein the first text region is a text region focused by the target object; the second configuration module is used for constructing a second regular expression for a target area of the first text area to obtain a second matching rule, wherein the target area is a text area containing target sentence information.
Optionally, the document information processing apparatus further includes: and the third configuration module is used for configuring a plurality of classification categories based on the requirements of the target object.
Optionally, the document information processing apparatus further includes: and the screening module is used for screening the target sentence information from the target document based on the target category.
Example 3
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the above-described document information processing method at run-time.
Example 4
According to another aspect of an embodiment of the present invention, there is also provided an electronic device, wherein fig. 7 is a schematic diagram of an alternative electronic device according to an embodiment of the present invention, as shown in fig. 7, the electronic device including one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the programs, wherein the programs are configured to perform the above-described document information processing method when run. The processor when executing the program implements the following steps: acquiring text information of a target document, wherein the text information characterizes approval information generated in the credit business approval process; extracting key sentence information corresponding to the requirement of a target object in text information based on a target matching model to obtain target sentence information, wherein the target matching model is constructed according to a plurality of matching rules, the plurality of matching rules at least comprise a first matching rule and a second matching rule, and the target sentence information is composed of at least one short text sentence information; classifying the target sentence information according to the context of at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information; and determining a target category corresponding to the target sentence information from the plurality of classification categories according to the probability value, wherein the target category corresponds to a target approval process in the credit business approval process.
Optionally, the processor further implements the following steps when executing the program: determining a target parser corresponding to a document type of the target document from the at least one parser; based on a target analyzer, extracting information from the text information according to a first matching rule to obtain first sentence information; and extracting information from the first statement information according to a second matching rule based on the target analyzer to obtain target statement information, wherein the first matching rule and the second matching rule are rules of a parent-child hierarchical structure.
Optionally, the processor further implements the following steps when executing the program: determining a plurality of word vectors corresponding to the target sentence information according to the context of at least one short text sentence information, wherein each word vector corresponds to each word in the target sentence information; carrying out average calculation on the word vectors to obtain a target vector corresponding to the target sentence information; and inputting the target vector into a target classifier for classification processing, and outputting a probability value.
Optionally, the processor further implements the following steps when executing the program: and determining the classification category corresponding to the maximum probability value in the plurality of classification categories as the target category.
Optionally, the processor further implements the following steps when executing the program: constructing a first regular expression for a first text region of a target document to obtain a first matching rule, wherein the first text region is a text region focused by a target object; and constructing a second regular expression for a target area of the first text area to obtain a second matching rule, wherein the target area is a text area containing target sentence information.
Optionally, the processor further implements the following steps when executing the program: a plurality of classification categories are configured based on the requirements of the target object.
Optionally, the processor further implements the following steps when executing the program: and screening the target sentence information from the target document based on the target category.
The device herein may be a server, PC, PAD, cell phone, etc.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (10)

1. A method of processing document information, comprising:
acquiring text information of a target document, wherein the text information characterizes approval information generated in a credit business approval process;
extracting key sentence information corresponding to the requirement of a target object in the text information based on a target matching model to obtain target sentence information, wherein the target matching model is constructed according to a plurality of matching rules, the plurality of matching rules at least comprise a first matching rule and a second matching rule, and the target sentence information is composed of at least one short text sentence information;
classifying the target sentence information according to the context of the at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information;
and determining a target category corresponding to the target sentence information from the plurality of classification categories according to the probability value, wherein the target category corresponds to a target approval process in the credit business approval process.
2. The method according to claim 1, wherein the target matching model includes at least one parser, and extracting key sentence information corresponding to a requirement of a target object from the text information based on the target matching model to obtain target sentence information includes:
determining a target parser corresponding to a document type of the target document from the at least one parser;
based on the target analyzer, extracting information from the text information according to the first matching rule to obtain first sentence information;
and based on the target analyzer, extracting information from the first statement information according to the second matching rule to obtain the target statement information, wherein the first matching rule and the second matching rule are rules of a parent-child hierarchical structure.
3. The method of claim 1, wherein classifying the target sentence information according to the context of the at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information, comprises:
determining a plurality of word vectors corresponding to the target sentence information according to the context of the at least one short text sentence information, wherein each word vector corresponds to each word in the target sentence information;
Carrying out average calculation on the word vectors to obtain a target vector corresponding to the target sentence information;
and inputting the target vector into a target classifier for classification processing, and outputting the probability value.
4. The method of claim 3, wherein determining a target category corresponding to the target sentence information from the plurality of classification categories according to the probability value comprises:
and determining the classification category corresponding to the maximum probability value in the classification categories as the target category.
5. The method according to claim 1, wherein before extracting key sentence information corresponding to a requirement of a target object in the text information based on a target matching model, the method further comprises:
constructing a first regular expression for a first text region of the target document to obtain the first matching rule, wherein the first text region is a text region focused by the target object;
and constructing a second regular expression for a target area of the first text area to obtain the second matching rule, wherein the target area is a text area containing the target sentence information.
6. The method of claim 1, wherein before classifying the target sentence information according to the context of the at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information, the method further comprises:
the plurality of classification categories are configured based on the requirements of the target object.
7. The method of claim 1, wherein after determining a target category corresponding to the target sentence information from the plurality of classification categories according to the probability value, the method further comprises:
and screening the target sentence information from the target document based on the target category.
8. A document information processing apparatus, comprising:
the acquisition module is used for acquiring text information of the target document, wherein the text information represents approval information generated in the credit business approval process;
the first processing module is used for extracting key sentence information corresponding to the requirement of a target object in the text information based on a target matching model to obtain target sentence information, wherein the target matching model is constructed according to a plurality of matching rules, the plurality of matching rules at least comprise a first matching rule and a second matching rule, and the target sentence information is composed of at least one short text sentence information;
The second processing module is used for classifying the target sentence information according to the context of the at least one short text sentence information to obtain probability values of a plurality of classification categories corresponding to the target sentence information;
and the determining module is used for determining a target category corresponding to the target statement information from the plurality of classification categories according to the probability value, wherein the target category corresponds to a target approval process in the credit business approval process.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program, wherein the computer program is arranged to execute the method of processing document information as claimed in any one of claims 1 to 7 at run-time.
10. An electronic device, the electronic device comprising one or more processors; a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is arranged to perform the method of processing document information as claimed in any one of claims 1 to 7 when run.
CN202211538157.0A 2022-12-02 2022-12-02 Document information processing method and device and electronic equipment Pending CN116402056A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211538157.0A CN116402056A (en) 2022-12-02 2022-12-02 Document information processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211538157.0A CN116402056A (en) 2022-12-02 2022-12-02 Document information processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116402056A true CN116402056A (en) 2023-07-07

Family

ID=87009159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211538157.0A Pending CN116402056A (en) 2022-12-02 2022-12-02 Document information processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116402056A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118587718A (en) * 2024-08-06 2024-09-03 浙商银行股份有限公司 An automated post-loan management method and device based on deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118587718A (en) * 2024-08-06 2024-09-03 浙商银行股份有限公司 An automated post-loan management method and device based on deep learning

Similar Documents

Publication Publication Date Title
US20230005075A1 (en) Ai-augmented auditing platform including techniques for automated assessment of vouching evidence
CN110163478B (en) Risk examination method and device for contract clauses
CN110909226B (en) Financial document information processing method, device, electronic device and storage medium
CN112613501A (en) Information auditing classification model construction method and information auditing method
Liang et al. Analyzing credit risk among Chinese P2P-lending businesses by integrating text-related soft information
WO2025152763A1 (en) Intelligent document recognition method and apparatus, electronic device, and storage medium
US11899727B2 (en) Document digitization, transformation and validation
US11880394B2 (en) System and method for machine learning architecture for interdependence detection
Minhas et al. From spin to swindle: identifying falsification in financial text
CN118172785A (en) Document information extraction method, apparatus, device, storage medium, and program product
Toprak et al. Enhanced Named Entity Recognition algorithm for financial document verification: A. Toprak, M. Turan
CN117522485A (en) Advertisement recommendation method, device, equipment and computer readable storage medium
Limam et al. Information extraction from multi-layout invoice images using FATURA dataset
CN116402056A (en) Document information processing method and device and electronic equipment
CN116189215A (en) Automatic auditing method and device, electronic equipment and storage medium
Owda et al. Financial discussion boards irregularities detection system (fdbs-ids) using information extraction
US8768941B2 (en) Document data processing device
Sim et al. Detecting Voice Phishing with Precision: Fine-Tuning Small Language Models
CN118520853A (en) Analysis report generation method and device, storage medium and electronic equipment
CN114580398B (en) Text information extraction model generation method, text information extraction method and device
KR20230169538A (en) Apparatus and method for analysis of transaction brief data using corpus for machine learning based on financial mydata and computer program for the same
Aydogdu et al. Using long short‐term memory neural networks to analyze SEC 13D filings: A recipe for human and machine interaction
US12437151B2 (en) Information processing device, information processing system, and information processing method
US12541545B1 (en) Generation of benchmarking datasets for summarization
US20250238607A1 (en) Document generation rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination