CN112270224B - Insurance liability analysis method, device and computer-readable storage medium - Google Patents
Insurance liability analysis method, device and computer-readable storage medium Download PDFInfo
- Publication number
- CN112270224B CN112270224B CN202011101652.6A CN202011101652A CN112270224B CN 112270224 B CN112270224 B CN 112270224B CN 202011101652 A CN202011101652 A CN 202011101652A CN 112270224 B CN112270224 B CN 112270224B
- Authority
- CN
- China
- Prior art keywords
- insurance
- policy
- target
- standard
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Software Systems (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Medical Informatics (AREA)
- Development Economics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses an insurance responsibility analysis method. The insurance responsibility analysis method comprises the steps of obtaining an insurance policy image to be identified and an insurance clause document to be analyzed, processing the insurance policy image to be identified to obtain a target standard insurance policy field and a target insurance policy field value thereof, processing the insurance clause document to be analyzed to obtain a target labeling tag and target information thereof, wherein the target information comprises an insurance responsibility analysis formula, obtaining a target parameter value from the target standard insurance policy field, the target insurance policy field value thereof, the target labeling tag and the target information thereof according to the insurance responsibility analysis formula, and substituting the target parameter value into the insurance responsibility analysis formula to obtain a responsibility analysis result. The invention also discloses an insurance responsibility analysis device and a computer readable storage medium. The invention can solve the problem of limitation of the existing insurance responsibility analysis method.
Description
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for resolving insurance responsibility, and a computer readable storage medium.
Background
In the process of insurance promotion, sales personnel usually make insurance proposal according to the current situation of clients and display the insurance proposal to users so that clients can know information such as insurance schemes, insurance benefits and the like. The most central part of the insurance proposal is an insurance responsibility analysis part which displays insurance responsibility, including responsibility paying conditions and paying modes.
Currently, insurance liabilities are mainly resolved by means of conventional searching techniques. And calculating insurance benefit values under different input parameters in advance to generate a benefit data table. After the customer inputs the related parameters, the corresponding insurance benefits are obtained by inquiring from the benefit data table through the searching technology. The method has certain limitation, only the insurance benefit recorded in advance can be obtained, and if the recording is not extracted, the insurance benefit cannot be obtained by analysis.
Disclosure of Invention
The invention mainly aims to provide an insurance responsibility analysis method, an insurance responsibility analysis device and a computer readable storage medium, and aims to solve the problem that the existing insurance responsibility analysis method has limitation.
In order to achieve the above object, the present invention provides an insurance responsibility analysis method, including:
acquiring an image of a policy to be identified and an insurance clause document to be analyzed;
Processing the to-be-identified policy image to obtain a target standard policy field and a target policy field value thereof;
Processing the insurance clause document to be analyzed to obtain a target labeling label and target information thereof, wherein the target information comprises an insurance responsibility analysis formula;
And acquiring a target parameter value from the target standard policy field and the target policy field value thereof and the target labeling tag and the target information thereof according to the insurance responsibility analysis formula, and substituting the target parameter value into the insurance responsibility analysis formula to obtain a responsibility analysis result.
Optionally, the step of processing the policy image to be identified to obtain the target standard policy field and the target policy field value thereof includes:
Identifying the to-be-identified policy image to obtain initial policy information, wherein the initial policy information comprises identification characters and position information thereof;
according to the mapping relation between the preset non-standard policy fields and the standard policy fields, the initial policy fields and the standard policy fields corresponding to the initial policy fields are obtained by matching from the identification characters;
And acquiring field value characteristics corresponding to the standard policy fields, and matching identification characters except the initial policy fields according to the field value characteristics and the position information to obtain target policy field values corresponding to the standard policy fields.
Optionally, before the step of obtaining the initial policy field and the corresponding standard policy field by matching from the identification characters according to the mapping relationship between the preset non-standard policy field and the standard policy field, the method further includes:
Acquiring a policy sample image and corresponding first product type and insurance clause information thereof, and identifying the policy sample image to obtain policy sample field information;
acquiring a first target policy field corresponding to the first product type, and determining a first non-standard policy field and a first standard policy field thereof corresponding to each first product type according to the insurance clause information;
Obtaining a second standard policy field according to the first target policy field and the first standard policy field, and carrying out statistical analysis on the policy sample field information according to the first product type and the second standard policy field to obtain a statistical analysis result;
and constructing a mapping relation between the preset non-standard policy field and the standard policy field according to the statistical analysis result, the first non-standard policy field and the first standard policy field, wherein the mapping relation between the preset non-standard policy field and the standard policy field comprises a sub-mapping relation between the non-standard policy field and the standard policy field corresponding to each first product type.
Optionally, the step of obtaining the initial policy field and the corresponding standard policy field by matching from the identification characters according to the mapping relation between the preset non-standard policy field and the standard policy field includes:
according to the mapping relation between the preset non-standard policy field and the standard policy field, matching the product field and the corresponding standard product field from the identification characters;
Determining and obtaining a target sub-mapping relation corresponding to the standard product field from the mapping relation between the preset non-standard policy field and the standard policy field;
And matching the identification characters to obtain a residual policy field and a standard residual policy field corresponding to the residual policy field according to the target sub-mapping relation and the specific policy field value characteristic obtained based on the insurance clause information, wherein the standard policy field comprises the standard product field and the standard residual policy field.
Optionally, the step of processing the insurance clause to be resolved to obtain the target labeling label and the target information thereof includes:
Inputting the insurance clause document to be analyzed into a pre-trained position labeling model to obtain a position labeling result;
intercepting the insurance clause document to be analyzed according to the position labeling result to obtain target insurance clause content;
labeling the target insurance clause content by using a pre-trained label labeling model to obtain a label labeling result, wherein the label labeling result comprises a target labeling label and corresponding original information thereof, and the original information comprises responsibility information to be analyzed;
and carrying out structuring treatment on the original information to obtain target information corresponding to each target labeling label, wherein the target information comprises an insurance responsibility analysis formula, and the insurance responsibility analysis formula is obtained by carrying out structuring treatment on the responsibility information to be analyzed.
Optionally, before the step of labeling the target insurance clause content by using the pre-trained label labeling model to obtain a label labeling result, the method further includes:
Acquiring a first training sample set, wherein the first training sample set comprises an insurance clause content sample, a real labeling label and real information thereof, and the real labeling label and the real information thereof are obtained based on a preset corpus label;
Training a preset label labeling model through the first training sample set to obtain a trained label labeling model;
the preset label marking model comprises an information extraction layer and a classification layer;
the step of training the preset label labeling model through the first training sample set to obtain a trained label labeling model comprises the following steps:
inputting the insurance clause content samples into the information extraction layer for information extraction to obtain characteristic information corresponding to each insurance clause content sample;
Converting the characteristic information into a characteristic vector, inputting the characteristic vector into the classification layer to obtain a prediction labeling label, and determining corresponding prediction information according to the prediction labeling label and the characteristic information;
Calculating to obtain a loss value according to the prediction labeling label, the prediction information, the real labeling label of the insurance clause content sample and the real information thereof;
Updating parameters of a preset label labeling model through a gradient descent algorithm according to the loss value, and performing iterative training based on the first training sample set to obtain a trained label labeling model.
Optionally, the insurance responsibility analysis method further includes:
acquiring an insurance clause sample document, and classifying the insurance clause sample document according to the product name;
Performing cluster analysis on the content of each part of insurance clause of the classified insurance clause sample document to obtain a cluster result;
labeling various insurance clause contents according to the clustering result to obtain preset labels, and carrying out statistical analysis on values corresponding to the preset labels in the various insurance clause contents to obtain value characteristics;
And constructing and obtaining the preset corpus according to the preset labels and the value characteristics.
Optionally, the insurance responsibility analysis method further includes:
Acquiring the product name of the to-be-identified policy image from the target standard policy field and the target policy field value, and determining a target display method according to the product name;
And displaying the responsibility analysis result based on the target display method.
In addition, in order to achieve the above object, the present invention also provides an insurance responsibility analysis device, which includes a memory, a processor, and an insurance responsibility analysis program stored on the memory and executable on the processor, wherein the insurance responsibility analysis program, when executed by the processor, implements the steps of the insurance responsibility analysis method as described above.
In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an insurance responsibility analysis program which, when executed by a processor, implements the steps of the insurance responsibility analysis method as described above.
The invention provides an insurance responsibility analysis method, an insurance responsibility analysis device and a computer readable storage medium, wherein an insurance policy image to be identified and an insurance clause document to be analyzed are obtained; processing the to-be-identified policy image to obtain a target standard policy field and a target policy field value thereof, processing the to-be-analyzed insurance clause document to obtain a target labeling tag and target information thereof, wherein the target information comprises an insurance responsibility analysis formula, acquiring a target parameter value from the target standard policy field and the target policy field value thereof and the target labeling tag and the target information thereof according to the insurance responsibility analysis formula, and substituting the target parameter value into the insurance responsibility analysis formula to obtain a responsibility analysis result. According to the embodiment of the invention, automatic identification of all types of policy information and automatic structuring processing of the insurance clause document to be analyzed can be realized, so that the target standard policy field and the target policy field value in the policy and the target labeling label and the target information in the insurance clause document to be analyzed are obtained, wherein the target information comprises an insurance responsibility analysis formula, then the target parameter value in the insurance responsibility analysis formula is obtained from the target standard policy field and the target policy field value and the target labeling label and the target information thereof, and further the responsibility analysis result is calculated. By the method, various types of insurance responsibilities can be automatically analyzed, so that the problem of limitation of the existing insurance responsibilities analysis method can be solved.
Drawings
FIG. 1 is a schematic diagram of a terminal structure of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart of a first embodiment of an insurance responsibility analysis method according to the present invention;
FIG. 3 is a flowchart of a second embodiment of an insurance responsibility analysis method according to the present invention;
Fig. 4 is a flowchart of a fifth embodiment of an insurance responsibility analysis method according to the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a schematic diagram of a terminal structure of a hardware operating environment according to an embodiment of the present invention.
The terminal of the embodiment of the invention can be a PC (personal computer ) or terminal equipment such as a tablet personal computer, a portable computer, a server and the like.
As shown in fig. 1, the terminal may include a processor 1001, such as a CPU (Central Processing Unit ), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., wireless-Fidelity, wi-Fi interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the terminal structure shown in fig. 1 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, and an insurance responsibility resolution program may be included in the memory 1005, which is one type of computer storage medium.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server, the user interface 1003 is mainly used for connecting to a client and performing data communication with the client, and the processor 1001 may be used for calling an insurance responsibility analysis program stored in the memory 1005 and performing the respective steps of the following insurance responsibility analysis method.
Based on the hardware structure, various embodiments of the insurance responsibility analysis method are provided.
The invention provides an insurance responsibility analysis method.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of an insurance responsibility analysis method according to the present invention.
In this embodiment, the insurance responsibility analysis method includes:
Step S10, acquiring a policy image to be identified and an insurance clause document to be analyzed;
The terminal of the embodiment of the invention can be a PC, or can be terminal equipment such as a tablet personal computer, a portable computer, a server and the like. In this embodiment, a server is described as an example.
In this embodiment, insurance data to be analyzed is first obtained, where the insurance data to be analyzed includes an insurance policy image to be identified and an insurance clause document to be analyzed.
For facilitating subsequent processing, it may be detected whether the format of the insurance clause document to be parsed is a preset document format, where the preset document format is optionally a txt document or a doc document. If the document format is not the preset document format, carrying out format conversion on the document of the insurance clause to be analyzed, and obtaining the document of the insurance clause to be analyzed after format conversion. For example, if the insurance clause document to be analyzed is a pdf document, the pdf may be converted into a txt document or a doc document, and if the insurance clause document to be analyzed is a picture, the insurance clause document to be analyzed is identified by an OCR (Optical Character Recognition ) identification technology to obtain text information and position information thereof in the document, and then the corresponding text information is input into the txt document or the doc document according to the position information. And processing the document based on the insurance clause document to be analyzed after format conversion.
Step S20, processing the to-be-identified policy image to obtain a target standard policy field and a target policy field value;
And then, processing the to-be-identified policy image to obtain a target standard policy field and a target policy field value. The method comprises the steps of firstly identifying a to-be-identified policy image to obtain initial policy information, wherein the initial policy information comprises identification characters and position information thereof, then obtaining initial policy fields and standard policy fields corresponding to the initial policy fields from the identification characters according to a mapping relation between preset non-standard policy fields and standard policy fields, finally obtaining field value characteristics corresponding to the standard policy fields, and finally matching identification characters except the initial policy fields according to the field value characteristics and the position information to obtain target policy field values corresponding to the standard policy fields. For a specific implementation, reference may be made to the second embodiment described below.
Step S30, processing the insurance clause document to be analyzed to obtain a target labeling label and target information thereof, wherein the target information comprises an insurance responsibility analysis formula;
And processing the insurance clause document to be analyzed to obtain a target labeling label and target information thereof, wherein the target information comprises an insurance responsibility analysis formula. The method comprises the steps of inputting an insurance clause document to be analyzed into a pre-trained position labeling model to obtain a position labeling result, intercepting the insurance clause document to be analyzed according to the position labeling result to obtain target insurance clause content, labeling the target insurance clause content by using the pre-trained label labeling model to obtain a label labeling result, wherein the label labeling result comprises a target labeling label and corresponding original information, the original information comprises responsibility information to be analyzed, the original information is subjected to structural processing to obtain target information corresponding to each target labeling label, the target information comprises an insurance responsibility analysis formula, and the insurance responsibility analysis formula is obtained through structural processing of the responsibility information to be analyzed. For a specific implementation, reference may be made to the fifth embodiment described below.
The insurance responsibility analysis formula can be divided into two major categories of basic logic operation standard analysis formula and special type insurance responsibility standard analysis formula. The basic logical operation standard analysis formulas can include, but are not limited to, multiple formulas, MAX and MIN maximum and minimum formulas and the like, and special type insurance liability standard analysis formulas can include reimbursement medical liability standard analysis formulas, cash flow type annuity standard analysis formulas and the like.
It should be noted that the execution process of step S20 and step S30 is not sequential.
And S40, acquiring a target parameter value from the target standard policy field and the target policy field value thereof and the target labeling tag and the target information thereof according to the insurance responsibility analysis formula, and substituting the target parameter value into the insurance responsibility analysis formula to obtain a responsibility analysis result.
And finally, according to the insurance responsibility analysis formula, acquiring a target parameter value from the target standard insurance policy field and the target insurance policy field value thereof and the target labeling label and the target information thereof, and substituting the target parameter value into the insurance responsibility analysis formula to obtain a responsibility analysis result.
For example, if the insurance responsibility analysis formula is that the basic insurance amount is 1% 1+3% in the payment period (policy annual number-3), the target parameter value to be obtained includes the basic insurance amount, the payment period and the policy annual number, and then the target parameter value is brought into the formula to calculate the responsibility analysis result.
The invention provides an insurance responsibility analysis method, which comprises the steps of obtaining an insurance policy image to be identified and an insurance clause document to be analyzed, then processing the insurance policy image to be identified to obtain a target standard insurance policy field and a target insurance policy field value thereof, processing the insurance clause document to be analyzed to obtain a target labeling label and target information thereof, wherein the target information comprises an insurance responsibility analysis formula, obtaining a target parameter value from the target standard insurance policy field and the target insurance policy field value thereof and the target labeling label and the target information thereof according to the insurance responsibility analysis formula, and substituting the target parameter value into the insurance responsibility analysis formula to obtain a responsibility analysis result. According to the embodiment of the invention, automatic identification of all types of policy information and automatic structuring processing of the insurance clause document to be analyzed can be realized, so that the target standard policy field and the target policy field value in the policy and the target labeling label and the target information in the insurance clause document to be analyzed are obtained, wherein the target information comprises an insurance responsibility analysis formula, then the target parameter value in the insurance responsibility analysis formula is obtained from the target standard policy field and the target policy field value and the target labeling label and the target information thereof, and further the responsibility analysis result is calculated. By the method, various types of insurance responsibilities can be automatically analyzed, so that the problem of limitation of the existing insurance responsibilities analysis method can be solved.
Further, based on the first embodiment, a second embodiment of the insurance responsibility analysis method of the present invention is provided. Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of an insurance responsibility analysis method according to the present invention.
In this embodiment, the step S20 includes:
step S21, identifying the to-be-identified policy image to obtain initial policy information, wherein the initial policy information comprises identification characters and position information thereof;
the embodiment provides a processing procedure of a policy image to be identified, which specifically comprises the following steps:
Firstly, identifying a policy image to be identified to obtain initial policy information, wherein the initial policy information comprises identification characters and position information, the identification characters comprise initial policy fields and initial policy field values, and the position information is the position of the identification characters in the image and can be characterized in a coordinate form. In the process of image recognition, OCR (Optical Character Recognition ) technology may be used for recognition, and specific recognition methods may refer to the prior art and are not described herein.
Step S22, according to the mapping relation between the preset non-standard policy fields and the standard policy fields, the initial policy fields and the standard policy fields corresponding to the initial policy fields are obtained by matching from the identification characters;
And then, according to the mapping relation between the preset non-standard policy fields and the standard policy fields, the initial policy fields and the standard policy fields corresponding to the initial policy fields are obtained from the identification characters in a matching mode. The construction of the mapping relationship between the preset non-standard policy field and the standard policy field can refer to the following third embodiment, which is not described herein.
Step S23, obtaining the field value characteristics corresponding to the standard policy fields, and matching the identification characters except the initial policy fields according to the field value characteristics and the position information to obtain the target policy field values corresponding to the standard policy fields.
Finally, obtaining the field value characteristics corresponding to the standard policy fields, wherein the field value characteristics are the characteristics of the values corresponding to the standard policy fields and can comprise one or more kinds of characteristics. For example, for the standard policy field "policy number" the field value is characterized by a combination of numbers and letters, and characters are typically above 5 bits, and for the standard policy field "identification card number" the field value is characterized by a number of characters of 18 bits.
And matching the identification characters except the initial policy field according to the field value characteristics and the position information to obtain the target policy field value corresponding to the standard policy field.
Specifically, the step of matching the identification characters except the initial policy field according to the field value characteristics and the position information to obtain the target policy field value corresponding to the standard policy field includes:
Step a231, marking the identification characters except the initial policy field as an initial policy field value, and acquiring the first position information of the initial policy field and the second position information of the initial policy field value from the position information;
Step a232, calculating to obtain the relative distance between each initial policy field and each initial policy field value according to the first position information and the second position information;
step a233, screening a suspected field value corresponding to the initial policy field from the initial policy field values according to the relative distance and the preset range;
Step a234, matching the suspected field value according to the field value feature to obtain a target policy field value corresponding to the standard policy field.
In this embodiment, the matching process of the target policy field value corresponding to the standard policy field is as follows:
for convenience of explanation, the identification characters except the initial policy field are recorded as initial policy field values, and first position information of the initial policy field and second position information of the initial policy field values are obtained from the position information.
And then, calculating the relative distance between each initial policy field and each initial policy field value according to the first position information and the second position information. The calculation method of the relative distance may include, but is not limited to, 1) directly calculating a distance between the first position information and the second position information, for example, when the first position information and the second position information are represented in a coordinate form, a coordinate of a certain initial policy field is (x 1, y 1), a coordinate of a certain initial policy field value is (x 2, y 2), and the distance d= [ (x 2-x 1) 2+(y2-y1)2]1/2, and 2) determining the relative distance according to the distance and the direction, for example, the distance between the first position information and the second position information may be calculated first, a distance score may be determined according to a mapping relation between the distance and a preset distance and the score, and meanwhile, a direction score may be determined according to the first position information and the second position information, and then the direction score may be determined according to the mapping relation between the direction and the preset direction and the score, and the sum of the distance and the direction score may be added as the relative distance.
And screening the suspected field value corresponding to the initial policy field from the initial policy field value according to the relative distance and the preset range. Namely, the initial policy field value with the relative distance to the initial policy field within the preset range is used as the suspected field value, so that the subsequent matching range is reduced, and the matching efficiency is improved.
And finally, matching the suspected field value according to the field value characteristics to obtain a target policy field value corresponding to the standard policy field. When matching is carried out, a value corresponding to the field value characteristic of the standard policy field matched currently can be screened out from the suspected field value to be used as a target policy field value.
After the matching is completed, whether the standard policy field is not matched with the target policy field value (marked as an unmatched standard policy field) is detected, if so, the identification characters except the target policy field value and the standard policy field are matched according to the field value characteristics of the unmatched standard policy field so as to enlarge the matching range and further match.
In addition, since the policy field values of different standard policy fields may be the same, when the existing standard policy field is not matched with the target policy field value (recorded as an unmatched standard policy field), whether the unmatched standard policy field has the standard policy field with the same policy field value or not can be detected, and if so, the policy field value of the unmatched standard policy field is directly obtained. For example, if the "insurance period" field value of an insurance product is "same main insurance", the insurance period field value of the main insurance product in the policy is copied to the insurance period of the insurance product.
Further, after the target policy field value is obtained, whether the target policy field value meets the requirement of the standardized format or not can be further detected, and if not, the target policy field value is subjected to standardized processing to obtain the standard policy field value. The method comprises the steps of carrying out standardized processing on the target policy field value, namely detecting whether preset redundant characters exist in the target policy field value, deleting the preset redundant characters existing in the target policy field value if the preset redundant characters exist, and/or detecting whether abbreviated characters exist in the target policy field value, replacing the abbreviated characters existing in the target policy field value with corresponding Chinese holonomics if the abbreviated characters exist, and/or detecting whether the target policy field value accords with an output format, carrying out format conversion on the target policy field value if the target policy field value does not accord with the output format, and/or detecting whether the target policy field value has a latest expression mode, and replacing the target policy field value with the latest expression mode if the target policy field value does exist. By the mode, standardized output of standard policy field values can be realized, and subsequent further insurance responsibility analysis processing is facilitated.
In the embodiment of the invention, all characters and position information thereof in the policy images are firstly identified, then the standard policy fields are obtained by matching according to the mapping relation between the preset non-standard policy fields and the standard policy fields, and then the target policy field values corresponding to the standard policy fields are obtained by matching according to the field value characteristics and the position information of the standard policy fields.
Further, based on the above second embodiment, a third embodiment of the insurance responsibility analysis method of the present invention is provided.
In this embodiment, before the step S22, the insurance responsibility analysis method further includes:
Step A, acquiring a policy sample image and corresponding first product type and insurance clause information thereof, and identifying the policy sample image to obtain policy sample field information;
In this embodiment, a policy sample image and a corresponding product type thereof (for distinguishing from a product type of a subsequent policy image to be identified, which is denoted as a first product type) and insurance clause information are obtained, where the policy sample image may be a policy image of a selected different product type, so as to be used for counting expressions of different policy fields of policies of different companies and different product types. The insurance clause information is information of an insurance clause part corresponding to the insurance policy, is obtained through structural processing, and can comprise clause labels and label values thereof.
And then identifying the policy sample image to obtain the policy sample field information. The policy sample field information is a policy field in the policy sample image corresponding to the standard policy field. In the process of image recognition, OCR technology can be adopted for recognition, and specific recognition methods can refer to the prior art, and are not described herein.
Step B, a first target policy field corresponding to the first product type is obtained, and a first non-standard policy field and a first standard policy field corresponding to each first product type are determined according to the insurance clause information;
And then, acquiring a target policy field corresponding to the first product type (for distinguishing the target policy field from the target policy field of the subsequent policy image to be identified, and recording the target policy field as a first target policy field), wherein the first target policy field refers to a policy field existing in a policy corresponding to each first product type, and is also a key field required to be analyzed for the policy corresponding to the first product type, so as to be used for entering or checking the policy by a user.
And meanwhile, determining a first non-standard policy field and a first standard policy field corresponding to each first product type according to the insurance clause information. That is, the standard policy fields and their expressions which are already included in the insurance clauses are determined according to the insurance clause information, because corresponding expressions are often specified for some standard policy fields in the insurance clauses, for example, in some insurance products, the standard policy fields are expressed as "pay period" in the insurance clauses, and the corresponding "pay period" will also be used in the policy, so that subsequent matching can be performed more quickly and accurately based on the expression of the standard policy fields involved in the insurance clauses, and the information standardization processing efficiency and accuracy can be improved.
Step C, obtaining a second standard policy field according to the first target policy field and the first standard policy field, and carrying out statistical analysis on the policy sample field information according to the first product type and the second standard policy field to obtain a statistical analysis result;
and then, obtaining a second standard policy field according to the first target policy field and the first standard policy field, wherein the second standard policy field is a policy field except the first standard policy field in the first target policy field.
And further, carrying out statistical analysis on the information of the insurance policy sample field according to the first product type and the second standard insurance policy field to obtain a statistical analysis result. And in the statistical analysis, the corresponding insurance policy fields corresponding to the second standard insurance policy fields of the same first product type are counted according to the insurance policy sample field information, and the corresponding statistical analysis result comprises the non-standard insurance policy fields corresponding to the second standard insurance policy fields.
And D, constructing a mapping relation between the preset non-standard policy fields and the standard policy fields according to the statistical analysis result, the first non-standard policy fields and the first standard policy fields, wherein the mapping relation between the preset non-standard policy fields and the standard policy fields comprises sub-mapping relations of the non-standard policy fields and the standard policy fields corresponding to the first product types.
And finally, constructing a mapping relation between the preset non-standard policy field and the standard policy field according to the statistical analysis result, the first non-standard policy field and the first standard policy field. The non-standard policy fields and standard policy fields may be in many-to-one, or one-to-one, form.
The statistical analysis result comprises a second standard policy field and a second non-standard policy field corresponding to the second standard policy field, when a mapping relation is constructed, the mapping relation between the preset non-standard policy field and the standard policy field is formed by the first sub-mapping relation and the second sub-mapping relation according to different first product types, the first sub-mapping relation is constructed based on the first non-standard policy field and the first standard policy field, and the second sub-mapping relation is constructed based on the second non-standard policy field and the second standard policy field. The mapping relation actually comprises a plurality of sub-mapping relations, each sub-mapping relation is a mapping relation between a non-standard policy field and a standard policy field of different product types, wherein the standard policy field in each sub-mapping relation is a first target policy field of each first product type. Specifically, each sub-mapping relationship is further divided into a sub-mapping relationship between a first non-standard policy field and a first standard policy field obtained by insurance clause information and a sub-mapping relationship between a second non-standard policy field and a second standard policy field obtained by policy sample field information statistics.
Further, the statistical analysis result includes a second standard policy field and a corresponding non-standard policy field, and step C may further include:
step C1, obtaining synonyms of the second standard policy field and the second non-standard policy field;
And C2, constructing a mapping relation between the preset non-standard policy field and the standard policy field according to the statistical analysis result, the synonym, the first non-standard policy field and the first standard policy field thereof.
Further, in order to expand the matching range and improve the matching efficiency and the accuracy of the matching result, the synonyms of the second standard policy field and the second non-standard policy field can be obtained, and then the mapping relation between the preset non-standard policy field and the standard policy field is constructed according to the statistical analysis result and the synonyms. That is, the synonym is added to the second non-standard policy field.
In this embodiment, by constructing a non-standard policy field corresponding to a preset standard policy field, the subsequent identification and matching of the policy field can be facilitated.
Further, based on the above third embodiment, a fourth embodiment of the insurance responsibility analysis method of the present invention is proposed.
In this embodiment, the step S22 includes:
step a221, according to the mapping relation between the preset non-standard policy field and the standard policy field, matching the product field and the corresponding standard product field from the identification characters;
In this embodiment, when the mapping relationship between the preset standard policy field and the standard policy field is constructed by combining with the insurance clause information, that is, the mapping relationship between the preset standard policy field and the standard policy field includes sub-mapping relationships between the non-standard policy field and the standard policy field of different product types, and each sub-mapping relationship is further divided into a sub-mapping relationship between the first non-standard policy field and the first standard policy field obtained by the insurance clause information and a sub-mapping relationship between the second non-standard policy field and the second standard policy field obtained by the policy sample field information statistics. At this time, the product field and the corresponding standard product field can be obtained by matching from the identification characters according to the mapping relation between the preset non-standard policy field and the standard policy field. The product field is recorded as a field corresponding to the product type, and can be obtained by matching target fields such as the product name, the product type or the policy name, and then the corresponding standard product field, namely the standardized product type name, is determined.
Step a222, determining and obtaining a target sub-mapping relation corresponding to the standard product field from the mapping relation between the preset non-standard policy field and the standard policy field;
And then, determining and obtaining a target sub-mapping relation corresponding to the standard product field from the mapping relation between the preset non-standard policy field and the standard policy field. Namely, determining the sub-mapping relation corresponding to the product type obtained by matching, and marking the sub-mapping relation as a target sub-mapping relation.
And a step a223 of matching the residual policy field and the corresponding standard residual policy field from the identification character according to the target sub-mapping relation and the specific policy field value characteristic obtained based on the insurance clause information, wherein the standard policy field comprises the standard product field and the standard residual policy field.
As an implementation manner, the remaining policy fields and the corresponding standard remaining policy fields thereof can be obtained by matching from the identification characters according to the target sub-mapping relationship, wherein the standard policy fields comprise standard product fields and standard remaining policy fields, and the standard remaining policy fields are other policy fields except the standard product fields and included in the policy.
When matching is performed, a part of the remaining policy fields and the standard remaining policy fields thereof can be determined and obtained according to the sub-mapping relation between the first non-standard policy fields and the first standard policy fields obtained by the insurance clause information in the target sub-mapping relation, and then another part of the remaining policy fields and the standard remaining policy fields thereof can be determined and obtained according to the sub-mapping relation between the second non-standard policy fields and the second standard policy fields obtained by the policy sample field information statistics in the target sub-mapping relation.
As another implementation, field identification may also be performed in conjunction with specific policy field value characteristics derived based on the insurance clause information. Specifically, according to the target sub-mapping relation and the specific policy field value characteristics obtained based on the insurance clause information, the residual policy fields and the corresponding standard residual policy fields are obtained by matching from the identification characters, wherein the standard policy fields comprise standard product fields and standard residual policy fields. The specific policy field value characteristics obtained based on the insurance clause information are field value characteristics corresponding to certain specific policy fields obtained through analysis according to the insurance clause information. During identification, the field value of the corresponding field can be preferentially identified according to the specific policy field value characteristic, then a part of corresponding residual policy fields can be identified and obtained when the position information is in a certain range according to the position information, further the residual standard policy fields corresponding to the part of residual policy fields can be determined and obtained according to the target sub-mapping relation, further the residual policy fields and the standard residual policy fields can be remained according to the target sub-mapping relation equipment, and the specific identification process can refer to the previous embodiment.
By the mode, the matching efficiency of the policy fields can be improved, and the accuracy rate of the matching of the policy fields can also be improved.
Further, based on the first embodiment described above, a fifth embodiment of the insurance responsibility analysis method of the present invention is proposed. Referring to fig. 4, fig. 4 is a flowchart illustrating a fifth embodiment of an insurance responsibility analysis method according to the present invention.
In this embodiment, the step S30 includes:
S31, inputting the insurance clause document to be analyzed into a pre-trained position labeling model to obtain a position labeling result;
the embodiment provides a label marking of insurance clauses and an information standardization processing method thereof, which concretely comprises the following steps:
Firstly, inputting an insurance clause document to be analyzed into a pre-trained position labeling model to obtain a position labeling result. The position labeling model is used for labeling the positions of the insurance clause contents (such as the insurance clause basic information, the responsibility basic information and the like) of each part in the insurance clause document to be analyzed, and the parts can be subdivided according to actual requirements, so that the subsequent screenshot of the insurance clause contents of each part is facilitated, the processing range of the insurance clause is shortened, the processing efficiency can be improved, meanwhile, the interference of useless parts can be avoided, and the processing accuracy can be improved to a certain extent.
The training process of the position labeling model comprises the steps of firstly obtaining a second training sample set, wherein the second training sample set comprises an insurance clause sample document and labeling frames of the insurance clause contents of all parts. The insurance clause contents of each part can comprise, but are not limited to, insurance clause basic information, responsibility basic information and the like, and can be divided according to actual needs, and the marking frame is used for marking the corresponding position of each part of the insurance clause contents in the insurance clause sample document. And then training the preset position labeling model through a second training sample set to obtain a trained position labeling model. The preset position labeling model is optionally a multi-classification model, such as a neural network model, a random forest model, a logistic regression model and the like. Specific training procedures may be referred to the prior art. The position labeling model is trained in advance to be used for labeling the positions of the insurance clause contents of each part in the insurance clause sample document, so that the subsequent screenshot of the insurance clause contents of each part is facilitated, the processing range of the insurance clause is reduced, the processing efficiency can be improved, meanwhile, the interference of useless parts can be avoided, and the processing accuracy can be improved to a certain extent.
S32, intercepting the insurance clause document to be analyzed according to the position labeling result to obtain target insurance clause content;
Then, intercepting the document of the insurance clause to be analyzed according to the position labeling result to obtain the target insurance clause content. That is, the part where the information to be structured is located is cut out, so that the label labeling and structuring can be performed conveniently, and the important information of the insurance clause mainly includes the basic information of the insurance (such as file name, insurance company name, product name, design type, number of pages and position of responsibility), the responsibility payment information (such as responsibility basic information, responsibility payment condition and responsibility payment formula), the limit information and some other special information (such as red information).
Step S33, labeling the target insurance clause content by using a pre-trained label labeling model to obtain a label labeling result, wherein the label labeling result comprises a target labeling label and corresponding original information thereof, and the original information comprises responsibility information to be analyzed;
After capturing the target insurance clause content from the insurance clause document to be analyzed, labeling the target insurance clause content by utilizing a pre-trained label labeling model to obtain a label labeling result, wherein the label labeling result comprises a target labeling label and original information corresponding to the target labeling label, the original information is a value corresponding to the target labeling label, the original information comprises responsibility information to be analyzed, and the responsibility information to be analyzed comprises responsibility paying condition description and responsibility paying formula description. The training process of the label marking model can refer to the fifth embodiment described below.
As one of the label labeling modes, a label labeling model which can be used for identifying and labeling various labels can be trained in advance, and then, the content of the target insurance clause is input into the label labeling model, so that a label labeling result can be obtained.
As another label labeling mode, a plurality of label labeling models can be trained in advance, each label labeling model aims at different types of insurance clause contents respectively and is used for labeling the different types of insurance clause contents, and then, according to the type to which the target insurance clause contents belong, each type of target insurance clause contents are input into the corresponding label labeling model respectively so as to obtain label labeling results. Compared with the label labeling mode, the label labeling mode can enable the target insurance clause content of each type to be processed in parallel, the processing efficiency can be improved, and the label labeling model is trained for the insurance clause content of each type, so that the processing is more targeted, and the accuracy of a processing result can be improved.
Here, it should be noted that, the target labeling label may be classified into 17 major categories of insurance clause basic information, responsibility basic information, detailed analysis of responsibility, proportion coefficient, limit class label, other labels, annuity class label, universal or throw-in label, red product label, disease class label, medical fee class label, medical body paste class, disability class label, persistent prize, endowment guarantee commission label, etc., and specific refinement labels are included in each major category, for example, the insurance clause basic information may include 5 refinement labels of file name, insurance company full name, product name, design type, number of pages and positions of responsibility, the responsibility basic information may include 4 refinement labels of responsibility name, responsibility layer one, responsibility layer two, responsibility clause summary, detailed resolution of liabilities may include 5 refinement tags of waiting period, liability statement, liability payment formula, description associated with liability of the present contract, description associated with liability of other contracts, limit class tags may include 6 refinement tags of payment form, payment count limit, payment limit, same time interval, payment limit description, payment amount limit, other tags may include 5 refinement tags of selectable liability core words, policy loan proportion, common word complex case, guarantee duration, no claim offer, annuity class tags may include 5 refinement tags of initial lead age, ending lead age, conversion annuity option, annuity entry universal account, annuity guarantee payment, universal or continuous tags may include 5 refinement tags of initial charge, risk premium, policy management charge, buy sell-in spread, minimum guarantee interest rate, the reddish product label may include 3 refinement labels of reddish information location, reddish pattern, bonus usage, the disease class label may include 2 refinement labels of disease class, disease group, the medical class label may include 1 refinement label of continuation of responsibility for days, the medical fee class label may include 1 refinement label of claim free amount, the medical benefit class may include claim free period, pay benefit per day base, apply score 4 refinement labels, the disabling class label may include 2 refinement labels of disabling again, disabling state change, the persistent prize may include persistent prize condition description, and the persistent prize pay description 2 refinement labels. Of course, it will be appreciated that other sub-labels may be provided under each refinement label as appropriate. The identification tag is set based on the identification processing and statistical analysis of the prior insurance clause information, and can cover various important information of the insurance clause.
For example (described as example 1), for the responsibility clause section "from the fourth year of the present contract, if the insured survives until the year of the present contract corresponds, the present company pays the career annually as specified below: the accountability is described as" from the fourth year of the present contract to the year of the present contract, if the insured survives to the year of the present contract, the accountability formula is described as "the present company pays the career annuity every year:// the career annuity = the basic insurance amount: (years): 1% [1+3% ] in the period of (years) paying (years) (the present contract has been guaranteed to the year-3) ]. "
For another example (described as example 2), the company pays an expiration insurance fee for the corresponding day of the year effective from the life of the insured to the age of seventy-five years, and the contract is terminated. The term "insured life = basic insurance amount x time of payment (years)", the available liability payment condition is described as "insured life to the corresponding day of year effect of seventy-five years old, and the liability payment formula is described as" the company pays the full insurance life according to the following rule, and the contract is terminated ". The expiration insurance policy=the basic insurance amount×the period of payment (years) ".
Step S34, carrying out structuring processing on the original information to obtain target information corresponding to each target labeling label, wherein the target information comprises an insurance responsibility analysis formula, and the insurance responsibility analysis formula is obtained by carrying out structuring processing on the responsibility information to be analyzed.
And finally, carrying out structuring treatment on the original information, namely representing the original information in a uniform format to obtain target information corresponding to each target labeling label, so that the insurance clause document to be analyzed is finally output in a structuring mode of the target labeling labels and the target information, and the subsequent analysis of insurance responsibility is facilitated. The target information comprises an insurance responsibility analysis formula, and the insurance responsibility analysis formula is obtained by carrying out structural processing on responsibility information to be analyzed.
When the structuring processing is performed, the processing can be performed based on a pre-constructed corpus, and standardized expressions of corresponding values of all labeling labels are specified in the pre-set corpus, wherein the standardized expressions comprise unified expression of units, unified expression of characters, standardized expression of words, formula expression and the like. For example, the term units are unified as d, m and y to represent day, month and year respectively, the characters can comprise consistent expressions and unified expressions of Chinese and English characters, the formula symbols are unified as max, min, +, -,/, the Chinese and English characters are unified as corresponding Chinese full names or English abbreviations, standardized expressions are unified for words, the formula expressions are expressed by formula symbols and words on one hand, and expression of various types of formulas on the other hand, such as expression of 'the first insurance contract annual larger year of the fifth insurance contract annual day and the first insurance contract annual day of the sixty five years', the usable formula is expressed as 'max (insurance age+5, 65)', and the usable formula is expressed as 'insurance age+1' from the first insurance annual day. For another example, the expression of the responsibility payment formula can be obtained according to the unified regulation of the formula symbol and sentence analysis and conversion, for example, the text of the responsibility payment part is described as 'the company deducts the balance payment die insurance deposit after the paid wounded insurance deposit according to the accidental injury insurance deposit agreed by the contract', and the contract is terminated. The responsibility payment formula is converted into 'accident insurance amount-paid insurance amount'.
For example, in the above example 1, a preliminary insurance responsibility payment formula may be obtained from the description of the responsibility payment condition and the responsibility payment formula, wherein the basic insurance amount is 1% by number of times of payment (years) [1+3% (the present contract has been over number of years-3) ], and then the term is normalized, thereby obtaining a standard insurance responsibility payment formula, wherein the basic insurance amount is 1% by number of times of payment [1+3% (number of years-3) ].
For example, in the above example 2, a preliminary insurance responsibility payment formula may be obtained from the description of the responsibility payment condition and the responsibility payment formula, and then the term is expressed in a standardized manner, thereby obtaining a standard insurance responsibility payment formula, i.e., a basic insurance amount.
According to the embodiment of the invention, unstructured insurance clause information is automatically structured in a label mode to obtain the corresponding target labeling label and target information, so that the insurance clause is labeled and structured, and compared with manual processing, the processing efficiency and accuracy can be greatly improved, and meanwhile, the follow-up analysis of insurance responsibility can be facilitated.
Further, based on the fifth embodiment described above, a sixth embodiment of the insurance responsibility analysis method of the present invention is proposed.
In this embodiment, before the step S33, the insurance responsibility analysis method further includes:
Step E, a first training sample set is obtained, wherein the first training sample set comprises an insurance clause content sample, a real labeling label and real information thereof, and the real labeling label and the real information thereof are obtained based on a preset corpus label;
in this embodiment, an acquisition process of a label labeling model is provided, which specifically includes the following steps:
The method comprises the steps of firstly obtaining a first training sample set, wherein the first training sample set comprises an insurance clause content sample, a real labeling label and real information thereof, the real labeling label and the real information thereof are obtained by labeling based on a preset corpus, the real labeling label is based on a preset label in the preset corpus, the label to which characters in the insurance clause content sample belong is manually judged, the label is further obtained by labeling, and the real information is a value corresponding to the real labeling label in the insurance clause content sample. The construction method of the preset corpus can refer to the sixth embodiment described below.
And F, training a preset label labeling model through the first training sample set to obtain a trained label labeling model.
And then training the preset label labeling model through the first training sample set to obtain a trained label labeling model. The preset label labeling model can comprise one or more, and when one label labeling model is included, the first training sample set is directly input into the preset label labeling model for iterative training. When the system comprises a plurality of insurance clause content samples, the insurance clause content samples can be respectively input into each preset label labeling model according to types for iterative training, so that label labeling models for label labeling of different types of insurance clause content samples can be obtained.
Specifically, the preset label labeling model includes an information extraction layer and a classification layer, and step F includes:
Step F1, inputting the insurance clause content sample into the information extraction layer for information extraction to obtain characteristic information corresponding to each insurance clause content sample;
Step F2, converting the characteristic information into a characteristic vector, inputting the characteristic vector into the classification layer to obtain a prediction labeling label, and determining corresponding prediction information according to the prediction labeling label and the characteristic information;
Step F3, calculating to obtain a loss value according to the prediction labeling label, the prediction information, the real labeling label of the insurance clause content sample and the real information thereof;
And F4, updating parameters of a preset label labeling model through a gradient descent algorithm according to the loss value, and performing iterative training based on the first training sample set to obtain a trained label labeling model.
In this embodiment, the label labeling model includes an information extraction layer and a classification layer, where the information extraction layer is used for extracting feature information, and the classification layer is used for classifying the feature information to determine labels corresponding to each word or each sentence. The training process of the label labeling model is as follows:
The insurance clause content sample is input to the information extraction layer to extract information, and feature information corresponding to each insurance clause content sample is obtained, wherein the feature information can include but is not limited to part-of-speech information (main-predicate information), entity information (entity name), position information (position information in the whole sentence), word vectors, sentence vectors and the like. The extraction of various features can be performed by adopting corresponding modules, and an information extraction layer can be constructed according to actual needs.
Then, converting the feature information into a feature vector, inputting the feature vector into a classification layer, obtaining a preliminarily obtained labeling label, marking the preliminarily obtained labeling label as a prediction labeling label, and determining corresponding prediction information according to the prediction labeling label and the feature information, wherein the prediction information is a prediction value corresponding to the prediction labeling label. When the prediction information is determined, the prediction information can be obtained by matching the value characteristics corresponding to the preset value in the preset corpus with the extracted characteristic information.
And after obtaining the prediction labeling label and the corresponding prediction information, calculating to obtain the loss value according to the prediction labeling label, the prediction information, the real labeling label of the insurance clause content sample and the real information thereof. Specifically, a first loss value can be obtained by calculating according to the prediction labeling label and the real labeling label, then a second loss value can be obtained by calculating according to the prediction information and the real information, and the first loss value and the second loss value are summed to obtain a final loss value. The first Loss value and the second Loss value may be calculated by using a Loss function such as a mean square error (Mean Square Error, MSE), a mean absolute error (Mean Absolute Error, MAE), a multi-class SVM Loss (finger Loss) function, a cross entropy Loss function, and the like.
Finally, updating parameters of a preset label marking model through a gradient descent algorithm according to the loss value, and performing iterative training based on a first training sample set, namely updating gradients of nodes in each layer in the label marking model according to total loss, further updating weight parameters of each node, repeating the steps F1 to F3 until the network converges, namely stably descending the total loss to a smaller range (lower than a preset threshold value or reaching a minimum value), and obtaining the trained label marking model at the moment. The optimization problem of the large-scale sample data can be solved through a gradient descent algorithm, and the specific gradient descent algorithm can refer to the prior art and is not described herein.
In this embodiment, a label labeling model is trained in advance to label labels and values (original information) included in various types of insurance clause content, and then the original information is structured to obtain a target labeling label and target information corresponding to an insurance clause document to be analyzed, so that the insurance clause is labeled and structured, and compared with manual processing, the processing efficiency and accuracy can be greatly improved, and meanwhile, subsequent analysis of insurance responsibility can be facilitated.
Further, based on the sixth embodiment, a seventh embodiment of the insurance responsibility analysis method of the present invention is provided.
In this embodiment, before step E, the insurance responsibility resolution method further includes:
step G, acquiring an insurance clause sample document, and classifying the insurance clause sample document according to the product name;
in this embodiment, a process for obtaining a preset corpus is provided, which is specifically as follows:
an insurance clause sample document is firstly obtained, and classified according to the product name. When classifying, the product names of the insurance clause sample documents can be acquired first, and then the insurance clause sample documents are classified according to the product names so as to classify the same product into one type.
Step H, carrying out cluster analysis on the content of each part of insurance clause of the classified insurance clause sample document to obtain a cluster result;
And then, carrying out cluster analysis on the content of each part of the classified insurance clause sample document to obtain a cluster result. In performing the cluster analysis, a partition-based clustering algorithm (e.g., a k-means clustering algorithm), a hierarchy-based clustering algorithm (e.g., a chameleon Floyd algorithm, an AGNES (Agglomerative NE Sting, bottom-up aggregation algorithm), a CURE (Clustering Using RE PRESENTATIVES), etc.), a density-based clustering algorithm, a mesh-based clustering algorithm, a model-based clustering algorithm, a fuzzy-based clustering algorithm, etc., may be employed. Of course, in order to facilitate clustering, feature information, such as word vectors or sentence vectors, of each part of insurance clause content of each product can be extracted first, and then clustering analysis is performed based on the extracted feature information, so as to obtain a clustering result.
Step I, labeling various insurance clause contents according to the clustering result to obtain preset labels, and carrying out statistical analysis on values corresponding to the preset labels in the various insurance clause contents to obtain value characteristics;
After the clustering result is obtained, labeling various insurance clause contents according to the clustering result to obtain preset labels, wherein the labeling method comprises the steps of, but is not limited to, 1) manually labeling, namely, sending the clustering result and various insurance clause contents to a working end to enable staff to set corresponding labels according to the clustering result, 2) machine labeling, namely, matching the clustering result with preset multi-type labels, screening the corresponding labels according to the matching result to serve as the preset labels, and 3) combining the manual labeling with the machine labeling. And carrying out statistical analysis on the values corresponding to the preset labels in the contents of various insurance clauses while marking the labels, so as to obtain the value characteristics.
For example, the subdivision label of the upper limit of the payment period may include 8 kinds of upper limits of the payment days for a single payment period, upper limits of the payment days for a single policy, upper limits of the payment days for a contract period, upper limits of the payment days for each policy period, upper limits of the payment days for a plurality of insurance costs for a single policy, upper limits of the payment days for a plurality of insurance costs for a contract period, upper limits of the payment days for the same reason for a contract period, and upper limits of the payment days for the same reason for a single policy period, and the values are all characterized by a combination of numerals and letters, and the letters are year/month/day.
As an implementation mode, label labeling can be directly carried out on various insurance clause contents according to the clustering result, and labels of all types of products are obtained and used as preset labels.
Further, the step of labeling various insurance clause contents according to the clustering result to obtain a preset label includes:
Step I1, labeling various insurance clause contents according to the clustering result to obtain an initial label;
and step I2, counting the initial labels according to the product types corresponding to the initial labels, and clustering and de-duplication processing the initial labels according to the counting result to obtain preset labels.
In another embodiment, the label labeling can be performed on various insurance clause contents according to the clustering result to obtain initial labels, wherein the initial labels are the collection of various part of insurance clause contents of various product types, and then the statistics is performed on the initial labels according to the product types corresponding to the initial labels, namely the labels of different product types are respectively counted to obtain the labels of various types of products, and then the clustering and de-duplication treatment are performed on the initial labels according to the counting result so as to combine the labels of the same meaning but different names in various types of products, so that various expressions of the same thing are avoided, and the final preset label can be obtained. Through the mode, the obtained preset label is more refined, and repetition and similar conditions are avoided.
And J, constructing and obtaining the preset corpus according to the preset labels and the value characteristics.
After the preset labels and the corresponding features are obtained, a preset corpus is constructed according to the preset labels and the value features.
In this embodiment, a corpus, that is, a multi-dimensional label index system of insurance clause features is finally constructed by counting and refining massive insurance clauses, so that labels and information of the insurance clauses can be labeled later, the insurance clauses are labeled and structured, and analysis of insurance responsibility is facilitated later.
Further, based on the fifth to seventh embodiments described above, an eighth embodiment of the insurance responsibility analysis method of the present invention is proposed.
In this embodiment, before step S33, further includes:
Step K, detecting whether a form exists in the content of the target insurance clause;
In this embodiment, since the form is unfavorable for the subsequent label labeling process, it is possible to detect whether the form exists in the target insurance clause content before the target insurance clause content is input into the label labeling model for processing.
If yes, executing the step L, and acquiring row and column information and dimensions of the table in the target insurance clause content;
M1, if the dimension of the form is one-dimensional, connecting the row and column information according to a first preset expression to obtain the processed target insurance clause content;
Step M2, if the dimension of the form is multidimensional, connecting the row and column information according to a second preset expression to obtain the processed target insurance clause content;
if the table exists in the target insurance clause content, row and column information and dimension of the table in the target insurance clause content are acquired, wherein the row and column information is recorded as information displayed in each list of each row.
If the dimension of the form is one-dimensional, connecting the row and column information according to a first preset expression to obtain the processed form content, and further using the processed form and other contents as the processed target insurance clause content. The first preset expression may be (illustrated by the fields including 2) a table identifier, a field 1 | [ a dimension field corresponding to a field 1 ] =a field 2 | [ a dimension field corresponding to a field 2 ], where "|" only plays a role of a separator, where the table identifier may be determined according to a type of a table, and is used to distinguish where the table identifier is expressed as a table, for example, an assignment table may be set as a table, a presentation table may be set as gentable, a table field corresponding to a first row or a first column of fields 1 and 2, and a dimension field corresponding to a field 1 or 2 is a numerical value corresponding to a row or a column of fields 1 or 2. In addition, to further facilitate the subsequent label labeling process, standardized conversion may be performed on each field, specifically, conversion may be performed according to a mapping relationship between a preset nonstandard field and a standardized field, and the corresponding first preset expression may be (illustrated by the field including 2) a table identifier, a field 1 | a dimension field corresponding to the standardized field 2 |.
For example, if the table is shown in Table 1 below, gentable insurance year (see parade two) ++insurance year) [1 st, 2 nd, 3 rd, 4 th and later ] = percentage of the basic insurance amount ++scaling factor 1 ++10%, 20%,40%,100% ] can be converted.
Table 1 one-dimensional table
If the dimension of the form is multidimensional, connection processing is carried out on the row and column information according to a second preset expression, so that processed form contents are obtained, and the processed form and other contents are used as processed target insurance clause contents.
The second preset expression may be (illustrated by the fields including 3) a table identifier, a dimension field corresponding to field 1 | [ field 1 ] # field 2 | [ dimension field corresponding to field 2] =a value of field 3 corresponding to fields 1 and 2], where "#" only plays a role of a separator, where the table identifier may be determined according to a type of the table and is used to distinguish the table, for example, an assignment class table may be set as a table, a presentation class table may be set as gentable, the fields 1,2 and 3 correspond to a first row and a first column of table fields, the value of the corresponding field 3 corresponding to fields 1 and 2 is a value of the corresponding row or column of fields 1 or 2, and the dimension field corresponding to field 3 is a value of the corresponding field 3 under fields 1 and 2. In addition, to further facilitate the subsequent label labeling process, standardized conversion may be performed on each field, specifically, conversion may be performed according to a mapping relationship between a preset non-standard field and a standardized field, and the corresponding second preset expression may be (illustrated by the field including 3) a table identifier, a dimension field corresponding to a field 1 | standardized field 1 | a dimension field corresponding to a field 1 ] # field 2 | standardized field 2 | a dimension field corresponding to a field 2] =a field 3 | standardized field 3 | values of the corresponding field 3 under fields 1 and 2 ]. In addition, the second preset expression can also express the dimension field corresponding to each field in a matrix form on the basis of the above.
For example, if the table is shown in Table 2 below, the table insurance period | [10 years, 15 years ] # charging period | [3 years, 5 years, 10 years ] = annual survival gold pay ratio | proportionality coefficient 1 | [9%,14%,18%,10%,15%,25% ] can be converted. Can also be converted into an insurance periodX time period of paymentThe annual survival gold administration ratio [9%,14%,18%,10%,15%,25% ].
Step S33 includes:
And labeling the processed target insurance clause content by using a pre-trained label labeling model to obtain a label labeling result.
And after the processed target insurance clause content is obtained, labeling the processed target insurance clause content by using a pre-trained label labeling model to obtain a label labeling result. The specific implementation process may refer to the above embodiment, and will not be described herein.
In this embodiment, the table is subjected to dimension reduction processing, and is converted into a corresponding expression according to the dimension of the table, so as to realize standardized expression of the table, thereby facilitating subsequent processing.
Further, based on the above embodiments, a ninth embodiment of the insurance responsibility analysis method of the present invention is proposed.
In this embodiment, after the step S40, the insurance responsibility analysis method further includes:
Step N, obtaining the product name of the policy image to be identified from the target standard policy field and the target policy field value thereof, and determining a target display method according to the product name;
And step O, displaying the responsibility analysis result based on the target display method.
In this embodiment, after the insurance responsibility analysis result is obtained, the product name of the policy image to be identified is obtained from the target standard policy field and the target policy field value thereof, and the target display method is determined according to the product name. The target display method can be determined according to a preset mapping relation between the product name and the display method, and the display method can include but is not limited to hierarchical display, cash flow display and preset template display.
And then, displaying the responsibility analysis result based on the target display method. For example, the non-annual gold product display can be divided into three layers for display, namely, responsibility basic display, generation of only pay phrases containing guarantee responsibility data, display of pay conditions and pay responsibility analysis results, generation of sentences containing pay conditions, guarantee responsibility data and pay limits, and retrieval of a database, and extraction of clause texts of the responsibility. And for the display of the annuity products, if the responsibility type is annuity, displaying the responsibility analysis result in a cash flow mode. The cash flow starts at the date of the policy validation and ends at the date of the policy termination. The cash flow node and the amount can be filled by inquiring corresponding values from the target standard policy field and the target policy field value thereof, the target labeling label and the target information thereof.
The present invention also provides a computer-readable storage medium having stored thereon an insurance liability analysis program which, when executed by a processor, implements the steps of the insurance liability analysis method according to any of the above embodiments.
The specific embodiments of the computer readable storage medium of the present invention are substantially the same as the embodiments of the insurance responsibility analysis method described above, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011101652.6A CN112270224B (en) | 2020-10-14 | 2020-10-14 | Insurance liability analysis method, device and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011101652.6A CN112270224B (en) | 2020-10-14 | 2020-10-14 | Insurance liability analysis method, device and computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112270224A CN112270224A (en) | 2021-01-26 |
CN112270224B true CN112270224B (en) | 2024-12-13 |
Family
ID=74338559
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011101652.6A Active CN112270224B (en) | 2020-10-14 | 2020-10-14 | Insurance liability analysis method, device and computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112270224B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112906352A (en) * | 2021-03-06 | 2021-06-04 | 道和云科技(天津)有限公司 | Vehicle insurance electronic insurance policy text recognition and extraction method and system |
CN114330283A (en) * | 2021-11-09 | 2022-04-12 | 世纪保众(北京)网络科技有限公司 | Technical method for automatically filling guarantee responsibility abstract applied to insurance clause analysis |
CN116978041A (en) * | 2023-08-03 | 2023-10-31 | 中国银行股份有限公司 | Text positioning method, device, equipment and medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109035032A (en) * | 2018-06-11 | 2018-12-18 | 中国平安人寿保险股份有限公司 | Data structured processing method, device, computer equipment and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101515272B (en) * | 2008-02-18 | 2012-10-24 | 株式会社理光 | Method and device for extracting webpage content |
CN107798299B (en) * | 2017-10-09 | 2020-02-07 | 平安科技(深圳)有限公司 | Bill information identification method, electronic device and readable storage medium |
KR102223226B1 (en) * | 2018-08-30 | 2021-03-05 | 주식회사 디레몬 | Apparatus and method for providing coverage information of insurance |
CN110287785A (en) * | 2019-05-20 | 2019-09-27 | 深圳壹账通智能科技有限公司 | Text structure information extracting method, server and storage medium |
-
2020
- 2020-10-14 CN CN202011101652.6A patent/CN112270224B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109035032A (en) * | 2018-06-11 | 2018-12-18 | 中国平安人寿保险股份有限公司 | Data structured processing method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112270224A (en) | 2021-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108717406B (en) | Text emotion analysis method and device and storage medium | |
US12340319B2 (en) | System and method for determining a structured representation of a form document utilizing multiple machine learning models | |
CN112270604B (en) | Information structuring processing method, device and computer readable storage medium | |
JP7268273B2 (en) | Legal document analysis system and method | |
CN112632989B (en) | Method, device and equipment for prompting risk information in contract text | |
Milosevic et al. | A framework for information extraction from tables in biomedical literature | |
US11687812B2 (en) | Autoclassification of products using artificial intelligence | |
US9646077B2 (en) | Time-series analysis based on world event derived from unstructured content | |
CN112270224B (en) | Insurance liability analysis method, device and computer-readable storage medium | |
US10733675B2 (en) | Accuracy and speed of automatically processing records in an automated environment | |
US20150032645A1 (en) | Computer-implemented systems and methods of performing contract review | |
CN111694946A (en) | Text keyword visual display method and device and computer equipment | |
CN110162754B (en) | Method and equipment for generating post description document | |
CN112035675A (en) | Medical text labeling method, device, equipment and storage medium | |
US9483740B1 (en) | Automated data classification | |
CA3048356A1 (en) | Unstructured data parsing for structured information | |
CN112270223B (en) | Policy inspection method, policy inspection device and computer readable storage medium | |
JP6757840B2 (en) | Sentence extraction system, sentence extraction method, and program | |
CN110347806A (en) | Original text discriminating method, device, equipment and computer readable storage medium | |
CN110232328A (en) | A kind of reference report analytic method, device and computer readable storage medium | |
CN114549177A (en) | Insurance letter examination method, device, system and computer readable storage medium | |
US20250029417A1 (en) | Data digitization via custom integrated machine learning ensembles | |
EP3640861A1 (en) | Systems and methods for parsing log files using classification and a plurality of neural networks | |
Wu et al. | Automatic semantic knowledge extraction from electronic forms | |
CN112651725A (en) | Electronic invoice parsing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |