CN117011877A

CN117011877A - Financial contract auditing method and device, electronic equipment and storage medium

Info

Publication number: CN117011877A
Application number: CN202311000116.0A
Authority: CN
Inventors: 苏沁宁; 詹乐
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2023-08-09
Filing date: 2023-08-09
Publication date: 2023-11-07

Abstract

The application provides a financial contract auditing method, a financial contract auditing device, electronic equipment and a storage medium, which comprise the steps of determining a pre-trained target segmentation model according to a template type corresponding to a financial contract to be audited; inputting the to-be-signed financial contract and the signed financial contract into a target segmentation model in a picture format respectively to output text segmentation information corresponding to the to-be-signed financial contract and the signed financial contract respectively; inputting text segmentation screenshots corresponding to segmentation screenshots of the financial contract to be signed and the signed financial contract respectively into a pre-trained optical character recognition model so as to output text recognition results corresponding to each text segmentation screenshot; and performing text matching on text recognition results corresponding to text segmentation screenshots with the same text category between the to-be-signed financial contract and the signed financial contract to determine whether the to-be-signed financial contract and the signed financial contract meet the contract signing standard.

Description

A method, device, electronic equipment and storage medium for auditing financial contracts

技术领域Technical field

本申请涉及数据处理技术领域，具体而言，涉及一种财资合同的审核方法、装置、电子设备及存储介质。This application relates to the field of data processing technology, specifically to a financial contract review method, device, electronic equipment and storage medium.

背景技术Background technique

合同是各行各业交易发生的一种凭证。由于使用场景丰富，因此在物理世界中，产生了大量的合同文本数据。大量的合同文本加剧了合同审查的时间，也拉长了客户的等待周期。行业内较多的方案是直接对合同文本进行文字识别，然后匹配文本行，对文本进行比较，来确认合同是否被修改。A contract is a document that transactions in all walks of life take place. Due to the rich usage scenarios, a large amount of contract text data is generated in the physical world. The large amount of contract text increases the time required for contract review and also lengthens the waiting period for customers. Most solutions in the industry are to directly recognize the text of the contract text, then match the text lines, and compare the texts to confirm whether the contract has been modified.

但使用上述方法进行合同文本比对时，往往会发生隔行、错行的问题，最终导致比对结果有偏差，影响审核进度，加大了人工二次审核的工作量。因此需要一种审核结果更准确、效率更高的合同审核方法。However, when using the above method to compare contract texts, interlaced and wrong lines often occur, which ultimately leads to deviations in the comparison results, affects the review progress, and increases the workload of manual secondary review. Therefore, a contract review method with more accurate review results and higher efficiency is needed.

发明内容Contents of the invention

有鉴于此，本申请的目的在于提供一种财资合同的审核方法、装置、电子设备及存储介质，以提高合同审核的效率和准确性。In view of this, the purpose of this application is to provide a financial contract review method, device, electronic equipment and storage medium to improve the efficiency and accuracy of contract review.

第一方面，本申请提供了一种财资合同的审核方法，方法包括根据待审核财资合同对应的模板类型，确定出预先训练好的目标分割模型，待审核财资合同包括合同双方中的一方拟定的待签署财资合同和合同双方中的另一方反馈的已签署财资合同；将待签署财资合同和已签署财资合同分别以图片的格式输入目标分割模型，以输出待签署财资合同和已签署财资合同各自对应的文本分割信息，文本分割信息至少包括文本分割截图和每个文本分割截图分割截图对应的文本类别；将待签署财资合同和已签署财资合同各自的文本分割截图输入至预先训练好的光学字符识别模型，以输出每个文本分割截图对应的文本识别结果；将待签署财资合同和已签署财资合同之间文本类别相同的文本分割截图对应的文本识别结果进行文本匹配，以确定待签署财资合同和已签署财资合同是否满足合同签署标准。First, this application provides a method for reviewing financial contracts. The method includes determining a pre-trained target segmentation model based on the template type corresponding to the financial contract to be reviewed. The financial contract to be reviewed includes both parties to the contract. The financial contract to be signed prepared by one party and the signed financial contract fed back by the other party to the contract; the financial contract to be signed and the financial contract signed are input into the target segmentation model in the format of pictures to output the financial contract to be signed. The corresponding text segmentation information of the financial contract and the signed financial contract. The text segmentation information at least includes text segmentation screenshots and the text category corresponding to each text segmentation screenshot segmentation screenshot; separate the financial contract to be signed and the signed financial contract. The text segmentation screenshots are input to the pre-trained optical character recognition model to output the text recognition results corresponding to each text segmentation screenshot; the text segmentation screenshots corresponding to the same text category between the financial contract to be signed and the financial contract that has been signed are The text recognition results are text matched to determine whether the financial contract to be signed and the financial contract that has been signed meet the contract signing standards.

优选的，文本分割信息还包括每个文本分割截图的顶点坐标，在执行待签署财资合同和已签署财资合同之间文本类别相同的文本分割截图对应的文本识别结果进行文本匹配，以确定待签署财资合同和已签署财资合同是否满足合同签署标准的步骤之前，还包括：针对待签署财资合同和已签署财资合同中的任一合同，基于该合同的所有文本分割截图的顶点坐标，对所有文本分割截图进行排序，将序号相同的一个待签署财资合同的文本分割截图和一个已签署财资合同的文本分割截图组合为一个截图数据组；针对每个截图数据组，对该截图数据组内的两个文本分割截图进行文本类别的匹配；若每一截图数据组之间的文本类别都匹配成功，则执行文本匹配的步骤；若存在文本类别不匹配的组，则对该组内的两个文本分割截图进行标记。Preferably, the text segmentation information also includes the vertex coordinates of each text segmentation screenshot, and the text recognition results corresponding to the text segmentation screenshots with the same text category between the financial contract to be signed and the financial contract that has been signed are text matched to determine Before determining whether the financial contract to be signed and the financial contract that has been signed meet the contract signing standards, it also includes: For any of the financial contract to be signed and the financial contract that has been signed, split screenshots based on all texts of the contract Vertex coordinates, sort all the text-split screenshots, and combine the text-split screenshots of a financial contract to be signed and the text-split screenshots of a signed financial contract with the same serial number into a screenshot data group; for each screenshot data group, Match the text categories of the two text-segmented screenshots in the screenshot data group; if the text categories between each screenshot data group are matched successfully, then perform the text matching step; if there are groups whose text categories do not match, then Mark the two text-split screenshots within this group.

优选的，文本识别结果包括多个字段和每个字段的识别框坐标，将待签署财资合同和已签署财资合同之间文本类别相同的文本分割截图对应的文本识别结果进行文本匹配，以确定待签署财资合同和已签署财资合同是否满足合同签署标准的步骤，具体包括：针对任一文本分割截图，将该文本分割截图对应的文本识别结果中的字段，按照字段的识别框坐标进行排列并连接，以获取该文本分割截图内的文本；对两个文本类别相同的文本分割截图的文本进行逐字匹配；若每组文本类别相同的文本分割截图的文本之间的字都是匹配的，则确定待签署财资合同和已签署财资合同满足合同签署标准，否则不满足合同签署标准。Preferably, the text recognition results include multiple fields and the coordinates of the recognition frame of each field, and the text recognition results corresponding to the text segmentation screenshots of the same text category between the financial contract to be signed and the financial contract that has been signed are text matched, so as to The steps to determine whether the financial contract to be signed and the financial contract that has been signed meet the contract signing standards include: for any text segmentation screenshot, the field in the text recognition result corresponding to the text segmentation screenshot, according to the identification frame coordinates of the field Arrange and connect to obtain the text in the text segmentation screenshot; match the text of two text segmentation screenshots with the same text category word for word; if the words between the texts of each group of text segmentation screenshots with the same text category are If they match, it is determined that the financial contract to be signed and the financial contract that has been signed meet the contract signing standards, otherwise they do not meet the contract signing standards.

优选的，若每组文本类别相同的文本分割截图的文本之间的字不是完全匹配的，则对这两个文本分割截图进行标记；以及根据所有文本分割截图的标注情况，生成财资合同审核报告并输出。Preferably, if the words in each group of text segmented screenshots of the same text category do not completely match, mark the two text segmented screenshots; and generate a financial contract review based on the labeling of all text segmented screenshots. Report and output.

优选的，目标分割模型包括第一特征提取单元、第二特征提取单元、第三特征提取单元、第四特征提取单元、区域生成网络单元、分割单元、分类单元和检测单元，其中，第一特征提取单元的输入作为目标分割模型的输入，第一特征提取单元的输出与第二特征提取单元的输入连接，第二特征提取单元的输出与第三特征提取单元的输入连接，第三特征提取单元的输出与区域生成网络单元的输入连接，区域生成网络单元的输出与第四特征提取单元的输入连接，区域生成网络单元的输出还与分割单元的输入连接，分割单元的输出作为目标缝模型的第一输出，用于输出文本分割截图，第四特征提取单元的输出与分类单元的输入连接，分类单元的输出作为目标缝模型的第二输出，用于输出文本分割截图对应的文本类别，第四特征提取单元的输出还与检测单元的输入连接，检测单元的输出作为目标缝模型的第三输出，用于输出文本分割截图的顶点坐标。Preferably, the target segmentation model includes a first feature extraction unit, a second feature extraction unit, a third feature extraction unit, a fourth feature extraction unit, a region generation network unit, a segmentation unit, a classification unit and a detection unit, wherein the first feature The input of the extraction unit is used as the input of the target segmentation model, the output of the first feature extraction unit is connected to the input of the second feature extraction unit, the output of the second feature extraction unit is connected to the input of the third feature extraction unit, and the third feature extraction unit The output of the region generation network unit is connected to the input of the region generation network unit. The output of the region generation network unit is connected to the input of the fourth feature extraction unit. The output of the region generation network unit is also connected to the input of the segmentation unit. The output of the segmentation unit serves as the target seam model. The first output is used to output text segmentation screenshots. The output of the fourth feature extraction unit is connected to the input of the classification unit. The output of the classification unit is used as the second output of the target seam model and is used to output the text category corresponding to the text segmentation screenshots. The output of the four feature extraction units is also connected to the input of the detection unit. The output of the detection unit is used as the third output of the target seam model and is used to output the vertex coordinates of the text segmentation screenshot.

优选的，第一特征提取单元、第二特征提取单元、第三特征提取单元、第四特征提取单元中每个特征提取单元中包括多个依次连接的注意力子单元，每个注意力子单元包括多个单头注意力块，每个单头注意力块包括多个注意力层、池化层、直连层和全连接层，其中不同特征提取单元中的注意力层的数量不同。Preferably, each of the first feature extraction unit, the second feature extraction unit, the third feature extraction unit and the fourth feature extraction unit includes a plurality of sequentially connected attention sub-units, each attention sub-unit It includes multiple single-head attention blocks, and each single-head attention block includes multiple attention layers, pooling layers, direct connection layers and fully connected layers, where the number of attention layers in different feature extraction units is different.

优选的，通过以下方式训练生成目标分割模型：针对每一训练样本，将标注后的训练样本输入待训练的初始分割模型，以分别确定区域生成网络单元的输出对应的第一损失值、分割单元的输出对应的第二损失值、分类单元的输出对应的第三损失值、检测单元的输出对应的第四损失值；将第一损失值、第二损失值、第三损失值、第四损失值之间的总和作为总损失值；Preferably, the target segmentation model is trained and generated in the following manner: for each training sample, the annotated training sample is input into the initial segmentation model to be trained to respectively determine the first loss value and segmentation unit corresponding to the output of the region generation network unit. The second loss value corresponding to the output of the classification unit, the third loss value corresponding to the output of the detection unit, and the fourth loss value corresponding to the output of the detection unit; combine the first loss value, the second loss value, the third loss value, the fourth loss value The sum between the values is used as the total loss value;

基于总损失值对初始分割模型进行参数调整，以生成目标分割模型。The parameters of the initial segmentation model are adjusted based on the total loss value to generate the target segmentation model.

优选的，针对每个截图数据组，通过以下方式进行文本类别的匹配：确定该截图数据组内的两个文本分割截图的文本类别是否相同；若相同，则确定该截图数据组匹配成功；若不相同，则确定该截图数据组不匹配。Preferably, for each screenshot data group, the text category is matched in the following manner: determine whether the text categories of the two text-segmented screenshots in the screenshot data group are the same; if they are the same, determine that the screenshot data group is matched successfully; if If they are not the same, it is determined that the screenshot data group does not match.

优选的，标注情况包括文本部匹配和文本类别不匹配，通过以下方式生成财资合同审核报告：当文本类别不匹配时，将待签署财资合同和已签署财资合同完整的添加在财资合同审核报告的页面中；当文本不匹配时，将待签署财资合同和已签署财资合同之间不匹配的文本分割截图各自添加在财资合同审核报告的页面中。Preferably, the marking situation includes text part matching and text category mismatch, and the financial contract review report is generated in the following way: when the text category does not match, the financial contract to be signed and the signed financial contract are completely added to the financial In the page of the contract review report; when the text does not match, screenshots of the mismatched text between the financial contract to be signed and the signed financial contract are added to the page of the financial contract review report respectively.

优选的，当文本类别不匹配与文本不匹配同时存在时，将待签署财资合同和已签署财资合同完整的添加在财资合同审核报告的页面中，并根据被标记为文本不匹配所对应的文本分割截图的顶点坐标，生成第一截图框并添加在待签署财资合同和已签署财资合同的上层，以生成财资合同审核报告。Preferably, when the text category mismatch and the text mismatch exist at the same time, the financial contract to be signed and the signed financial contract are completely added to the page of the financial contract review report, and the financial contract that is marked as a text mismatch is added in its entirety. The corresponding vertex coordinates of the text segmentation screenshot are generated, and the first screenshot frame is generated and added to the upper layer of the financial contract to be signed and the signed financial contract to generate a financial contract review report.

优选的，针对每组文本类别相同的文本分割截图，当对二者之间存在不匹配的文字时，确定文字识别框的顶点坐标并记录。Preferably, for each group of text segmentation screenshots with the same text category, when there is mismatched text between the two, the vertex coordinates of the text recognition frame are determined and recorded.

优选的，当文本不匹配时，根据文字识别框的顶点坐标，生成第二截图框并添加在对应的文本分割图的上层，以生成财资合同审核报告。Preferably, when the text does not match, a second screenshot frame is generated based on the vertex coordinates of the text recognition frame and added to the upper layer of the corresponding text segmentation graph to generate a financial contract review report.

第二方面，本申请提供了一种财资合同的审核装置，装置包括：In the second aspect, this application provides a financial contract review device, which includes:

预选模块，用于根据待审核财资合同对应的模板类型，确定出预先训练好的目标分割模型，待审核财资合同包括合同双方中的一方拟定的待签署财资合同和合同双方中的另一方反馈的已签署财资合同；The pre-selection module is used to determine the pre-trained target segmentation model based on the template type corresponding to the financial contract to be reviewed. The financial contract to be reviewed includes the financial contract to be signed drafted by one of the parties to the contract and the other party to the contract. A signed financial contract as reported by one party;

分割模块，用于将待签署财资合同和已签署财资合同分别以图片的格式输入目标分割模型，以输出待签署财资合同和已签署财资合同各自对应的文本分割信息，文本分割信息至少包括文本分割截图和每个文本分割截图分割截图对应的文本类别；The segmentation module is used to input the financial contract to be signed and the financial contract that has been signed into the target segmentation model in the format of pictures to output the text segmentation information corresponding to the financial contract to be signed and the financial contract that has been signed. The text segmentation information At least include text segmentation screenshots and text categories corresponding to each text segmentation screenshot segmentation screenshot;

识别模块，用于将待签署财资合同和已签署财资合同各自分割截图对应的文本分割截图输入至预先训练好的光学字符识别模型，以输出每个文本分割截图对应的文本识别结果；The recognition module is used to input the text segmentation screenshots corresponding to the respective segmentation screenshots of the financial contract to be signed and the signed financial contract into the pre-trained optical character recognition model to output the text recognition results corresponding to each text segmentation screenshot;

核对模块，用于将待签署财资合同和已签署财资合同之间文本类别相同的文本分割截图对应的文本识别结果进行文本匹配，以确定待签署财资合同和已签署财资合同是否满足合同签署标准。The verification module is used to match the text recognition results corresponding to the text segmentation screenshots of the same text category between the financial contract to be signed and the financial contract that has been signed, to determine whether the financial contract to be signed and the financial contract that has been signed meet the requirements. Contract signing standards.

第三方面，本申请还提供一种电子设备，包括：处理器、存储器和总线，存储器存储有处理器可执行的机器可读指令，当电子设备运行时，处理器与存储器之间通过总线通信，机器可读指令被处理器执行时执行如上述的一种财资合同的审核方法的步骤。In a third aspect, this application also provides an electronic device, including: a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the electronic device is running, the processor and the memory communicate through the bus. When the machine-readable instructions are executed by the processor, the steps of the above-mentioned method for auditing a financial contract are performed.

第四方面，本申请还提供一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器运行时执行如上述的一种财资合同的审核方法的步骤。In a fourth aspect, the present application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is run by a processor, the computer program executes the above-mentioned steps of a financial contract review method. .

本申请提供的一种本申请提供了一种财资合同的审核方法、装置、电子设备及存储介质，包括根据待审核财资合同对应的模板类型，确定出预先训练好的目标分割模型，待审核财资合同包括合同双方中的一方拟定的待签署财资合同和合同双方中的另一方反馈的已签署财资合同；将待签署财资合同和已签署财资合同分别以图片的格式输入目标分割模型，以输出待签署财资合同和已签署财资合同各自对应的文本分割信息，文本分割信息至少包括文本分割截图和每个文本分割截图分割截图对应的文本类别；将待签署财资合同和已签署财资合同各自分割截图对应的文本分割截图输入至预先训练好的光学字符识别模型，以输出每个文本分割截图对应的文本识别结果；将待签署财资合同和已签署财资合同之间文本类别相同的文本分割截图对应的文本识别结果进行文本匹配，以确定待签署财资合同和已签署财资合同是否满足合同签署标准。通过先将合同按区域进行分割，再对每个分割出的区域进行文字提取，进而在区域层面上进行文本的匹配，这样减小了文本识别过程中串行的概率，提高了合同审核的效率和准确性。This application provides a method, device, electronic device and storage medium for reviewing financial contracts, including determining a pre-trained target segmentation model based on the template type corresponding to the financial contract to be reviewed. The review of financial contracts includes the financial contract to be signed drafted by one of the two parties to the contract and the signed financial contract reported by the other party; input the financial contract to be signed and the signed financial contract in picture format respectively. The target segmentation model is to output text segmentation information corresponding to the financial contract to be signed and the financial contract that has been signed. The text segmentation information at least includes text segmentation screenshots and the text category corresponding to each text segmentation screenshot segmentation screenshot; The text segmentation screenshots corresponding to the respective segmentation screenshots of the contract and the signed financial contract are input into the pre-trained optical character recognition model to output the text recognition results corresponding to each text segmentation screenshot; the financial contract to be signed and the signed financial contract are The text recognition results corresponding to the text segmentation screenshots of the same text category between contracts are text matched to determine whether the financial contract to be signed and the financial contract that has been signed meet the contract signing standards. By first segmenting the contract by area, and then extracting text from each segmented area, and then matching the text at the regional level, this reduces the probability of serialization in the text recognition process and improves the efficiency of contract review. and accuracy.

为使本申请的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present application more obvious and understandable, preferred embodiments are given below and described in detail with reference to the attached drawings.

附图说明Description of the drawings

为了更清楚地说明本申请实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present application and therefore do not It should be regarded as a limitation of the scope. For those of ordinary skill in the art, other relevant drawings can be obtained based on these drawings without exerting creative efforts.

图1为本申请实施例所提供的一种财资合同的审核方法的流程图；Figure 1 is a flow chart of a financial contract review method provided by the embodiment of this application;

图2为本申请实施例所提供的文本分割截图的对比步骤的流程图；Figure 2 is a flow chart of the comparison steps of text segmentation screenshots provided by the embodiment of the present application;

图3为本申请实施例所提供的一种文本识别结果的对比步骤的流程图；Figure 3 is a flow chart of the comparison steps of text recognition results provided by an embodiment of the present application;

图4为本申请实施例所提供的一种财资合同的审核装置的结构示意图；Figure 4 is a schematic structural diagram of a financial contract review device provided by an embodiment of the present application;

图5为本申请实施例所提供的一种电子设备的结构示意图。FIG. 5 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围，而是仅仅表示本申请的选定实施例。基于本申请的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的每个其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only These are part of the embodiments of this application, but not all of them. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the application provided in the appended drawings is not intended to limit the scope of the claimed application, but rather to represent selected embodiments of the application. Based on the embodiments of this application, every other embodiment obtained by those skilled in the art without any creative work shall fall within the scope of protection of this application.

首先，对本申请可适用的应用场景进行介绍。本申请可应用于对交易过程中签署前后的合同的审核。First, the applicable application scenarios of this application are introduced. This application can be applied to the review of contracts before and after they are signed during the transaction.

合同是各行各业交易发生的一种凭证。由于使用场景丰富，因此在物理世界中，产生了大量的合同文本数据。大量的合同文本加剧了合同审查的时间，也拉长了客户的等待周期。行业内较多的方案是直接对合同文本进行文字识别，然后匹配文本行，对文本进行比较，来确认合同是否被修改。但使用上述方法进行合同文本比对时，往往会发生隔行、错行的问题，最终导致比对结果有偏差，影响审核进度，加大了人工二次审核的工作量。因此需要一种审核结果更准确、效率更高的合同审核方法。A contract is a document that transactions in all walks of life take place. Due to the rich usage scenarios, a large amount of contract text data is generated in the physical world. The large amount of contract text increases the time required for contract review and also lengthens the waiting period for customers. Most solutions in the industry are to directly recognize the text of the contract text, then match the text lines, and compare the texts to confirm whether the contract has been modified. However, when using the above method to compare contract texts, interlaced and wrong lines often occur, which ultimately leads to deviations in the comparison results, affects the review progress, and increases the workload of manual secondary review. Therefore, a contract review method with more accurate review results and higher efficiency is needed.

基于此，本申请实施例提供了一种财资合同的审核方法、装置、电子设备及存储介质，以提高合同审核的效率和准确性。Based on this, embodiments of this application provide a financial contract review method, device, electronic device and storage medium to improve the efficiency and accuracy of contract review.

请参阅图1，图1为本申请实施例所提供的一种财资合同的审核方法的流程图。所如图1中所示，本申请实施例提供的一种财资合同的审核方法，包括：Please refer to Figure 1, which is a flow chart of a financial contract review method provided by an embodiment of the present application. As shown in Figure 1, the embodiment of this application provides a method for auditing financial contracts, including:

S101、根据待审核财资合同对应的模板类型，确定出预先训练好的目标分割模型，待审核财资合同包括合同双方中的一方拟定的待签署财资合同和合同双方中的另一方反馈的已签署财资合同。S101. Determine the pre-trained target segmentation model based on the template type corresponding to the financial contract to be reviewed. The financial contract to be reviewed includes the financial contract to be signed drafted by one of the two parties to the contract and the financial contract to be signed by the other party to the contract. The financial contract has been signed.

这里的财资合同可以为理财、证券、银行等金融领域的交易、支付过程中所产生的支付协议、交易合同、保险保单等等。也可以是其他领域所签署的交易合同。这里的待签署财资合同为交易双方中的一方拟定后并转交给另一方并需要另一方签署确认的合同。这里的已签署财资合同为交易双方中的另一方接收到交易中的一方给出的待签署财资合同后、签署完毕并反馈回一方的合同。这里的待签署财资合同和已签署财资合同中合同的内容应该是完全一直的，但为了防止合同内容被修改，因此需要对两份财资合同进行校对。这里的财资合同可以是电子形式的、也可以是纸质的，但需要转换或扫描为图片形式再进行处理。The financial contracts here can be transactions in financial fields such as wealth management, securities, and banks, payment agreements, transaction contracts, insurance policies, etc. generated during the payment process. It can also be a transaction contract signed in other fields. The financial contract to be signed here is a contract drawn up by one party to the transaction and transferred to the other party and requires the other party to sign and confirm. The signed financial contract here is a contract in which the other party to the transaction receives the financial contract to be signed from one party in the transaction, completes the signing, and feeds it back to the other party. The content of the financial contract to be signed and the financial contract that has been signed should be completely consistent. However, in order to prevent the content of the contract from being modified, the two financial contracts need to be proofread. The financial contracts here can be in electronic form or paper, but they need to be converted or scanned into image form before processing.

可以理解的是，对于不同种类的交易，这里的财资合同通常是具有固定的模板的，合同的制式多为类似的，因此，这里对于同一种制式的财资合同，可以训练出对应的分割模型。因此，在步骤S101中，在对财资合同审查之前，可以先根据财资合同的模板，选择对应的模型，以保证分割的准确性。It is understandable that for different types of transactions, the financial contracts here usually have fixed templates, and the formats of the contracts are mostly similar. Therefore, for financial contracts of the same format, the corresponding segmentation can be trained. Model. Therefore, in step S101, before reviewing the financial contract, the corresponding model can be selected based on the template of the financial contract to ensure the accuracy of segmentation.

S102、将待签署财资合同和已签署财资合同分别以图片的格式输入目标分割模型，以输出待签署财资合同和已签署财资合同各自对应的文本分割信息，文本分割信息至少包括文本分割截图和每个文本分割截图分割截图对应的文本类别。S102. Input the financial contract to be signed and the financial contract that has been signed into the target segmentation model in the format of pictures to output text segmentation information corresponding to the financial contract to be signed and the financial contract that has been signed. The text segmentation information at least includes text. Split screenshots and text categories for each split screenshot.

这里的分割模型可以按照财资合同的版块划分和文本内容进行分割，示例性的，文本类别可以分别为标题、甲乙方信息、内容栏、清单、条款和印章等等几种。这里的文本类别是可以根据财资合同的内容确定的。The segmentation model here can be segmented according to the section division and text content of the financial contract. For example, the text categories can be title, information about Party A and Party B, content column, list, terms and seals, etc. The text categories here can be determined based on the content of the financial contract.

每个文本分割截图是从财资合同中截取的与文本类别对应的区域。Each text segmentation screenshot is an area corresponding to the text category taken from the financial contract.

S103、将待签署财资合同和已签署财资合同各自的文本分割截图输入至预先训练好的光学字符识别模型，以输出每个文本分割截图对应的文本识别结果。S103. Input the text segmentation screenshots of the financial contract to be signed and the financial contract that has been signed into the pre-trained optical character recognition model to output the text recognition results corresponding to each text segmentation screenshot.

在步骤S103中，可以使用光学字符识别(OCR，Optical Character Recognition)模型对每个文本分割截图进行文字提取，以获得该截图中的所有文字。In step S103, an optical character recognition (OCR, Optical Character Recognition) model can be used to perform text extraction on each text-segmented screenshot to obtain all text in the screenshot.

S104、将待签署财资合同和已签署财资合同之间文本类别相同的文本分割截图对应的文本识别结果进行文本匹配，以确定待签署财资合同和已签署财资合同是否满足合同签署标准。S104. Perform text matching on the text recognition results corresponding to the text segmentation screenshots of the same text category between the financial contract to be signed and the financial contract that has been signed, to determine whether the financial contract to be signed and the financial contract that has been signed meet the contract signing standards. .

具体的，针对每个截图数据组，通过以下方式进行文本类别的匹配：确定该截图数据组内的两个文本分割截图的文本类别是否相同。若相同，则确定该截图数据组匹配成功，若不相同，则确定该截图数据组不匹配。Specifically, for each screenshot data group, text category matching is performed in the following manner: determining whether the text categories of the two text-segmented screenshots in the screenshot data group are the same. If they are the same, it is determined that the screenshot data group matches successfully; if they are not the same, it is determined that the screenshot data group does not match.

示例性的，将从待签署财资合同中提取出的标题对应的第一文本识别结果和从已签署财资合同中提取出的标题对应的第二文本识别结果进行逐字的匹配，确定两个合同的标题是否完全一致，依次类推，按照财资合同的排版顺序，依次对甲乙方信息、内容栏、清单、条款和印章等区域的文字进行比对，以完成财资合同的审核，确定待签署财资合同和已签署财资合同之间的内容是否存在改动。对于不匹配的部分，可以将对应的文本分割截图整理，统一生成审核报告，以便于二次审核，或者直接通过人工进行校对。For example, the first text recognition result corresponding to the title extracted from the financial contract to be signed and the second text recognition result corresponding to the title extracted from the signed financial contract are matched word for word, and the two text recognition results are determined. Whether the titles of each contract are completely consistent, and so on, according to the layout order of the financial contract, compare the text in the information, content column, list, terms and seals of Party A and Party B in order to complete the review of the financial contract and determine Whether there are any changes in the content between the financial contract to be signed and the financial contract that has been signed. For the unmatched parts, the corresponding text can be divided into screenshots and sorted out, and an audit report can be generated uniformly to facilitate secondary review, or it can be directly proofread manually.

本申请实施例提供的财资合同的审核方法，通过先将合同按区域进行分割，再对每个分割出的区域进行文字提取，进而在区域层面上进行文本的匹配，这样减小了文本识别过程中串行的概率，提高了合同审核的效率和准确性。The method for auditing financial contracts provided by the embodiment of this application first divides the contract into regions, then extracts text from each divided region, and then matches the text at the regional level, thus reducing the need for text recognition. The probability of serialization in the process improves the efficiency and accuracy of contract review.

请参阅图2，图2为本申请一实施例提供的一种文本分割截图的对比步骤的流程图。如图2中所示，文本分割信息还包括每个文本分割截图的顶点坐标，在执行待签署财资合同和已签署财资合同之间文本类别相同的文本分割截图对应的文本识别结果进行文本匹配，以确定待签署财资合同和已签署财资合同是否满足合同签署标准的步骤之前，还可以通过以下方式对文本分割截图进行对比：Please refer to Figure 2. Figure 2 is a flow chart of comparison steps of text segmentation screenshots provided by an embodiment of the present application. As shown in Figure 2, the text segmentation information also includes the vertex coordinates of each text segmentation screenshot. The text recognition results corresponding to the text segmentation screenshots with the same text category between the financial contract to be signed and the financial contract that has been signed are text recognition results. Before matching to determine whether the financial contract to be signed and the financial contract that has been signed meet the contract signing standards, you can also compare the text split screenshots in the following ways:

S201、针对待签署财资合同和已签署财资合同中的任一合同，基于该合同的所有文本分割截图的顶点坐标，对所有文本分割截图进行排序，将序号相同的一个待签署财资合同的文本分割截图和一个已签署财资合同的文本分割截图组合为一个截图数据组。S201. For any contract among the financial contract to be signed and the financial contract that has been signed, based on the vertex coordinates of all the text-divided screenshots of the contract, sort all the text-divided screenshots, and sort the financial contract to be signed with the same serial number. The text-segmented screenshots and the text-segmented screenshots of a signed financial contract are combined into a screenshot data group.

可以理解的是，按照文本分割截图在财资合同中的排版位置，对文本分割截图进行排序，示例性的，可以按照标题、甲乙方信息、内容栏、清单、条款和印章的顺序，其中，对于一个财资合同，每个文本类别所对应的文本分割截图的数量至少为一个，例如，甲乙方信息可以被划分在一个文本分割截图中，也可以是甲方信息对应划分在一个文本分割截图中，乙方信息对应划分在另一文本分割截图中，但这两个文本分割截图之间也是有固定的排版位置的，因此，这两个文本分割截图之间的顺序也是固定的。例如对于甲乙方信息而言，可以按照甲方信息、乙方信息的顺序排列的。It can be understood that the text-divided screenshots are sorted according to their layout positions in the financial contract. For example, the text-divided screenshots can be sorted in the order of title, information of Party A and Party B, content column, list, terms and seals, where, For a financial contract, the number of text segmentation screenshots corresponding to each text category is at least one. For example, the information of Party A and Party B can be divided into one text segmentation screenshot, or the information of Party A can be divided into one text segmentation screenshot. , Party B's information is correspondingly divided into another text-divided screenshot, but there is also a fixed layout position between the two text-divided screenshots. Therefore, the order between the two text-divided screenshots is also fixed. For example, the information of Party A and Party B can be arranged in the order of Party A's information and Party B's information.

S202、针对每个截图数据组，对该截图数据组内的两个文本分割截图进行文本类别的匹配。S202. For each screenshot data group, perform text category matching on the two text-divided screenshots in the screenshot data group.

S203、若每一截图数据组之间的文本类别都匹配成功，则执行文本匹配的步骤。S203. If the text categories between each screenshot data group are matched successfully, execute the text matching step.

示例性的，对于待签署财资合同和已签署财资合同各自排序后的文本分割截图序列，将待签署财资合同中第一位序的文本分割截图与已签署财资合同中第一位序的文本分割截图进行文本类别的匹配，若两个文本分割截图对应的文本类别是一致的，例如都是标题，则确定这一组匹配成功，若所有截图数据组都匹配成功，则可以进行步骤S104。For example, for the sequence of text segmentation screenshots of the financial contract to be signed and the financial contract that has been signed, the text segmentation screenshot of the first order in the financial contract to be signed is compared with the text segmentation screenshot of the first order in the financial contract that has been signed. Match the text categories of the text segmentation screenshots in sequence. If the text categories corresponding to the two text segmentation screenshots are consistent, for example, both are titles, then it is determined that this group of matches is successful. If all screenshot data groups are matched successfully, you can proceed. Step S104.

S204、若存在文本类别不匹配的组，则对该组内的两个文本分割截图进行标记。S204. If there is a group whose text categories do not match, mark the two text segmentation screenshots in the group.

对于文本类别不匹配的截图数据组中的两个文本分割截图，需要对这两个文本分割截图进行标记，可以标记为文本类别不匹配截图数据组，以便于后续进行人工审核。For the two text-segmented screenshots in the screenshot data group with mismatched text categories, the two text-segmented screenshots need to be marked. They can be marked as the screenshot data group with mismatched text categories to facilitate subsequent manual review.

示例性的，将从待签署财资合同中提取出的标题对应的第一文本识别结果和从已签署财资合同中提取出的标题对应的第二文本识别结果进行逐字的匹配，确定两个合同的标题是否完全一致，依次类推，将从待签署财资合同中提取出的甲乙方信息中对应的第三文本识别结果和从已签署财资合同中提取出的甲乙方信息中对应的第四文本识别结果进行逐字的匹配，等等。For example, the first text recognition result corresponding to the title extracted from the financial contract to be signed and the second text recognition result corresponding to the title extracted from the signed financial contract are matched word for word, and the two text recognition results are determined. Whether the titles of each contract are completely consistent, and so on, the corresponding third text recognition results from the information of Party A and Party B extracted from the financial contract to be signed and the corresponding third text recognition results from the information of Party A and Party B extracted from the signed financial contract The fourth text recognition results are matched word by word, and so on.

图3为本申请实施例所提供的一种文本识别结果的对比步骤的流程图。在本申请的一个实施例中，文本识别结果包括多个字段和每个字段的识别框坐标，将待签署财资合同和已签署财资合同之间文本类别相同的文本分割截图对应的文本识别结果进行文本匹配，以确定待签署财资合同和已签署财资合同是否满足合同签署标准的步骤，具体包括：Figure 3 is a flow chart of the comparison steps of text recognition results provided by an embodiment of the present application. In one embodiment of the present application, the text recognition result includes multiple fields and the coordinates of the recognition frame of each field, and the text recognition corresponding to the text segmentation screenshots with the same text category between the financial contract to be signed and the financial contract that has been signed is The steps of text matching the results to determine whether the financial contract to be signed and the financial contract that has been signed meet the contract signing standards include:

S301、针对任一文本分割截图，将该文本分割截图对应的文本识别结果中的字段，按照字段的识别框坐标进行排列并连接，以获取该文本分割截图内的文本。S301. For any text segmentation screenshot, arrange and connect the fields in the text recognition results corresponding to the text segmentation screenshot according to the coordinates of the recognition frames of the fields to obtain the text in the text segmentation screenshot.

对于任一文本分割截图中识别出的多个字段，首先需要按其在截图中的排列顺序进行拼接，以获得完整的文本段落。For multiple fields identified in any text split screenshot, they first need to be spliced in the order they are arranged in the screenshot to obtain a complete text paragraph.

S302、对两个文本类别相同的文本分割截图的文本进行逐字匹配。S302. Match the text of two text segmentation screenshots with the same text category word by word.

S303、若每组文本类别相同的文本分割截图的文本之间的字都是匹配的，则确定待签署财资合同和已签署财资合同满足合同签署标准，否则不满足合同签署标准。S303. If the words in the texts of each group of text segmentation screenshots of the same text category match, it is determined that the financial contract to be signed and the financial contract that has been signed meet the contract signing standards, otherwise the contract signing standards are not met.

这里需要比对每个位序上的文字是否完全相同，如果完全相同，则可以确定满足合同签署标准，说明待签署财资合同和已签署财资合同之间没有被修改过合同内容。Here you need to compare whether the words in each sequence are exactly the same. If they are exactly the same, it can be determined that the contract signing standards are met, indicating that the contract content has not been modified between the financial contract to be signed and the financial contract that has been signed.

S304、若每组文本类别相同的文本分割截图的文本之间的字不是完全匹配的，则对这两个文本分割截图进行标记。S304. If the words between the texts of each group of text segmentation screenshots of the same text category do not completely match, mark the two text segmentation screenshots.

进一步的，还可以根据所有文本分割截图的标注情况，生成财资合同审核报告并输出。Furthermore, a financial contract review report can be generated and output based on the annotation of all text segmented screenshots.

将所有被标记过的文本分割截图可以整理生成财资合同审核报告，工作人员可以清晰的确定出两个财资合同之间是否有修改，进行二次审核，及时发现问题，避免造成损失。Screenshots of all marked texts can be sorted and generated to generate a financial contract review report. Staff can clearly determine whether there have been modifications between two financial contracts, conduct a secondary review, and discover problems in a timely manner to avoid losses.

在本申请的一个实施例中，可以通过以下几种方案生成财资合同审核报告：In an embodiment of this application, a financial contract audit report can be generated through the following solutions:

首先，标注情况包括文本部匹配和文本类别不匹配，通过以下方式生成财资合同审核报告：当文本类别不匹配时，将待签署财资合同和已签署财资合同完整的添加在财资合同审核报告的页面中；当文本不匹配时，将待签署财资合同和已签署财资合同之间不匹配的文本分割截图各自添加在财资合同审核报告的页面中。First, the annotation situation includes text part matching and text category mismatch, and a financial contract audit report is generated in the following way: when the text category does not match, the financial contract to be signed and the signed financial contract are completely added to the financial contract. In the page of the audit report; when the text does not match, screenshots of the mismatched text between the financial contract to be signed and the signed financial contract are added to the page of the financial contract audit report respectively.

当文本类别不匹配与文本不匹配同时存在时，将待签署财资合同和已签署财资合同完整的添加在财资合同审核报告的页面中，并根据被标记为文本不匹配所对应的文本分割截图的顶点坐标，生成第一截图框并添加在待签署财资合同和已签署财资合同的上层，以生成财资合同审核报告。When text category mismatch and text mismatch exist at the same time, the financial contract to be signed and the financial contract that has been signed are completely added to the page of the financial contract review report, and the text corresponding to the text that is marked as text mismatch is added. Split the vertex coordinates of the screenshot, generate the first screenshot frame and add it to the upper layer of the financial contract to be signed and the signed financial contract to generate a financial contract review report.

除了财资合同或截图，报告中还至少应该标注有具体的错误事项，例如是哪一文本类别匹配失败，或哪一文字匹配失败等。In addition to financial contracts or screenshots, the report should also at least indicate specific error matters, such as which text category failed to match, or which text failed to match, etc.

进一步的，针对每组文本类别相同的文本分割截图，当对二者之间存在不匹配的文字时，确定文字识别框的顶点坐标并记录。Further, for each group of text segmentation screenshots with the same text category, when there is mismatched text between the two, the vertex coordinates of the text recognition frame are determined and recorded.

当文本不匹配时，根据文字识别框的顶点坐标，生成第二截图框并添加在对应的文本分割图的上层，以生成财资合同审核报告。When the text does not match, a second screenshot frame is generated based on the vertex coordinates of the text recognition frame and added to the upper layer of the corresponding text segmentation graph to generate a financial contract review report.

这样，当审核人员拿到财资合同审核报告进行二次校验时，可以一眼看出匹配失败的点，进而提高审核整体效率。In this way, when the auditors get the financial contract audit report for secondary verification, they can see at a glance the points where the matching failed, thereby improving the overall efficiency of the audit.

在本申请的一个实施例中，通过以下方式构建与模板类型对应的分割模型：In one embodiment of the present application, a segmentation model corresponding to the template type is constructed in the following manner:

首先是对初始分割模型的选择，这里选择Swin主干网络为原型，用于分割、检测、分类三个分支的构建。The first is the selection of the initial segmentation model. Here, the Swin backbone network is selected as the prototype for the construction of the three branches of segmentation, detection, and classification.

目标分割模型包括第一特征提取单元、第二特征提取单元、第三特征提取单元、第四特征提取单元、区域生成网络单元、分割单元、分类单元和检测单元，其中，第一特征提取单元的输入作为目标分割模型的输入，第一特征提取单元的输出与第二特征提取单元的输入连接，第二特征提取单元的输出与第三特征提取单元的输入连接，第三特征提取单元的输出与区域生成网络单元的输入连接，区域生成网络单元的输出与第四特征提取单元的输入连接，区域生成网络单元的输出还与分割单元的输入连接，分割单元的输出作为目标缝模型的第一输出，用于输出文本分割截图，第四特征提取单元的输出与分类单元的输入连接，分类单元的输出作为目标缝模型的第二输出，用于输出文本分割截图对应的文本类别，第四特征提取单元的输出还与检测单元的输入连接，检测单元的输出作为目标缝模型的第三输出，用于输出文本分割截图的顶点坐标。The target segmentation model includes a first feature extraction unit, a second feature extraction unit, a third feature extraction unit, a fourth feature extraction unit, a region generation network unit, a segmentation unit, a classification unit and a detection unit, wherein the first feature extraction unit As input to the target segmentation model, the output of the first feature extraction unit is connected to the input of the second feature extraction unit, the output of the second feature extraction unit is connected to the input of the third feature extraction unit, and the output of the third feature extraction unit is connected to The input of the region generation network unit is connected, the output of the region generation network unit is connected with the input of the fourth feature extraction unit, the output of the region generation network unit is also connected with the input of the segmentation unit, and the output of the segmentation unit serves as the first output of the target seam model , used to output text segmentation screenshots, the output of the fourth feature extraction unit is connected to the input of the classification unit, the output of the classification unit is used as the second output of the target seam model, and is used to output the text category corresponding to the text segmentation screenshots, the fourth feature extraction The output of the unit is also connected to the input of the detection unit. The output of the detection unit is used as the third output of the target seam model and is used to output the vertex coordinates of the text segmentation screenshot.

第一特征提取单元、第二特征提取单元、第三特征提取单元、第四特征提取单元中每个特征提取单元中包括多个依次连接的注意力子单元，每个注意力子单元包括多个单头注意力块，每个单头注意力块包括多个注意力层、池化层、直连层(Short cut)和全连接层，其中不同特征提取单元中的注意力层的数量不同。Each of the first feature extraction unit, the second feature extraction unit, the third feature extraction unit and the fourth feature extraction unit includes a plurality of sequentially connected attention sub-units, and each attention sub-unit includes a plurality of Single-head attention block, each single-head attention block includes multiple attention layers, pooling layers, direct connection layers (Short cut) and fully connected layers, where the number of attention layers in different feature extraction units is different.

这里的注意力层由key、query和value矩阵运算组成。多个单头注意力块形成具有多头注意力机制的注意力子单元。其中，第一特征提取单元、第二特征提取单元、第三特征提取单元、第四特征提取单元中，每个单头注意力块中的注意力层的数量按照3:6:6:9的比例设置。The attention layer here consists of key, query and value matrix operations. Multiple single-head attention blocks form an attention sub-unit with a multi-head attention mechanism. Among them, in the first feature extraction unit, the second feature extraction unit, the third feature extraction unit, and the fourth feature extraction unit, the number of attention layers in each single-head attention block is in accordance with 3:6:6:9. Scale settings.

基于MaskRCNN，在第三特征提取单元和第四特征提取单元之间加设了区域生成网络(RPN，Region Proposal Networks)单元，这里的区域生成网络单元由卷积层、激活层和池化层构成。Based on MaskRCNN, a Region Proposal Networks (RPN) unit is added between the third feature extraction unit and the fourth feature extraction unit. The region generation network unit here consists of a convolution layer, an activation layer and a pooling layer. .

在特征提取主干网络的基础上，设置了三个并行的pipline，用于实现对合同内容的识别和检测，其中，第三特征提取单元输出的特征向量经过区域生成网络单元(输出检测框的坐标)、最后由分割单元裁剪出对应的文本缝截图。这里的分割单元是由按照resnet组成的block结构(包括卷积层、池化层、激活层和Short cut层)和反卷积层构成。最终通过一个1x1卷积将特征映射成图像大小k维的截图。On the basis of the feature extraction backbone network, three parallel piplines are set up to realize the identification and detection of contract content. Among them, the feature vector output by the third feature extraction unit passes through the area generation network unit (outputs the coordinates of the detection frame ), and finally the corresponding text seam screenshot is cut out by the segmentation unit. The segmentation unit here is composed of a block structure composed of resnet (including convolution layer, pooling layer, activation layer and short cut layer) and deconvolution layer. Finally, the features are mapped into k-dimensional screenshots of image size through a 1x1 convolution.

第四特征提取单元输出的特征向量经过分类单元，进行分类，从而获得对应的文本分割截图的文本类别。这里的分类单元采用的是MLP(Multi-Layer Perceptron，多层感知器)。The feature vector output by the fourth feature extraction unit is classified by the classification unit, thereby obtaining the text category of the corresponding text segmentation screenshot. The classification unit here uses MLP (Multi-Layer Perceptron, multi-layer perceptron).

检测单元与分类单元的结构类似，检测单元用于输出文本分割截图的顶点坐标，这里的顶点坐标用于指示文本分割截图在对应的财资合同图片中的相对位置。The structure of the detection unit is similar to that of the classification unit. The detection unit is used to output the vertex coordinates of the text segmentation screenshot. The vertex coordinates here are used to indicate the relative position of the text segmentation screenshot in the corresponding financial contract picture.

在本申请的一个实施例中，通过以下方式训练生成目标分割模型：In one embodiment of this application, the target segmentation model is trained in the following manner:

步骤一：针对每一训练样本，将标注后的训练样本输入待训练的初始分割模型，以分别确定区域生成网络单元的输出对应的第一损失值、分割单元的输出对应的第二损失值、分类单元的输出对应的第三损失值、检测单元的输出对应的第四损失值；Step 1: For each training sample, input the labeled training sample into the initial segmentation model to be trained to determine the first loss value corresponding to the output of the region generation network unit, the second loss value corresponding to the output of the segmentation unit, and The third loss value corresponding to the output of the classification unit, and the fourth loss value corresponding to the output of the detection unit;

步骤二：将所述第一损失值、第二损失值、第三损失值、第四损失值之间的总和作为总损失值；Step 2: Take the sum of the first loss value, the second loss value, the third loss value, and the fourth loss value as the total loss value;

步骤三：基于所述总损失值对所述初始分割模型进行参数调整，以生成目标分割模型。Step 3: Adjust parameters of the initial segmentation model based on the total loss value to generate a target segmentation model.

接着基于transformer(注意力机制)主干网络的二阶段分割模型训练方式，进行任务定义、损失函数设计和迭代训练：Then, based on the two-stage segmentation model training method of the transformer (attention mechanism) backbone network, task definition, loss function design and iterative training are carried out:

确定出合适的训练样本，这里的训练样本为模板制式统一的财资合同。财资合同有别于其它合同，它具有稳定的结构特征，经过对采集到的情况分析后，可以将其分割为标题、甲乙方信息、内容栏、清单、条款和印章等。Determine appropriate training samples. The training samples here are financial contracts with a unified template format. Financial contracts are different from other contracts in that they have stable structural characteristics. After analyzing the collected information, they can be divided into titles, information about Party A and Party B, content columns, lists, terms and seals, etc.

对收集到的训练样本按照检测框、文本分割区域和对应的文本类别在云标注平台上完成标注，并划分为训练集、验证集和测试集，形成完整的数据集。The collected training samples are annotated on the cloud annotation platform according to the detection frame, text segmentation area and corresponding text category, and divided into training set, verification set and test set to form a complete data set.

对于损失函数的设计部分，损失函数包含四部分，RPN层输出和真实框计算对应的MSE Loss(第一损失值)，检测层输出和真实框同样计算对应MSE Loss(第二损失值)，分类层和真实类别得到CE Loss(第三损失值)，分割层的输出按照像素点和真实MASK计算FocalLoss(第四损失值)，最后将这四个Loss结合后形成总Loss。For the design part of the loss function, the loss function contains four parts. The output of the RPN layer and the real frame are calculated to correspond to the MSE Loss (the first loss value). The output of the detection layer and the real frame are also calculated to correspond to the MSE Loss (the second loss value). Classification The CE Loss (the third loss value) is obtained from the layer and the real category. The output of the segmentation layer calculates the FocalLoss (the fourth loss value) according to the pixel points and the real MASK. Finally, these four losses are combined to form the total loss.

学习率设置为1e-3，batch_size设置为64，分辨率设置为1024(短边按照padding形式补齐)，优化器采用adam优化器，根据梯度下降算法，迭代训练，直到模型的Loss变化范围很小，则得到训练后的分割模型。The learning rate is set to 1e-3, the batch_size is set to 64, and the resolution is set to 1024 (the short sides are filled in according to the padding form). The optimizer uses the adam optimizer, and iteratively trains according to the gradient descent algorithm until the loss range of the model is very small. If it is small, the trained segmentation model is obtained.

基于同一发明构思，本申请实施例中还提供了与财资合同的审核方法对应的财资合同的审核装置，由于本申请实施例中的装置解决问题的原理与本申请实施例上述财资合同的审核方法相似，因此装置的实施可以参见方法的实施，重复之处不再赘述。Based on the same inventive concept, the embodiments of the present application also provide a financial contract review device corresponding to the financial contract review method. Since the problem-solving principle of the device in the embodiments of the present application is consistent with the above-mentioned financial contract in the embodiments of the present application, The audit methods are similar, so the implementation of the device can be referred to the implementation of the method, and repeated details will not be repeated.

请参阅图4，图4为本申请实施例所提供的一种财资合同的审核装置的结构示意图。如图4中所示，所述审核装置400包括：Please refer to FIG. 4 , which is a schematic structural diagram of a financial contract review device provided by an embodiment of the present application. As shown in Figure 4, the audit device 400 includes:

预选模块410，用于根据待审核财资合同对应的模板类型，确定出预先训练好的目标分割模型，待审核财资合同包括合同双方中的一方拟定的待签署财资合同和合同双方中的另一方反馈的已签署财资合同；The pre-selection module 410 is used to determine the pre-trained target segmentation model based on the template type corresponding to the financial contract to be reviewed. The financial contract to be reviewed includes the financial contract to be signed drafted by one of the two parties to the contract and the financial contract to be signed by both parties to the contract. The signed financial contract as reported by the other party;

分割模块420，用于将待签署财资合同和已签署财资合同分别以图片的格式输入目标分割模型，以输出待签署财资合同和已签署财资合同各自对应的文本分割信息，文本分割信息至少包括文本分割截图和每个文本分割截图分割截图对应的文本类别；The segmentation module 420 is used to input the financial contract to be signed and the financial contract that has been signed into the target segmentation model in the format of pictures, so as to output the text segmentation information corresponding to the financial contract to be signed and the financial contract that has been signed. Text segmentation The information at least includes text segmentation screenshots and text categories corresponding to each text segmentation screenshot;

识别模块430，用于将待签署财资合同和已签署财资合同各自分割截图对应的文本分割截图输入至预先训练好的光学字符识别模型，以输出每个文本分割截图对应的文本识别结果；The recognition module 430 is used to input the text segmentation screenshots corresponding to the respective segmentation screenshots of the financial contract to be signed and the signed financial contract into the pre-trained optical character recognition model to output the text recognition results corresponding to each text segmentation screenshot;

核对模块440，用于将待签署财资合同和已签署财资合同之间文本类别相同的文本分割截图对应的文本识别结果进行文本匹配，以确定待签署财资合同和已签署财资合同是否满足合同签署标准。The verification module 440 is used to perform text matching on the text recognition results corresponding to the text segmentation screenshots of the same text category between the financial contract to be signed and the financial contract that has been signed, to determine whether the financial contract to be signed and the financial contract that have been signed are Meet contract signing criteria.

在一优选实施例中，文本分割信息还包括每个文本分割截图的顶点坐标，还包括匹配模块(图中未示出)用于针对待签署财资合同和已签署财资合同中的任一合同，基于该合同的所有文本分割截图的顶点坐标，对所有文本分割截图进行排序，将序号相同的一个待签署财资合同的文本分割截图和一个已签署财资合同的文本分割截图组合为一个截图数据组；针对每个截图数据组，对该截图数据组内的两个文本分割截图进行文本类别的匹配；若每一截图数据组之间的文本类别都匹配成功，则执行文本匹配的步骤；若存在文本类别不匹配的组，则对该组内的两个文本分割截图进行标记。In a preferred embodiment, the text segmentation information also includes the vertex coordinates of each text segmentation screenshot, and also includes a matching module (not shown in the figure) for targeting any of the financial contracts to be signed and the financial contracts that have been signed. Contract, based on the vertex coordinates of all text-split screenshots of the contract, sort all the text-split screenshots, and combine the text-split screenshots of a financial contract to be signed and the text-split screenshots of a signed financial contract with the same serial number into one Screenshot data group; for each screenshot data group, match the text categories of the two text-divided screenshots in the screenshot data group; if the text categories between each screenshot data group are matched successfully, perform the text matching step ; If there is a group with mismatched text categories, mark the two text-split screenshots in the group.

在一优选实施例中，文本识别结果包括多个字段和每个字段的识别框坐标，核对模块440具体用于针对任一文本分割截图，将该文本分割截图对应的文本识别结果中的字段，按照字段的识别框坐标进行排列并连接，以获取该文本分割截图内的文本；对两个文本类别相同的文本分割截图的文本进行逐字匹配；若每组文本类别相同的文本分割截图的文本之间的字都是匹配的，则确定待签署财资合同和已签署财资合同满足合同签署标准，否则不满足合同签署标准。In a preferred embodiment, the text recognition result includes multiple fields and the recognition frame coordinates of each field. The verification module 440 is specifically configured to segment any text screenshot and segment the field in the text recognition result corresponding to the text segmentation screenshot. Arrange and connect according to the coordinates of the recognition frame of the field to obtain the text in the text segmentation screenshot; match the text of the two text segmentation screenshots with the same text category word for word; if each group of text categories has the same text segmentation screenshot, If the words between them all match, it is determined that the financial contract to be signed and the financial contract that has been signed meet the contract signing standards, otherwise they do not meet the contract signing standards.

在一优选实施例中，核对模块440还用于，若每组文本类别相同的文本分割截图的文本之间的字不是完全匹配的，则对这两个文本分割截图进行标记；以及根据所有文本分割截图的标注情况，生成财资合同审核报告并输出。In a preferred embodiment, the checking module 440 is also used to mark the two text segmentation screenshots if the words between the texts of each group of text segmentation screenshots of the same text category do not completely match; and according to all texts Divide the annotation status of the screenshot, generate a financial contract review report and output it.

在一优选实施例中，目标分割模型包括第一特征提取单元、第二特征提取单元、第三特征提取单元、第四特征提取单元、区域生成网络单元、分割单元、分类单元和检测单元，其中，第一特征提取单元的输入作为目标分割模型的输入，第一特征提取单元的输出与第二特征提取单元的输入连接，第二特征提取单元的输出与第三特征提取单元的输入连接，第三特征提取单元的输出与区域生成网络单元的输入连接，区域生成网络单元的输出与第四特征提取单元的输入连接，区域生成网络单元的输出还与分割单元的输入连接，分割单元的输出作为目标缝模型的第一输出，用于输出文本分割截图，第四特征提取单元的输出与分类单元的输入连接，分类单元的输出作为目标缝模型的第二输出，用于输出文本分割截图对应的文本类别，第四特征提取单元的输出还与检测单元的输入连接，检测单元的输出作为目标缝模型的第三输出，用于输出文本分割截图的顶点坐标。In a preferred embodiment, the target segmentation model includes a first feature extraction unit, a second feature extraction unit, a third feature extraction unit, a fourth feature extraction unit, a region generation network unit, a segmentation unit, a classification unit and a detection unit, where , the input of the first feature extraction unit is used as the input of the target segmentation model, the output of the first feature extraction unit is connected with the input of the second feature extraction unit, the output of the second feature extraction unit is connected with the input of the third feature extraction unit, the The output of the three feature extraction units is connected to the input of the region generation network unit, the output of the region generation network unit is connected to the input of the fourth feature extraction unit, the output of the region generation network unit is also connected to the input of the segmentation unit, and the output of the segmentation unit is as The first output of the target seam model is used to output text segmentation screenshots. The output of the fourth feature extraction unit is connected to the input of the classification unit. The output of the classification unit is used as the second output of the target seam model and is used to output the corresponding text segmentation screenshots. For the text category, the output of the fourth feature extraction unit is also connected to the input of the detection unit. The output of the detection unit is used as the third output of the target seam model and is used to output the vertex coordinates of the text segmentation screenshot.

在一优选实施例中，第一特征提取单元、第二特征提取单元、第三特征提取单元、第四特征提取单元中每个特征提取单元中包括多个依次连接的注意力子单元，每个注意力子单元包括多个单头注意力块，每个单头注意力块包括多个注意力层、池化层、直连层和全连接层，其中不同特征提取单元中的注意力层的数量不同。In a preferred embodiment, each of the first feature extraction unit, the second feature extraction unit, the third feature extraction unit and the fourth feature extraction unit includes a plurality of sequentially connected attention sub-units, each of which The attention sub-unit includes multiple single-head attention blocks. Each single-head attention block includes multiple attention layers, pooling layers, direct connection layers and fully connected layers. Among them, the attention layers in different feature extraction units Quantities vary.

在一优选实施例中，还包括训练模块(图中未示出)，用于通过以下方式训练生成目标分割模型：针对每一训练样本，将标注后的训练样本输入待训练的初始分割模型，以分别确定区域生成网络单元的输出对应的第一损失值、分割单元的输出对应的第二损失值、分类单元的输出对应的第三损失值、检测单元的输出对应的第四损失值；将第一损失值、第二损失值、第三损失值、第四损失值之间的总和作为总损失值；基于总损失值对初始分割模型进行参数调整，以生成目标分割模型。In a preferred embodiment, it also includes a training module (not shown in the figure), which is used to train and generate the target segmentation model in the following manner: for each training sample, input the labeled training sample into the initial segmentation model to be trained, To respectively determine the first loss value corresponding to the output of the region generation network unit, the second loss value corresponding to the output of the segmentation unit, the third loss value corresponding to the output of the classification unit, and the fourth loss value corresponding to the output of the detection unit; The sum of the first loss value, the second loss value, the third loss value, and the fourth loss value is used as the total loss value; parameters of the initial segmentation model are adjusted based on the total loss value to generate a target segmentation model.

在一优选实施例中，针对每个截图数据组，匹配模块通过以下方式进行文本类别的匹配：确定该截图数据组内的两个文本分割截图的文本类别是否相同；若相同，则确定该截图数据组匹配成功；若不相同，则确定该截图数据组不匹配。In a preferred embodiment, for each screenshot data group, the matching module performs text category matching in the following manner: determine whether the text categories of the two text-segmented screenshots in the screenshot data group are the same; if they are the same, determine whether the screenshot The data group matches successfully; if they are not the same, it is determined that the screenshot data group does not match.

在一优选实施例中，标注情况包括文本部匹配和文本类别不匹配，通过以下方式生成财资合同审核报告：当文本类别不匹配时，将待签署财资合同和已签署财资合同完整的添加在财资合同审核报告的页面中；当文本不匹配时，将待签署财资合同和已签署财资合同之间不匹配的文本分割截图各自添加在财资合同审核报告的页面中。In a preferred embodiment, the marked situations include text part matching and text category mismatch, and the financial contract review report is generated in the following manner: when the text category does not match, complete the financial contract to be signed and the signed financial contract. Added to the page of the financial contract review report; when the text does not match, screenshots of the mismatched text segments between the financial contract to be signed and the signed financial contract are added to the page of the financial contract review report respectively.

在一优选实施例中，当文本类别不匹配与文本不匹配同时存在时，将待签署财资合同和已签署财资合同完整的添加在财资合同审核报告的页面中，并根据被标记为文本不匹配所对应的文本分割截图的顶点坐标，生成第一截图框并添加在待签署财资合同和已签署财资合同的上层，以生成财资合同审核报告。In a preferred embodiment, when text category mismatch and text mismatch exist at the same time, the financial contract to be signed and the financial contract that has been signed are completely added to the page of the financial contract review report, and the financial contract is marked as If the text does not match the vertex coordinates of the corresponding text segmentation screenshot, the first screenshot frame is generated and added to the upper layer of the financial contract to be signed and the signed financial contract to generate a financial contract review report.

在一优选实施例中，针对每组文本类别相同的文本分割截图，当对二者之间存在不匹配的文字时，确定文字识别框的顶点坐标并记录。In a preferred embodiment, for each group of text segmentation screenshots with the same text category, when there is mismatched text between the two, the vertex coordinates of the text recognition frame are determined and recorded.

在一优选实施例中，当文本不匹配时，根据文字识别框的顶点坐标，生成第二截图框并添加在对应的文本分割图的上层，以生成财资合同审核报告。In a preferred embodiment, when the text does not match, a second screenshot frame is generated based on the vertex coordinates of the text recognition frame and added to the upper layer of the corresponding text segmentation graph to generate a financial contract review report.

请参阅图5，图5为本申请实施例所提供的一种电子设备的结构示意图。如图5中所示，所述电子设备500包括处理器510、存储器520和总线530。Please refer to FIG. 5 , which is a schematic structural diagram of an electronic device provided by an embodiment of the present application. As shown in FIG. 5 , the electronic device 500 includes a processor 510 , a memory 520 and a bus 530 .

所述存储器520存储有所述处理器510可执行的机器可读指令，当电子设备500运行时，所述处理器510与所述存储器520之间通过总线530通信，所述机器可读指令被所述处理器510执行时，可以执行如上述图1所示方法实施例中的财资合同的审核方法的步骤，具体实现方式可参见方法实施例，在此不再赘述。The memory 520 stores machine-readable instructions executable by the processor 510. When the electronic device 500 is running, the processor 510 and the memory 520 communicate through the bus 530, and the machine-readable instructions are When the processor 510 is executed, the steps of the financial contract review method in the method embodiment shown in the above-mentioned method embodiment shown in FIG. 1 can be executed. For specific implementation methods, please refer to the method embodiment, which will not be described again here.

本申请实施例还提供一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器运行时可以执行如上述图1所示方法实施例中的财资合同的审核方法的步骤，具体实现方式可参见方法实施例，在此不再赘述。An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is run by a processor, it can execute the financial contract in the method embodiment shown in Figure 1. For the steps of the audit method, please refer to the method embodiments for specific implementation methods, and will not be described again here.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，又例如，多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some communication interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-OnlyMemory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium that is executable by a processor. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code.

最后应说明的是：以上所述实施例，仅为本申请的具体实施方式，用以说明本申请的技术方案，而非对其限制，本申请的保护范围并不局限于此，尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本申请实施例技术方案的精神和范围，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementation modes of the present application, and are used to illustrate the technical solutions of the present application, but not to limit them. The protection scope of the present application is not limited thereto. Although refer to the foregoing The embodiments describe the present application in detail. Those of ordinary skill in the art should understand that any person familiar with the technical field can still modify the technical solutions recorded in the foregoing embodiments within the technical scope disclosed in the present application. It is possible to easily think of changes, or to make equivalent substitutions for some of the technical features; and these modifications, changes or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application, and they should all be covered by this application. within the scope of protection. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

1. A method of auditing financial contracts, the method comprising:

determining a pre-trained target segmentation model according to the template type corresponding to the financial contract to be checked, wherein the financial contract to be checked comprises a financial contract to be signed, which is drawn by one of the two parties of the contract, and a signed financial contract fed back by the other party of the two parties of the contract;

inputting the financial contract to be signed and the signed financial contract into the target segmentation model in a picture format respectively to output text segmentation information corresponding to the financial contract to be signed and the signed financial contract, wherein the text segmentation information at least comprises text segmentation screenshots and text categories corresponding to each text segmentation screenshot;

inputting respective text segmentation screenshots of the financial contract to be signed and the signed financial contract to a pre-trained optical character recognition model so as to output a text recognition result corresponding to each text segmentation screenshot;

and performing text matching on text recognition results corresponding to text segmentation screenshots with the same text category between the financial contract to be signed and the signed financial contract to determine whether the financial contract to be signed and the signed financial contract meet the contract signing standard.

2. The method of claim 1, wherein the text segmentation information further includes vertex coordinates for each text segmentation screenshot, and wherein prior to the step of performing text matching of text recognition results corresponding to text segmentation shots having the same text category between the to-be-signed financial contract and the signed financial contract to determine whether the to-be-signed financial contract and the signed financial contract meet contract signing criteria, further comprising:

aiming at any contract in the to-be-signed financial contract and the signed financial contract, sequencing all text segmentation screenshots based on vertex coordinates of all text segmentation screenshots of the contract, and combining the text segmentation screenshots of one to-be-signed financial contract and the text segmentation screenshots of one signed financial contract with the same serial numbers into a screenshot data set;

aiming at each screenshot data group, matching text categories of two text segmentation screenshots in the screenshot data group;

if the text category between each screenshot data group is successfully matched, executing a text matching step;

if a group with the text category not matched exists, marking two text segmentation screenshots in the group.

3. The method according to claim 2, wherein the text recognition result includes a plurality of fields and recognition frame coordinates of each field, and the step of text matching text recognition results corresponding to text segmentation shots having the same text category between the to-be-signed fund contract and the signed fund contract to determine whether the to-be-signed fund contract and the signed fund contract satisfy a contract signing standard specifically includes:

aiming at any text segmentation screenshot, arranging and connecting fields in a text recognition result corresponding to the text segmentation screenshot according to the recognition frame coordinates of the fields to obtain texts in the text segmentation screenshot;

performing word-by-word matching on texts of the text segmentation screenshots with the same two text categories;

if the words between the texts of the text segmentation screenshots with the same text category in each group are matched, determining that the financial contract to be signed and the signed financial contract meet the contract signing standard, or else, not meeting the contract signing standard.

4. A method according to claim 3, wherein if the words between the text of the text segmentation shots of the same text category of each group are not perfectly matched, the two text segmentation shots are marked;

And generating and outputting a financial contract auditing report according to the labeling conditions of all the text segmentation screenshots.

5. The method of claim 1, wherein the object segmentation model comprises a first feature extraction unit, a second feature extraction unit, a third feature extraction unit, a fourth feature extraction unit, a region generation network unit, a segmentation unit, a classification unit, and a detection unit,

the input of the first feature extraction unit is used as the input of the target segmentation model, the output of the first feature extraction unit is connected with the input of the second feature extraction unit, the output of the second feature extraction unit is connected with the input of the third feature extraction unit, the output of the third feature extraction unit is connected with the input of the region generation network unit, the output of the region generation network unit is connected with the input of the fourth feature extraction unit, the output of the region generation network unit is further connected with the input of the segmentation unit, the output of the segmentation unit is used as the first output of the target seam model and used for outputting a text segmentation screenshot, the output of the fourth feature extraction unit is connected with the input of the classification unit, the output of the classification unit is used as the second output of the target seam model and used for outputting a text category corresponding to the text segmentation screenshot, the output of the fourth feature extraction unit is further connected with the input of the detection unit, and the output of the detection unit is used as the third output of the target seam model and used for outputting the vertex coordinates of the text segmentation screenshot.

6. The method of claim 5, wherein each of the first feature extraction unit, the second feature extraction unit, the third feature extraction unit, and the fourth feature extraction unit comprises a plurality of sequentially connected attention sub-units, each attention sub-unit comprises a plurality of single-head attention blocks, each single-head attention block comprises a plurality of attention layers, a pooling layer, a direct connection layer, and a full connection layer, wherein the number of attention layers in different feature extraction units is different.

7. The method of claim 5, wherein the generating the target segmentation model is trained by:

inputting the marked training samples into an initial segmentation model to be trained for each training sample, so as to respectively determine a first loss value corresponding to the output of the regional generation network unit, a second loss value corresponding to the output of the segmentation unit, a third loss value corresponding to the output of the classification unit and a fourth loss value corresponding to the output of the detection unit;

taking the sum of the first loss value, the second loss value, the third loss value and the fourth loss value as a total loss value;

and carrying out parameter adjustment on the initial segmentation model based on the total loss value to generate a target segmentation model.

8. The method of claim 2, wherein for each screenshot data set, matching of text categories is performed by: determining whether text categories of two text segmentation screenshots in the screenshot data group are the same;

if the screenshot data sets are the same, determining that the screenshot data sets are successfully matched;

if not, the screenshot data sets are determined to be not matched.

9. The method of claim 4, wherein labeling includes text portion matches and text category mismatches, and wherein the financial contract audit report is generated by:

when the text categories are not matched, the financial contract to be signed and the signed financial contract are completely added into a page of a financial contract audit report;

and when the texts are not matched, respectively adding the text segmentation screenshot of the mismatch between the financial contract to be signed and the signed financial contract into a page of a financial contract audit report.

10. The method of claim 9, wherein when a text category mismatch exists together with a text mismatch, the to-be-signed financial contract and the signed financial contract are added in their entirety in a page of a financial contract audit report, and a first screenshot box is generated and added at an upper layer of the to-be-signed financial contract and the signed financial contract according to vertex coordinates of text segmentation shots corresponding to the text mismatch marked to generate a financial contract audit report.

11. The method of claim 10, wherein for each set of text-segmentation shots of the same text category, when there is a mismatch between the text, vertex coordinates of a text recognition box are determined and recorded.

12. The method of claim 10, wherein when text does not match, generating a second screenshot box according to vertex coordinates of the text recognition box and adding the second screenshot box to an upper layer of a corresponding text segmentation map to generate a financial contract audit report.

13. An audit device for financial contracts, said device comprising:

the pre-selection module is used for determining a pre-trained target segmentation model according to the template type corresponding to the financial contract to be checked, wherein the financial contract to be checked comprises a financial contract to be checked drawn by one of the two parties of the contract and a signed financial contract fed back by the other party of the two parties of the contract;

the segmentation module is used for inputting the financial contract to be signed and the signed financial contract into the target segmentation model in a picture format respectively so as to output text segmentation information corresponding to the financial contract to be signed and the signed financial contract respectively, wherein the text segmentation information at least comprises text segmentation screenshots and text categories corresponding to each text segmentation screenshot;

The recognition module is used for inputting text segmentation screenshots corresponding to the segmentation screenshots of the financial contract to be signed and the signed financial contract respectively into a pre-trained optical character recognition model so as to output text recognition results corresponding to each text segmentation screenshot;

and the checking module is used for carrying out text matching on text recognition results corresponding to text segmentation screenshots with the same text category between the financial contract to be signed and the signed financial contract so as to determine whether the financial contract to be signed and the signed financial contract meet the contract signing standard.

14. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said processor executing said machine readable instructions to perform the steps of the method of auditing a financial contract according to any of claims 1 to 12.

15. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which computer program, when being executed by a processor, performs the steps of the method of auditing financial contracts according to any one of claims 1 to 12.