CN119030789A

CN119030789A - A bill data integrity verification system and method

Info

Publication number: CN119030789A
Application number: CN202411471151.5A
Authority: CN
Inventors: 魏建华; 李方祥; 孙奕为; 张玉明; 连平
Original assignee: Shenzhen Shandong Digital Technology Group Co ltd
Current assignee: Shenzhen Shandong Digital Technology Group Co ltd
Priority date: 2024-10-22
Filing date: 2024-10-22
Publication date: 2024-11-26
Anticipated expiration: 2044-10-22
Also published as: CN119030789B

Abstract

The present invention relates to the field of data verification technology, and specifically to a bill data integrity verification system and method, the system comprising: a data hash processing module constructs a hash value of the data content based on the acquired bill data, calculates the time interval between each data item and its timestamp, associates the hash value and timestamp with a unique identifier, and generates a data integrity identifier. In the present invention, by inserting redundant data and monitoring errors in transmission in real time, it helps to improve the accuracy and stability of data transmission. The distribution and fluctuation range analysis of errors enables the system to dynamically adjust redundant data according to the actual transmission environment, reduces the probability of data loss, and ensures reliable transmission of data in complex environments. The verification mechanism is generated by a random number sequence to ensure continuous verification of the data stream, accurately detect error blocks in data transmission, and enhance the flexibility and efficiency of data verification.

Description

A bill data integrity verification system and method

技术领域Technical Field

本发明涉及数据验证技术领域，尤其涉及一种票据数据完整性验证系统及方法。The present invention relates to the technical field of data verification, and in particular to a bill data integrity verification system and method.

背景技术Background Art

数据验证技术领域使用校验码、哈希函数、数字签名或加密算法等多项技术检测数据在存储或传输过程中是否遭到篡改或损坏。此外，数据验证技术还涉及实施访问控制和使用安全协议来保护数据不受未授权访问和篡改的风险。在高风险和对数据准确性要求极高的行业中，如银行金融、医疗健康和政府机构，支持合规性审核、风险管理和数据治理。The field of data verification technology uses multiple technologies such as checksums, hash functions, digital signatures or encryption algorithms to detect whether data has been tampered with or damaged during storage or transmission. In addition, data verification technology also involves implementing access control and using security protocols to protect data from unauthorized access and tampering. In industries with high risks and extremely high requirements for data accuracy, such as banking and finance, healthcare, and government agencies, it supports compliance audits, risk management, and data governance.

其中，票据数据完整性验证系统是专门用于保护票据数据在处理和传输过程中维护其完整性的系统。系统通过实施数据验证协议来防止数据在电子票据交换、金融交易处理和存档过程中的非授权修改。通常包括使用数字签名技术确认数据未被篡改，以及应用加密措施来保护数据的私密性。Among them, the bill data integrity verification system is a system specifically used to protect the integrity of bill data during processing and transmission. The system implements data verification protocols to prevent unauthorized modification of data during electronic bill exchange, financial transaction processing and archiving. It usually includes the use of digital signature technology to confirm that the data has not been tampered with, and the application of encryption measures to protect the privacy of data.

现有技术在票据数据的完整性验证上，尽管通过数字签名和加密技术确保了数据未被篡改和隐私保护，但当传输过程中发生错误时，难以有效监测错误的分布特征，也难以根据实际传输情况动态调整数据冗余，导致在高干扰环境下数据丢失或损坏的风险增加。此外，现有系统容易在大数据量传输中造成效率低下的问题。在数据篡改检测方面，难以在篡改发生的早期进行有效拦截和响应，容易导致更大的安全风险。In the existing technology, although digital signatures and encryption technologies are used to ensure that data is not tampered with and privacy is protected in terms of integrity verification of bill data, it is difficult to effectively monitor the distribution characteristics of errors when errors occur during transmission, and it is also difficult to dynamically adjust data redundancy according to actual transmission conditions, resulting in an increased risk of data loss or damage in high-interference environments. In addition, the existing system is prone to inefficiency in the transmission of large amounts of data. In terms of data tampering detection, it is difficult to effectively intercept and respond to tampering in the early stages, which can easily lead to greater security risks.

发明内容Summary of the invention

本发明的目的是解决现有技术中存在的缺点，而提出的一种票据数据完整性验证系统及方法。The purpose of the present invention is to solve the shortcomings existing in the prior art and to propose a bill data integrity verification system and method.

为了实现上述目的，本发明采用了如下技术方案：一种票据数据完整性验证系统包括：In order to achieve the above-mentioned purpose, the present invention adopts the following technical scheme: A bill data integrity verification system comprises:

数据哈希处理模块基于获取的票据数据，构建数据内容的哈希值，计算每个数据项与其时间戳之间的时间间隔，关联哈希值和时间戳与唯一标识符，生成数据完整性标识；The data hash processing module constructs a hash value of the data content based on the acquired ticket data, calculates the time interval between each data item and its timestamp, associates the hash value and timestamp with a unique identifier, and generates a data integrity identifier;

预测纠错编码模块基于所述数据完整性标识，在票据信息中插入冗余数据，实时监测数据流中的传输错误，并计算每个字符在传输过程中丢失的概率值，分析错误分布的均匀程度与波动范围，调整冗余数据，生成动态纠错数据流；The predictive error correction coding module inserts redundant data into the bill information based on the data integrity identifier, monitors the transmission errors in the data stream in real time, calculates the probability value of each character being lost during the transmission process, analyzes the uniformity and fluctuation range of the error distribution, adjusts the redundant data, and generates a dynamic error correction data stream;

随机审计校验模块依据所述动态纠错数据流，生成随机数序列，重新验证数据块的校验标识，比对预期和当前校验结果，计算校验结果不一致的数据块与总数据块的比例，形成数据完整性分析结果；The random audit verification module generates a random number sequence based on the dynamic error correction data stream, re-verifies the verification mark of the data block, compares the expected and current verification results, calculates the ratio of the data blocks with inconsistent verification results to the total data blocks, and forms a data integrity analysis result;

篡改响应处理模块根据所述数据完整性分析结果，标记存在哈希值差异的数据块为疑似篡改，限制访问权限，更新哈希值和索引信息，生成安全处理结果。The tampering response processing module marks the data blocks with hash value differences as suspected tampering according to the data integrity analysis results, restricts access rights, updates the hash value and index information, and generates a security processing result.

作为本发明的进一步方案，所述数据完整性标识的获取步骤具体为：As a further solution of the present invention, the step of obtaining the data integrity identifier is specifically as follows:

基于获取的票据数据，通过对票据上的文字和数字进行解读，并对提取的数据内容进行归类整理，包括日期、金额、收款方多类关键信息及每个票据的唯一标识符，生成数据哈希值；Based on the acquired bill data, the text and numbers on the bill are interpreted and the extracted data content is classified and sorted, including the date, amount, multiple key information of the payee and the unique identifier of each bill, to generate a data hash value;

基于所述数据哈希值，对每个数据项与其时间戳进行计算，识别数据输入的时间模式，并进行数据输入的异常模式监测，获取数据输入模式分析结果；Based on the data hash value, each data item and its timestamp are calculated to identify the time pattern of data input, and abnormal pattern monitoring of data input is performed to obtain data input pattern analysis results;

基于所述数据输入模式分析结果，关联每个数据项的哈希值、时间戳与唯一标识符，并存入数据表，建立哈希表与数据表的双向索引关系，生成数据完整性标识。Based on the data input pattern analysis result, the hash value, timestamp and unique identifier of each data item are associated and stored in a data table, a bidirectional index relationship between the hash table and the data table is established, and a data integrity identifier is generated.

作为本发明的进一步方案，所述计算每个字符在传输过程中丢失的概率值的获取步骤具体为：As a further solution of the present invention, the step of obtaining the probability value of each character being lost during transmission is specifically as follows:

根据所述数据完整性标识，对传输过程中的字符频率和位置信息进行分析，确定数据中容易发生错误的位置，并插入冗余数据，生成传输错误的实时监测信息；According to the data integrity identifier, character frequency and position information during transmission are analyzed to determine the position in the data where errors are prone to occur, and redundant data is inserted to generate real-time monitoring information of transmission errors;

基于所述传输错误的实时监测信息，记录每个字符的传输错误次数、延迟次数及其出现频率，采用公式：Based on the real-time monitoring information of the transmission errors, the number of transmission errors, the number of delays and the frequency of occurrence of each character are recorded, using the formula:

， ,

计算第i个字符的丢失概率，得到丢失概率分析结果；Calculate the probability of missing the i-th character , get the loss probability analysis result;

其中，是第i个字符的传输错误次数，是第i个字符的传输延迟次数，是第i个字符的出现频率，表示字符i在整个传输过程中被传输的总次数，是权重系数，分别用于调整传输错误次数、延迟次数和字符出现频率在丢失概率中的影响比例；in, is the number of transmission errors of the i-th character, is the number of transmission delays for the ith character, is the frequency of occurrence of the i-th character, Indicates the total number of times character i is transmitted during the entire transmission process. are weight coefficients, which are used to adjust the influence ratio of the number of transmission errors, the number of delays and the frequency of character occurrence on the loss probability;

基于所述丢失概率分析结果，统计传输数据包中错误次数和字符出现位置，并结合丢失概率分析传输中错误的分布情况，生成传输错误信息分析结果。Based on the loss probability analysis result, the number of errors and the positions of characters in the transmission data packet are counted, and the distribution of errors in the transmission is analyzed in combination with the loss probability to generate a transmission error information analysis result.

作为本发明的进一步方案，所述分析错误分布的波动范围的获取步骤具体为：As a further solution of the present invention, the step of obtaining the fluctuation range of the analysis error distribution is specifically:

基于所述传输错误信息分析结果，对每个字符的传输特征进行分析，采用公式：Based on the transmission error information analysis result, the transmission characteristics of each character are analyzed using the formula:

， ,

计算错误分布的波动范围，生成波动范围信息；Calculate the range of the error distribution , generate fluctuation range information;

其中，表示字符在传输过程中从发送到接收端的时间延迟，表示由于传输错误导致字符被重新发送的次数，是全部字符的平均丢失概率，是字符总数，表示传输过程中出现的字符的总数量；in, Representing characters The time delay from the sender to the receiver during the transmission process, Indicates that the character The number of times it was resent. is the average probability of loss of all characters, is the total number of characters, which indicates the total number of characters that appear during the transmission;

基于所述波动范围信息，重新调整冗余数据的插入，动态监控数据流的变化，并进行调整，生成动态纠错数据流。Based on the fluctuation range information, the insertion of redundant data is readjusted, the change of the data stream is dynamically monitored, and adjustments are made to generate a dynamic error correction data stream.

作为本发明的进一步方案，所述比对预期和当前校验结果的获取步骤具体为：As a further solution of the present invention, the step of obtaining the comparison expected and current verification results is specifically as follows:

基于所述动态纠错数据流，生成随机数序列，确定待审计的数据块位置，得到待审计的数据块位置信息；Based on the dynamic error correction data stream, a random number sequence is generated, the position of the data block to be audited is determined, and the position information of the data block to be audited is obtained;

基于所述待审计的数据块位置信息，比对接收端和发送端的校验标识，对每个数据块的校验标识进行重新验证，得到校验标识验证结果；Based on the position information of the data block to be audited, the verification identifiers of the receiving end and the sending end are compared, and the verification identifier of each data block is re-verified to obtain a verification identifier verification result;

基于所述校验标识验证结果，采用公式：Based on the verification result of the verification mark, the formula is adopted:

， ,

计算数据块校验不一致比例，生成数据块校验的差异比例信息；Calculate the data block checksum inconsistency ratio , generate difference ratio information of data block verification;

其中，表示经过校验发现不一致的数据块个数，是审计频率，是传输延迟，是数据块大小，是总审计数据块数量。in, Indicates the number of inconsistent data blocks found after verification. is the audit frequency, is the transmission delay, is the data block size, is the total number of audit data blocks.

作为本发明的进一步方案，所述统计校验差异的出现次数的获取步骤具体为：As a further solution of the present invention, the step of obtaining the number of occurrences of the statistical verification difference is specifically:

基于所述数据块校验的差异比例信息，采用公式：Based on the difference ratio information of the data block verification, the formula is adopted:

， ,

计算总的校验差异次数，生成整合后的校验差异分析结果；Calculate the total number of checksum differences , generate integrated verification difference analysis results;

其中，表示整个审计过程中的数据块数量，是第个数据块的校验次数，是第个数据块的差异次数，表示第个数据块在校验过程中检测到的差异次数，表示和两个系列的数据块的数量；in, Indicates the number of data blocks in the entire audit process. It is The number of checksums for each data block, It is The number of differences between data blocks indicates the The number of differences detected during the verification process for each data block, express and The number of data blocks for both series;

基于所述整合后的校验差异分析结果，比较原始校验标识与传输后的校验标识，逐步分析每个数据块的差异情况，生成数据完整性分析结果。Based on the integrated verification difference analysis result, the original verification mark and the verification mark after transmission are compared, and the difference of each data block is analyzed step by step to generate a data integrity analysis result.

作为本发明的进一步方案，所述更新哈希值和索引信息的获取步骤具体为：As a further solution of the present invention, the step of obtaining the updated hash value and index information is specifically as follows:

基于所述数据完整性分析的结果，重新分配每个数据块的哈希值并与原始值对比，识别哈希值不一致的数据块，并将其标记为疑似篡改，生成疑似篡改的数据块列表；Based on the result of the data integrity analysis, the hash value of each data block is reallocated and compared with the original value, the data blocks with inconsistent hash values are identified, and the data blocks are marked as suspected tampering, and a list of suspected tampering data blocks is generated;

基于所述疑似篡改的数据块列表，采用公式：Based on the suspected tampered data block list, the formula is used:

， ,

计算数据篡改的量化严重程度，生成数据篡改的严重程度指标；Calculating the quantified severity of data tampering , generating a severity indicator of data tampering;

其中，用于衡量疑似篡改数据块的比例，是被标记为疑似篡改的数据块数量，是总的数据块数量；in, Used to measure the proportion of suspected tampered data blocks, is the number of data blocks marked as suspected of tampering, is the total number of data blocks;

基于所述数据篡改的严重程度指标，限制疑似篡改数据块的访问权限，记录每个数据块的唯一标识符和时间戳，并更新哈希值和索引信息，生成安全处理结果。Based on the severity index of the data tampering, the access rights of the suspected tampered data blocks are restricted, the unique identifier and timestamp of each data block are recorded, and the hash value and index information are updated to generate a security processing result.

一种票据数据完整性验证方法，所述票据数据完整性验证方法基于上述票据数据完整性验证系统执行，包括以下步骤：A method for verifying the integrity of bill data, which is performed based on the above-mentioned bill data integrity verification system, comprises the following steps:

S1：基于获取的票据数据，对每个数据项进行内容读取和时间戳关联，将每个数据项内容与时间戳的组合输入到哈希处理中，生成每个组合的唯一哈希值；S1: Based on the acquired ticket data, read the content and associate the timestamp for each data item, input the combination of the content and timestamp of each data item into the hash process, and generate a unique hash value for each combination;

S2：根据所述每个组合的唯一哈希值，将哈希值与对应的时间戳和唯一标识符关联，形成数据完整性标识；S2: according to the unique hash value of each combination, associating the hash value with the corresponding timestamp and unique identifier to form a data integrity identifier;

S3：基于所述数据完整性标识，在票据信息中添加冗余数据，同时实时监控数据流，记录并分析数据传输中的错误，以及每个字符的丢失概率，生成动态纠错数据流；S3: Based on the data integrity identifier, redundant data is added to the ticket information, and the data stream is monitored in real time, errors in data transmission and the probability of loss of each character are recorded and analyzed, and a dynamic error correction data stream is generated;

S4：基于所述动态纠错数据流，进行随机数序列的生成，用于选择需要重新验证的数据块，对选中的数据块执行校验标识的核对，比对校验结果的预期与当前表现，形成数据完整性分析结果；S4: Based on the dynamic error correction data stream, a random number sequence is generated to select a data block that needs to be re-verified, a verification mark is checked on the selected data block, and the expected and current performance of the verification result are compared to form a data integrity analysis result;

S5：基于所述数据完整性分析结果，识别并标记具有哈希值差异的数据块，对这些数据块执行访问限制，更新哈希值和索引信息，生成安全处理结果。S5: Based on the data integrity analysis result, identify and mark the data blocks with hash value differences, perform access restrictions on these data blocks, update the hash value and index information, and generate a security processing result.

与现有技术相比，本发明的优点和积极效果在于：Compared with the prior art, the advantages and positive effects of the present invention are:

本发明中，通过插入冗余数据并实时监测传输中的错误，有助于提高数据传输的准确性与稳定性。错误的分布与波动范围分析，使得系统能够根据实际传输环境动态调整冗余数据，降低了数据丢失的概率，确保数据在复杂环境下的可靠传输。通过随机数序列生成校验机制，确保对数据流的持续性验证，能精准发现数据传输中的错误块，增强了数据验证的灵活性和效率。而对疑似篡改数据块的访问限制和安全处理，进一步强化了数据在遭遇篡改时的响应能力，防止恶意篡改数据对系统和用户造成的损害。使票据数据在传输、存储和处理各环节中的安全性得到了全面提升。In the present invention, inserting redundant data and monitoring errors in transmission in real time helps to improve the accuracy and stability of data transmission. The analysis of the distribution and fluctuation range of errors enables the system to dynamically adjust redundant data according to the actual transmission environment, reducing the probability of data loss and ensuring reliable transmission of data in complex environments. The verification mechanism is generated by a random number sequence to ensure continuous verification of the data stream, accurately detect error blocks in data transmission, and enhance the flexibility and efficiency of data verification. The access restriction and security processing of suspected tampered data blocks further enhance the data's responsiveness when tampered with, preventing damage to the system and users caused by malicious tampering of data. The security of bill data in each link of transmission, storage and processing has been comprehensively improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的系统流程图；Fig. 1 is a system flow chart of the present invention;

图2为本发明数据完整性标识的获取步骤流程图；FIG2 is a flow chart of steps for obtaining a data integrity identifier according to the present invention;

图3为本发明计算每个字符在传输过程中丢失的概率值的获取步骤流程图；FIG3 is a flow chart of the steps for calculating the probability value of each character being lost during transmission according to the present invention;

图4为本发明分析错误分布的波动范围的获取步骤流程图；FIG4 is a flow chart of steps for obtaining the fluctuation range of error distribution analysis according to the present invention;

图5为本发明比对预期和当前校验结果的获取步骤流程图；FIG5 is a flow chart of the steps of obtaining the comparison between the expected and current verification results of the present invention;

图6为本发明统计校验差异的出现次数的获取步骤流程图；FIG6 is a flow chart of the steps of obtaining the number of occurrences of statistical verification differences according to the present invention;

图7为本发明更新哈希值和索引信息的获取步骤流程图。FIG. 7 is a flow chart of the steps for updating the hash value and the index information of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention.

在本发明的描述中，需要理解的是，术语“长度”、“宽度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，在本发明的描述中，“多个”的含义是两个或两个以上，除非另有明确具体的限定。In the description of the present invention, it should be understood that the terms "length", "width", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inside", "outside" and the like indicate positions or positional relationships based on the positions or positional relationships shown in the drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be understood as limiting the present invention. In addition, in the description of the present invention, "multiple" means two or more, unless otherwise clearly and specifically defined.

请参阅图1，本发明提供一种技术方案：一种票据数据完整性验证系统包括：Please refer to FIG1 , the present invention provides a technical solution: a bill data integrity verification system includes:

数据哈希处理模块基于获取的票据数据，提取数据内容和唯一标识符，构建数据内容的哈希值，计算每个数据项与其时间戳之间的时间间隔，以预防数据输入的异常模式，关联哈希值和时间戳与唯一标识符，将数据内容和用户签名存入数据表，生成数据完整性标识；The data hash processing module extracts the data content and unique identifier based on the acquired ticket data, constructs the hash value of the data content, calculates the time interval between each data item and its timestamp to prevent abnormal data input patterns, associates the hash value and timestamp with the unique identifier, stores the data content and user signature in the data table, and generates a data integrity identifier;

预测纠错编码模块基于数据完整性标识，调用传输过程中的字符频率与位置信息，在票据信息中插入冗余数据，实时监测数据流中的传输错误，并计算每个字符在传输过程中丢失的概率值，统计传输错误的次数和位置，分析错误分布的波动范围，根据错误分布特征调整冗余数据，生成动态纠错数据流；The predictive error correction coding module uses the character frequency and position information in the transmission process based on the data integrity identifier, inserts redundant data into the ticket information, monitors the transmission errors in the data stream in real time, calculates the probability value of each character being lost in the transmission process, counts the number and position of transmission errors, analyzes the fluctuation range of error distribution, adjusts redundant data according to the error distribution characteristics, and generates a dynamic error correction data stream;

随机审计校验模块依据动态纠错数据流，生成随机数序列确定待审计的数据块位置，重新验证数据块的校验标识，比对预期和当前校验结果，计算校验结果不一致的数据块与总数据块的比例，以辅助确定需要篡改的数据块位置，统计校验差异的出现次数，以及差异数据块的信息，形成数据完整性分析结果；The random audit verification module generates a random number sequence based on the dynamic error correction data stream to determine the location of the data block to be audited, re-verifies the verification mark of the data block, compares the expected and current verification results, calculates the ratio of data blocks with inconsistent verification results to the total data blocks, to assist in determining the location of the data block that needs to be tampered with, counts the number of occurrences of verification differences, and the information of the difference data blocks, and forms the data integrity analysis results;

篡改响应处理模块根据数据完整性分析结果，标记存在哈希值差异的数据块为疑似篡改，量化数据篡改的严重程度，限制访问权限，记录数据块的唯一标识符和时间戳，更新哈希值和索引信息，生成安全处理结果；The tampering response processing module marks the data blocks with hash value differences as suspected tampering based on the data integrity analysis results, quantifies the severity of the data tampering, restricts access rights, records the unique identifier and timestamp of the data block, updates the hash value and index information, and generates a security processing result;

数据完整性标识包括关联的数据哈希值、数据内容与用户签名的结合体、时间戳信息，以及与之相关联的唯一标识符，动态纠错数据流包括加入的冗余数据类型、根据错误分布自动调整的纠错参数，以及用于监控的错误统计信息，数据完整性分析结果包括经过重新计算的数据块哈希值、发现的哈希值不一致的数据块信息，以及这些数据块的位置和内容详情，安全处理结果包括被标记的疑似篡改数据块的识别信息、所采取的访问限制措施、更新的哈希值与索引信息。The data integrity identification includes the associated data hash value, the combination of data content and user signature, timestamp information, and the unique identifier associated with it. The dynamic error correction data stream includes the added redundant data type, error correction parameters automatically adjusted according to the error distribution, and error statistics used for monitoring. The data integrity analysis results include the recalculated data block hash value, the data block information with inconsistent hash values, and the location and content details of these data blocks. The security processing results include the identification information of the marked suspected tampered data blocks, the access restriction measures taken, and the updated hash value and index information.

请参阅图2，数据完整性标识的获取步骤具体为：Please refer to Figure 2, the steps for obtaining the data integrity identifier are as follows:

票据数据的获取过程涉及对各类票据扫描后形成的数字化信息的提取，这一过程通常包括OCR（光学字符识别）技术对票据上的文字和数字进行自动解读，再由数据提取模块进行归类整理，将提取的数据内容包括但不限于日期、金额、收款方等关键信息，以及每个票据的唯一标识符，如序列号或特定的二维码信息进行登记，这些唯一标识符由票据生成时的内部系统赋予，确保每张票据在后续处理中的可追溯性，提取的数据内容和唯一标识符被用于构建票据数据的哈希值，哈希值的计算采用特定的哈希函数，如SHA-256，确保任何微小的数据变化都会导致哈希值的显著变化，从而在数据验证阶段可以有效检测到数据的任何未授权修改。The process of acquiring bill data involves extracting digital information from various bills after scanning. This process usually includes OCR (optical character recognition) technology to automatically interpret the text and numbers on the bill, which is then classified and organized by the data extraction module. The extracted data content includes but is not limited to key information such as date, amount, payee, and unique identifiers for each bill, such as serial numbers or specific QR code information. These unique identifiers are assigned by the internal system when the bill is generated to ensure the traceability of each bill in subsequent processing. The extracted data content and unique identifiers are used to construct the hash value of the bill data. The hash value is calculated using a specific hash function, such as SHA-256, to ensure that any slight data change will result in a significant change in the hash value, so that any unauthorized modification of the data can be effectively detected during the data verification phase.

基于数据哈希值，对每个数据项与其时间戳进行计算，识别数据输入的时间模式，并进行数据输入的异常模式监测，获取数据输入模式分析结果；Based on the data hash value, each data item and its timestamp are calculated to identify the time pattern of data input, and abnormal pattern monitoring of data input is performed to obtain data input pattern analysis results;

关于对每个数据项与其时间戳进行计算，采用公式：，其中，是第i个数据项与前一项数据的时间间隔，是第i个数据项的时间戳，是第i-1个数据项的时间戳；For calculation of each data item and its timestamp, the formula is: ,in, is the time interval between the ith data item and the previous data item, is the timestamp of the ith data item, is the timestamp of the i-1th data item;

若和，则分钟。通常通过同步服务器时间或采用网络时间协议（NTP）来获取。通过对连续票据输入时间间隔的统计分析，可以发现输入时间的规律性或异常性。例如，如果某一时间段内的明显低于其他时间段，可能表示该时段有批量处理票据的行为。结果表明通过时间间隔的计算与分析，可以有效地监测并识别数据输入的初步异常模式。like and ,but Minutes. Usually obtained by synchronizing the server time or using the Network Time Protocol (NTP). By statistically analyzing the time intervals between consecutive ticket inputs, the regularity or abnormality of the input time can be found. For example, if It is significantly lower than other time periods, which may indicate that there is batch processing of bills during this period. The results show that the calculation and analysis of time intervals can effectively monitor and identify preliminary abnormal patterns of data input.

基于数据输入模式分析结果，关联每个数据项的哈希值、时间戳与唯一标识符，并存入数据表，建立哈希表与数据表的双向索引关系，生成数据完整性标识；Based on the data input pattern analysis results, the hash value, timestamp and unique identifier of each data item are associated and stored in the data table, a bidirectional index relationship between the hash table and the data table is established, and a data integrity identifier is generated;

关联哈希值和时间戳与唯一标识符，关联步骤包括将每个数据项的哈希值与其对应的时间戳和唯一标识符进行匹配存储，确保在数据表中每一项数据的完整性和追踪性，关联后的数据通过数据库进行管理，使用如MySQL或Oracle等数据库软件，这些数据随后被用于生成整体的数据完整性标识，此标识是通过特定的算法，如Merkle树结构来生成，每个节点代表单个数据项的哈希值，通过逐层聚合最终形成根哈希值，根哈希值作为数据完整性的标识，它结合了全部数据项的哈希值，任何数据的更改都会导致根哈希值的变化，从而在数据访问或交换时提供验证机制。Associate hash values and timestamps with unique identifiers. The association step includes matching and storing the hash value of each data item with its corresponding timestamp and unique identifier to ensure the integrity and traceability of each data item in the data table. The associated data is managed through a database using database software such as MySQL or Oracle. These data are then used to generate an overall data integrity identifier, which is generated through a specific algorithm such as a Merkle tree structure. Each node represents the hash value of a single data item, which is aggregated layer by layer to ultimately form a root hash value. The root hash value serves as an identifier of data integrity, and it combines the hash values of all data items. Any change in data will result in a change in the root hash value, thereby providing a verification mechanism when data is accessed or exchanged.

请参阅图3，计算每个字符在传输过程中丢失的概率值的获取步骤具体为：Please refer to FIG3 , the specific steps for calculating the probability value of each character being lost during transmission are:

根据数据完整性标识，对传输过程中的字符频率和位置信息进行分析，确定数据中容易发生错误的位置，并插入冗余数据，生成传输错误的实时监测信息；According to the data integrity mark, the character frequency and position information in the transmission process are analyzed to determine the position in the data where errors are prone to occur, and redundant data is inserted to generate real-time monitoring information of transmission errors;

首先需要对传输过程中的字符频率和位置信息进行分析，字符频率的获取过程是通过对传输的数据流进行逐字符扫描并统计字符出现次数的方式实现，通过分析票据信息中的各类字符，系统能够实时跟踪每个字符在传输过程中的出现频率和分布，同时，字符的位置信息则通过为每个字符打上唯一的序列号实现，序列号的生成是基于数据包传输的顺序，通过附加到每个字符后的方式记录字符的位置，通过插入冗余数据的操作，系统能够根据频率和位置信息自动识别传输中最容易发生错误的位置，并在这些位置插入额外的冗余数据，通过实时监测传输过程中的错误情况，系统对每个传输字符进行校验，确保在数据包到达接收端后能准确识别和处理未丢失的字符数据。First, it is necessary to analyze the frequency and position information of characters in the transmission process. The character frequency is obtained by scanning the transmitted data stream character by character and counting the number of times the characters appear. By analyzing the various characters in the bill information, the system can track the frequency and distribution of each character in the transmission process in real time. At the same time, the position information of the characters is achieved by marking each character with a unique serial number. The generation of the serial number is based on the order of data packet transmission, and the position of the character is recorded by appending it to each character. By inserting redundant data, the system can automatically identify the positions in the transmission that are most prone to errors based on the frequency and position information, and insert additional redundant data at these positions. By real-time monitoring of errors in the transmission process, the system checks each transmitted character to ensure that the character data that is not lost can be accurately identified and processed after the data packet arrives at the receiving end.

基于传输错误的实时监测信息，记录每个字符的传输错误次数、延迟次数及其出现频率，采用公式：Based on the real-time monitoring information of transmission errors, the number of transmission errors, delays and their frequency of occurrence for each character are recorded, using the formula:

， ,

其中，表示字符i在传输过程中丢失的可能性大小，是第i个字符的传输错误次数，通过接收端的校验机制获取，是第i个字符的传输延迟次数，通过实时监测传输延迟日志获取，是第i个字符的出现频率，为字符i在数据流中出现的次数除以总的字符数，表示字符i在整个传输过程中被传输的总次数，通过接收端的传输统计获得，是权重系数，分别用于调整传输错误次数、延迟次数和字符出现频率在丢失概率中的影响比例，权重设置依据实际数据传输过程中的分析，通过对历史数据的分析来确定其合理区间，以确保系统的灵活性和准确性。in, Indicates the probability that character i is lost during transmission. is the number of transmission errors of the i-th character, obtained through the verification mechanism at the receiving end, is the number of transmission delays of the i-th character, obtained by real-time monitoring of the transmission delay log. is the frequency of occurrence of the i-th character, which is the number of times character i appears in the data stream divided by the total number of characters, Indicates the total number of times character i is transmitted during the entire transmission process, obtained through the transmission statistics of the receiving end. Are weight coefficients, which are used to adjust the proportion of the number of transmission errors, number of delays and frequency of character occurrence in the loss probability. The weight setting is based on the analysis of the actual data transmission process. The reasonable range is determined by analyzing historical data to ensure the flexibility and accuracy of the system.

若字符A出现100次传输错误，延迟50次，且传输1000次，字符A的频率为，设置权重系数为、、，则字符A的丢失概率为：If character A has 100 transmission errors, is delayed 50 times, and is transmitted 1000 times, the frequency of character A is for , set the weight coefficient to , , , then the probability of losing character A is:

， ,

结果表明字符A的丢失概率为0.08，表明在传输1000次中约有8次可能发生丢失，最终计算的结果可以帮助识别出字符A的传输风险，并基于此概率调整系统中的冗余数据插入策略，以降低字符A丢失的可能性，保证票据数据的完整性。The results show that the loss probability of character A is 0.08, indicating that it may be lost about 8 times out of 1000 transmissions. The final calculated result can help identify the transmission risk of character A and adjust the redundant data insertion strategy in the system based on this probability to reduce the possibility of character A loss and ensure the integrity of the bill data.

基于丢失概率分析结果，统计传输数据包中错误次数和字符出现位置，并结合丢失概率分析传输中错误的分布情况，生成传输错误信息分析结果；Based on the loss probability analysis results, the number of errors and character positions in the transmitted data packets are counted, and the distribution of errors in the transmission is analyzed in combination with the loss probability to generate the transmission error information analysis results;

通过对数据包传输日志进行分析，系统首先记录每个数据包的唯一序列号以及其传输时间戳，统计传输错误的次数是通过校验接收端反馈的字符数据是否与原始数据匹配，若发现不一致的字符，系统则将其标记为错误字符，并记录该字符的出现位置和传输时段，通过该方式，系统能够定位每个字符错误的确切位置和次数，此外，系统还会分析传输过程中错误发生的频率和分布情况，并结合前述丢失概率的计算结果，进一步对传输错误的区域和时间段进行聚合统计，通过这些聚合分析，识别出传输中错误频繁出现的区域和时段，并调整冗余数据的插入策略，确保在未来的传输中降低这些区域和时段的错误率。By analyzing the data packet transmission log, the system first records the unique serial number of each data packet and its transmission timestamp. The number of transmission errors is counted by checking whether the character data fed back by the receiving end matches the original data. If inconsistent characters are found, the system will mark them as erroneous characters and record the occurrence position and transmission period of the character. In this way, the system can locate the exact position and number of each character error. In addition, the system will analyze the frequency and distribution of errors during the transmission process, and combined with the calculation results of the aforementioned loss probability, further aggregate statistics on the areas and time periods of transmission errors. Through these aggregate analyses, the areas and time periods where errors frequently occur during transmission are identified, and the insertion strategy of redundant data is adjusted to ensure that the error rate of these areas and time periods is reduced in future transmissions.

请参阅图4，分析错误分布的波动范围的获取步骤具体为：Please refer to FIG4 , the steps for obtaining the fluctuation range of the analysis error distribution are specifically as follows:

基于传输错误信息分析结果，对每个字符的传输特征进行分析，采用公式：Based on the results of the transmission error information analysis, the transmission characteristics of each character are analyzed using the formula:

， ,

其中，用于衡量各字符在传输过程中错误分布的不均匀性，高值意味着传输错误分布不均匀，低值表示错误分布均匀，表示字符在传输过程中从发送到接收端的时间延迟，通过网络延迟日志获取，延迟时间较长可能表明传输路径或网络存在问题，表示由于传输错误导致字符被重新发送的次数，通过接收端的反馈记录获取，重传次数较高通常意味着网络不稳定或存在较多传输错误，是全部字符的平均丢失概率，通过公式：计算，是字符总数，表示传输过程中出现的不同字符的总数量，由传输网络记录。in, It is used to measure the uneven distribution of errors of each character during transmission. A high value means that the transmission error distribution is uneven, and a low value means that the error distribution is uniform. Representing characters The time delay from sending to receiving end during the transmission process is obtained through the network delay log. A long delay time may indicate a problem with the transmission path or network. Indicates that the character The number of retransmissions is obtained through the feedback records of the receiving end. A high number of retransmissions usually means that the network is unstable or there are many transmission errors. is the average loss probability of all characters, through the formula: calculate, It is the total number of characters, which indicates the total number of different characters that appear during transmission, recorded by the transmission network.

若数据流中传输4种字符，其丢失概率为，，，，传输延迟为，，，，重传次数为，，，，则平均丢失概率为：If 4 types of characters are transmitted in the data stream, the probability of loss is , , , , the transmission delay is , , , , the number of retransmissions is , , , , then the average loss probability is:

， ,

带入公式：Plug into the formula:

， ,

结果表明错误分布的波动范围为13.6377。波动范围较大，表明某些字符在传输中存在较大错误、延迟或重传次数，而其他字符较为正常。此结果用于判断系统是否需要在特定区域调整冗余数据的插入策略。一般来说，如果波动范围超过10，系统会调整高误差区域的冗余数据分布，以减少传输错误。The results show that the fluctuation range of the error distribution is 13.6377. A large fluctuation range indicates that some characters have large errors, delays, or retransmission times in transmission, while other characters are relatively normal. This result is used to determine whether the system needs to adjust the insertion strategy of redundant data in a specific area. Generally speaking, if the fluctuation range exceeds 10, the system will adjust the redundant data distribution in the high error area to reduce transmission errors.

基于波动范围信息，重新调整冗余数据的插入，动态监控数据流的变化，并进行调整，生成动态纠错数据流；Based on the fluctuation range information, the insertion of redundant data is readjusted, the changes of data stream are dynamically monitored, and adjustments are made to generate dynamic error correction data stream;

通过分析每个字符在传输过程中的丢失概率、延迟和重传次数，这些数据通过接收端反馈的错误报告和延迟日志获取，系统在分析这些数据后，对每个字符的传输特征进行综合计算，特别是对于波动较大的字符，系统会重点考虑这些字符的传输路径、延迟时间和重传次数，通过分析字符的分布和传输行为，系统能够识别出哪些区域的字符传输存在异常波动，接着，通过在这些高波动区域中插入更多的冗余数据，系统确保传输中的数据完整性，包括动态监控和调整数据流，冗余数据的调整是基于每个字符的错误概率计算结果，将更多冗余数据分配给波动较大的字符，同时减少在传输稳定的区域中的冗余数据分配，系统通过这种调整策略生成动态纠错数据流，保证了数据在不同传输环境下的稳定性和准确性。By analyzing the loss probability, delay and number of retransmissions of each character during the transmission process, these data are obtained through error reports and delay logs fed back by the receiving end. After analyzing these data, the system comprehensively calculates the transmission characteristics of each character, especially for characters with large fluctuations. The system will focus on the transmission path, delay time and number of retransmissions of these characters. By analyzing the distribution and transmission behavior of the characters, the system can identify which areas have abnormal fluctuations in character transmission. Then, by inserting more redundant data in these high-fluctuation areas, the system ensures the integrity of the data in transmission, including dynamic monitoring and adjustment of data streams. The adjustment of redundant data is based on the error probability calculation results of each character, allocating more redundant data to characters with large fluctuations, and reducing the allocation of redundant data in areas with stable transmission. The system generates a dynamic error correction data stream through this adjustment strategy, ensuring the stability and accuracy of data in different transmission environments.

请参阅图5，比对预期和当前校验结果的获取步骤具体为：Please refer to FIG5 , the specific steps for obtaining the comparison between the expected and current verification results are as follows:

基于动态纠错数据流，生成随机数序列，确定待审计的数据块位置，得到待审计的数据块位置信息；Based on the dynamic error correction data stream, a random number sequence is generated to determine the location of the data block to be audited, and the location information of the data block to be audited is obtained;

系统的随机数生成方式依赖于数据流的传输顺序和时间戳信息，通过对这些信息的提取与处理，生成的数据序列能够对应到数据流中的特定数据块，确保审计过程中的随机性，同时避免重复审计已经检查过的数据块，具体操作包括提取数据流中的每一个数据块信息，利用生成的随机数序列将这些数据块映射到存储单元中的位置，构建审计数据块列表，确保审计过程的广泛覆盖性与数据块的随机分布，从而确定了待审计的数据块位置。The system's random number generation method relies on the transmission order and timestamp information of the data stream. By extracting and processing this information, the generated data sequence can correspond to a specific data block in the data stream, ensuring randomness during the audit process while avoiding repeated audits of data blocks that have already been checked. The specific operations include extracting information about each data block in the data stream, mapping these data blocks to locations in the storage unit using the generated random number sequence, and constructing an audit data block list to ensure extensive coverage of the audit process and random distribution of data blocks, thereby determining the location of the data blocks to be audited.

基于待审计的数据块位置信息，比对接收端和发送端的校验标识，对每个数据块的校验标识进行重新验证，得到校验标识验证结果；Based on the location information of the data block to be audited, the verification identifiers of the receiving end and the sending end are compared, and the verification identifier of each data block is re-verified to obtain the verification identifier verification result;

在确定待审计的数据块后，接下来对每个数据块的校验标识进行重新验证，具体过程包括对比接收端的校验标识与发送端原始记录的校验标识，系统通过提取传输日志中记录的数据块校验值，调用相应的校验算法如SHA-256重新计算数据块的哈希值，以确保传输过程中数据的完整性，系统针对每个数据块逐个计算校验标识，通过对比校验值来验证数据块是否在传输过程中遭到篡改或丢失，确保每个被审计的数据块校验标识的完整性与准确性，系统通过比对原始数据块的校验标识与当前数据块的校验标识，得到校验结果，确保重新验证过程中全部数据块的校验标识无误。After determining the data block to be audited, the checksum of each data block is re-verified. The specific process includes comparing the checksum of the receiving end with the checksum of the original record of the sending end. The system extracts the checksum value of the data block recorded in the transmission log and calls the corresponding checksum algorithm such as SHA-256 to recalculate the hash value of the data block to ensure the integrity of the data during transmission. The system calculates the checksum for each data block one by one, and verifies whether the data block has been tampered with or lost during transmission by comparing the checksum value, ensuring the integrity and accuracy of the checksum of each audited data block. The system obtains the checksum result by comparing the checksum of the original data block with the checksum of the current data block, ensuring that the checksums of all data blocks are correct during the re-verification process.

基于校验标识验证结果，采用公式：Based on the verification result of the verification mark, the formula is used:

， ,

其中，表示需要进一步审查的数据块比例，范围为0到1，用于衡量数据块的校验不一致情况，表示经过校验发现不一致的数据块个数，获取方式为通过比对数据块的哈希值，是审计频率，通过设定审计策略获取，频率较高时可以更早发现问题，反映审计的密集程度，是传输延迟，表示数据在传输过程中因网络或系统原因产生的延迟，延迟时间通过监控系统采集，延迟越大可能增加校验不一致的风险，是数据块大小，表示数据块的大小差异，较大的数据块更容易在传输过程中出现校验不一致，获取方式为数据块的实际大小，是总审计数据块数量，表示此次审计中选取的全部数据块的数量。in, Indicates the proportion of data blocks that need further review, ranging from 0 to 1, used to measure the inconsistency of the checksum of the data block. Indicates the number of inconsistent data blocks found after verification. The method of obtaining it is to compare the hash value of the data block. It is the audit frequency, which is obtained by setting the audit strategy. When the frequency is higher, problems can be found earlier, which reflects the intensity of the audit. It is the transmission delay, which means the delay caused by network or system reasons during data transmission. The delay time is collected by the monitoring system. The greater the delay, the greater the risk of verification inconsistency. It is the data block size, indicating the size difference of the data block. Larger data blocks are more likely to have inconsistent verification during transmission. The acquisition method is the actual size of the data block. It is the total number of audit data blocks, indicating the number of all data blocks selected in this audit.

若审计过程中选取了100个数据块，其中有5个数据块校验不一致，审计频率为，传输延迟为秒，数据块大小差异为，则计算公式为：If 100 data blocks are selected during the audit process, and 5 of them have inconsistent checksums, the audit frequency is , the transmission delay is seconds, the data block size difference is , the calculation formula is:

， ,

结果表明校验不一致比例为0.08，表明在100个数据块中，有8%的数据块校验结果不一致，系统需要进一步分析这些数据块，以确定传输过程中可能存在的错误或篡改情况。The results show that the verification inconsistency ratio is 0.08, indicating that among the 100 data blocks, 8% of the data blocks have inconsistent verification results. The system needs to further analyze these data blocks to determine possible errors or tampering during the transmission process.

请参阅图6，统计校验差异的出现次数的获取步骤具体为：Please refer to FIG6 , the specific steps of obtaining the number of occurrences of statistical verification differences are:

基于数据块校验的差异比例信息，采用公式：Based on the difference ratio information of data block verification, the formula is:

， ,

其中，表示系统在审计过程中统计到的所有校验不一致的总次数，这个结果能够帮助系统准确衡量数据块之间的差异，识别传输过程中可能存在的错误或篡改情况，表示整个审计过程中的数据块数量，系统在进行审计时，会从所有数据中随机生成审计数据块，提取并统计此数量，是第个数据块的校验次数，校验次数通过系统对每个数据块进行的多次校验记录得出，是第个数据块的差异次数，表示第个数据块在校验过程中检测到的差异次数，差异是通过比对预期校验标识和实际校验标识不一致的情况统计得出，表示和两个系列的数据块的数量。in, Indicates the total number of inconsistencies in all checks counted by the system during the audit process. This result can help the system accurately measure the differences between data blocks and identify possible errors or tampering during the transmission process. Indicates the number of data blocks in the entire audit process. When the system is auditing, it will randomly generate audit data blocks from all data, extract and count this number. It is The number of times the system verifies each data block is obtained by recording multiple verifications of each data block. It is The number of differences between data blocks indicates the The number of differences detected during the verification process for each data block. The differences are calculated by comparing the expected verification mark with the actual verification mark. express and The number of data blocks for both series.

设置参数，审计了个数据块，并且每个数据块的校验和差异次数为、、、，则计算总的校验差异次数如下：Setting parameters , audited data blocks, and the number of checksum differences for each data block is , , , , then calculate the total number of verification differences as follows:

， ,

结果表明系统中的总校验差异次数为14，表示在100个数据块中，共出现了14次校验差异。这些统计结果为系统提供了深入分析差异数据块的依据，从而进一步确认这些差异是否源自篡改或传输错误，并采取相应的补救措施。The results show that the total number of checksum differences in the system is 14, which means that there are 14 checksum differences in 100 data blocks. These statistical results provide the system with a basis for in-depth analysis of the difference data blocks, so as to further confirm whether these differences are caused by tampering or transmission errors, and take corresponding remedial measures.

基于整合后的校验差异分析结果，比较原始校验标识与传输后的校验标识，逐步分析每个数据块的差异情况，生成数据完整性分析结果；Based on the integrated verification difference analysis results, the original verification mark and the verification mark after transmission are compared, and the difference of each data block is gradually analyzed to generate the data integrity analysis results;

首先从传输日志中提取所有涉及校验差异的数据块，并记录每个数据块的相关信息，包括差异出现的时间、传输路径、数据块的大小和位置等，这些数据通过传输过程中生成的日志文件或实时监控系统获取，系统依次调用每个数据块的校验记录，比较差异数据块与原始数据块的校验标识，通过对比每个数据块的校验结果，系统可以识别出哪些数据块在传输中发生了不一致的情况，接着，对这些差异数据块进行进一步分析，通过统计差异的出现频率，结合传输路径和时间段，系统能够确定这些差异是否具有规律性或集中出现在某些传输条件下，此外，系统还会将每个差异数据块的相关信息进行整合和关联，并将统计结果与其他审计数据结合，确保数据传输的完整性，经过该操作，生成数据完整性分析结果，并对差异数据块做出进一步的处理建议。First, all data blocks involving verification differences are extracted from the transmission log, and the relevant information of each data block is recorded, including the time when the difference occurred, the transmission path, the size and location of the data block, etc. These data are obtained through the log file generated during the transmission process or the real-time monitoring system. The system calls the verification record of each data block in turn, and compares the verification mark of the difference data block with the original data block. By comparing the verification results of each data block, the system can identify which data blocks have inconsistent conditions during transmission. Then, these difference data blocks are further analyzed. By counting the frequency of occurrence of differences, combined with the transmission path and time period, the system can determine whether these differences are regular or appear in a concentrated manner under certain transmission conditions. In addition, the system will integrate and associate the relevant information of each difference data block, and combine the statistical results with other audit data to ensure the integrity of data transmission. After this operation, the data integrity analysis results are generated, and further processing suggestions are made for the difference data blocks.

请参阅图7，更新哈希值和索引信息的获取步骤具体为：Please refer to FIG. 7 , the specific steps for obtaining updated hash values and index information are as follows:

基于数据完整性分析的结果，重新分配每个数据块的哈希值并与原始值对比，识别哈希值不一致的数据块，并将其标记为疑似篡改，生成疑似篡改的数据块列表；Based on the results of data integrity analysis, the hash value of each data block is reallocated and compared with the original value, data blocks with inconsistent hash values are identified and marked as suspected tampering, and a list of suspected tampered data blocks is generated;

通过哈希算法重新生成每个数据块的哈希值，并与原始值进行比对，系统通过这一对比过程来识别哈希值不一致的数据块，这些数据块随后被标记为篡改嫌疑对象，哈希计算和比对过程在系统的审计日志中详细记录，以便于事后追踪和分析，通过这些步骤，能够成功标记所有疑似被篡改的数据块。The hash value of each data block is regenerated through the hash algorithm and compared with the original value. The system uses this comparison process to identify data blocks with inconsistent hash values. These data blocks are then marked as suspected tampering objects. The hash calculation and comparison process is recorded in detail in the system's audit log for subsequent tracking and analysis. Through these steps, all data blocks suspected of being tampered with can be successfully marked.

基于疑似篡改的数据块列表，采用公式：Based on the list of suspected tampered data blocks, the formula is used:

， ,

其中，用于衡量疑似篡改数据块的比例，是被标记为疑似篡改的数据块数量，是总的数据块数量。in, Used to measure the proportion of suspected tampered data blocks, is the number of data blocks marked as suspected of tampering, is the total number of data blocks.

若，，，则：like , , ,but:

， ,

结果表明数据篡改的严重程度为1%，表明在所有审查的数据块中，存在1%的数据块被检测为疑似篡改，是一个关键的度量，用于决策制定。通常，任何大于0.5%的值都应被视为引起关注的指标，因为可能表明系统的安全控制措施存在漏洞或已被绕过。The results show that the severity of data tampering is 1%, indicating that among all the data blocks reviewed, 1% of the data blocks are detected as suspected tampering, which is a key metric for decision making. Generally, any data tampering greater than 0.5% is detected as suspected tampering. Values above 0.001 should be considered an indicator of concern as they may indicate that the system's security controls are vulnerable or have been bypassed.

基于数据篡改的严重程度指标，限制疑似篡改数据块的访问权限，记录每个数据块的唯一标识符和时间戳，并更新哈希值和索引信息，生成安全处理结果；Based on the severity index of data tampering, restrict access rights to suspected tampered data blocks, record the unique identifier and timestamp of each data block, and update the hash value and index information to generate a security processing result;

在数据块被标记为疑似篡改后，系统进一步限制这些数据块的访问权限，采用设置访问控制列表（ACL）的方法禁止非授权访问，记录每个数据块的唯一标识符和时间戳，确保可以准确追踪到每个数据块的来源和修改时间，此外，系统还更新被标记数据块的哈希值和索引信息，增强了对数据完整性的保护，所有这些操作都记录在系统日志中，以供后续安全审核和风险评估使用。After a data block is marked as suspected of tampering, the system further restricts access rights to the data block by setting an access control list (ACL) to prohibit unauthorized access and records the unique identifier and timestamp of each data block to ensure that the source and modification time of each data block can be accurately tracked. In addition, the system also updates the hash value and index information of the marked data block to enhance the protection of data integrity. All these operations are recorded in the system log for subsequent security audits and risk assessments.

一种票据数据完整性验证方法，票据数据完整性验证方法基于上述票据数据完整性验证系统执行，包括以下步骤：A method for verifying the integrity of bill data, which is performed based on the above-mentioned bill data integrity verification system, comprises the following steps:

S2：根据每个组合的唯一哈希值，将哈希值与对应的时间戳和唯一标识符关联，形成数据完整性标识；S2: Based on the unique hash value of each combination, the hash value is associated with the corresponding timestamp and unique identifier to form a data integrity identifier;

S3：基于数据完整性标识，在票据信息中添加冗余数据，同时实时监控数据流，记录并分析数据传输中的错误，以及每个字符的丢失概率，生成动态纠错数据流；S3: Based on the data integrity identifier, redundant data is added to the ticket information, and the data stream is monitored in real time, recording and analyzing the errors in data transmission and the loss probability of each character, and generating a dynamic error correction data stream;

S4：基于动态纠错数据流，进行随机数序列的生成，用于选择需要重新验证的数据块，对选中的数据块执行校验标识的核对，比对校验结果的预期与当前表现，形成数据完整性分析结果；S4: Based on the dynamic error correction data stream, a random number sequence is generated to select the data block that needs to be re-verified, the verification mark is checked on the selected data block, and the expected and current performance of the verification result are compared to form a data integrity analysis result;

S5：基于数据完整性分析结果，识别并标记具有哈希值差异的数据块，对这些数据块执行访问限制，更新哈希值和索引信息，生成安全处理结果。S5: Based on the data integrity analysis results, identify and mark data blocks with hash value differences, perform access restrictions on these data blocks, update hash values and index information, and generate security processing results.

以上，仅是本发明的较佳实施例而已，并非对本发明作其他形式的限制，任何熟悉本专业的技术人员可能利用上述揭示的技术内容加以变更或改型为等同变化的等效实施例应用于其他领域，但是凡是未脱离本发明技术方案内容，依据本发明的技术实质对以上实施例所做的任何简单修改、等同变化与改型，仍属于本发明技术方案的保护范围。The above are only preferred embodiments of the present invention and are not intended to limit the present invention in other forms. Any technician familiar with the profession may use the technical contents disclosed above to change or modify them into equivalent embodiments with equivalent changes and apply them to other fields. However, any simple modification, equivalent change and modification made to the above embodiments based on the technical essence of the present invention without departing from the technical solution of the present invention still falls within the protection scope of the technical solution of the present invention.

Claims

1. A ticket data integrity verification system, the system comprising:

The data hash processing module constructs hash values of data contents based on the acquired bill data, calculates the time interval between each data item and the time stamp thereof, associates the hash values, the time stamp and the unique identifier, and generates a data integrity identifier;

The prediction error correction coding module inserts redundant data into bill information based on the data integrity mark, monitors transmission errors in the data stream in real time, calculates a probability value of each character lost in the transmission process, analyzes the uniformity degree and fluctuation range of error distribution, adjusts the redundant data and generates a dynamic error correction data stream;

The random audit verification module generates a random number sequence according to the dynamic error correction data stream, re-verifies the verification identification of the data block, compares the expected and current verification results, calculates the proportion of the data block with inconsistent verification results and the total data block, counts the occurrence times of verification differences and information of the difference data block, and forms a data integrity analysis result;

And marking the data blocks with hash value differences as suspected tampering by the tamper response processing module according to the data integrity analysis result, limiting access rights, updating the hash value and index information, and generating a security processing result.

2. The ticket data integrity verification system of claim 1, wherein the step of obtaining the data integrity identifier comprises:

based on the acquired bill data, generating a data hash value by reading characters and numbers on the bill and classifying and sorting the extracted data content, wherein the data hash value comprises a date, an amount of money, a plurality of key information of a payee and a unique identifier of each bill;

Based on the data hash value, calculating each data item and a time stamp thereof, identifying a time mode of data input, and monitoring an abnormal mode of data input to obtain a data input mode analysis result;

Based on the data input mode analysis result, associating the hash value, the time stamp and the unique identifier of each data item, storing the data items into a data table, establishing a bidirectional index relation between the hash table and the data table, and generating a data integrity identifier.

3. The ticket data integrity verification system of claim 2, wherein the step of calculating a probability value for each character lost during transmission comprises:

analyzing character frequency and position information in the transmission process according to the data integrity identification, determining the position where errors easily occur in the data, inserting redundant data, and generating real-time monitoring information of the transmission errors;

Based on the real-time monitoring information of the transmission errors, recording the transmission error times, delay times and occurrence frequency of each character, and adopting the formula:

，

Calculating the loss probability of the ith character Obtaining a loss probability analysis result;

Wherein, Is the number of transmission errors for the i-th character,Is the number of transmission delays of the ith character,Is the frequency of occurrence of the i-th character,Representing the total number of times character i is transmitted throughout the transmission,The weight coefficient is used for adjusting the influence proportion of the transmission error times, the delay times and the character occurrence frequency in the loss probability respectively;

Based on the loss probability analysis result, counting the number of errors and the occurrence positions of characters in the transmission data packet, and generating a transmission error information analysis result by combining the distribution situation of errors in the loss probability analysis transmission.

4. A ticket data integrity verification system as claimed in claim 3 wherein said step of obtaining a fluctuation range of said analysis error distribution is specifically:

based on the analysis result of the transmission error information, the transmission characteristics of each character are analyzed, and the formula is adopted:

，

Calculating fluctuation range of error distribution Generating fluctuation range information;

Wherein, Representing charactersThe time delay from the sending to the receiving end during transmission,Representing characters due to transmission errorsThe number of times that it is retransmitted,Is the average probability of loss of all the characters,Is the total number of characters, representing the total number of characters that occur during transmission;

And readjusting the insertion of redundant data based on the fluctuation range information, dynamically monitoring the change of the data stream, and adjusting to generate a dynamic error correction data stream.

5. The ticket data integrity verification system of claim 4 wherein said step of comparing expected and current verification results comprises the steps of:

generating a random number sequence based on the dynamic error correction data stream, and determining the position of the data block to be audited to obtain the position information of the data block to be audited;

Based on the position information of the data block to be audited, comparing the verification identifications of the receiving end and the sending end, and re-verifying the verification identifications of each data block to obtain a verification result of the verification identifications;

based on the verification result of the verification mark, adopting the formula:

，

Calculating data block verification inconsistency ratio Generating difference proportion information of data block verification;

Wherein, Indicating the number of data blocks found to be inconsistent by the verification,Is the frequency of the audit,Is the transmission delay of the packet,Is the size of the data block and,Is the total audit data block number.

6. The ticket data integrity verification system of claim 5, wherein the step of obtaining the number of occurrences of the statistical verification difference comprises:

based on the difference proportion information of the data block verification, adopting the formula:

，

calculating the total check difference times Generating an integrated verification difference analysis result;

Wherein, Representing the number of data blocks throughout the audit process,Is the firstThe number of checks of the individual data blocks,Is the firstThe number of differences of the data blocks indicates the firstThe number of differences detected by the data blocks during the verification process,Representation ofAndThe number of data blocks of the two series;

and based on the integrated check difference analysis result, comparing the original check mark with the transmitted check mark, and gradually analyzing the difference condition of each data block to generate a data integrity analysis result.

7. The ticket data integrity verification system of claim 6, wherein the step of obtaining updated hash value and index information comprises:

Based on the result of the data integrity analysis, reassigning the hash value of each data block and comparing the hash value with the original value, identifying the data block with inconsistent hash value, marking the data block as suspected tampering, and generating a suspected tampered data block list;

based on the suspected tampered data block list, adopting the formula:

，

calculating quantization severity of data tampering Generating a severity index of data tampering;

Wherein, For measuring the proportion of suspected tampered data blocks,Is the number of data blocks marked as suspected tampering,Is the total number of data blocks;

And limiting the access authority of suspected tampered data blocks based on the severity index of the data tampering, recording the unique identifier and the time stamp of each data block, updating the hash value and the index information, and generating a security processing result.

8. A ticket data integrity verification method, characterized in that the ticket data integrity verification system according to any one of claims 1-7 is executed, comprising the steps of:

Based on the acquired bill data, carrying out content reading and time stamp association on each data item, inputting the combination of the content of each data item and the time stamp into hash processing, and generating a unique hash value of each combination;

according to the unique hash value of each combination, associating the hash value with the corresponding time stamp and the unique identifier to form a data integrity identifier;

based on the data integrity mark, adding redundant data in bill information, simultaneously monitoring data flow in real time, recording and analyzing errors in data transmission and loss probability of each character, and generating dynamic error correction data flow;

based on the dynamic error correction data stream, generating a random number sequence, which is used for selecting a data block to be re-verified, checking a check mark on the selected data block, and comparing the expected and current performances of the check result to form a data integrity analysis result;

Based on the data integrity analysis result, identifying and marking data blocks with hash value differences, performing access restriction on the data blocks, updating hash values and index information, and generating a security processing result.