CN1949761B

CN1949761B - A data synchronization method and its differential encoding method

Info

Publication number: CN1949761B
Application number: CN200510100339XA
Authority: CN
Inventors: 黄斌强
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2005-10-13
Filing date: 2005-10-13
Publication date: 2010-09-15
Anticipated expiration: 2025-10-13
Also published as: CN1949761A

Abstract

The invention relates to a data synchronization method, which is used for data synchronization between a data synchronization center and a synchronization terminal; it includes steps: the data synchronization center and the synchronization terminal negotiate to start differential encoding synchronization; the data synchronization center compares the data files since the last synchronization The data synchronization center sends the coded string and the newly generated latest synchronization serial number to the synchronization terminal; after receiving the synchronization information sent by the data synchronization center, the synchronization terminal decodes the aforementioned coded string, and Set the data file obtained by the last synchronization according to the decoding result; when the data synchronization is completed, the synchronization terminal saves the aforementioned latest synchronization serial number. In the present invention, after the data synchronization center and the synchronization terminal start the data synchronization process through negotiation, no further negotiation is required during the synchronization process, so the time overhead of the negotiation can be reduced, and the synchronization terminal does not need to perform verification operations, which can improve work efficiency.

Description

A data synchronization method and its differential encoding method

技术领域technical field

本发明涉及电子数据处理技术，特别是涉及一种数据同步方法及其差分编码方法。The invention relates to electronic data processing technology, in particular to a data synchronization method and a differential encoding method thereof.

背景技术Background technique

随着信息技术的发展，各种电子数据处理系统以及网络的应用日益广泛。在很多互联网增值业务中，为加强用户之间的沟通和增强用户的体验，很多业务和应用服务器都需要感知用户的状态信息。在一些其他的领域，如电子商务搜索等领域，也需要感知用户的在线状态并提供搜索过滤功能等。With the development of information technology, various electronic data processing systems and networks are increasingly widely used. In many Internet value-added services, in order to enhance communication between users and enhance user experience, many services and application servers need to sense user status information. In some other fields, such as e-commerce search and other fields, it is also necessary to sense the user's online status and provide search filtering functions.

为提高对用户状态信息处理的效率，需要将用户的状态数据从状态数据中心(也可称之为数据同步中心)同步到要接受用户状态数据的各业务服务器(也可称之为同步端)中。In order to improve the efficiency of processing user status information, it is necessary to synchronize the user status data from the status data center (also called the data synchronization center) to each service server (also called the synchronization terminal) that accepts the user status data middle.

因此，有必要设计和实现某种数据同步方案以高效地同步这些频繁更新的状态数据到各业务服务器，而不增加业务服务器的负荷。Therefore, it is necessary to design and implement some kind of data synchronization scheme to efficiently synchronize these frequently updated state data to each service server without increasing the load of the service server.

现有技术中，存在一些数据同步的方法。目前，比较通用、高效的数据同步方案一般基于同步中心和同步端间数据块的校验算法，如较为流行的远程数据同步工具Rsync等，属于增量同步方式。In the prior art, there are some data synchronization methods. At present, more general and efficient data synchronization schemes are generally based on the verification algorithm of data blocks between the synchronization center and the synchronization terminal, such as the more popular remote data synchronization tool Rsync, which belongs to the incremental synchronization method.

基于校验算法的数据同步方法的工作原理为同步中心针对要同步数据目录中的每个文件，对文件中每个固定大小的数据块(Chunk)生成校验码Checksum，该校验码是能唯一标识文件中某个数据块的数据；然后发送到同步端；同步端收到该信息后，首先也对相应文件的数据块生成校验码Checksum’，将生成的检验码Checksum’与同步中心发送过来的校验码Checksum进行校对；若校验码一致则该数据块中的数据一致，则通知同步中心不需进行数据同步；若两端校验码不一致，则同步端发送同步请求到同步中心，同步中心将数据发送到同步端。如此逐一数据块进行校验码的校对，以及同步；直到整个文件处理完毕。The working principle of the data synchronization method based on the verification algorithm is that the synchronization center generates a check code Checksum for each fixed-size data block (Chunk) in the file for each file in the data directory to be synchronized. Uniquely identify the data of a data block in the file; then send it to the synchronization terminal; after the synchronization terminal receives the information, it first generates a checksum Checksum' for the data block of the corresponding file, and compares the generated checksum Checksum' with the synchronization center The checksum Checksum sent is checked; if the checksum is consistent, the data in the data block is consistent, and the synchronization center is notified that data synchronization is not required; if the checksums at both ends are inconsistent, the synchronization terminal sends a synchronization request to the synchronization Center, the synchronization center sends data to the synchronization terminal. In this way, the check code is checked and synchronized block by block until the entire file is processed.

请参阅图1，是一种现有技术的数据同步方法进行数据同步的流程图。Please refer to FIG. 1 , which is a flowchart of a data synchronization method in the prior art for data synchronization.

同步中心对文件的数据块Chunk1(0，1000)生成校验码Checksum1；向同步端发送信息Check_update(filename，Chunk1，Checksum1，…)；同步端接受该信息后，对文件的数据块Chunk1(0，1000)采用相同的算法生成校验码Checksum1’；随后比较Checksum1和Checksum1’；由于二者一致，因此向同步中心发送不需同步的通知，携带参数(proto_version，chunkid，oper_type，…)，其中proto_version为协议版本，chunkid为数据块标识，oper_type为操作类型(同步或不需同步)。The synchronization center generates a check code Checksum1 for the data block Chunk1 (0, 1000) of the file; sends information Check_update (filename, Chunk1, Checksum1, ...) to the synchronization terminal; , 1000) using the same algorithm to generate the check code Checksum1'; then compare Checksum1 and Checksum1'; since the two are consistent, a notification that does not need to be synchronized is sent to the synchronization center, carrying parameters (proto_version, chunkid, oper_type, ...), where proto_version is the protocol version, chunkid is the data block identifier, and oper_type is the operation type (synchronous or not).

随后，同步中心对文件的数据块Chunk2(1000，2000)生成校验码Checksum2；向同步端发送信息Check_update(filename，Chunk2，Checksum2，…)；同步端接受该信息后，对文件的数据块Chunk2(1000，2000)生成校验码Checksum2’；随后比较Checksum2和Checksum2’；由于二者不一致，因此向同步中心发送同步请求，携带参数(proto_version，chunkid，oper_type，…)；同步中心向同步端发送数据同步消息(filename，chunk2，data)；同步端接收数据并更新相应的chunk2；然后向同步中心发送同步成功消息。Subsequently, the synchronization center generates a check code Checksum2 for the data block Chunk2 (1000, 2000) of the file; sends information Check_update (filename, Chunk2, Checksum2, ...) to the synchronization terminal; (1000, 2000) generates a checksum Checksum2'; then compares Checksum2 and Checksum2'; since the two are inconsistent, a synchronization request is sent to the synchronization center with parameters (proto_version, chunkid, oper_type, ...); the synchronization center sends it to the synchronization terminal Data synchronization message (filename, chunk2, data); the synchronization terminal receives the data and updates the corresponding chunk2; then sends a synchronization success message to the synchronization center.

对于Rsync来说，校验码的生成采用的是md4，在开始时对文件各数据块生成好一系列的checksum列表，然后逐个校对。For Rsync, the verification code is generated using md4. At the beginning, a series of checksum lists are generated for each data block of the file, and then checked one by one.

但是，该现有技术存在缺陷：首先，数据同步中心与同步端都需对文件中每个数据分块进行校验运算和协商同步，协商的次数取决于文件数据分块的大小，若数据分块较小，则需协商的次数较多，协商导致的耗时开销较大；其次，在同步两端校验码不一致的情况下，即使数据块中仅有1个bit的数据发生改变，同步中心也要将整个数据块或者将整个数据块压缩后发送给同步端，数据压缩编码率较低。However, there are defects in this prior art: first, both the data synchronization center and the synchronization terminal need to perform verification calculation and negotiation synchronization on each data block in the file, and the number of times of negotiation depends on the size of the file data block. If the block size is small, more times need to be negotiated, and the time-consuming overhead caused by the negotiation is relatively large; secondly, when the check codes at both ends of the synchronization are inconsistent, even if only 1 bit of data in the data block changes, the synchronization The center also needs to send the entire data block or compress the entire data block to the synchronization terminal, and the data compression coding rate is relatively low.

由前述可知，若数据分块太小，则会导致同步两端校验运算和协商时延增大；若数据分块太大，则会导致数据压缩编码率降低，故难以找到一个最佳的数据分块值，使其适合所有的应用场景。It can be seen from the above that if the data block is too small, it will lead to an increase in the verification operation and negotiation delay at both ends of the synchronization; if the data block is too large, the data compression rate will be reduced, so it is difficult to find an optimal Data chunking value, making it suitable for all application scenarios.

现有技术中，另外一种在数据同步领域中使用较多的是全同步方式，同步中心和同步端协商好后，建立数据传输通道，并将要同步的整个数据文件传输到同步端，即使该数据文件仅有少数几个bit发生改变。In the prior art, another method that is widely used in the field of data synchronization is the full synchronization method. After the synchronization center and the synchronization terminal negotiate, a data transmission channel is established and the entire data file to be synchronized is transmitted to the synchronization terminal. Only a few bits of the data file are changed.

该全同步方式的机制相对比较简单，协商只需一次，没有同步过程中多次协商和校验运算的时间开销，但是由于总是要同步整个数据文件，故数据压缩编码率很低，从而增加服务器的网络I/O和带宽消耗。The mechanism of the full synchronization method is relatively simple, only one negotiation is required, and there is no time overhead for multiple negotiations and verification operations in the synchronization process. However, since the entire data file is always synchronized, the data compression coding rate is very low, thereby increasing Network I/O and bandwidth consumption of the server.

发明内容Contents of the invention

本发明解决的技术问题在于提供一种数据同步方法及其差分编码方法，其协商和校验运算的时间开销小，且可以具有较高的压缩编码率。The technical problem to be solved by the present invention is to provide a data synchronization method and a differential coding method thereof, the time overhead of the negotiation and verification operation is small, and the compression coding rate can be relatively high.

为此，本发明解决技术问题的技术方案是：提供一种数据同步方法，用于数据同步中心和同步端之间的数据同步；所述方法包括步骤：For this reason, the technical scheme that the present invention solves technical problem is: provide a kind of data synchronization method, be used for the data synchronization between data synchronization center and synchronization terminal; Described method comprises steps:

11)数据同步中心和同步端协商启动差分编码同步；数据同步中心对当前数据文件和上次同步后的数据文件进行多字节读入并进行异或运算，若异或结果为零，则统计数据块的偏移值，若异或结果不等于零，则算出当前数据块的偏移值及差异值，对当前数据块的偏移值及差异值进行压缩后增加到编码串中；11) The data synchronization center negotiates with the synchronization terminal to start differential encoding synchronization; the data synchronization center reads in multiple bytes of the current data file and the data file after the last synchronization and performs an XOR operation. If the XOR result is zero, the statistics The offset value of the data block, if the XOR result is not equal to zero, then calculate the offset value and difference value of the current data block, compress the offset value and difference value of the current data block and add them to the code string;

12)数据同步中心将编码后的编码串和新生成的最近同步流水号发送到同步端；12) The data synchronization center sends the encoded code string and the newly generated latest synchronization serial number to the synchronization terminal;

13)同步端接收到数据同步中心发送的同步信息后，对前述编码串进行解码，得到当前数据块的偏移值及差异值，从上次同步得到的数据文件中获取相对偏移值的数据块，并与差异值进行异或运算，完成后再写回到上次同步得到的数据文件中，得到要同步的数据文件；13) After the synchronization terminal receives the synchronization information sent by the data synchronization center, it decodes the aforementioned code string to obtain the offset value and difference value of the current data block, and obtains the data of the relative offset value from the data file obtained in the last synchronization Block, and perform XOR operation with the difference value, and then write back to the data file obtained from the last synchronization to obtain the data file to be synchronized;

14)当完成数据同步后，同步端保存前述最近同步流水号。14) After the data synchronization is completed, the synchronization terminal saves the aforementioned latest synchronization serial number.

优选地，在所述步骤11)的协商是：同步中心和同步端进行最近同步流水号校对协商；如果最近同步流水号一致，则启动差分编码同步；如果不一致，则采用全同步方式进行数据同步。Preferably, the negotiation in said step 11) is: the synchronization center and the synchronization terminal carry out the latest synchronization serial number proofreading negotiation; if the latest synchronization serial number is consistent, then start the differential encoding synchronization; if not, then adopt the full synchronization mode to perform data synchronization .

优选地，所述步骤11)由同步中心定期发起，且同步频率可配置。Preferably, the step 11) is periodically initiated by the synchronization center, and the synchronization frequency is configurable.

优选地，所述步骤11)读入的字节的数目可配置；所述步骤11)的偏移值为当前数据块位置与上次出现差异特性的数据块位置的差；所述步骤11)压缩所采用的压缩算法为Vint压缩算法、zip压缩算法、或离散二进制串的前缀压缩算法。Preferably, the number of bytes read in the step 11) is configurable; the offset value of the step 11) is the difference between the current data block position and the data block position where the difference characteristic occurred last time; the step 11) The compression algorithm adopted for the compression is a Vint compression algorithm, a zip compression algorithm, or a prefix compression algorithm of discrete binary strings.

优选地，所述步骤13)的解码包括获得当前数据块的偏移值及差异值；所述步骤13)的设置包括：根据编码串中的偏移值获得对应的数据文件的数据块；将差异值和前述数据块进行异或运算，得到更新后的数据块；将更新后的数据块写入数据文件中，得到同步后的数据文件。Preferably, the decoding in step 13) includes obtaining the offset value and difference value of the current data block; the setting in step 13) includes: obtaining the data block of the corresponding data file according to the offset value in the coded string; An XOR operation is performed on the difference value and the aforementioned data block to obtain an updated data block; the updated data block is written into the data file to obtain a synchronized data file.

另外，本发明还提供一种差分编码方法，所述方法包括步骤：In addition, the present invention also provides a differential encoding method, said method comprising the steps of:

81)对当前数据文件和上次同步后的数据文件进行多字节读入并进行异或运算；81) Multi-byte reading is carried out to the current data file and the data file after the last synchronization and XOR operation is performed;

82)若异或结果为零，则统计数据块的偏移值；82) If the XOR result is zero, the offset value of the statistical data block;

83)若异或结果不等于零，则算出当前数据块的偏移值及差异值；83) If the XOR result is not equal to zero, then calculate the offset value and difference value of the current data block;

84)将当前数据块的偏移值及差异值组成元素对；对其进行压缩后增加到编码串中。84) Composing the offset value and difference value of the current data block into an element pair; compressing it and adding it to the coded string.

优选地，所述步骤81)读入的字节的数目可配置；所述步骤83)的偏移值为当前数据块位置与上次出现差异特性的数据块位置的差；所述步骤84)采用的压缩算法为Vint压缩算法、zip压缩算法、或离散二进制串的前缀压缩算法。Preferably, the number of bytes read in the step 81) is configurable; the offset value of the step 83) is the difference between the current data block position and the data block position where the difference characteristic occurred last time; the step 84) The compression algorithm adopted is a Vint compression algorithm, a zip compression algorithm, or a prefix compression algorithm of discrete binary strings.

该方法进一步包括步骤：The method further includes the steps of:

101)对接收到的编码串进行解码，获得当前数据块的偏移值及差异值；101) Decoding the received code string to obtain the offset value and difference value of the current data block;

102)根据编码串中的偏移值获得对应的数据文件的数据块；102) Obtain the data block of the corresponding data file according to the offset value in the code string;

103)将差异值和前述数据块进行异或运算，得到更新后的数据块；103) Execute an XOR operation on the difference value and the aforementioned data block to obtain an updated data block;

104)将更新后的数据块写入数据文件中，得到同步后的数据文件。104) Write the updated data block into the data file to obtain the synchronized data file.

相对于现有技术，本发明的有益效果是：首先，由于本发明中数据同步中心和同步端经协商启动数据同步过程后，在同步过程中不需要再进行协商，因此可以减少协商的时间开销，且同步端无需进行校验运算，可以提高工作效率；其次，由于采用差分编码方式，对数据文件较上次同步以来的差异特性进行编码，数据编码率较高，从而可以减少网络I/O和带宽使用的开销。Compared with the prior art, the beneficial effects of the present invention are as follows: firstly, after the data synchronization center and the synchronization terminal start the data synchronization process through negotiation in the present invention, no further negotiation is required in the synchronization process, so the time overhead of the negotiation can be reduced , and the synchronization terminal does not need to perform verification operations, which can improve work efficiency; secondly, due to the use of differential encoding, the difference characteristics of the data file since the last synchronization are encoded, and the data encoding rate is high, which can reduce network I/O and bandwidth usage overhead.

在本发明的优选方案中，数据同步中心和同步端仅需对最近同步流水号协商一次，其协商的时间开销小。In the preferred solution of the present invention, the data synchronization center and the synchronization terminal only need to negotiate the latest synchronization serial number once, and the time overhead for the negotiation is small.

此外，在采用差分编码的基础上，应用压缩算法来实现压缩，数据压缩编码率较高，从而进一步减少网络I/O和带宽使用的开销。In addition, on the basis of differential coding, a compression algorithm is applied to achieve compression, and the data compression coding rate is high, thereby further reducing the overhead of network I/O and bandwidth usage.

附图说明Description of drawings

图1是一种现有技术的数据同步方法进行数据同步的流程图；Fig. 1 is a flow chart of performing data synchronization by a data synchronization method in the prior art;

图2是本发明的数据同步方法的流程图；Fig. 2 is a flowchart of the data synchronization method of the present invention;

图3是本发明的数据同步方法的实施例中进行差分编码的示意图；3 is a schematic diagram of differential encoding in an embodiment of the data synchronization method of the present invention;

图4是本发明的数据同步方法的实施例中进行差分解码的示意图；4 is a schematic diagram of differential decoding in an embodiment of the data synchronization method of the present invention;

图5是本发明的数据同步方法中的差分编码器的工作流程图；Fig. 5 is the work flowchart of the differential encoder in the data synchronization method of the present invention;

图6是本发明的数据同步方法中的差分解码器的工作流程图。Fig. 6 is a working flowchart of the differential decoder in the data synchronization method of the present invention.

具体实施方式Detailed ways

请参阅图2，是本发明的数据同步方法的流程图。Please refer to FIG. 2 , which is a flow chart of the data synchronization method of the present invention.

步骤S211，同步中心和同步端进行最近同步流水号校对协商。如果最近同步流水号一致，则进入步骤S221；如果不一致，则进入步骤S231，采用全同步方式进行数据同步。In step S211, the synchronization center and the synchronization terminal perform a negotiation on the latest synchronization serial number. If the latest synchronization serial numbers are consistent, go to step S221; if not, go to step S231, and use the full synchronization method to perform data synchronization.

在本发明中，所述最近同步流水号校对协商的过程可以由同步中心定期发起，且同步频率可以进行配置。In the present invention, the process of checking and negotiating the latest synchronization serial number can be initiated periodically by the synchronization center, and the synchronization frequency can be configured.

同步中心针对特定的文件向同步端发送消息<file，last_sync_seq>，其中，last_sync_seq为文件file对应的最近同步流水号；同步端提取本地保存的该文件(上次同步得到数据)的最近同步流水号(上次同步的流水号)，进行校对；若不一致，则表示同步端没有获取得到同步中心最近同步的数据，同步端向同步中心发送以全同步方式进行数据同步的请求<file，full_sync_type>；当同步中心收到该请求后，将要同步的数据文件通过建立的数据通道发送到同步端，格式为<file，last_sync_seq，full_sync_type，code_string>。The synchronization center sends a message <file, last_sync_seq> to the synchronization terminal for a specific file, where last_sync_seq is the latest synchronization serial number corresponding to the file file; the synchronization terminal extracts the latest synchronization serial number of the file (data obtained from the last synchronization) stored locally (The serial number of the last synchronization), check it; if it is inconsistent, it means that the synchronization terminal has not obtained the latest synchronized data from the synchronization center, and the synchronization terminal sends a request for data synchronization in a full synchronization mode to the synchronization center <file, full_sync_type>; When the synchronization center receives the request, it sends the data file to be synchronized to the synchronization terminal through the established data channel, and the format is <file, last_sync_seq, full_sync_type, code_string>.

步骤S221，若最近同步流水号一致，则同步端和数据同步中心之间启动压缩差分编码同步。In step S221, if the latest synchronization serial number is consistent, start compression differential encoding synchronization between the synchronization terminal and the data synchronization center.

步骤S222，数据同步中心对数据文件和数据文件较上次同步以来的差异特性进行压缩编码。In step S222, the data synchronization center compresses and encodes the data files and the difference characteristics of the data files since the last synchronization.

步骤S223，数据同步中心将压缩编码后的编码字符串和新生成的最近同步流水号发送到同步端。In step S223, the data synchronization center sends the compressed coded character string and the newly generated latest synchronization serial number to the synchronization terminal.

步骤S224，同步端收到数据同步中心发送的同步信息后，对编码字符串进行解码，并设置上次同步得到的数据文件。Step S224, after receiving the synchronization information sent by the data synchronization center, the synchronization terminal decodes the coded character string, and sets the data file obtained in the last synchronization.

步骤S225，当完成数据同步后，保存最近同步流水号。Step S225, when the data synchronization is completed, save the latest synchronization serial number.

为了便于对本发明做进一步的了解，下面结合实施例对本发明进行详细描述。In order to facilitate a further understanding of the present invention, the present invention will be described in detail below in conjunction with examples.

首先，数据同步中心和同步端进行最近流水号校对协商。如果最近同步流水号不一致，则采用全同步方式进行数据同步。如果一致，则由同步端通知数据同步中心进行压缩差分编码同步，所述通知的格式可以是<file，last_sync_seq，sync_type>。Firstly, the data synchronization center and the synchronization terminal perform the latest serial number collation and negotiation. If the latest synchronization serial number is inconsistent, the full synchronization method will be used for data synchronization. If they are consistent, the synchronization terminal notifies the data synchronization center to perform compressed differential encoding synchronization, and the format of the notification can be <file, last_sync_seq, sync_type>.

其次，数据同步中心在接收到该通知后，启动差分压缩编码器，对数据文件和数据文件较上次同步以来的差异特性进行压缩编码，并将压缩编码后的编码字符串和新生成的最近同步流水号<last_sync_seq，code_string>发送到同步端。其中，对差异特性进行压缩编码的目的是减少要传输的字节数，从而减少网络带宽的开销。Secondly, after receiving the notification, the data synchronization center starts the differential compression encoder to compress and encode the data file and the difference characteristics of the data file since the last synchronization, and compare the encoded string after compression with the newly generated latest The synchronization sequence number <last_sync_seq, code_string> is sent to the synchronization terminal. Among them, the purpose of compressing and coding the difference features is to reduce the number of bytes to be transmitted, thereby reducing the overhead of network bandwidth.

请参阅图3，是本发明的数据同步方法的实施例中进行差分编码的示意图。Please refer to FIG. 3 , which is a schematic diagram of differential encoding in an embodiment of the data synchronization method of the present invention.

假设数据文件D的t时刻为D1，当前时刻为D2，经过差分编码器(DiffEncoder)300后将得到数据文件D1和D2差异特性表示的编码串code string。Assuming that the time t of the data file D is D1, and the current time is D2, after passing through the differential encoder (DiffEncoder) 300, the code string code string represented by the difference between the data files D1 and D2 will be obtained.

本实施例中，数据同步中心和同步端进行同步时，压缩编码串(Code String)格式定义为：In this embodiment, when the data synchronization center and the synchronization terminal are synchronizing, the compressed code string (Code String) format is defined as:

Vint{(offset，diff_value)^<diff_count>}。Vint {(offset, diff_value) ^<diff_count> }.

其中，数据文件D1和D2的差异特性通过元素对(offset，diff_value)进行表示，diff_count为差异元素对个数，offset为相对上次元素对表示差异特性的数据块偏移，diff_value为差异值。Among them, the difference characteristics of the data files D1 and D2 are represented by element pairs (offset, diff_value), diff_count is the number of difference element pairs, offset is the data block offset relative to the last element pair representing the difference characteristics, and diff_value is the difference value.

差异值为数据文件与上次同步数据文件的相应数据块进行异或运算得到。The difference value is obtained by XOR operation between the data file and the corresponding data block of the last synchronization data file.

当然，本领域的技术人员数据块的位置也可以直接用该字节的位置(即相对于文件头的偏移)表示。Of course, those skilled in the art may also directly use the position of the byte to represent the position of the data block (that is, the offset relative to the file header).

优选采用相对偏移，如此有利于压缩，使用本实施例提到的压缩算法，一般只使用一个字节则能进行表示。而若直接用该字节的位置表示，将需要多个字节表示。It is preferable to use a relative offset, which is beneficial to compression. Using the compression algorithm mentioned in this embodiment, generally only one byte can be used for representation. And if it is expressed directly by the position of the byte, multiple bytes will be required.

本实施例采用的Vint压缩算法是一种简单且效率很高的整形压缩算法，其主要思想为每7bits使用一个字节进行表示。当然，本领域的技术人可以理解，还可以采用其他的压缩算法来进行压缩。例如zip压缩算法、或离散二进制串的前缀压缩算法等等。The Vint compression algorithm adopted in this embodiment is a simple and highly efficient shaping compression algorithm, and its main idea is to use one byte for every 7 bits for representation. Of course, those skilled in the art can understand that other compression algorithms can also be used for compression. For example, the zip compression algorithm, or the prefix compression algorithm of discrete binary strings, etc.

请参阅表1，是压缩的示例。See Table 1 for an example of compression.

表1 Value First byte Second byte Third byte Table 1 value First byte Second byte Third byte

0 00000000 1 00000001 2 00000010 … 127 01111111 128 10000000 00000001 129 10000001 00000001 130 10000010 00000001 … 16,383 11111111 01111111 16,384 10000000 10000000 00000001 16,385 10000001 10000000 00000001 … 0 00000000 1 00000001 2 00000010 … 127 01111111 128 10000000 00000001 129 10000001 00000001 130 10000010 00000001 … 16,383 11111111 01111111 16,384 10000000 10000000 00000001 16,385 10000001 10000000 00000001 …

该表格1列举了一些数值在压缩前后的对应，并清楚地表示出其压缩特性。其中，Value为需要压缩的值，first byte、second byte、third byte则分别表示压缩后各字节的值。Table 1 lists the correspondence of some values before and after compression, and clearly shows the compression characteristics. Among them, Value is the value to be compressed, and first byte, second byte, and third byte respectively represent the value of each byte after compression.

再次，同步端收到数据同步中心发送的同步信息<last_sync_seq，code_string>后，启动差分压缩解码器对code_string进行解码，得到多个二元组(offset，diff_value)，并设置上次同步得到的数据文件。当完成数据同步后，设置最近同步流水号到数据同步登记表中。Again, after receiving the synchronization information <last_sync_seq, code_string> sent by the data synchronization center, the synchronization terminal starts the differential compression decoder to decode the code_string, obtains multiple binary groups (offset, diff_value), and sets the data obtained from the last synchronization document. When the data synchronization is completed, set the latest synchronization serial number to the data synchronization registration table.

请参阅图4，本发明的数据同步方法的实施例中进行差分解码的示意图。Please refer to FIG. 4 , which is a schematic diagram of differential decoding in an embodiment of the data synchronization method of the present invention.

启动差分压缩解码器(Diff_Decoder)400对code_string进行解码，得到多个二元组(offset，diff_value)；对每个(offset，diff_value)，从上次同步得到的数据文件D1中获取相对offset的数据块，并与diff_value进行异或运算，完成后再写回到D1数据文件，得到要同步的数据文件D2。Start the differential compression decoder (Diff_Decoder) 400 to decode code_string to obtain multiple binary groups (offset, diff_value); for each (offset, diff_value), obtain the data relative to offset from the data file D1 that was synchronized last time block, and perform XOR operation with diff_value, and then write back to the D1 data file to obtain the data file D2 to be synchronized.

请参阅图5，是本发明的数据同步方法中的差分编码器的工作流程图。Please refer to FIG. 5 , which is a working flowchart of the differential encoder in the data synchronization method of the present invention.

步骤S510，差分编码器进行编码时，对数据文件D1和D2进行多字节读入并进行异或运算(读入的字节个数为差分编码器的可调参数，缺省为32bits)。Step S510, when the differential encoder performs encoding, multi-byte reading is performed on the data files D1 and D2 and an XOR operation is performed (the number of bytes to be read is an adjustable parameter of the differential encoder, and the default is 32 bits).

步骤S520，若异或结果为0，则统计offset值。Step S520, if the XOR result is 0, calculate the offset value.

步骤S530，若异或结果不等于0，则算出其offset及diff_value值。将offset和diff_value组成元素对(offset，diff_value)。Step S530, if the XOR result is not equal to 0, calculate its offset and diff_value. Compose offset and diff_value into an element pair (offset, diff_value).

其中，offset及diff_value值可以采用这样的方式来计算：Among them, the offset and diff_value values can be calculated in this way:

offset＝当前数据块位置-上次出现差异特性的数据块位置；offset=current data block position - the data block position where the difference characteristic occurred last time;

若data_d1 data_d2分别表示数据文件D1和D2相应数据块(chunk)的数据，则diff_value＝data_d1^data_d1。If data_d1 and data_d2 represent the data of corresponding data blocks (chunks) of data files D1 and D2 respectively, then diff_value=data_d1^data_d1.

步骤S540，使用压缩算法对offset及diff_value值进行压缩，增加到编码队列中。Step S540, use a compression algorithm to compress the offset and diff_value values, and add them to the encoding queue.

请参阅图6，是本发明的数据同步方法中的差分解码器的工作流程图。Please refer to FIG. 6 , which is a working flowchart of the differential decoder in the data synchronization method of the present invention.

步骤S610，差分解码器进行解码时，获取差分编码器生成的编码串。Step S610, when the differential decoder performs decoding, obtain the coded string generated by the differential encoder.

步骤S620，根据编码串中的offset获得对应的数据文件D1的数据块。Step S620, obtain the corresponding data block of the data file D1 according to the offset in the code string.

步骤S630，将diff_value值和前述数据块进行异或运算，得到更新后的数据块。Step S630, XOR operation is performed on the diff_value value and the aforementioned data block to obtain an updated data block.

步骤S640，将其写入数据文件D1中，得到要同步的数据文件D2。Step S640, write it into the data file D1 to obtain the data file D2 to be synchronized.

本领域的技术人员理解，本发明中，差分编/解码器采用的是边读入边进行编码压缩或边读入边进行解码的方式，效率比较高。Those skilled in the art understand that, in the present invention, the differential encoder/decoder adopts a manner of encoding and compressing while reading or decoding while reading, and the efficiency is relatively high.

应用实例Applications

应用本发明的技术方案在用户状态数据中心和各应用服务器(同步端)间进行数据同步，效果比较好。Applying the technical scheme of the invention to synchronize data between the user state data center and each application server (synchronization terminal), the effect is relatively good.

140M的用户状态数据在状态同步中心与同步端间，每5s需进行一次同步，同步更新较为频繁，5s内总共大概有25000个用户的状态发生变化(上线，下线或隐身，每状态使用2bit进行表示)。The 140M user status data needs to be synchronized every 5s between the status synchronization center and the synchronization terminal, and the synchronization update is relatively frequent. There are about 25,000 user status changes within 5 seconds (online, offline or stealth, each status uses 2bit express).

应用本发明的技术方案所描述的具压缩特性的差分编码器处理后的编码数据仅为65K，而编码的效率为ms级，同步端进行解码同步的速度更为微秒级。整个同步过程从数据同步中心发起同步请求，到同步端收到应答后完成数据同步，耗时为ms级，效率极高。并且由于大大压缩了要同步的数据量，节省了两端CPU和内网带宽的开销。The encoded data processed by the differential encoder with compression characteristics described in the technical solution of the present invention is only 65K, and the encoding efficiency is at the ms level, and the decoding synchronization speed of the synchronization terminal is even more at the microsecond level. The entire synchronization process starts from the synchronization request initiated by the data synchronization center, and the data synchronization is completed after the synchronization terminal receives the response. The time-consuming is at the ms level and the efficiency is extremely high. And because the amount of data to be synchronized is greatly compressed, the CPU and intranet bandwidth overhead at both ends are saved.

综上所述，本发明提供一种使数据同步中心和同步端高效进行数据同步的方法，该技术方案对于数据文件同步前后的差异具有稀疏特性，且同步中心需与同步端进行频繁准实时同步的应用场景效果很好。In summary, the present invention provides a method for efficiently synchronizing data between the data synchronization center and the synchronization terminal. This technical solution has a sparse characteristic for the difference between data files before and after synchronization, and the synchronization center needs to perform frequent quasi-real-time synchronization with the synchronization terminal. The application scenario works well.

本发明的方法对同步中心和同步端的同步机制进行了改进。The method of the invention improves the synchronization mechanism of the synchronization center and the synchronization terminal.

同时，具有压缩特性的差分编/解码方法的实现能对数据文件较上次同步以来的差异特性进行编码压缩表示，另一侧获得这个编码串并进行解码后能得到要同步的数据。At the same time, the realization of the differential encoding/decoding method with compression characteristics can encode and compress the difference characteristics of the data file since the last synchronization, and the other side can obtain the data to be synchronized after obtaining the encoded string and decoding it.

以上所述仅仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above description is only a preferred embodiment of the present invention, it should be pointed out that for those skilled in the art, without departing from the principle of the present invention, some improvements and modifications can also be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

Claims

1. A data synchronization method, used for data synchronization between a data synchronization center and a synchronization terminal; it is characterized in that, comprising steps:

11) The data synchronization center negotiates with the synchronization terminal to start differential encoding synchronization; the data synchronization center reads in multiple bytes of the current data file and the data file after the last synchronization and performs an XOR operation. If the XOR result is zero, the statistics The offset value of the data block, if the XOR result is not equal to zero, then calculate the offset value and difference value of the current data block, compress the offset value and difference value of the current data block and add them to the code string;

12) The data synchronization center sends the encoded code string and the newly generated latest synchronization serial number to the synchronization terminal;

13) After the synchronization terminal receives the synchronization information sent by the data synchronization center, it decodes the aforementioned code string to obtain the offset value and difference value of the current data block, and obtains the data of the relative offset value from the data file obtained in the last synchronization Block, and perform XOR operation with the difference value, and then write back to the data file obtained from the last synchronization to obtain the data file to be synchronized;

14) After the data synchronization is completed, the synchronization terminal saves the aforementioned latest synchronization serial number.

2. The data synchronization method according to claim 1, characterized in that, the negotiation in the step 11) is: the synchronization center and the synchronization terminal carry out the latest synchronization serial number proofreading negotiation, if the latest synchronization serial number is consistent, then start the difference Encoding synchronization; if inconsistent, data synchronization will be performed in full synchronization mode.

3. The data synchronization method according to claim 2, wherein the step 11) is regularly initiated by the synchronization center, and the synchronization frequency is configurable.

4. The data synchronization method according to claim 1, wherein:

The number of bytes read in the step 11) is configurable;

The offset value of the step 11) is the difference between the current data block position and the data block position where the difference characteristic occurred last time;

The compression algorithm used in the step 11) is Vint compression algorithm, zip compression algorithm, or prefix compression algorithm of discrete binary strings.

5. The data synchronization method according to claim 1, wherein:

The decoding in step 13) includes obtaining the offset value and the difference value of the current data block;

The setting of step 13) includes: obtaining the data block of the corresponding data file according to the offset value in the code string; performing an XOR operation on the difference value and the aforementioned data block to obtain an updated data block; The block is written into the data file, and the synchronized data file is obtained.

6. A differential encoding method, characterized in that, comprising steps:

81) Multi-byte reading is carried out to the current data file and the data file after the last synchronization and XOR operation is performed;

82) If the XOR result is zero, the offset value of the statistical data block;

83) If the XOR result is not equal to zero, then calculate the offset value and difference value of the current data block;

84) Composing the offset value and difference value of the current data block into an element pair; compressing it and adding it to the coded string.

7. differential coding method according to claim 6, is characterized in that,

The number of bytes read in in step 81) is configurable;

The offset value of the step 83) is the difference between the current data block position and the data block position where the difference characteristic occurred last time;

The compression algorithm adopted in the step 84) is a Vint compression algorithm, a zip compression algorithm, or a prefix compression algorithm of discrete binary strings.

8. The differential encoding method according to claim 6, characterized in that the method further comprises the steps of:

101) Decoding the received code string to obtain the offset value and difference value of the current data block;

102) Obtain the data block of the corresponding data file according to the offset value in the code string;

103) Execute an XOR operation on the difference value and the aforementioned data block to obtain an updated data block;

104) Write the updated data block into the data file to obtain the synchronized data file.