CN104834700A

CN104834700A - Method for capturing movement data increment based on track change

Info

Publication number: CN104834700A
Application number: CN201510206648.9A
Authority: CN
Inventors: 徐小龙; 刘笑笑; 李涛; 徐佳; 李千目; 章韵
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University
Priority date: 2015-04-27
Filing date: 2015-04-27
Publication date: 2015-08-12

Abstract

The invention discloses a mobile data incremental capture method based on trajectory change, which uses a trigger to capture the operation log, records the operation change process of the operation data table, and uses a purification method to merge the operation log to realize the capture of the net increment and the entire operation A record of the change process. The invention makes full use of the existing trigger mechanism of the mobile database and the low frequency of mobile node write operations in the mobile computing environment, significantly improves the efficiency of increment capture, and further adopts a purification operation when extracting increments to ensure that the increments The amount of data is the least, which greatly reduces the amount of synchronous data communication, relaxes the demand for network bandwidth of the synchronous system, and reduces the cost of communication. At the same time, the operation change process recorded by the present invention can be used as an effective credential for the rollback operation of the synchronization system when the synchronization system is unexpectedly terminated or severely timed out, which affects the final consistency of data, thereby increasing the robustness of the synchronization system.

Description

A Method of Incremental Capture of Mobile Data Based on Trajectory Change

技术领域technical field

本发明涉及增量捕获方法，属于移动计算和数据库交叉技术应用领域，尤其涉及一种基于轨迹变更的移动数据增量捕获方法。The invention relates to an incremental capture method, which belongs to the application field of mobile computing and database intersecting technologies, in particular to a trajectory change-based mobile data incremental capture method.

背景技术Background technique

随着移动计算技术和无线通信技术的发展与普及，移动应用日趋复杂与多样化。为了满足移动用户对移动业务数据的可靠处理需求，移动计算系统需要允许用户在离线情况下处理数据，支持多数据副本的分布式存储机制。目前，分布式存储机制中普遍采用乐观复制方法(Optimistic Replication)来保障业务数据的可用性和控制业务数据的一致性。由于移动网络的弱连接性以及通信延迟与带宽相对有限等特征使得移动计算环境中的数据同步机制难以实现高效的数据的一致性。With the development and popularization of mobile computing technology and wireless communication technology, mobile applications are becoming increasingly complex and diverse. In order to meet mobile users' requirements for reliable processing of mobile service data, mobile computing systems need to allow users to process data offline and support a distributed storage mechanism with multiple data copies. At present, Optimistic Replication is commonly used in distributed storage mechanisms to ensure the availability of business data and control the consistency of business data. Due to the weak connectivity of the mobile network and the relatively limited communication delay and bandwidth, it is difficult for the data synchronization mechanism in the mobile computing environment to achieve efficient data consistency.

为了减少同步数据量，提高同步效率，降低同步对移动网络带宽的需求，移动计算系统通常采用增量捕获方法，即每次数据同步时只交换修改过的数据。典型的增量数据捕获方法有快照法(Snapshot)、触发器法(Trigger)、日志法(Log)和时间戳法(Timestamp)等。它们各有优缺点，Snapshot取数据表的新旧快照进行比对以提取信息增量，这一方法适用面较广，但对存储空间要求较高，随着数据元组的增多，对比检测算法易会成为系统的性能瓶颈。Trigger利用触发器捕获增量数据，增量捕获效率高，但只应用于设有触发器机制的数据管理系统中，而且当数据集中有大量数据进行操作时，触发器对系统的性能影响较大。Log采用分析数据库自身带有的操作日志来提取增量，不增加系统额外开销，效率高，但数据库和管理系统对日志的访问一般都有严格的权限限制；此外数据管理系统的日志格式大都各不相同，导致日志法的使用受到诸多限制。Timestamp在元组中设置一个时间戳字段，同步时将大于上次同步时间的所有字段提取出来即可获取增量，简单易实现，但此方法要求改变业务数据集的结构，并且难以捕获删除操作，限制了其适用范围。In order to reduce the amount of synchronization data, improve synchronization efficiency, and reduce the demand for synchronization on mobile network bandwidth, mobile computing systems usually adopt an incremental capture method, that is, only the modified data is exchanged during each data synchronization. Typical incremental data capture methods include snapshot method (Snapshot), trigger method (Trigger), log method (Log) and timestamp method (Timestamp). They have their own advantages and disadvantages. Snapshot compares the old and new snapshots of the data table to extract information increments. This method is widely applicable, but requires high storage space. With the increase of data tuples, the comparison detection algorithm is easy It will become the performance bottleneck of the system. Trigger uses triggers to capture incremental data, and the incremental capture efficiency is high, but it is only used in data management systems with a trigger mechanism, and when there is a large amount of data in the data set for operations, triggers have a greater impact on system performance . Log uses the operation log of the analysis database itself to extract increments, which does not increase system overhead and is highly efficient. However, databases and management systems generally have strict permission restrictions on log access; in addition, most of the log formats of data management systems are different. Not the same, resulting in many restrictions on the use of the log method. Timestamp sets a timestamp field in the tuple. When synchronizing, extract all fields greater than the last synchronization time to obtain the increment. It is simple and easy to implement, but this method requires changing the structure of the business data set, and it is difficult to capture the delete operation , limiting its scope of application.

现有的增量数据捕获方法大多面向稳定的分布式计算环境，并不适应移动计算环境，简单移植容易导致低效率、高开销。Most of the existing incremental data capture methods are oriented to stable distributed computing environments, and are not suitable for mobile computing environments. Simple transplantation can easily lead to low efficiency and high overhead.

发明内容Contents of the invention

本发明的目的在于提供一种移动数据增量捕获方法，在保证各移动节点实现数据最终一致性的前提下，进一步减少同步数据通信量，降低同步对移动网络带宽的需求，提高同步效率。The purpose of the present invention is to provide a mobile data incremental capture method, which can further reduce the traffic of synchronization data, reduce the requirement of synchronization for mobile network bandwidth, and improve the synchronization efficiency under the premise of ensuring that each mobile node realizes the final consistency of data.

本发明采用的技术方案如下：The technical scheme that the present invention adopts is as follows:

一种基于轨迹变更的移动数据增量捕获方法采用关系数据库提供的触发器捕获操作日志，记录操作数据表的操作变化过程，并采用净化方法合并操作日志，实现净增量的捕获和整个操作变化过程的记录。其步骤如下：A mobile data incremental capture method based on track change uses triggers provided by relational databases to capture operation logs, record the operation change process of the operation data table, and use the purification method to merge the operation logs to realize the capture of net increments and the entire operation change A record of the process. The steps are as follows:

步骤一：在操作数据表上分别定义新增、修改和删除三个触发器，分别设置好触发器的触发条件，然后定义一张操作日志表用来存放捕获到的数据变更轨迹，操作日志表除了拥有操作数据表的所有字段外，还包括主键和“操作类型”字段；Step 1: Define three triggers for adding, modifying, and deleting on the operation data table, respectively set the trigger conditions of the triggers, and then define an operation log table to store the captured data change track, the operation log table In addition to having all the fields of the operation data table, it also includes the primary key and the "operation type" field;

步骤二：对操作数据表进行操作(新增、修改或删除)即会触发预先定义好的相应触发器，触发器将操作记录和操作类型自动地记录到操作日志表中；Step 2: To operate (add, modify or delete) the operation data table will trigger the corresponding pre-defined trigger, and the trigger will automatically record the operation record and operation type into the operation log table;

步骤三：将操作日志表中的所有记录按照被操作记录的ID号分组；Step 3: group all the records in the operation log table according to the ID numbers of the operated records;

步骤四：应用净化算法分别对这些分组进行压缩合并即净化，得到各个被操作记录的净化结果；Step 4: apply the purification algorithm to compress and merge these groups, that is, purify, and obtain the purification results of each operated record;

步骤五：将各个被操作记录的净化结果合并得到整个操作日志表的净化结果，即完成对操作日志表的净化操作。Step 5: Combine the purification results of each operated record to obtain the purification result of the entire operation log table, that is, complete the operation of purifying the operation log table.

所述步骤一中触发器的情况是：在关系数据库中有三种操作类型，分别为新增、修改和删除，根据这三种操作类型在操作数据表上分别定义这三种操作类型的触发器，例如Add_trigger、Modify_trigger和Delete_trigger，并分别设置其触发条件。如在操作数据表上新增一条记录的时候就会触发新增触发器Add_trigger，Add_trigger将新增的记录数据和新增类型分别记录到操作日志表中以供净化处理使用。所述步骤一中操作日志表的情况是：操作日志表必须包含操作数据表中的所有字段、自己的主键和“操作类型”字段，以便能够记录下完整的操作过程和本次操作类型。The situation of the trigger in the step 1 is: there are three types of operations in the relational database, which are respectively adding, modifying and deleting. According to these three types of operations, the triggers of these three types of operations are respectively defined on the operation data table , such as Add_trigger, Modify_trigger and Delete_trigger, and set their trigger conditions respectively. For example, when a new record is added on the operation data table, the new trigger Add_trigger will be triggered, and Add_trigger will record the newly added record data and the new type into the operation log table for purification processing. The situation of the operation log table in step 1 is: the operation log table must contain all the fields in the operation data table, its own primary key and the "operation type" field, so as to be able to record the complete operation process and this operation type.

所述步骤三中操作记录分组情况是：由于操作日志表记录了被操作记录的ID号，所以将操作日志表中的一系列记录按照被操作记录的ID号分为若干个分组。The grouping of operation records in step three is: since the operation log table records the ID numbers of the operated records, a series of records in the operation log table are divided into several groups according to the ID numbers of the operated records.

表1操作类型Table 1 Operation type

编号serial number 基本操作类型basic operation type 意义significance 编号serial number 复合操作类型Composite operation type 意义significance 11 AA 新增Add 55 A-MA-M 新增后修改Modified after adding 22 Mm 修改Revise 66 A-DA-D 新增后删除Delete after adding 33 DD. 删除delete 77 M-MM-M 修改后再修改edit after edit 44 NN 无操作no action 88 M-DM-D 修改后删除delete after modification

所述步骤四和步骤五的净化与合并操作情况是：在关系数据库中，对同一记录的任何操作序列都可以分解为8种基本操作类型和复合操作类型，如表1所示。将每一个分组表示的一系列操作全部拆分为基本操作类型和复合操作类型，然后根据各个分组中操作序列的类型，分别从(1)-(5)中采用相应的净化公式进行净化处理。最后采用公式(6)将各个分组的净化结果合并，即完成了操作日志表的净化操作。The purification and merging operations in steps 4 and 5 are as follows: in a relational database, any operation sequence on the same record can be decomposed into 8 basic operation types and composite operation types, as shown in Table 1. Split the series of operations represented by each group into basic operation types and composite operation types, and then use the corresponding purification formulas from (1)-(5) for purification according to the types of operation sequences in each group. Finally, formula (6) is used to combine the purification results of each group, that is, the purification operation of the operation log table is completed.

f^* _i(L_i)＝OP_A(l₁)∪OP_M(l₂)∪…∪OP_M(l_k)∪…∪OP_M(l_n)＝OP_A(l_n) (1)f ^* _i (L _i )＝OP _A (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _M (l _n )＝OP _A (l _n ) (1)

f^* _i(L_i)＝OP_A(l₁)∪OP_M(l₂)∪…∪OP_M(l_k)∪…∪OP_D(l_n)＝NULL (2)f ^* _i (L _i )＝OP _A (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _D (l _n )＝NULL (2)

f^* _i(L_i)＝OP_M(l₁)∪OP_M(l₂)∪…∪OP_M(l_k)∪…∪OP_M(l_n)＝OP_M(l_n) (3)f ^* _i (L _i )＝OP _M (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _M (l _n )＝OP _M (l _n ) (3)

f^* _i(L_i)＝OP_M(l₁)∪OP_M(l₂)∪…∪OP_M(l_k)∪…∪OP_D(l_n)＝OP_D(l_n) (4)f ^* _i (L _i )＝OP _M (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _D (l _n )＝OP _D (l _n ) (4)

f^* _i(L_i)＝OP_D(l₁)＝OP_D(l₁) (5)f ^* _i (L _i ) = OP _D (l ₁ ) = OP _D (l ₁ ) (5)

${f f}^{* *} ((L L)) = = {\cup \cup}_{k k = = 11}^{n no} {f f}^{* *}_{k k} (({L L}_{k k})) - - - - - - ((66))$

公式(1)-(6)中各符号的含义为：A、M、D如表1所示，分别代表新增、修改和删除操作类型；OP表示对记录的一次操作；操作合并运算符“∪”表示求其前后两个操作数的操作并集,其中NULL表示操作并集为空；我们将操作日志表和操作数据表分别看成是操作日志集合和操作数据集合，那么对于操作日志集合L中对操作数据集第i条记录的操作集合L_i(1≤i≤n,i∈N)，假设L_i＝{l₁,l₂,…l_k…l_n}(1≤k≤n,k∈N)，OP_Γ(l_k),Γ∈{A,M,D}表示对第i条记录的第k次操作类型是Γ，结果是l_k；f^* _i(L_i)表示L_i的最优等价操作函数。The meanings of the symbols in formulas (1)-(6) are: A, M, and D, as shown in Table 1, respectively represent the operation types of adding, modifying, and deleting; OP represents an operation on a record; the operation merge operator "∪" means to find the operation union of the two operands before and after it, where NULL means the operation union is empty; we regard the operation log table and the operation data table as the operation log set and the operation data set respectively, then for the operation log set The operation set L _i (1≤i≤n, i∈N) of the i-th record in the operation data set in L, assuming L _i ={l ₁ ,l ₂ ,…l _k …l _n }(1≤k≤ n, k∈N), OP _Γ (l _k ), Γ∈{A,M,D} means that the k-th operation type on the i-th record is Γ, and the result is l _k ; f ^* _i (L _i ) Represents the optimal equivalent operation function of L _i .

本发明具有以下有益效果：The present invention has the following beneficial effects:

1、现有的增量数据捕获方法面向稳定的分布式计算环境，并不适应移动计算环境，简单移植容易导致低效率、高开销。本发明采用触发器捕获操作日志，记录操作数据表的操作变化过程，充分利用了移动数据库现有的触发器机制和移动计算环境中移动节点写操作频次较低的特点，显著地提高了增量捕获的效率。1. The existing incremental data capture method is oriented to a stable distributed computing environment and is not suitable for a mobile computing environment. Simple transplantation can easily lead to low efficiency and high overhead. The invention adopts the trigger to capture the operation log, records the operation change process of the operation data table, makes full use of the existing trigger mechanism of the mobile database and the low frequency of the mobile node write operation in the mobile computing environment, and significantly improves the incremental capture s efficiency.

2、移动网络的弱连接性以及通信延迟与带宽相对有限等特征，要求在移动数据同步的过程中尽量减少传输的数据，本发明将捕获到的增量数据进一步采取净化操作，确保增量数据最少，极大地减少了同步数据通信量，放宽了同步系统对网络带宽的需求，降低了通信的成本。2. Due to the weak connectivity of the mobile network and the relatively limited communication delay and bandwidth, it is required to reduce the transmitted data as much as possible in the process of mobile data synchronization. The present invention will further purify the captured incremental data to ensure that the incremental data At least, it greatly reduces the amount of synchronous data communication, relaxes the requirements of the synchronous system for network bandwidth, and reduces the cost of communication.

3、在移动数据同步的过程中，同步服务器担负着同步的主要职责，传输过多的同步数据会大大地增加同步服务器的负担，本发明通过增加净化操作，确保增量数据最少，当增量数据传输到同步服务器的时候更有利于同步服务器的处理，减轻同步服务器负担。3. In the process of mobile data synchronization, the synchronization server is responsible for the main responsibility of synchronization, and the transmission of too much synchronization data will greatly increase the burden on the synchronization server. The present invention ensures the least incremental data by adding purification operations. When the data is transmitted to the synchronization server, it is more conducive to the processing of the synchronization server and reduces the burden on the synchronization server.

4、本发明采用的增量捕获方法时间复杂度为O(n)，简单易行且效率高，能够极大地缩短同步响应时间，提高同步效率。4. The time complexity of the incremental capture method adopted by the present invention is O(n), which is simple and efficient, and can greatly shorten the synchronization response time and improve the synchronization efficiency.

5、本发明采用触发器捕获操作日志的同时记录了操作数据表的操作变化过程，在同步系统发生意外终止或严重超时等影响数据最终一致性的时候可以将此操作日志作为同步系统回滚操作的有效凭据，增加了同步系统的鲁棒性。5. The present invention uses a trigger to capture the operation log while recording the operation change process of the operation data table. This operation log can be used as the rollback operation of the synchronization system when the synchronization system is unexpectedly terminated or seriously timed out, which affects the final consistency of the data. valid credentials, increasing the robustness of the synchronization system.

附图说明Description of drawings

图1为移动数据同步架构图。Figure 1 is a diagram of the mobile data synchronization architecture.

图2为触发器捕获增量图。Figure 2 is a graph of trigger capture increments.

图3为净化操作过程图。Figure 3 is a diagram of the cleaning operation process.

具体实施方式Detailed ways

本发明所要解决的技术问题是在移动计算环境下实现移动数据的增量捕获。在移动分布式存储机制下，采用乐观复制方法实现多移动终端业务数据达到最终一致性状态，然而移动网络的弱连接性以及通信延迟与带宽相对有限等特征使得移动计算环境中的数据同步机制难以实现高效的数据的一致性。为了进一步减少同步数据通信量，通常采用增量同步方法，可是现有的增量数据捕获方法是面向稳定的分布式计算环境，并不适应移动计算环境，简单移植容易导致低效率、高开销。本发明提出一种基于轨迹变更的移动数据增量捕获方法，采用触发器捕获操作日志，并用净化方法合并操作日志得到净增量数据，适用于移动环境中的数据同步。本发明不仅充分利用了移动数据库中现有的触发器机制和移动终端写操作频次低的特点，而且将增量数据压缩到最小，最大程度地减少了同步数据通信量，降低通信带宽的需求，更提高了同步效率。The technical problem to be solved by the invention is to realize the incremental capture of mobile data in the mobile computing environment. Under the mobile distributed storage mechanism, the optimistic replication method is used to achieve the final consistency of multi-mobile terminal business data. However, the weak connectivity of the mobile network and the relatively limited communication delay and bandwidth make the data synchronization mechanism in the mobile computing environment difficult. Achieve efficient data consistency. In order to further reduce the amount of synchronous data communication, the incremental synchronization method is usually used. However, the existing incremental data capture method is oriented to a stable distributed computing environment and is not suitable for a mobile computing environment. Simple transplantation may easily lead to low efficiency and high overhead. The invention proposes a mobile data incremental capture method based on track change, which uses a trigger to capture operation logs, and uses a purification method to merge the operation logs to obtain net incremental data, which is suitable for data synchronization in a mobile environment. The present invention not only makes full use of the existing trigger mechanism in the mobile database and the characteristics of low frequency of writing operations of the mobile terminal, but also compresses the incremental data to the minimum, reduces the synchronous data traffic to the greatest extent, and reduces the demand for communication bandwidth. Improved synchronization efficiency.

下面结合图例和实例对本发明的技术方案进行详细说明：Below in conjunction with legend and example the technical scheme of the present invention is described in detail:

本发明提出的基于轨迹变更的移动数据增量捕获方法适用于移动数据同步机制中。移动数据同步架构如图1所示，移动终端(如手机、PAD、平板电脑等)通常都在系统内部集成了自己的移动数据库(如SQLite，SQL Server CE等)，当处于离线状态下时，移动终端将操作数据存储在本地移动数据库中；当在线或者可访问同步服务器的时候，移动终端将从本地移动数据库中取出增量数据，经过净增量处理得到净增量数据，然后通过接入点(Access Point,AP)接入Internet将净增量数据传输到同步服务器，由同步服务器处理后更新到主数据库中，由此实现了移动终端与同步服务器之间数据的最终一致性。The incremental acquisition method of mobile data based on track change proposed by the invention is applicable to the synchronization mechanism of mobile data. The mobile data synchronization architecture is shown in Figure 1. Mobile terminals (such as mobile phones, PADs, tablets, etc.) usually integrate their own mobile databases (such as SQLite, SQL Server CE, etc.) within the system. When offline, The mobile terminal stores the operation data in the local mobile database; when it is online or can access the synchronization server, the mobile terminal will take out the incremental data from the local mobile database, obtain the net incremental data through net incremental processing, and then access the The Access Point (AP) connects to the Internet and transmits the net incremental data to the synchronization server, which is processed by the synchronization server and updated to the main database, thereby realizing the final consistency of data between the mobile terminal and the synchronization server.

本发明提出的基于轨迹变更的移动数据增量捕获方法机制如图2所示，首先在移动终端的移动业务数据库中定义操作数据表和操作日志表，然后在操作数据表上定义相关操作的触发器，此时当用户对移动终端业务进行操作的时候，系统自动将用户操作转化为数据操纵语言(Data Manipulation Language,DML)对操作数据表进行操作，此时即会触发相应的触发器，触发器将相关操作日志记录到操作日志表中。当发起数据同步的时候，系统从操作日志表中提取增量数据，经过净化操作处理后发送给同步服务器。The mechanism of the mobile data incremental capture method based on trajectory change proposed by the present invention is shown in Figure 2. First, an operation data table and an operation log table are defined in the mobile service database of the mobile terminal, and then triggers of related operations are defined on the operation data table. At this time, when the user operates the mobile terminal service, the system automatically converts the user operation into Data Manipulation Language (Data Manipulation Language, DML) to operate the operation data table, and then triggers the corresponding trigger. The server records related operation logs into the operation log table. When data synchronization is initiated, the system extracts incremental data from the operation log table, and sends it to the synchronization server after cleaning operation.

如图3所示，系统从操作日志表中提取出增量数据，净化操作机制将操作日志按照被操作记录ID分组，对每一个分组分别净化处理，然后再通过合并操作将各个净化后的分组合并，即可得到操作日志的净增量数据。As shown in Figure 3, the system extracts the incremental data from the operation log table, and the purification operation mechanism groups the operation log according to the ID of the operated record, purifies each group separately, and then merges each purified group Combined, you can get the net incremental data of the operation log.

为了方便理解本发明的技术方案，下面定义一些概念：In order to facilitate the understanding of the technical solution of the present invention, some concepts are defined below:

定义1净增量处理净增量处理指将作用在操作数据集上的一系列操作等价合并，压缩操作步骤，使Definition 1 Net incremental processing Net incremental processing refers to the equivalent combination of a series of operations acting on the operation data set, compressing the operation steps, so that

操作序列在压缩后的操作作用下和原始操作作用下的最终状态是一致的。The final state of the operation sequence under the action of the compressed operation is consistent with that of the original operation.

净增量处理不仅能减少增量数据实际大小，减少数据传输时间，而且可以减少同步服务器对数据的加载维护时间，使整个同步过程更加稳定、高效。Net incremental processing can not only reduce the actual size of incremental data and data transmission time, but also reduce the data loading and maintenance time of the synchronization server, making the entire synchronization process more stable and efficient.

为了更好的表达关系型数据库中各种操作变化过程及其结果关系，本发明引入关系集合的概念，则一张表的关系可以用关系集合R来表示，表中记录就是关系集合中的元素。若一张表中有n条记录，则表示为关系集合R＝{r₁,r₂,…,r_n},n∈N^*，其中r₁,r₂,…,r_n代表记录。In order to better express the various operation change processes and their result relationships in relational databases, the present invention introduces the concept of relational set, then the relationship of a table can be represented by relational set R, and the records in the table are the elements in the relational set . If there are n records in a table, it is expressed as a relational set R={r ₁ ,r ₂ ,…,r _n },n∈N ^* , where r ₁ ,r ₂ ,…,r _n represent records.

定义2操作函数关系集合X经过一系列操作后变成X',我们可以描述为X经过函数Ψ(X)作用后变成X'，即Ψ(X)＝X'，记函数Ψ为X到X'的一个操作函数。由操作函数的定义可知，操作日志中某一记录的一系列操作即为该记录的一个操作函数。Definition 2. The operation function relationship set X becomes X' after a series of operations. We can describe that X becomes X' after being acted on by the function Ψ(X), that is, Ψ(X)=X', and the function Ψ is X to An operation function of X'. From the definition of the operation function, it can be seen that a series of operations of a certain record in the operation log is an operation function of the record.

定义3等价操作函数对于关系集合A，在操作函数f₁的作用下得到A',即f₁(A)＝A'，若有另一操作函数f₂使得f₂(A)＝A'，则操作函数f₁和f₂相对于A即为等价操作函数，记作f₁(A)≈f₂(A)。Definition 3 Equivalent operation function For relational set A, A' is obtained under the action of operation function f ₁ , that is, f ₁ (A)=A', if there is another operation function f ₂ such that f ₂ (A)=A' , then the operation functions f ₁ and f ₂ are equivalent operation functions with respect to A, denoted as f ₁ (A)≈f ₂ (A).

定义4最优等价操作函数与f₁等价的操作函数并不唯一，若f₂是f₁所有等价操作函数中操作步骤最少的，则f₂是f₁的最优等价操作函数。Definition 4. The optimal equivalent operation function is not unique to the operation function equivalent to f _1. If f ₂ is the least operation step among all the equivalent operation functions of f ₁ , then f ₂ is the optimal equivalent operation function of f ₁ .

对增量数据进行净增量处理是根据f₁(X)＝X'找到其最优等价操作函数f^*使得：The net incremental processing of incremental data is to find its optimal equivalent operation function f ^* according to f ₁ (X)=X' such that:

f^*(X)＝X' (1)f ^* (X) = X' (1)

本发明以“同步通信录”系统为应用实例，将其安装到Android智能手机上，并采用SQLite3移动数据库，本发明的增量捕获流程为：The present invention takes the "synchronized address book" system as an application example, installs it on the Android smart phone, and adopts the SQLite3 mobile database. The incremental capture process of the present invention is:

步骤一：在操作数据表上分别定义新增、修改和删除三个不同类型的触发器，分别设置好触发器的触发条件，然后定义一张操作日志表用来存放捕获到的数据变更轨迹，操作日志表除了拥有操作数据表的所有字段外，还包括主键和“操作类型”字段。其具体描述如下：Step 1: Define three different types of triggers for adding, modifying, and deleting on the operation data table, respectively set the trigger conditions of the triggers, and then define an operation log table to store the captured data change track, In addition to all the fields of the operation data table, the operation log table also includes the primary key and the "operation type" field. Its specific description is as follows:

①在移动数据库中定义一张联系人操作数据表contact，包括LUID、name、number、email、birth字段，LUID为主键；① Define a contact operation data table contact in the mobile database, including LUID, name, number, email, birth fields, LUID is the primary key;

②在与①相同的移动数据库中定义一张操作日志表log_contact,包括contact中的所有字段、主键ID和“操作类型”operaType字段。operaType字段取A(新增)、M(修改)、D(删除)三种类型之一；② Define an operation log table log_contact in the same mobile database as ①, including all fields in contact, the primary key ID and the "operation type" operaType field. The operaType field takes one of three types: A (new), M (modified), and D (deleted);

③在contact上分别定义三个触发器，分别是新增触发器Add_trigger，修改触发器Modify_trigger和删除触发器Delete_trigger，这些触发器用来捕获数据变更轨迹。③ Three triggers are defined on the contact, namely the new trigger Add_trigger, the modification trigger Modify_trigger and the deletion trigger Delete_trigger. These triggers are used to capture the data change track.

步骤二：对操作数据表进行操作(新增、修改或删除)即会触发预先定义好的相应触发器，触发器将操作记录即记录变更轨迹和操作类型自动的记录到操作日志表中。例如：Step 2: Operation (addition, modification or deletion) on the operation data table will trigger the corresponding pre-defined trigger, and the trigger will automatically record the operation record, that is, record the change track and operation type, into the operation log table. For example:

①向contact表中新增两条联系人信息，contact表中新增的联系人信息如表1所示：①Add two pieces of contact information to the contact table. The new contact information in the contact table is shown in Table 1:

表1 contact表中新增的联系人信息Table 1 New contact information in the contact table

LUIDLUID namename numbernumber emailemail birthbirth abcdwwwefgruilaabcdwwwefgruila 张三Zhang San 1511234567815112345678 zhansanyx.comzhansanyx.com 2000/11/92000/11/9 afkltyoskdffgtrafkltyoskdffgtr 李四Li Si 1811234567818112345678 lisinj.comlisinj.com 1999/2/131999/2/13

②此时，新增触发器Add_trigger被触发，则向log_contact中写入操作日志，如表2所示：②At this time, when the new trigger Add_trigger is triggered, the operation log is written to log_contact, as shown in Table 2:

表2 log_contact记录的操作日志Table 2 Operation logs recorded by log_contact

IDID LUIDLUID namename numbernumber emailemail birthbirth operaTypeoperaType 11 abcdwwwefgruilaabcdwwwefgruila 张三Zhang San 1511234567815112345678 zhansanyx.comzhansanyx.com 2000/11/92000/11/9 AA 22 afkltyoskdffgtrafkltyoskdffgtr 李四Li Si 1811234567818112345678 lisinj.comlisinj.com 1999/2/131999/2/13 AA

③修改联系人信息，第一次修改联系人张三的号码number为“18309876543”；第二次修改张三的号码number为“15311176543”，同时修改email为zhansan0000yx.com；第三次将李四联系人记录删除。第一次和第二次会分别触发修改触发器Modify_trigger；第三次会触发删除触发器Delete_trigger；则log_contact内容新增三条日志记录，如表3所示：③Modify the contact information. For the first time, modify the number of the contact person Zhang San to "18309876543"; for the second time, modify the number of Zhang San to "15311176543", and modify the email to zhansan0000yx.com; for the third time, change the number of Li Si Contact record deletion. The modification trigger Modify_trigger will be triggered for the first time and the second time respectively; the delete trigger Delete_trigger will be triggered for the third time; then three new log records will be added to the log_contact content, as shown in Table 3:

表3 修改记录后的log_contact表信息Table 3 log_contact table information after modifying records

IDID LUIDLUID namename numbernumber emailemail birthbirth operaTypeoperaType 11 abcdwwwefgruilaabcdwwwefgruila 张三Zhang San 1511234567815112345678 zhansanyx.comzhansanyx.com 2000/11/92000/11/9 AA 22 afkltyoskdffgtrafkltyoskdffgtr 李四Li Si 1811234567818112345678 lisinj.comlisinj.com 1999/2/131999/2/13 AA 33 abcdwwwefgruilaabcdwwwefgruila 张三Zhang San 1830987654318309876543 zhansanyx.comzhansanyx.com 2000/11/92000/11/9 Mm 44 abcdwwwefgruilaabcdwwwefgruila 张三Zhang San 1531117654315311176543 zhansan0000yx.comzhansan0000yx.com 2000/11/92000/11/9 Mm 55 afkltyoskdffgtrafkltyoskdffgtr 李四Li Si 1811234567818112345678 lisinj.comlisinj.com 1999/2/131999/2/13 DD.

步骤三：将操作日志表中的所有记录按照被操作记录ID号分组。Step 3: Group all the records in the operation log table according to the ID numbers of the operated records.

对数据集的一系列操作本质上是对数据集中各个记录的操作。操作日志用关系集合L表示，对操作数据集中第i条记录的操作集合用L_i(1≤i≤n,i∈N)来表示，所以操作日志表关系集合可表示为L＝(L₁∪L₂∪…∪L_n)。由此，对关系集合L的净化处理，可以转化为对每一条记录的操作集合L_i的净化处理，即：A sequence of operations on a dataset is essentially an operation on individual records in the dataset. The operation log is represented by a relational set L, and the operation set of the i-th record in the operation data set is represented by L _i (1≤i≤n, i∈N), so the relational set of the operation log table can be expressed as L=(L ₁ ∪L ₂ ∪…∪L _n ). Thus, the purification process of the relationship set L can be converted into the purification process of the operation set L _i of each record, namely:

f^*(L)＝f^*(L₁∪L₂∪…∪L_n)f ^* (L) = f ^* (L ₁ ∪L ₂ ∪…∪L _n )

(2) (2)

＝f^* ₁(L₁)∪f^* ₂(L₂)∪…∪f^* _n(L_n)＝f ^* ₁ (L ₁ )∪f ^* ₂ (L ₂ )∪…∪f ^* _n (L _n )

当提取增量时，首先从操作日志表中取出本次全部数据变更轨迹，这些变更轨迹本身即是一系列记录并且由操作数据表中各个被操作记录的操作序列组成，所以将操作日志表中的所有记录按照被操作记录ID号分组，由公式(2)知，对操作日志表的净化操作就是对每一个分组的净化操作。When extracting increments, first take out all data change tracks from the operation log table. These change tracks themselves are a series of records and are composed of the operation sequence of each operated record in the operation data table, so the operation log table All the records in are grouped according to the ID number of the operated record. According to the formula (2), the purification operation of the operation log table is the purification operation of each group.

log_contact表中日志记录被按照关键字LUID分为两组，如表4所示：The log records in the log_contact table are divided into two groups according to the keyword LUID, as shown in Table 4:

表4 log_contact中记录的分组情况Table 4 Grouping records recorded in log_contact

步骤四：应用净化算法分别对这些分组进行压缩合并即净化，得到各个被操作记录的净化结果。Step 4: Apply the purification algorithm to compress and merge these groups, that is, purify, to obtain the purification results of each operated record.

在关系型数据库中，对同一记录的任何操作序列都可以分解为以下8种基本操作类型或复合操作类型，如表5所示：In a relational database, any operation sequence on the same record can be decomposed into the following eight basic operation types or composite operation types, as shown in Table 5:

表5操作类型Table 5 Operation Type

例如，一条记录经过一系列操作“新增-修改-修改-删除”，则可由表中基本操作类型“1-2-2-3”组合成；也可以由复合操作类型“5-7-8”组合成。For example, if a record undergoes a series of operations "Add-Modify-Modify-Delete", it can be composed of the basic operation type "1-2-2-3" in the table; it can also be composed of the composite operation type "5-7-8 "combined to.

由此可知，可将各个记录的一系列操作全部拆分为基本操作类型和复合操作类型，最后合并运算。运算规则如表6所示：It can be seen from this that a series of operations of each record can be split into basic operation types and composite operation types, and finally combined operations. The operation rules are shown in Table 6:

表6 同一记录操作合并运算规则Table 6 Operation rules for merging operations on the same record

表6中“/”表示不存在此种合并操作，NULL表示操作合并结果为空。运算规则可以描述如下："/" in Table 6 means that there is no such merge operation, and NULL means that the merge result of the operation is empty. The operation rules can be described as follows:

①OP_A∪OP_M＝OP_A，②OP_A∪OP_D＝NULL，③OP_M∪OP_M＝OP_M，①OP _A ∪OP _M ＝OP _A , ②OP _A ∪OP _D ＝NULL, ③OP _M ∪OP _M ＝OP _M ,

④OP_M∪OP_D＝OP_D，⑤OP_J∪OP_N＝OP_N∪OP_J＝OP_J,J∈{A,M,D,N}。④ OP _M ∪ OP _D ＝ OP _D , ⑤ OP _J ∪ OP _N ＝ OP _N ∪ OP _J ＝ OP _J , J ∈ {A, M, D, N}.

OP表示对记录的一次操作，“∪”运算符表示求其前后两个操作数的操作并集。OP represents an operation on the record, and the "∪" operator represents the operation union of the two operands before and after it.

对于操作日志关系集合L中对操作数据集第i条记录的操作集合L_i(1≤i≤n,i∈N)，假设L_i＝{l₁,l₂,…l_k…l_n}(1≤k≤n,k∈N)，OP_Γ(l_k),Γ∈{A,M,D}表示对第i条记录的第k次操作类型是Γ，结果是l_k。则记录的操作序列集合可以分为五个类型，分别采用以下5种净化公式：For the operation set L _i (1≤i≤n,i∈N) of the i-th record in the operation log relational set L, suppose L _i ={l ₁ ,l ₂ ,…l _k …l _n } (1≤k≤n, k∈N), OP _Γ (l _k ), Γ∈{A,M,D} means that the kth operation type on the i-th record is Γ, and the result is l _k . Then the set of recorded operation sequences can be divided into five types, and the following five purification formulas are used respectively:

f^* _i(L_i)＝OP_A(l₁)∪OP_M(l₂)∪…∪OP_M(l_k)∪…∪OP_M(l_n)＝OP_A(l_n) (3)f ^* _i (L _i )＝OP _A (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _M (l _n )＝OP _A (l _n ) (3)

f^* _i(L_i)＝OP_A(l₁)∪OP_M(l₂)∪…∪OP_M(l_k)∪…∪OP_D(l_n)＝NULL (4)f ^* _i (L _i )＝OP _A (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _D (l _n )＝NULL (4)

f^* _i(L_i)＝OP_M(l₁)∪OP_M(l₂)∪…∪OP_M(l_k)∪…∪OP_M(l_n)＝OP_M(l_n) (5)f ^* _i (L _i )＝OP _M (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _M (l _n )＝OP _M (l _n ) (5)

f^* _i(L_i)＝OP_M(l₁)∪OP_M(l₂)∪…∪OP_M(l_k)∪…∪OP_D(l_n)＝OP_D(l_n) (6)f ^* _i (L _i )＝OP _M (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _D (l _n )＝OP _D (l _n ) (6)

f^* _i(L_i)＝OP_D(l₁)＝OP_D(l₁) (7)f ^* _i (L _i ) = OP _D (l ₁ ) = OP _D (l ₁ ) (7)

根据对记录的操作序列的类型，从(3)-(7)中采用相应的净化公式即可求得此记录操作序列的净化结果。According to the type of the recorded operation sequence, the purification result of the recorded operation sequence can be obtained by using the corresponding purification formula from (3)-(7).

针对表4的分组情况，选择(3)对组一进行净化，选择(4)对组二进行净化，净化结果如下：According to the grouping situation in Table 4, choose (3) to purify group 1, choose (4) to purify group 2, and the purification results are as follows:

表7组一和组二的净化结果Table 7 Purification results of Group 1 and Group 2

IDID LUIDLUID namename numbernumber emailemail birthbirth operaTypeoperaType 组别group 44 abcdwwwefgruilaabcdwwwefgruila 张三Zhang San 1531117654315311176543 zhansan0000yx.comzhansan0000yx.com 2000/11/92000/11/9 AA 一one

步骤五：将各个被操作记录的净化结果合并即可得到整个操作日志表的净化结果，即完成对操作日志表的净化操作。Step 5: The purification result of the entire operation log table can be obtained by merging the purification results of each operated record, that is, the purification operation of the operation log table is completed.

所以对操作日志关系集合的最优等价操作函数为：Therefore, the optimal equivalent operation function for the operation log relationship set is:

${f f}^{* *} ((L L)) = = {\cup \cup}_{k k = = 11}^{n no} {f f}^{* *}_{k k} (({L L}_{k k})) - - - - - - ((88))$

在得到操作日志后，即可用式(8)净化操作日志，去除冗余记录，得到净增量同步数据，以减少同步过程中所需要传输的同步数据量，缩短同步响应时间，提高同步效率。After the operation log is obtained, the operation log can be purified by formula (8), redundant records are removed, and the net incremental synchronization data is obtained, so as to reduce the amount of synchronization data that needs to be transmitted during the synchronization process, shorten the synchronization response time, and improve the synchronization efficiency.

合并组一和组二的净化结果即可得到净增量同步数据。由表7可知，组二经过净化操作后变成空，即组二的操作日志记录全被删除。所以“同步通信录”系统的log_contact表中的操作日志净增量同表7所示内容相同。The net incremental synchronization data can be obtained by combining the purification results of group 1 and group 2. It can be seen from Table 7 that group 2 becomes empty after the purification operation, that is, all operation log records of group 2 are deleted. Therefore, the net increment of the operation log in the log_contact table of the "synchronized address book" system is the same as that shown in Table 7.

Claims

1. A mobile data incremental capture method based on trajectory changes, which uses triggers provided by relational databases to capture operation logs, records the operation change process of the operation data table, and uses a purification method to merge operation logs to achieve net incremental capture and A record of the entire operational change process; the steps are as follows:

Step 1: Define three triggers for adding, modifying, and deleting on the operation data table, respectively set the trigger conditions of the triggers, and then define an operation log table to store the captured data change track, the operation log table In addition to having all the fields of the operation data table, it also includes the primary key and the "operation type" field;

Step 2: To operate (add, modify or delete) the operation data table will trigger the corresponding pre-defined trigger, and the trigger will automatically record the operation record and operation type into the operation log table;

Step 3: group all the records in the operation log table according to the ID numbers of the operated records;

Step 4: apply the purification algorithm to compress and merge these groups, that is, purify, and obtain the purification results of each operated record;

Step 5: Combine the purification results of each operated record to obtain the purification result of the entire operation log table, that is, complete the operation of purifying the operation log table.

2. a kind of mobile data incremental capture method based on track change according to claim 1, the situation of trigger in the described step 1 is: there are three kinds of operation types in the relational database, are respectively adding, revising and deleting According to these three types of operations, define the triggers of these three types of operations on the operation data table, such as Add_trigger, Modify_trigger and Delete_trigger, and set their trigger conditions respectively, such as when adding a new record on the operation data table. The new trigger Add_trigger will be triggered, and Add_trigger will record the newly added record data and the newly added type into the operation log table for purification. In the case of the operation log table in step 1, the operation log table must contain the operation All fields in the data table, its own primary key and the "operation type" field, so that the complete operation process and the type of this operation can be recorded.

3. A kind of mobile data incremental capture method based on trajectory change according to claim 1, the operation record grouping situation in the said step 3 is: because the operation log table has recorded the ID number of the operated record, so the operation log A series of records in the table are divided into several groups according to the ID numbers of the records being operated.

4. A method for capturing incremental mobile data based on trajectory changes according to claim 1, the purification and merging operations in steps 4 and 5 are as follows: in a relational database, any sequence of operations on the same record is It can be decomposed into 8 basic operation types and composite operation types, as shown in the following table:

serial number basic operation type significance serial number Composite operation type significance 1 A Add 5 A-M Modified after adding 2 m Revise 6 A-D Delete after adding 3 D. delete 7 M-M edit after edit 4 N no action 8 M-D delete after modification

Split the series of operations represented by each group into basic operation types and compound operation types, and then use the corresponding purification formulas from (1)-(5) to perform purification according to the type of operation sequence in each group. Finally, formula (6) is used to combine the purification results of each group, that is, the purification operation of the operation log table is completed.

f ^* _i (L _i )＝OP _A (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _M (l _n )＝OP _A (l _n ) (1)

f ^* _i (L _i )＝OP _A (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _D (l _n )＝NULL (2)

f ^* _i (L _i )＝OP _M (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _M (l _n )＝OP _M (l _n ) (3)

f ^* _i (L _i )＝OP _M (l ₁ )∪OP _M (l ₂ )∪…∪OP _M (l _k )∪…∪OP _D (l _n )＝OP _D (l _n ) (4)

f ^* _i (L _i ) = OP _D (l ₁ ) = OP _D (l ₁ ) (5)

{f f}^{* *} ((L L)) = = {∪ ∪}_{k k = = 11}^{n no} {f f}^{* *}_{k k} (({L L}_{k k})) - - - - - - ((66))

The meanings of the symbols in formulas (1)-(6) are: A, M, and D, as shown in Table 1, respectively represent the operation types of adding, modifying, and deleting; OP represents an operation on a record; the operation merge operator "∪" means to find the operation union of the two operands before and after it, where NULL means the operation union is empty; we regard the operation log table and the operation data table as the operation log set and the operation data set respectively, then for the operation log set The operation set L _i (1≤i≤n, i∈N) of the i-th record in the operation data set in L, assuming L _i ={l ₁ ,l ₂ ,…l _k …l _n }(1≤k≤ n, k∈N), OP _Γ (l _k ), Γ∈{A,M,D} means that the k-th operation type on the i-th record is Γ, and the result is l _k ; f ^* _i (L _i ) Represents the optimal equivalent operation function of L _i .