CN101611402B

CN101611402B - System and method for optimizing changes of data sets

Info

Publication number: CN101611402B
Application number: CN2007800468787A
Authority: CN
Inventors: S·兰茨; L·詹森
Original assignee: OMX Technology AB
Current assignee: Nasdaq Technology AB
Priority date: 2006-12-20
Filing date: 2007-12-14
Publication date: 2012-12-26
Anticipated expiration: 2027-12-14
Also published as: EP2095272A1; JP2010514035A; CN101611402A; AU2007336337A1; JP5230645B2; US8117609B2; WO2008074751A1; US20080155527A1; HK1140030A1; TW200833010A; AU2007336337B2

Abstract

A system and method for generating an update data set to be sent to remote terminals. The update data set comprises operators describing differences between two data sets, so that a remote terminal is able to transform an old data set into a more recent data set. The system comprising a comparator for comparing data elements in the data sets, and a selector for selecting operators based on a change parameter stored in a memory.

Description

Systems and methods for optimizing changes in datasets

技术领域 technical field

本发明涉及在计算机环境中的数据传播，本发明尤其涉及提取数据集中的变化以用于在计算机网络中分发这些变化。The present invention relates to the dissemination of data in a computer environment, and in particular the invention relates to extracting changes in data sets for distributing these changes in a computer network.

背景技术 Background technique

现今，通过计算机网络发送数据是非常普遍的。由于技术进步，所发送的数据量迅速增长，使得与以前相比能够以更高的速度发送和处理更多数据。此外，新的应用也需要更多数据，因为这些新的应用变得更加复杂。计算机系统一其中数据传播技术是最重要的部分之一的例子是电子交易系统。Sending data over computer networks is very common these days. Due to technological advances, the amount of data being sent has grown rapidly, making it possible to send and process more data at higher speeds than before. In addition, new applications require more data, as these new applications become more complex. An example of a computer system where data dissemination technology is one of the most important parts is an electronic trading system.

证券、衍生物、商品和其它金融工具的电子交易导致必须分发给用户的大量数据，这些用户需要这些数据来作出交易决定、统计计算和其它评估。在高性能计算机系统中提取和发送所有这些信息的过程对处理器的要求非常高。CPU的时间量是稀有资源，不应该浪费在本可以避免的执行步骤上。Electronic trading of securities, derivatives, commodities, and other financial instruments results in vast amounts of data that must be distributed to users who need the data to make trading decisions, statistical calculations, and other evaluations. The process of extracting and sending all this information in a high-performance computer system is very demanding on the processor. The amount of CPU time is a scarce resource and should not be wasted on execution steps that could have been avoided.

此外，连接到这样的中央交易系统的用户典型地希望尽可能快地拥有信息。在这些情况下，通过例如更新硬件来提高中央系统的性能可能是不够的。Furthermore, users connecting to such a central transaction system typically want to have information as quickly as possible. In these cases, it may not be sufficient to increase the performance of the central system by, for example, updating hardware.

为了摆脱系统中的瓶颈或者其它潜在问题，必须使用额外的技术。In order to get rid of bottlenecks or other potential problems in the system, additional techniques must be used.

因此，一种这样的额外技术是使处理器的工作更有效。例如，当在用户终端上更新数据集时，存在不同的方法。Therefore, one such additional technique is to make the processor work more efficiently. For example, when updating a data set on a user terminal, there are different methods.

一种通常使用的并且是明显的解决方案是一直发送替换旧数据集的全新数据集。这在数据集中只有一部分数据已经发生变化时通常是低效的。因此更有效的方式可以是只发送已经变化的数据集部分。更进一步的改进是发送delta变化。A commonly used and obvious solution is to always send a brand new dataset replacing the old dataset. This is often inefficient when only a portion of the data in the dataset has changed. Therefore a more efficient way may be to send only the part of the dataset that has changed. A further improvement is to send delta changes.

另一种已知技术是发送描述两个数据集之间的差异的运算符。对第一数据集应用该运算符可以将第一数据集转变成第二数据集。Another known technique is to send an operator describing the difference between two datasets. Applying the operator to a first data set can transform the first data set into a second data set.

通过选择作用于数据集的良好的运算符集，优化是可能的。现今，可用于提取数据集差异的方法是不足的。因此，需要发展技术以用于以有效方式-例如通过更少的步骤-提取和选择运算符以便于减少处理器上的负载，并且用于减少在计算机系统中的诸如带宽的数据传播。Optimization is possible by choosing a good set of operators to operate on the dataset. Today, the methods available for extracting differences in datasets are insufficient. Therefore, there is a need to develop techniques for extracting and selecting operators in an efficient manner - eg by fewer steps - in order to reduce the load on the processor, and for reducing data propagation, such as bandwidth, in computer systems.

发明内容 Contents of the invention

因此，本发明是目标是提供用于产生待发送给远程终端的更新数据集的解决方案。Therefore, the present invention aims to provide a solution for generating an update data set to be sent to a remote terminal.

本发明的另一目标是提供从数据集中有效地提取数据的解决方案。Another object of the present invention is to provide a solution for efficiently extracting data from a data set.

本发明的另一目标是提供一种解决方案，该解决方案基于数据集之间的差异有效地提取和/或选择运算符。Another object of the present invention is to provide a solution that efficiently extracts and/or selects operators based on differences between datasets.

本发明的另一目标是提供产生待发送给远程终端的数据结构的解决方案。Another object of the invention is to provide a solution for generating a data structure to be sent to a remote terminal.

本发明的另一目标是提供使用更少处理器时间的解决方案。Another object of the invention is to provide a solution that uses less processor time.

根据本发明的第一方面，上述目标和其它目标通过一种用于产生待发送给远程终端的更新数据集的计算机系统实现，该更新数据集包括描述包含分类数据元的第一数据集与包含分类数据元的第二数据集之间的差异的运算符，该计算机系统包括：According to a first aspect of the present invention, the above objects and others are achieved by a computer system for generating an updated data set to be transmitted to a remote terminal, the updated data set comprising a first data set describing elements containing classification data and containing An operator for the difference between the second data set of categorical data elements, the computer system comprising:

-存储器，该存储器包括第一数据集和第二数据集，- a memory comprising the first data set and the second data set,

-比较器，该比较器可与存储器连接用于将第二数据集中的数据元与第一数据集中的数据元顺次比较，第一次比较的结果控制第一数据集中的哪个数据元与第二数据集中的哪个数据元将在第二次比较中进行比较，在每次比较之后更新存储器中的变化参数，并且在检测到第二数据集中的数据元与第一数据集中的数据元一致时启动选择器，- a comparator, which can be connected to the memory for sequentially comparing the data elements in the second data set with the data elements in the first data set, the result of the first comparison controlling which data element in the first data set is compared with the data elements in the first data set Which data elements in the two data sets are to be compared in the second comparison, after each comparison the change parameters in the memory are updated, and when it is detected that the data elements in the second data set are consistent with the data elements in the first data set launch picker,

一所述选择器，该选择器可与存储器和比较器连接，该选择器用于基于存储器中存储的变化参数来确定运算符，并且将确定的运算符存储在存储器中，a said selector, the selector is connectable to the memory and the comparator, the selector is used to determine the operator based on the variable parameters stored in the memory, and to store the determined operator in the memory,

由此产生待发送给远程终端的、包括描述第一数据集和第二数据集之间差异的运算符的更新数据集。This results in an updated data set to be sent to the remote terminal comprising an operator describing the difference between the first data set and the second data set.

该计算机系统所具有的优点是，使得例如交易系统的计算机系统可以通过使用更少的CPU时间来更有效地产生更新数据集。例如，其原因在于该计算机系统使得通过从头到尾将每个数据集只运行一遍就可以对第一数据集和第二数据集进行比较。The computer system has the advantage that it enables a computer system such as a transaction system to more efficiently generate update data sets by using less CPU time. For example, this is because the computer system allows a first data set and a second data set to be compared by running each data set through only one pass.

该计算机系统进一步包括与选择器相关联的通信器，用于产生和发送包括更新数据集的更新消息。该消息可以包括诸如列表、阵列、位棋盘(bitboard)、堆栈、堆(heap)、树或者集合等等的数据结构，所述数据结构包括待发送给远程终端的数据集。优选地，通过使用FIX标准来发送消息，然而可以使用本领域技术人员熟知的任何其它协议来发送该消息，所述协议例如是Omnet API、XTP、SSL或者任何其它标准或者专有的协议。The computer system further includes a communicator associated with the selector for generating and sending an update message including the updated data set. The message may include a data structure, such as a list, array, bitboard, stack, heap, tree, or set, etc., that includes the data set to be sent to the remote terminal. Preferably, the message is sent using the FIX standard, however any other protocol known to those skilled in the art may be used to send the message, such as Omnet API, XTP, SSL or any other standard or proprietary protocol.

优选地，第一数据集和第二数据集包括随时间变化的动态数据元。例如，第一数据集可以是在时间T1时交易系统中的命令薄，第二数据集可以是在时间T2时的命令薄。命令薄中的数据可能已经在时间T1和T2期间发生了变化，因为针对销售金融工具进行买入或者报价的新命令可能已经进入。信息传播系统可以按照特定的预定时间间隔或者基于命令薄中的活动而向远程终端传播信息。例如，如果命令薄没有接收到任何新命令，则向远程终端发送新的更新是没有意义的。然而，当新命令进入命令薄时，则必须向远程终端传播新信息。因此，当在这个时间点发生活动时，发送更新消息可能是必要的。可能在命令薄中发生变化的另一个例子是在状态转换期间，例如当开启或者关闭命令薄时。Preferably, the first data set and the second data set include dynamic data elements that change over time. For example, the first data set may be the order book in the trading system at time T1, and the second data set may be the order book at time T2. The data in the order book may have changed between times T1 and T2 because new orders to buy or quotes for the sale financial instrument may have entered. The information dissemination system may disseminate information to remote terminals at certain predetermined time intervals or based on activity in the command book. For example, if the command book has not received any new commands, it does not make sense to send a new update to the remote terminal. However, when new commands enter the command book, new information must be propagated to the remote terminal. Therefore, it may be necessary to send update messages when activity occurs at this point in time. Another example where changes may occur in the command book is during state transitions, such as when the command book is opened or closed.

因此，第二数据集是第一数据集的后续版本(later version)。这样，中央系统可以比较已经发生了什么变化，并且因为中央系统知道远程终端拥有什么数据集，因此该中央系统可以提取运算符以存储和发送到更新数据集中。此外，也可以将第一数据集和第二数据集之间的变化的特定部分发送给具体的远程终端，因为远程终端上的用户可能希望得到与数据集的不同部分有关的信息。Thus, the second data set is a later version of the first data set. This way, the central system can compare what has changed, and since the central system knows what data sets the remote terminals have, it can extract operators to store and send to the update data set. In addition, a specific portion of the variation between the first data set and the second data set may also be sent to a specific remote terminal, since a user at a remote terminal may wish to obtain information related to a different portion of the data set.

优选地，运算符是从包括下列运算符的组中确定的：添加运算符、删除运算符、替换运算符。通过基于变化参数确定这些运算符，可以产生更新数据集。如后面在本文献中所描述的那样，这些运算符可以结合起来。Preferably, the operator is determined from the group comprising the following operators: add operator, delete operator, replace operator. By determining these operators based on the changing parameters, an updated data set can be generated. These operators can be combined as described later in this document.

变化参数优选地包括与第一数据集相关的第一计数器和与第二数据集相关的第二计数器。这样，可以以更准确的方式监控和管理选择过程。因此，选择器可以基于与第一数据集相关的第一计数器和与第二数据集相关的第二计数器之间的关系来确定运算符。这加速了选择过程，因为所述关系优选地是从一组关系中所选择的关系，所述组包括：＞、＜、＝、≥、≤或者≠。基于计数器之间的关系，可以确定至少一个运算符：添加、删除、替换。The variation parameters preferably comprise a first counter associated with the first data set and a second counter associated with the second data set. In this way, the selection process can be monitored and managed in a more accurate manner. Accordingly, the selector may determine an operator based on a relationship between a first counter associated with a first data set and a second counter associated with a second data set. This speeds up the selection process, since the relation is preferably a relation selected from a set of relations comprising: >, <, =, ≥, ≤ or ≠. Based on the relationship between the counters, at least one operator can be determined: add, delete, replace.

变化参数可以进一步包括与第一数据集关联的第一位置参数和与第二数据集关联的第二位置参数，以用于对比较器在第一数据集和第二数据集中的逻辑位置进行追踪。这样，可以以更加准确的方式监控和管理对第一列表和第二列表进行顺序比较的结果。The variation parameters may further include a first position parameter associated with the first data set and a second position parameter associated with the second data set for tracking the logical position of the comparator in the first data set and the second data set . In this way, the result of the sequential comparison between the first list and the second list can be monitored and managed in a more accurate manner.

运算符可以包括delta变化。因此，如果只有一部分数据元已经变化，则该运算符可以描述已经变化的这部分数据元。然而，该运算符也可以描述应该例如删除或者替换或者添加整个数据元。Operators can include delta changes. Thus, if only a portion of the data elements has changed, this operator can describe the portion of the data elements that have changed. However, the operator may also describe that an entire data element should be deleted or replaced or added, for example.

每个数据元在正常情况下至少包括键(key)。然而，数据元也可以包括数据部分。一般地，数据部分可以是空的。然而在交易系统中，典型的键是命令薄中的价格。数据部分可以例如是针对那个价格的总容量。然而，如果键包括价格和时间，则发送整个键不总是必要的。数据元分类可以基于价格和时间，但是只有价格可以被发送给远程终端。Each data element normally includes at least a key (key). However, a data element may also include a data portion. Generally, the data section can be empty. In a trading system, however, the typical key is the price in the order book. The data portion may eg be the total capacity for that price. However, if the key includes price and time, it is not always necessary to send the entire key. Data element classification can be based on price and time, but only price can be sent to the remote terminal.

在本发明的第二方面中，上述目标和其它目标通过包括如上所述的计算机系统的电子交易系统来实现。In a second aspect of the invention, the above objects and others are achieved by an electronic transaction system comprising a computer system as described above.

因此在电子交易系统中，计算机系统可以是集成模块。该计算机系统也可以是作为信息提取系统来单独出售的独立模块。Therefore, in the electronic trading system, the computer system can be an integrated module. The computer system can also be an independent module sold separately as an information retrieval system.

在本发明的第三方面中，上述目标和其它目标通过用于产生待发送给远程终端的更新数据集的方法实现，该更新数据集包括描述包含分类数据元的第一数据集与包含分类数据元的第二数据集之间差异的运算符，所述方法包括下列步骤：In a third aspect of the invention, the above objects and others are achieved by a method for generating an updated data set to be sent to a remote terminal, the updated data set comprising a description of a first data set containing classified data elements and a first data set containing classified data An operator for the difference between a second dataset of elements, the method comprising the steps of:

-将第一数据集中的数据元与第二数据集中的数据元顺次地比较，第一次比较的结果控制第一数据集中的哪个数据元与第二数据集中的哪个数据元将在第二次比较中进行比较，- sequentially compare the data elements in the first data set with the data elements in the second data set, the result of the first comparison controls which data element in the first data set and which data element in the second data set will be in the second data set compared in the second comparison,

-在每次比较之后，将变化参数在存储器中进行更新，- after each comparison, the changed parameters are updated in memory,

-在检测到第二数据集中的数据元与第一数据集中的数据元一致时，启动选择过程，该选择过程基于存储在存储器中的变化参数来确定运算符，- upon detection that a data element in the second data set coincides with a data element in the first data set, initiating a selection process which determines an operator on the basis of varying parameters stored in memory,

-将确定的运算符存储在存储器中，从而产生待发送给远程终端的更新数据集，该更新数据集包括描述第一数据集与第二数据集之间差异的运算符。- Storing the determined operator in a memory, resulting in an updated data set to be sent to the remote terminal, the updated data set comprising the operator describing the difference between the first data set and the second data set.

该方法所具有的优点是，使得诸如交易系统的计算机系统可以通过使用更少的CPU时间而更有效地产生更新数据集，因为本发明例如使得通过从头到尾将每个数据集只运行一遍就可以对第一数据集和第二数据集进行比较。This method has the advantage that it enables a computer system such as a trading system to generate update data sets more efficiently by using less CPU time, since the invention, for example, allows each data set to be updated by running each data set only once from start to finish. A comparison can be made between the first data set and the second data set.

该方法可以进一步包括关联所确定的运算符来比较数据元的步骤。这样，系统可以跟踪运算符应该被用于哪个数据元。The method may further comprise the step of comparing the data elements in association with the determined operator. This way, the system can keep track of which data element the operator should be applied to.

该方法也可以包括这样的步骤：从包括下列运算符：添加运算符、删除运算符、替换运算符的组中确定至少一个运算符。通过基于变化参数来确定这些运算符可以产生更新数据集。如下面在本文献中所描述的那样，可以将这些运算符结合起来。The method may also include the step of determining at least one operator from the group consisting of the following operators: add operator, delete operator, replace operator. By determining these operators based on changing parameters an updated data set can be generated. These operators can be combined as described below in this document.

如上所述，变化参数可以包括与第一数据集相关的第一计数器和与第二数据集相关的第二计数器。因此，该方法可以进一步包括基于与第一数据集相关的第一计数器和与第二数据集相关的第二计数器之间的关系来确定运算符的步骤。As mentioned above, the variation parameters may include a first counter associated with the first data set and a second counter associated with the second data set. Accordingly, the method may further comprise the step of determining the operator based on the relationship between the first counter associated with the first data set and the second counter associated with the second data set.

优选地，每个数据元都包括键和数据部分，该方法可以进一步包括将第一数据集的数据元的键和数据部分中至少一个与第二数据集的数据元的键和数据部分中至少一个进行比较的步骤。Preferably, each data element comprises a key and a data portion, the method may further comprise combining at least one of the key and the data portion of the data elements of the first data set with at least one of the key and the data portion of the data elements of the second data set A step for comparison.

如早先提到的，第一数据集和第二数据集可以包括随时间变化的动态信息。As mentioned earlier, the first data set and the second data set may include dynamic information that changes over time.

在本发明的第四方面中，上述目标和其它目标通过根据任何一个前述方面和/或实施例的计算机程序产品来实现，该计算机程序产品存储在数据载体上。In a fourth aspect of the present invention, the above objects and others are achieved by a computer program product according to any one of the preceding aspects and/or embodiments, stored on a data carrier.

本发明的这些和其它方面从后面所述的实施例中变得明显并且参照后面的实施例进行解释。These and other aspects of the invention will be apparent from and explained with reference to the embodiments described hereinafter.

附图说明 Description of drawings

图1示出计算机网络的概貌，在该计算机网络中可以使用本发明。Figure 1 shows an overview of a computer network in which the invention can be used.

图2示出包括存储器、第一数据集、第二数据集、变化参数、比较器、选择器、更新数据集和接口的电子装置。Figure 2 shows an electronic device comprising a memory, a first data set, a second data set, a variation parameter, a comparator, a selector, an update data set and an interface.

图3示出包括数据元的数据集，以及包括键和数据的数据元。Figure 3 shows a data set including data elements, and data elements including keys and data.

图4示出用于迭代在文中所述的数据集的算法。Figure 4 shows the algorithm used to iterate the dataset described in the text.

图5示出数据集A和数据集B。Figure 5 shows dataset A and dataset B.

图6示出数据集A中的数据元与数据集B中的数据元的比较。Figure 6 shows a comparison of data elements in data set A with data elements in data set B.

图7示出数据集A中的数据元与数据集B中的数据元的比较。Figure 7 shows a comparison of data elements in data set A with data elements in data set B.

图8示出数据集A中的数据元与数据集B中的数据元的比较。FIG. 8 shows a comparison of data elements in data set A and data elements in data set B. FIG.

图9示出数据集A中的数据元与数据集B中的数据元的比较。FIG. 9 shows a comparison of data elements in data set A with data elements in data set B. FIG.

图10示出数据集A中的数据元与数据集B中的数据元的比较。FIG. 10 shows a comparison of data elements in data set A with data elements in data set B. FIG.

图11示出由选择方法选择的第一运算符。Fig. 11 shows the first operator selected by the selection method.

图12示出数据集A中的数据元与数据集B中的数据元的比较。FIG. 12 shows a comparison of data elements in data set A with data elements in data set B. FIG.

图13示出数据集A中的数据元与数据集B中的数据元的比较。FIG. 13 shows a comparison of data elements in data set A with data elements in data set B. FIG.

图14示出数据集A中的数据元与数据集B中的数据元的比较。FIG. 14 shows a comparison of data elements in data set A with data elements in data set B. FIG.

图15示出数据集A中的数据元与数据集B中的数据元的比较。FIG. 15 shows a comparison of data elements in data set A with data elements in data set B. FIG.

图16示出数据集A中的数据元与数据集B中的数据元的比较。FIG. 16 shows a comparison of data elements in data set A with data elements in data set B. FIG.

图17示出数据集A中的数据元与数据集B中的数据元的比较。FIG. 17 shows a comparison of data elements in data set A and data elements in data set B. FIG.

图18示出由选择方法选择的附加运算符。Fig. 18 shows additional operators selected by the selection method.

图19示出包括由选择方法选择的运算符的更新数据集。Fig. 19 shows an update data set including operators selected by a selection method.

图20示出使用运算符之前的数据集A和数据集B。FIG. 20 shows Dataset A and Dataset B before operators are used.

图21示出更新数据集中的一行。Figure 21 shows updating a row in a data set.

图22示出应用第一运算符之后的数据集A的变化。Fig. 22 shows the change of data set A after applying the first operator.

图23示出更新数据集中的一行。Figure 23 shows updating a row in a data set.

图24示出应用第二运算符之后的数据集A的变化。Fig. 24 shows the change of the data set A after applying the second operator.

图25示出更新数据集中的一行。Figure 25 shows updating a row in a data set.

图26示出应用第三运算符之后的数据集A的变化。Fig. 26 shows the change of the data set A after applying the third operator.

图27示出更新数据集中的一行。Figure 27 shows updating a row in a data set.

图28示出应用第四运算符之后的数据集A的变化。Fig. 28 shows the change of the data set A after applying the fourth operator.

图29示出将运算符应用于数据集A之后的结果。Figure 29 shows the result after applying the operator to dataset A.

图30示出选择方法B。FIG. 30 shows selection method B.

具体实施方式 Detailed ways

图1示出包括远程终端1、中央计算机2以及网关或者路由器3的计算机网络。在不同的装置之间，存在通过具有不同粗细的线4示出的连接。所述粗细表示带宽(数据速率)。粗线具有高的数据速率，而细线具有低的数据速率。图1中的3个前端计算机在屏幕上有字母，这表示该前端计算机是属于用户A、B和C的远程终端。通过包括线4和诸如路由器等等的网络装置的网络将更新数据集6从中央计算机2发送给远程终端1。更新数据集6在根据本发明的计算机系统5中产生。FIG. 1 shows a computer network comprising a remote terminal 1 , a central computer 2 and a gateway or router 3 . Between the different devices there are connections shown by lines 4 with different thicknesses. The thickness indicates bandwidth (data rate). Thick lines have a high data rate, while thin lines have a low data rate. The 3 front-end computers in Fig. 1 have letters on the screen, which means that the front-end computers are remote terminals belonging to users A, B and C. The update data set 6 is sent from the central computer 2 to the remote terminal 1 through a network comprising lines 4 and network devices such as routers or the like. The update data set 6 is generated in the computer system 5 according to the invention.

图2示出根据本发明的计算机系统5。计算机系统5包括存储器10、比较器14、选择器15和用于发送更新数据集7的接口16。存储器10包括第一数据集11和第二数据集12。此外，该存储器包括用于存储变化参数13的区域、用于存储更新数据集7的区域和用于存储运算符18的区域。Figure 2 shows a computer system 5 according to the invention. The computer system 5 comprises a memory 10 , a comparator 14 , a selector 15 and an interface 16 for sending the update data set 7 . The memory 10 comprises a first data set 11 and a second data set 12 . Furthermore, the memory comprises an area for storing variation parameters 13 , an area for storing update data sets 7 and an area for storing operators 18 .

图3示出数据集11和该数据集中的数据元17的放大部分。在该图中，数据元17既包括键又包括数据部分。然而，在本发明的另一实施例中，数据元17可以至少包括键部分。FIG. 3 shows a data set 11 and a magnified portion of a data element 17 in the data set. In this figure, a data element 17 includes both a key and a data portion. However, in another embodiment of the invention, the data element 17 may include at least a key portion.

在下面，将参照附图详细阐述本发明。将要解释可以如何决定/选择运算符的细节。In the following, the present invention will be explained in detail with reference to the accompanying drawings. Details of how an operator can be decided/selected will be explained.

下面是用于选择运算符的方法的例子，所属运算符以有效方式描述了两个数据集之间的差异。Below is an example of a method for selecting an operator that describes the difference between two data sets in an efficient manner.

数据集由一个或者几个数据元组成，其中每个数据元都具有键和可能的数据部分。键与分类算法一起为每个数据元给出在数据集中的逻辑位置。A dataset consists of one or several data elements, where each data element has a key and possibly a data part. The key, together with the classification algorithm, gives each data element its logical position in the data set.

该方法基于：The method is based on:

(1)对有效的运算符集进行设计。(1) Design an effective operator set.

(2)对有效算法进行设计以评估两个数据集的差异，并且使用运算符集来描述这些差异。为了达到该目的，该算法对两个数据集使用一次遍历(one-pass traversing)。(2) Design efficient algorithms to evaluate the differences between two data sets, and use sets of operators to describe these differences. To achieve this, the algorithm uses one-pass traversing over the two datasets.

该方法使用下述运算符集来描述两个数据集之间的变化：This method uses the following set of operators to describe changes between two datasets:

方法介绍method introduction

在两个数据集中(键和数据部分都)一致的数据元是未变化的。该算法将未变化的数据元考虑为屏障(barrier)。该方法使用屏障来确认什么时候可以执行对运算符的优化。该方法基于这样的观察，即可以对在检测到屏障之前使用的运算符进行优化。Data elements that are consistent in both data sets (both key and data parts) are unchanged. The algorithm considers unchanged data elements as barriers. This method uses barriers to determine when optimizations on operators can be performed. The method is based on the observation that operators used before barriers are detected can be optimized.

假设数据集A和B，见图5。从数据集A中，可以应用若干运算符来实现数据集B。下面描述的算法和方法提取这些将数据集A转化成数据集B的运算符。Assuming data sets A and B, see Figure 5. From dataset A, several operators can be applied to implement dataset B. The algorithms and methods described below extract these operators that transform dataset A into dataset B.

在遍历数据集期间，优选使用下面的计数器#Del和#Add。这些计数器在初始时被设置成0或者任何其它满足相同目标的对应值。During traversal of the dataset, the following counters #Del and #Add are preferably used. These counters are initially set to 0 or any other corresponding value that satisfies the same goal.

然后根据键的分类命令，优选地从开始遍历数据集A和数据集B。Dataset A and Dataset B are then traversed, preferably from the beginning, according to the sort command of the key.

数据集A中的实际逻辑位置在下面称为APos，该APos包括针对数据集A的位置参数。数据集B中的实际逻辑位置在下面称为BPos，该BPos包括针对数据集B的位置参数。APos和BPos在初始时被设置为数据集A和B中的第一逻辑位置。The actual logical position in dataset A is referred to below as APos, which includes the position parameter for dataset A. The actual logical position in dataset B is referred to below as BPos, which includes the location parameter for dataset B. APos and BPos are initially set as the first logical positions in datasets A and B.

初始化：initialization:

设置#Add＝0和#Del＝0。Set #Add=0 and #Del=0.

将APos和BPos设置成数据集A和B中的第一逻辑位置。Set APos and BPos to the first logical position in datasets A and B.

设置APosStart＝APos和BPosStart＝BPos。Set APosStart=APos and BPosStart=BPos.

算法：将数据集A和B迭代，并计算#Add和#Del：Algorithm: Iterate data sets A and B, and calculate #Add and #Del:

1.将在逻辑位置APos处的A集数据元与在逻辑位置BPos处的B集数据元进行比较。1. Compare the set A data element at logical location APos with the set B data element at logical location BPos.

a.如果没有A数据元和B数据元，使用下面的方法B选择运算符，然后继续进行步骤2(完成(Done))。a. If there is no A data element and B data element, use method B below to select the operator, and then proceed to step 2 (Done).

b.如果键是一致的，但是数据部分不同，则#Add加1，#Del加1，APos加1，BPos加1。继续进行步骤1并且比较下一数据元。b. If the keys are the same, but the data parts are different, add 1 to #Add, add 1 to #Del, add 1 to APos, and add 1 to BPos. Proceed to step 1 and compare the next data element.

c.如果A数据元的键(根据分类命令)比B数据元的键更差，或者没有A数据元，则#Add加1并且BPos加1。继续进行步骤1并且比较下一数据元。c. If the key of the A data element (according to the sort command) is worse than the key of the B data element, or there is no A data element, then increment #Add and BPos. Proceed to step 1 and compare the next data element.

d.如果A数据元的键(根据分类命令)比B数据元的键更好，或者没有B数据元，则#Del加1并且APos加1。继续进行步骤1并且比较下一数据元。d. If the A data element has a better key (according to the sort command) than the B data element key, or if there is no B data element, increment #Del and APos. Proceed to step 1 and compare the next data element.

e.如果键和数据部分都一致，则发现了屏障。如果(#Add≠0或者#Del≠0)，则使用下面的方法B来选择运算符。继续：e. If both the key and data parts agree, the barrier has been found. If (#Add≠0 or #Del≠0), use method B below to select an operator. continue:

设置#Add＝0和#Del＝0。Set #Add=0 and #Del=0.

将APos加1并且将BPos加1。Increment APos by 1 and increase BPos by 1.

继续进行步骤1并且比较下一数据元。Proceed to step 1 and compare the next data element.

2.完成(Done)2. Done

方法B。选择运算符：将#Del与#Add比较。Method B. Selection operator: compares #Del to #Add.

a.如果#Del＞#Add，则选择下列运算符：a. If #Del > #Add, select the following operators:

在逻辑位置APosStart处的删除(N＝#Del)运算符。Delete (N=#Del) operator at logical location APosStart.

接着是：followed by:

来自B集元素的#Add个添加运算器，从逻辑位置APosStart开始。#Add addition operators from the elements of the B set, starting at the logical position APosStart.

b.如果#Del＝#Add，则选择下列运算符：b. If #Del = #Add, select the following operators:

来自B集元素的#Add个替换运算器，从逻辑位置APosStart开始。#Add replacement operators from set B elements, starting at logical position APosStart.

c.如果#Del＜#Add，则选择下列运算符：c. If #Del<#Add, select the following operators:

来自B集元素的#Del个替换运算器，从逻辑位置APosStart开始。#Del replacement operators from set B elements, starting at logical position APosStart.

接着是：followed by:

来自B集元素的(#Add-#Del)个添加运算器，从APosStart+#Del开始。(#Add-#Del) addition operators from elements of set B, starting from APosStart+#Del.

例子example

下面是参照附图描述不同方法步骤的例子。Below is an example describing the steps of the different methods with reference to the figures.

根据在图5中示出的表格，想象两个数据集A和B。优选基于键来对数据元进行分类，导致在逻辑位置1处的最高值。数据集A表示旧的数据集，而数据集B表示新的数据集。任务是将数据集A转变成数据集B。According to the table shown in Fig. 5, imagine two data sets A and B. Data elements are preferably sorted based on key, resulting in the highest value at logical position 1. Dataset A represents the old dataset and Dataset B represents the new dataset. The task is to transform dataset A into dataset B.

该方法的执行：Execution of the method:

初始化：initialization:

设置#Add＝0和#Del＝0。Set #Add=0 and #Del=0.

将APos设置在数据集A中的第一逻辑位置，将BPos设置在数据集B中的第一逻辑位置，Set APos at the first logical position in dataset A, set BPos at the first logical position in dataset B,

给定Apos＝1和Bpos＝1。Apos=1 and Bpos=1 are given.

设置APosStart＝APos和BPosStart＝BPos，Set APosStart=APos and BPosStart=BPos,

给定AposStart＝1和BPosStart＝1。AposStart=1 and BPosStart=1 are given.

该方法和系统开始初始化计数器，该计数器优选在对数据集进行迭代期间使用。The method and system begin by initializing a counter, which is preferably used during iteration over a data set.

然后开始迭代步骤，其中将变化参数#Add和#Del迭代并增加。Then an iterative step begins where the variable parameters #Add and #Del are iterated and incremented.

步骤1.将A集的数据元与B集的数据元进行比较。因为Apos和Bpos都被设置为1，这意味着比较在列表中的第一数据元。然而，如果以另一方式对数据集中的数据元进行分类，则该方法可能首先开始比较其它数据元。Step 1. Compare the data elements of set A with the data elements of set B. Since both Apos and Bpos are set to 1, this means that the first element in the list is compared. However, if the data elements in the data set are classified in another way, the method may start comparing other data elements first.

该比较描绘在图6中。在这种情况下，99比101更差(见算法步骤c)(在数据集A中的数据元(Apos＝1)比在数据集B中的数据元(Bpos＝1)更差)，因此#Add加1，假定#Add＝1。BPos加1，假定BPos＝2。This comparison is depicted in FIG. 6 . In this case, 99 is worse than 101 (see algorithm step c) (elements in dataset A (Apos=1) are worse than elements in dataset B (Bpos=1), so #Add is incremented by 1, assuming that #Add=1. Add 1 to BPos, assuming that BPos=2.

继续算法中的步骤1并比较下一数据元。Continue with step 1 in the algorithm and compare the next data element.

步骤2.将A集的数据元与B集的数据元进行比较。现在Bpos计数器已经加1，因此数据集A中在(Apos＝1)处的数据元与数据集B中在(Bpos＝2)处的数据元进行比较。Step 2. Compare the data elements of set A with the data elements of set B. Now the Bpos counter has been incremented by 1, so the data element at (Apos=1) in dataset A is compared with the data element at (Bpos=2) in dataset B.

该比较在图7中示出。99比100更差(见算法步骤c)，因此#Add加1，假定#Add＝2。BPos加1，假定BPos＝3。This comparison is shown in FIG. 7 . 99 is worse than 100 (see algorithm step c), so #Add is incremented by 1, assuming #Add=2. Add 1 to BPos, assuming that BPos=3.

步骤3.将A集的数据元与B集的数据元进行比较。与上述步骤相似。Step 3. Compare the data elements of set A with the data elements of set B. Similar to the steps above.

该比较在图8中示出。99好于97(见算法步骤d)，因此#Del加1，假定#Del＝1。APos加1，假定Apos＝2。This comparison is shown in FIG. 8 . 99 is better than 97 (see algorithm step d), so #Del is incremented by 1, assuming #Del=1. Add 1 to APos, assuming Apos=2.

步骤4.将A集的数据元与B集的数据元进行比较。该比较在图9中示出。98好于97(见算法步骤d)，因此#Del加1，假定#Del＝2。APos加1，假定Apos＝3。Step 4. Compare the data elements of set A with the data elements of set B. This comparison is shown in FIG. 9 . 98 is better than 97 (see algorithm step d), so #Del is incremented by 1, assuming #Del=2. Add 1 to APos, assuming Apos=3.

步骤5.将A集的数据元与B集的数据元进行比较。该比较在图10中示出。键和数据部分都相同(见算法步骤e)，因此已经到达了屏障。Step 5. Compare the data elements of set A with the data elements of set B. This comparison is shown in FIG. 10 . Both key and data parts are the same (see algorithm step e), so the barrier has been reached.

使用方法B来启动选择器并选择运算符。Use method B to fire up the picker and select the operator.

步骤6.将#Del与#Add进行比较。根据方法B，步骤b可以在该情况下应用：Step 6. Compare #Del with #Add. According to method B, step b can be applied in this case:

(b)#Del(2)＝#Add(2)并且因此选择下列运算符：(b) #Del(2)=#Add(2) and thus select the following operators:

替换2(#Add)运算符，从B集选择数据，从逻辑位置AposStart＝1开始。Replace 2(#Add) operator, select data from set B, start from logical position AposStart=1.

所选择的运算符在图11中示出。The selected operators are shown in FIG. 11 .

步骤7.对数据集A和数据集B中的数据元的迭代和比较继续进行。Step 7. The iteration and comparison of the data elements in dataset A and dataset B continues.

设置#Add＝0和#Del＝0。Set #Add=0 and #Del=0.

为了达到屏障将计数器增加，APos加1并且BPos加1，假定Apos＝4并且Bpos＝4。To reach the barrier the counter is incremented, APos is incremented and BPos is incremented by 1, assuming Apos=4 and Bpos=4.

设置APosStart＝APos并且BPosStart＝BPos，假定APosStart＝4并且BPosStart＝4。Set APosStart=APos and BPosStart=BPos, assuming APosStart=4 and BPosStart=4.

继续算法中的步骤1并且比较下一数据元。Continue with step 1 in the algorithm and compare the next data element.

步骤8.将A集的数据元与B集的数据元进行比较。该比较在图12中示出。96好于93(见算法步骤d)，因此#Del加1，假定#Del＝1。APos加1，假定APos＝5。Step 8. Compare the data elements of set A with the data elements of set B. This comparison is shown in FIG. 12 . 96 is better than 93 (see algorithm step d), so #Del is incremented by 1, assuming #Del=1. Add 1 to APos, assuming APos=5.

步骤9.将A集的数据元与B集的数据元进行比较。该比较在图13中示出。95好于93(见算法步骤d)，因此#Del加1，假定#Del＝2。APos加1，假定APos＝6。Step 9. Compare the data elements of set A with the data elements of set B. This comparison is shown in FIG. 13 . 95 is better than 93 (see algorithm step d), so #Del is incremented by 1, assuming #Del=2. Add 1 to APos, assuming that APos=6.

步骤10.将A集的数据元与B集的数据元进行比较。该比较在图14中示出。94好于93(见算法步骤d)，因此#Del加1，假定#Del＝3。APos加1，假定APos＝7。Step 10. Compare the data elements of set A with the data elements of set B. This comparison is shown in FIG. 14 . 94 is better than 93 (see algorithm step d), so #Del is incremented by 1, assuming #Del=3. Add 1 to APos, assuming that APos=7.

步骤11.将A集的数据元与B集的数据元进行比较。该比较在图15中示出。键一致，但是数据部分不同(10不等于15)(见算法步骤b)。Step 11. Compare the data elements of set A with the data elements of set B. This comparison is shown in FIG. 15 . The keys are the same, but the data part is different (10 is not equal to 15) (see algorithm step b).

#Add加1，假定#Add＝1。#Add is incremented by 1, assuming that #Add=1.

#Del加1，假定#Del＝4。Add 1 to #Del, assuming #Del=4.

APos加1，假定APos＝8。Add 1 to APos, assuming APos=8.

BPos加1，假定BPos＝5。Add 1 to BPos, assuming that BPos=5.

步骤12.将A集的数据元与B集的数据元进行比较。该比较在图16中示出。没有B数据元(为空)(见算法步骤d)。Step 12. Compare the data elements of set A with the data elements of set B. This comparison is shown in FIG. 16 . There is no B data element (empty) (see algorithm step d).

#Del加1，假定#Del＝5。Apos加1，假定APos＝9。Add 1 to #Del, assuming #Del=5. Add 1 to Apos, assuming that APos=9.

步骤13.将A集的数据元与B集的数据元进行比较。该比较在图17中示出。数据集A中在APos(9)处的数据元和数据集B中在BPos(5)处的数据元为空，因此应用该算法中的步骤a。该算法中的步骤a启动选择器，因此应用如在步骤14中所述的方法B。Step 13. Compare the data elements of set A with the data elements of set B. This comparison is shown in FIG. 17 . The element at APos(9) in dataset A and the element at BPos(5) in dataset B are empty, so step a in the algorithm is applied. Step a in the algorithm activates the selector, so method B as described in step 14 applies.

在该选择之后继续进行步骤2(完成)。After this selection proceed to step 2 (Finish).

步骤14.通过使用选择方法B，将计数器#Del与计数器#Add进行比较。Step 14. By using selection method B, compare counter #Del with counter #Add.

根据选择方法B中的选择步骤(a)，#Del(5)＞#Add(1)，因此优选如下选择运算符：According to selection step (a) in selection method B, #Del(5)>#Add(1), so it is preferable to select the operator as follows:

在逻辑位置APosStart(4)的删除5(#Del)运算符。Delete 5 (#Del) operator at logical location APosStart(4).

接着是：followed by:

来自B集数据元的添加1(#Add)运算符，从逻辑位置APosStart(4)开始。Add 1 (#Add) operator from set B data element, starting from logical position APosStart(4).

所选择的运算符在图18中示出。The selected operators are shown in FIG. 18 .

步骤15.该算法中的最后步骤2(完成)如上所述在步骤14中实现。Step 15. The last step 2 (Completion) in the algorithm is implemented in step 14 as described above.

在完成该算法的迭代和方法B之后，描述从数据集A到数据集B的变化的运算符的全部数量被存储在存储器中，优选与对应的逻辑位置、键和数据部分一起存储。产生包括运算符、逻辑位置、键和数据部分的更新数据集，如在图19中示出的那样。After completion of iterations of the algorithm and method B, the full number of operators describing the change from dataset A to dataset B is stored in memory, preferably together with the corresponding logical locations, keys and data parts. An update data set including operator, logical location, key and data parts is generated as shown in FIG. 19 .

在另一实施例中，可以通过键来表示逻辑位置。在这种情况下，接收更新数据集的终端优选包括逻辑位置如何相互关联以便能够将数据元分类的信息。因此，逻辑位置不必是1、2、3...等等。也可以由A、B、C...等等来表示，只要位置具有相互关联以利于分类就可以。In another embodiment, logical locations may be represented by keys. In this case, the terminal receiving the updated data set preferably includes information on how the logical positions are related to each other in order to be able to classify the data elements. Therefore, the logical position does not have to be 1, 2, 3...etc. It can also be represented by A, B, C... and so on, as long as the positions are related to each other to facilitate classification.

利用对数据集A和对包括如在图19中示出的运算符的更新数据集的认识，可以通过从开始至结束顺次地每次使用一个运算符来生成数据集B。Using the knowledge of the data set A and the updated data set including the operators as shown in FIG. 19 , the data set B can be generated by sequentially using one operator at a time from the beginning to the end.

利用附图20-29在下面示出如何这样做的步骤方法说明。A step-by-step method description of how to do this is shown below using Figures 20-29.

图20示出在更新数据集被应用于数据集A之前的数据集A和数据集B的情况。因此，通过使用图19中的更新数据集，下面的步骤从数据集A生成数据集B。FIG. 20 shows the data set A and the data set B before the update data set is applied to the data set A. Therefore, the following steps generate dataset B from dataset A by using the updated dataset in Figure 19.

通过将在图21中示出的图19中的更新数据集的第一行应用于数据集A，获得了如在图22中示出的经过修改的数据集A。By applying the first row of the updated data set in FIG. 19 shown in FIG. 21 to data set A, a modified data set A as shown in FIG. 22 is obtained.

通过将在图23中示出的图19中的更新数据集的第二行应用于数据集A，获得了如在图24中示出的经过修改的数据集A。By applying the second row of the updated data set in FIG. 19 shown in FIG. 23 to data set A, a modified data set A as shown in FIG. 24 is obtained.

通过将在图25中示出的图19中的更新数据集的第三行应用于数据集A，获得了如在图26中示出的经过修改的数据集A。By applying the third row of the updated data set in FIG. 19 shown in FIG. 25 to data set A, a modified data set A as shown in FIG. 26 is obtained.

通过将在图27中示出的图19中的更新数据集的第四行应用于数据集A，获得了如在图28中示出的经过修改的数据集A。By applying the fourth row of the updated data set in FIG. 19 shown in FIG. 27 to data set A, a modified data set A as shown in FIG. 28 is obtained.

如在图29中示出的，运算符现在已经将数据集A转换成数据集B。As shown in FIG. 29, the operator has now transformed Data Set A into Data Set B.

在上述说明中，术语“包括”不排除其它数据元或者步骤，并且“一个”不排除多个。In the above description, the term "comprising" does not exclude other data elements or steps, and "a" does not exclude a plurality.

此外，术语“包含”不排除其它数据元或者步骤。Furthermore, the term "comprising" does not exclude other data elements or steps.

Claims

1. A computer system for generating an updated data set to be sent to a remote terminal, the updated data set including a difference between a first data set containing classified data elements and a second data set containing classified data elements operator determined from the group consisting of the following operators: addition operator, deletion operator, replacement operator,

The computer system includes:

a memory comprising a first data set and a second data set,

A comparator connected to the memory for sequentially comparing data elements in the second data set with data elements in the first data set, wherein a data element in the second data set is compared with a data element in the first data set The result of the first comparison between elements controls which element in the first dataset will be compared with which element in the second dataset in the second comparison in that element's comparison order, after each comparison updating change parameters in the memory, and activating the selector when it is detected that the data elements in the second data set are consistent with the data elements in the first data set, wherein the change parameters include the first counter associated with the first data set and the The second counter associated with the second data set also includes a first position parameter associated with the first data set and a second position parameter associated with the second data set, for comparing the comparator between the first data set and the second data set centralized location for tracking,

a said selector connected to a memory and a comparator for determining an operator based on a variable parameter stored in the memory and storing the determined operator in the memory,

This results in an updated data set to be sent to the remote terminal comprising an operator describing the difference between the first data set and the second data set.

2. The computer system of claim 1, further comprising a communicator associated with the selector for generating and sending an update message including the update data set.

3. The computer system of claim 1, wherein the first data set and the second data set include dynamic data elements that change over time.

4. The computer system of claim 1, wherein the second data set is a subsequent version of the first data set.

5. The computer system of claim 1, wherein the selector selects an operator based on a relationship between a first counter associated with a first data set and a second counter associated with a second data set.

6. The computer system of claim 1, wherein each data element includes a key and a data portion.

7. An electronic transaction system comprising the computer system according to claim 1.

8. A method for generating an updated data set to be sent to a remote terminal, said updated data set comprising an operation describing the difference between a first data set comprising classified data elements and a second data set comprising classified data elements operator, said operator is determined from the group comprising the following operators: add operator, delete operator, replace operator, said method comprising the following steps:

- sequentially comparing data elements in the first data set with data elements in the second data set, wherein the result of the first comparison between a data element in the second data set and a data element in the first data set controls which data element in the first data set is compared with which data element in the second data set in the second comparison in this data element comparison order,

- After each comparison, the change parameter is updated in the memory, wherein the change parameter includes a first counter related to the first data set and a second counter related to the second data set, and also includes a counter related to the first data set the first positional parameter associated with the set and the second positional parameter associated with the second data set,

- in case of detection that a data element in the second data set coincides with a data element in the first data set, initiating a selection process which determines an operator on the basis of variation parameters stored in memory,

- storing the determined operator in memory,

An updated data set is thereby generated to be sent to the remote terminal, the updated data set comprising an operator describing the difference between the first data set and the second data set.

9. The method of claim 8, further comprising the step of associating the determined operator with the compared data elements.

10. The method of claim 8, further comprising the step of determining an operator based on a relationship between a first counter associated with a first data set and a second counter associated with a second data set.

11. The method of claim 8, wherein each data element comprises a key and a data part, the method further comprising combining the key and data part of the data elements of the first data set with the keys and data parts of the data elements of the second data set Data sections are compared separately.

12. The method of claim 8, wherein the first data set and the second data set include dynamic information that changes over time.