WO2009065977A1

WO2009065977A1 - Multi-version cache with relaxed isolation for replicated and non-replicated systems

Info

Publication number: WO2009065977A1
Application number: PCT/ES2008/000636
Authority: WO
Inventors: Ricardo JIMÉNEZ PERIS; Marta PATIÑO MARTÍNEZ
Original assignee: Universidad Politécnica de Madrid
Priority date: 2007-11-23
Filing date: 2008-10-10
Publication date: 2009-05-28
Also published as: ES2331039A1

Abstract

The invention relates to a cache system application servers in multi-layer transactional systems which has one or more versions of each datum corresponding to the values taken by the latter in each transaction compromised. The invention relates to a method of managing the cache in replicated and non-replicated systems, which provides each transaction executed with an image of the database with the content it had when the transaction started. The method guarantees cache transparency, i.e. it guarantees that a system with the multi-version cache provides the same consistency as a system without a cache which always accesses the data through the data store. The method for a replicated system provides the same consistency as in a non-replicated system, thereby guaranteeing transparency of replication.

Description

Technical Sector

Multi-version cache with relaxed isolation for replicated and non-replicated systems

Technical sector

The invention falls within the field of data replication to provide availability (fault tolerance) and scalability (increased performance through the addition of new nodes) applicable to software systems such as those based on multi-layer architectures and architectures oriented to services. Within multi-layer architectures it is applicable to state layers such as the application server layer and the database server layer.

State of the Technique

Transactions are a proposed mechanism to guarantee the consistency of data in databases and other information systems such as application servers, multi-layer architectures, service-oriented architectures and web services. Transactions provide two types of consistency, isolation and atomicity against failures. Isolation determines what type of consistency is provided when a set of transactions are executed concurrently and can access common data. The atomicity against failures guarantees that the transaction is executed as a unit, that is, or is executed in its entirety successfully (in which case it is said that the transaction compromises), or if there is any failure (in which case it is said that the transaction aborts), the final result is as if it had not been applied. Transactions provide another additional property known as durability that dictates that once a transaction has compromised the updates it has made, they cannot be lost even in the case of system failures (eg, fall of the node in which the system is running. transactional). There are different levels of isolation such as seriality {serializability), snapshot (snapshot isolation) [Berenson95], read committed, etc. Seriality provides the highest level of consistency ensuring that the concurrent execution of transactions is equivalent to a sequential execution of these. This is that the concurrent system has a behavior equivalent to a sequential system without any concurrence, which greatly simplifies the development of transactional applications. Seriality has an inherent cost since it implies that the writings on a piece of data are conflicting with the readings and writings on it. In particular, the conflicts between writings and readings, which are usually very frequent in most systems, greatly restrict the potential concurrence in the system and therefore its maximum performance. Therefore, other more relaxed levels of isolation have been proposed. The snapshot isolation [Berenson95] is one of the most popular isolation levels since it provides a consistency very close to seriality, but eliminates conflicts between readings and writings, only retaining conflicts between writes, less frequent than those of reading and writing, which allows systems to perform with better performance thanks to greater potential concurrence in the system. Snapshot isolation provides the illusion to transactions that the database is frozen (and therefore its data and their values) at the point at which a transaction is initiated (as if a photo had been taken of this one, or snapshot). Snapshot isolation does not allow two concurrent transactions to modify data in common, as is the case with seriality. However, it is possible that a transaction reads data A, modifies data B and another transaction modifies data A and reads data β, a situation not allowed by seriality. Snapshot isolation when implemented in databases generally employs multiple versions of the data.

Read committed isolation is one of the most relaxed insulations. It only guarantees that a transaction always reads the last compromised values of the data. This isolation has numerous anomalies [Berenson95] that make it difficult to program concurrent applications based on it.

Modern information systems are usually structured in multiple layers (tiers). Each layer is characterized by a type ^'specialized server in a type of processing. The two most relevant layers are the data storage and the application server. The data storage layer is usually represented by a database server, a file server, a persistent repository of objects, etc. The application server layer provides functionality to deploy the business logic of user applications, typically providing the infrastructure necessary for its construction and deployment, such as data storage server cache to increase efficiency, transactional semantics. (typically being in charge of relating the transactions of the application server with those of the data storage server, typically a database), connection with data storage servers, session management with the clients of the application, etc. Session management allows to maintain volatile status between requests from the same customer (eg the shopping cart in an Internet sales service). The transaction management allows to offer the transactional semantics to the applications. An important point is the transactional consistency that is offered by combining the application server with a database (or other transactional data repository). At present, this combination offers a consistency formally characterized only for seriality. However, current application servers do not provide a formally characterized consistency for other lower isolation levels, such as snapshot isolation or read committed. The present invention proposes a method for combining an application server and a database (or other transactional data repository) that together provide snapshot isolation, this being one of the new contributions of the invention.

There may be multiple layers of application server in which each application server provides specific functionality for a given field (eg dynamic content for web pages). Between two layers of different application servers you can always see one as an application server and the other as a data storage server. In the description of the present invention, we will use the term application server in its most general sense, characterizing it by its ability to maintain a data cache of the data storage layer and support for transactions.

Replication is the main technique to provide fault tolerance and scalability (increase in the maximum productivity of the system by increasing the number of machines, nodes, in the system). Replication consists of executing multiple instances of a software, replicas, (usually in a distributed system, executing each instance on a different node) in a coordinated manner so that requests made to the system by its clients can be served by the system to Although some of the replicas fail (the node goes down, the software is blocked, the node disconnects from the network, etc.). In this way, the replication is able to tolerate faults by means of the introduction of processing redundancy.

Replication can also be used to increase the productivity of a system. Taking advantage of the redundancy of the software in a distributed system, the work to be done by the software can be distributed among the different replicas (instances of the software) so that each node performs a fraction of the overall work of the system. Replication for increased productivity has been applied mainly for services without status, that is, services whose result is independent of the requests that have been received in the past and that depend only on the parameters of the request and, at most, on a persistent read-only state. This approach has been used for example by grid systems. In databases, during the last decade, scalable replication solutions have been developed. The different solutions can be classified according to how they execute the transactions, when they propagate the updates of the transactions and the level of isolation they provide. Transactions can be executed using primary-backup techniques [PlattnerO4] or update-in-any-replica (update-everywhere) [Kemme00, Patiño05, Lin05]. The _^ primary backup technique only allows to execute update transactions (those that modify the database) in a single replica, called primary. The rest of the replicas, called backups, can only execute read-only transactions. This technique has the disadvantage that the primary becomes a bottleneck since it has to process all update transactions. In the update-in-any-replica technique, all replicas can process all types of transactions, including update ones, which avoids the existing bottleneck in the primary backup.

Database replication techniques can also be classified according to when and how the propagation of updates is made. This classification is relevant for replication techniques in which the update transaction is processed in a single replica, local replica (to which the client is connected), and then the updated data is propagated to the rest of the replicas (remote replicas) . The propagation of updates can be impatient (eageή or lazy (lazy). The impatient replication propagates the updates atomic with the commitment of the transaction in the local replica so that all active replicas apply the updates of the transaction or none The consistency between replicas is also guaranteed when all updates are applied in a coordinated manner.In lazy replication the updates are propagated independently of the commitment of the transaction in the local replica, so that the status of the replicas can diverge resulting in inconsistencies, that is, the client can observe the last state of a data and later observe a previous state, or in general, the clients can observe sequences of states that would not be possible in a non-replicated system. The consistency of a replicated database is based on the correction-1- copy that consists in the fact that the behavior visible by the clients of a replicated system is equivalent to that of an unreplicated system. The first definition of correction-1 -copy is seriality-1 -copy (1-copy-serializability) [Bemstein87] that dictates. that the replicated system must behave like a non-replicated serial system. That is, the replicated execution must be equivalent to a non-replicated execution in a system providing seriality as isolation. More recently, the notion of isolation-snapshot-1-copy (1-copy-snapshot-isolation) [LinO5] has been defined in which a replicated database has a behavior equivalent to that of a database with snapshot isolation .

It is important to note that all these replication techniques are applicable to database replication. That is, the techniques only consider replication of the data warehouse and the consistency between the different replicas of it. In the present invention, a procedure is proposed for the replication of systems with multiple layers in which a copy of the data, generally called a cache, is maintained in a layer other than the data storage layer. In this different context, the updates of the database replicas must be coordinated with the system replicas that keep a copy of the data in a cache (eg the application server) to ensure consistency between the data maintained between the different layers, subject not treated by the replication of databases.

In replication of multi-layer systems there have been investigations in recent years. The first investigations were made in the context of CORBA and the focus was to provide active, semi-active or passive replication of CORBA servers. These replication techniques are process replication techniques that pursue that the flow of the execution be the same in the different replicas. This technique is known as process replication and is more restrictive than data replication since it not only intends that the replicas have the same state and that a coherent state be returned to the applications, but also intended, that the flow of The execution is identical. One of the most significant examples of CORBA replication has been the Eternal system [NarasimhanO2]. Process replication requires that processes that are replicated be deterministic, which implies restricting the type of processes that can be executed. Recently one of the main restrictions that were applied, that the process to be replicated was purely sequential, has been eliminated thanks to the deterministic planning of concurrent or multi-threaded servers [JimenezOO, Moser03, Moser03b]. The process replication only tolerates failures, providing availability, but does not provide scalability unlike the data replication technique of the present patent that allows to increase the maximum productivity of the system by adding additional computers to the system. In leading-followers systems, other sources of non-determinism from the operating system are treated by a supervisor who intercepts the invocations to the operating system in the leading replica and communicates the non-deterministic decisions to the supervisors of the follow-up replicas that are responsible for forcing said decisions locally [Bressoud98]. Other process replication techniques seek to provide primary-backup replication transparently to the application [Vigna95].

In other types of multi-layer architectures such as Java ™ Enterprise Edition (JEE ₁ before J2EE) [JEE] there are solutions that replicate the components without persistent state (session beans) and share the persistent state (database) [WuO4]. This solution has as main problem that the shared database becomes the system bottleneck and single point of failure. This type of solution has also been applied in CORBA recently [ZhaoOδ]. The present patent solves the problem of the bottleneck of the shared database by simultaneously replicating the layer of the application server and the database.

Caching systems generally maintain a single version of the data. Recently it has been proposed in the context of web pages and similar systems, that the cache may have multiple versions available for the same dentist (eg web pages in different languages) and return to each client the version most appropriate to their needs [Jacobs04]. This versioning in the cache has a purpose radically to that proposed in the present patent that is applied to multi-layer transactional systems to provide snapshot isolation.

Garbage collection methods have also been proposed to eliminate unnecessary copies of the cache. One of these methods proposed in [MattisOl] for caches of multiple fragments of an object, organizes the cache in different areas and determines how to delete fragments of the object from the cache and how to reorganize them after deletion. The proposed patent, on the other hand, focuses on determining when the versions of the cache can be eliminated without violating the properties of the snapshot isolation.

Detailed Description of the Invention

, 5 The present invention presents a cache system for multi-layer transactional systems! Multi-layer systems with at least two layers are considered: a data store layer (hereinafter database) and another layer that maintains a data warehouse cache (hereinafter application server). The problem addressed by the present invention is how to provide customers with transactional systems

10 multhcapa execution of your transactions with snapshot isolation isolation. Cases of a replicated and non-replicated system are covered.

Using current cache systems on application servers with a single version, combined with databases that provide snapshot isolation

15 isolation, it is not possible to provide isolation snapshot isolation. to the transactions executed by the clients if the cache is canceled. That is, only if copies of the data are not kept in the cache, and the data is always accessed through the database, could the snapshot isolation be satisfied. Unfortunately, the option to clear the cache results in performance losses.

20 very substantial. The present invention proposes a cache system that maintains multiple versions of the data with the values they had when compromising transactions that, together with a procedure for managing said cache for both non-replicated and replicated systems, provides snapshot isolation.

25. To avoid the aforementioned problem experienced by current application servers with a cache that only maintains one version of each data- (or, at least, not the versions corresponding to the values generated by each compromised transaction), the present invention proposes a cache system that maintains multiple versions of each data corresponding to the generated values-

30 for each transaction committed .; The invention also proposes a procedure for data management. of the cache in which each transaction is guaranteed

^{• I} observes a state. Give the database corresponding to the one you had at the time the transaction was initiated. The basic idea is' what. Each time a transaction reads a data, the cache returns the appropriate version of the data to meet the objective.

35 mentioned above. By this procedure I know. get transparency of the cache, that is, that the cached system behaves the same as a system with data storage with snapshot isolation solation that did not have a cache, objective of the proposed cache system and the procedure for managing data on it. On the other hand, the data management procedure in the cache is responsible for not allowing the compromise of transactions that modify common data with other concurrent transactions (which committed after the transaction in question initiated) already committed. In this way, states that would not respect the transparency of the cache are avoided. That is, they would never occur if the cache were deleted. A procedure for the replication of the system is also presented, which guarantees that the whole replicated system behaves in an equivalent manner to the system without replicating, thus providing transparency of replication.

One of the difficulties to be solved is that other concurrent transactions can modify said data after a transaction has been initiated, if a single version is used, the transaction could see the modification of another transaction thus violating the snapshot isolation. Therefore, the cache system proposed in the present invention maintains multiple versions of each data, and when a transaction will modify a data of the cache, a private version of this data is created that only said transaction can see. In this way the existing versions will allow to offer the other transactions that are being

'executing the appropriate image of the database.

The private version of the data will be the one that the transaction modifies and reads in successive accesses. In this way you can observe your own modifications, a necessary requirement to satisfy the snapshot isolation. When the transaction is going to be committed, it is verified that no transaction already committed that was concurrent with it has modified data in common. If the latter occurs, the transaction will abort instead of compromising. If the transaction compromises and there are other transactions in progress that have modified the data or attempt to modify it, they will be aborted.

When a transaction is committed your updates become visible to the transactions that are initiated after your commitment. For this, (as private versions of the data created by the transaction are made public. determine which version each transaction should read so that it observes an adequate image of the database, the proposed procedure records when a transaction begins and when it commits, and on the other hand which transaction has produced which versions. When private versions of the data are made public during the commitment of a transaction, they are labeled so that they are associated with that transaction.

Since a transaction can read data that is not in the cache, you have to access the database to obtain it. In order for the database to generate the appropriate version of the data, the transaction executed in the database is synchronized with the application server transaction. This is achieved by simultaneously initiating an associated transaction in the database when a transaction is initiated on the application server. In this way when reading data that is not in the cache, the database thanks to its snapshot isolation and the synchronization between both transactions will provide the appropriate version of the data. As it is not assumed that the database provides information about its possible internal versioning, the application server labels the version read as unknown. When the application server transaction is committed, the changes are first propagated to the database in the context of the associated transaction in the database and also committed to the database. If the transaction in the application server is aborted, its associated transaction in the database is also aborted.

When a transaction tries to read a data that is in the cache, the system determines which version it should read. The most recent version corresponding to a compromised transaction is read before the start of the transaction. If none of the known versions meets this condition and there is a version labeled as unknown, it is returned. The cache management guarantees that there will always be

The suitable version for all running transactions based not remove versions of Ia cache until it is known that they will no longer be required for any transaction ^~ running.

The creation of data is a special case of updating. When creating a data, there are no versions of it, so a private version of it is created without public versions. Deletion is also another special case of update. When a transaction deletes a data and commits if all versions of the data were deleted, the snapshot isolation would be violated since transactions that were initiated prior to the commitment must see the status of the database when the data still existed. Therefore, when a deleted data is for a transaction, a grave version of it is created. When the transaction commits the grave version, it is made public with its corresponding label. If a transaction reads a grave version, the result is as if it did not find the data, this

_{: This is} as if it were deleted. However, those transactions that observe an earlier version will find the data.

As the number of versions grows in the cache, for the procedure to be practical it needs a mechanism to eliminate unnecessary copies of the cache, also known as garbage collection. It is vital that garbage collection does not eliminate versions that may result in violations of snapshot isolation. Therefore, the proposed procedure only deletes a version when you have

The certainty that will not be necessary, in the future. Only one produced version is deleted

, for a transaction when all transactions in execution began after that transaction compromised. When a version of a data is discarded, the version can always be discarded as unknown of said data, if it exists.

A complementary alternative to garbage collection is hibernating data from the cache. It is always possible to free memory from the cache to eject some data to another medium (for example, a disk or another server). In this case, all versions of the data must be ejected. If a transaction attempts to access hibernating data, all versions must be brought and the corresponding version must be determined. Hibernating and dehibiting can be done at any time.

It should be noted that in the proposed method, if the application server has enough memory to store all the data (and its versions) used by the clients (the database or at least all the data accessed by the clients simultaneously) in the cache), then the database does not have to provide snapshot solation, can provide seriality or isolation read committed, or even not provide transactionality. When the system is replicated, new challenges arise. On the one hand, snapshot isolation isolation must continue to be provided, on the other hand, coordinate the replicated execution so that the result is equivalent to that of an unreplicated system (that is, no anomalies or inconsistencies are introduced in the replication). In addition, it is very important that the replicated system be scalable, so that by adding new nodes (and new replicas on these nodes) the maximum system performance is increased. The replication model that we propose in the present invention consists in taking as a replication unit pairs application server and database. In this way, each application server is connected to a single database (which we will call local) and the application servers interact with each other to ensure consistency between replicas and the cache, as well as the transparency of replication. Below we detail the procedure for managing the cache in the replicated model.

In the replicated model, each client connects to the application server of any one of the replicas. This will be the local replica for the transactions you execute and the rest will be remote replicas. Transactions are first executed locally and then undergo remote processing that involves all replicas. Local processing is very similar to the processing described above for the non-replicated model. The only step that is treated differently is the commitment and garbage collection detailed below.

When a local read-only transaction wants to compromise (respectively, abort), it commits (respectively, aborts) both in the application server and in the database.

When a local update transaction wants to commit, in the replicated model, a message is sent to all the replicas, including the local replica, with the updated data, as well as the start mark of the transaction. This message is sent atomic to all replicas so that it reaches all or none. In addition, the message must have the same relative order associated in all replicas to be processed in that order in all replicas. This same relative order can be achieved by any of the known methods in distributed systems such as full-order radiation, use of a sequence, through an agreement protocol, etc. When the message sent with the changes of a transaction is processed by a replica, the transaction is validated first. This validation verifies that the concurrent execution in different replicas does not violate the transparency of repiration and thus guarantees that the replicated execution of the compromised transactions is equivalent to an unreplicated execution, in addition to satisfying the isolation snapshot isolatíon. The validation verifies that the transaction being validated has not modified common data with other concurrent transactions already committed. If it meets the condition it becomes compromised (that is, it exceeds the validation), if it is not aborted.

When committing the transaction, two different cases are treated, that is the local replica to the transaction (the one that sent the message) or a remote replica (the rest of the replicas). In the case of the local replica, all transactions in execution that have modified common data or that will modify them are aborted. In the case of a remote replica, the associated transaction is created in the database and it is necessary to apply the changes of the transaction, generating the corresponding versions of the updated data in the cache. Likewise, all transactions that have modified data in common or that will modify them are aborted. In either case, the changes in the database are applied in the context of the associated transaction and the transaction is committed in the application server and in the database.

Below is a detailed description of how to combine the replicated model with session repiration. In many cases, application servers maintain conversational status with the client (known as session, volatile state that is maintained between calls from the same client), such as stateful session beans in

JEE To provide high availability, it is necessary to combine the repiration of the session together with the repiication of data proposed above. Thus, if an application maintains conversational status and the replica to which a conversational client is connected fails, it can be reconnected transparently to another replica and continue its execution without losing the availability of the session. The way to combine session replication with the proposed data replication is to propagate the session status to other replicas after each invocation of the client.

Since the session is only replicated to obtain high availability and non-scalability (increased performance by increasing the number of replicas), it is not necessary to replicate the state of the session in all replicas if not in a subset of them.

Next, the fault management is detailed '. Clients connect to the application server through an API or proxy generally provided by the application server (for example, in JEE through JNDI - Java Naming and Directory Interface). The proxies being generated by the application server can incorporate the replication logic in a completely transparent way to the clients. The proxy uses some method to discover the available replicas.- For example, by means of radio protocols such as IP-multicast or by accessing a server that keeps the list of active replicas. For example, using IP multicast, the proxy would send a discovery message using this protocol to a multicast IP address associated with the system replicated by Io that all replicas receive the discovery message. Replicas have different identifiers (eg 0 to n-1, where n is the number of replicas). The requests sent by the clients are also uniquely associated with a client and have a growing and consecutive identifier of the request number (maintained by the proxy). This is each client sends requests identified by a unique client identifier and a request identifier with a known initial value (eg 0) and grows from unit to unit. One or more system replicas will reply to the discovery message indicating the list of available replicas, their network addresses (eg their IP address) and an indication of the load of each replica. For this, the replicas periodically exchange messages informing about their current load. The proxy then randomly selects a replica with a probability inversely proportional to its current load to balance the load between the replicas. The proxy then connects to the selected replica. When the proxy successfully connects to a replica, it will send all subsequent requests to that replica (client requests are local to that replica). If the replication fails, the proxy will detect it by means of a timer and connect to a new replica of the ones it knew in the discovery message using the same method. On the application server side, each replica contains an application server and database pair. If either of them fails or the node they are in fails, the replica is considered failed. Mechanisms can be used to detect partial failures (only the application server or the database fails) to force the replica to disconnect from the system.

On the client side the fault is detected when the timer since the last request sent to the application server expires without receiving a response. Then the proxy will reconnect to the new replica and forward the request again. There are two possible scenarios: (1) the replica failed before the updates propagated to the other replicas; (2) The replica successfully propagated the updates to the other replicas before failing.

If there were previous customer interactions, the new replica to which the proxy connects will have a copy of the last state of the session associated with the client. This reply will regenerate the session from the last message received from the reply to which the client was connected.

In case (1), since the requests are uniquely identified with a unique customer and request identifier, the new replica to which the client connects will identify the forwarding of the request as a request not previously sent (since it does not know none of this) and will perform its normal processing. Case (2) can be treated but at a higher cost. When the state of the session is propagated, the response to be sent to the client must also be propagated. Thus, if the new replica to which the client is connected recognizes the request forwarded as a duplicate (the last state of the known session already contains the result of its processing), and the associated transaction compromised, the replica will return the response stored to the customer. Otherwise, it will return an error message to the client notifying that the transaction could not be committed because it was not possible to guarantee snapshot isolation. Description of the figures

Figure 1 illustrates the replicated model. Each replica (4) consists of an application server pair (2) and database (3). Each client (1) connects to one of the replicas. In the non-replicated model there would be elements of a single replica, that is, an application server and database pair. The application servers (4) communicate through a communication network to coordinate and ensure the consistency of the replication.

Figure 2 shows an example of the evolution of the local processing of the cache (ignoring remote processing). The figure illustrates the application server (2), the database (3), and the cache that the application server (1) maintains. Within the database, the X and Y data shows the evolution of its versions. The cache shows its evolution starting from an initial state in which the cache is empty. For each data (Xe Y) the value of the data (4) and its version (5) are shown. The execution of two transactions T1 and T2 is illustrated. T2 has the following steps (6) Start of transaction, (7) Read (X), (8) Write (Y, d), (9) Read (Y), (10) Commit. T1 has the following steps (11) Start of transaction, (12) Read (X), (13) Read (Y), (14) Write (X, c), (15) Commit.

The figure starts from a state in which the cache is empty and the time stamp of the replica is worth 10. Transactions T1 and T2 obtain the same start time stamp, 10, and the associated transaction is created on the basis of data for each of them. T2 reads X. Since there is no version of X in the cache, the data is read from the database and a version labeled -1 (to represent that the version is unknown) of X is created. The value of X in the cache is a. Now T1 reads X and Y. Since the version -1 of X is in the cache and -1 ≤ 10, T1 the version -1 of X. In addition, it reads Y from the database and is labeled as version -1. The value of version -1 of Yes b. Now T1 updates X with the value c and Y with the value d. For this, it creates private versions of Xe Y by acquiring the corresponding bolts on Xe Y.

Finally, T1 asks to compromise. Receive the time commitment mark CT (TI) = 11, the private versions of X and Y are labeled with this time stamp and are made public in the cache. The associated transaction in the database compromises, having previously applied the writings on X and Y on the database in the context of the associated transaction. As the database provides snapshot isolation, internally creates versions of X and Y. When T2 now reads Y, it does not read version 11 of Y, since 11> ST (T2). Instead, read the version - 1 of Y, that is, the old value of Y, b. As T2 is read-only, it simply compromises in the database and compromises in the database without any time stamp being assigned.

Remote processing is illustrated by an example in Figures 3 and 4. In the example, there are two replicas R1 (1) and R2 (2) that execute two transactions 71 and 72, respectively. Each replica contains an application server pair (4) and database (3). The application server maintains a data cache (5). The sequence of transaction processing (6) is also shown. Each value shows its value (7) and version (8). Transaction T1 consists of the following steps: Start, Read (X), Write (X, b), Commit. Transaction T2 consists of the following steps: Start, Read (X), Write (X, c), Commit.

In the illustrated example, T1 and T2 that read and update the data X concurrently (have the same starting time stamp: S7 (71) = 0, S7 (72) = 0). When they finish the execution in their respective local replicas, their changes are propagated. The relative order of the updates is 71, 72 and there are no other concurrent conflicting transactions. When 71 is received, the validation of 71 will be successful in R1 since there is no other concurrent transaction committed. Your temporary commitment mark is obtained (C7 (7 1) = 1) and the transaction will compromise. Then, 72 will be processed. During the validation of 72 it will be determined that 71 is concurrent and has a conflict with 71 that has already been committed (S7 (72) <C7 (71)). Therefore, 72 will abort. In the R2 replica the transactions will be validated in the same order. However, during the validation of 71 it will be found that there is a bolt on X possessed by 72. 72 will be aborted and 71 will compromise. Thus, when the changes of 72 are received, the validation will fail. In this way, the two replicas will compromise the same transactions.

Exhibition of an embodiment of the invention

An embodiment of the invention is presented for the case in which the data store provides snapshot isolation [Berenson95] isolation, both for the replicated and the non-replicated system. The proposed implementation uses time stamps to determine when transactions begin and end, as well as to label the different versions of the data. The implementation uses locks (locking) to detect conflicts between concurrent transactions early.

In the replicated model, multiple instances or replicas of each database application server pair are maintained. The embodiment of the invention is explained in two steps. In the first step it is detailed how the local processing of a request is carried out in the replica that it receives to guarantee the coherence at the local level, and in the second step how the remote processing is performed in the rest of the replicas to guarantee the coherence to global level.

The local processing of the transactions is detailed below. Both the application server and the database have their own transaction management system. The application server maintains the relationship between the transactions of both systems, that is, when a transaction is initiated in the application server, a transaction is also initiated in the database. The application server also maintains the relationship between client identifier and transaction. Each transaction of the application server is associated with a start time stamp, when it starts on the application server and a commitment time mark when the transaction commits. The commitment mark MC (T) of the transaction T is an increasing number that reflects the snapshots through which the data progresses (that is, the number of transactions committed). The MI (T) start mark of a transaction T is the largest commitment mark at the start time of T. That is, it represents the last compromised transaction T and indicates that if T reads an updated (modified, created or deleted data) ) by T, you must read the values updated by T. S / the data was not updated by T, T must read the value updated by the transaction with the highest commitment mark. Initially, all replicas begin with the same time commitment mark (which is associated as a start time mark), eg 0. The proposed method guarantees that all update transactions (modify, create or delete data) will receive the same commitment mark on all replicas. Those that have conflict (update a common data) in the system will be compromised in the database in all replicas in the same relative order. The application server maintains a cache or copies some or all of the data in the database. Initially there is no data in the application server (cache) and the data accessed by a transaction is read from the database. Each data X in the cache is labeled with a time stamp, / ^' , being called version / of the data X (Xi). i indicates the time commitment mark of the transaction that updated and committed said version. Since it is assumed that the database does not provide information about the internal versioning of the data, when an X data is read from the database its version number is unknown. For this reason, an X data read from the database is labeled with a special version to indicate that it is unknown. Hereinafter, the version -1 identifier will be used to indicate that the version of the data is unknown.

When a transaction reads an X data, it first checks if the transaction has updated that data, in that case, it will read what the transaction updated (as will be detailed later it will be accessible in the private version of the data of said transaction). If you have not updated that data, search X in the cache. The version of data X that a transaction T will access is the last compromised version at the moment the transaction was initiated. That is, the version of the data that has a version /, such that / <= Ml (T) and there is no version j such that Xj: / <j <= Ml (T). If an X data is not cached, then it is read from the database. Given that when a transaction is initiated on the application server, a transaction is initiated in the database, and that the database provides snapshot isolation, the database will return the appropriate version of X. This process guarantees that each transaction look at a photo (or snapshot) of the database in the state it was in when the transaction began, so that it satisfies the snapshot isolation.

Under snapshot isolation, when two concurrent transactions update the same data, only one of them can compromise and the other has to abort. This type of conflict can be detected or how much they occur or when it compromises the second of the transactions or at any intermediate time. The method is described below using early conflict detection using locks. A Ti transaction to update an X data must obtain a lock on it before updating it. This lock will prevent other concurrent transactions from updating X. If another local transaction (customer transactions connected to that replica) requests a bolt on X, it will remain locked and therefore cannot update X until it is unlocked. Once the bolt has been obtained, it is checked if the major version of the data existing in the cache, Xj, is greater than the MI (Ti) start mark of 77, then this version was created by a concurrent transaction that has committed and You must abort (a concurrent transaction has updated that data and has committed). In another case, Ti can perform the update, that is, create its own version. This version is private and can only be seen by the Ti transaction while 77 does not compromise. If the transaction Ti writes the same data two or more times, the second and subsequent times will access its private version directly, without performing the previous checks. This ensures that a transaction observes its own updates and prevents other transactions from observing uncommitted changes. If the transaction fails the validation, it is aborted in the application server and in the database, the private versions of the updated data are discarded, and the locks are released. The first transaction that is waiting for each lock is unlocked and the lock is granted. After the completion of the transaction (commitment or abortion), all locks are released.

When a transaction that only carries out readings, commits, the transaction is committed in the application server and in the database and the result is returned to the client. These transactions do not require coordination with the other replicas. For transactions that perform updates, in the compromise of the transaction on the application server, it propagates the updated data to the other replicas. Updates are propagated to all replicas atomic (updates are received by all active replicas or none if the replica that sent the message has been dropped) and in total order (all replicas associate the same order regarding updates ). All replicas will validate the updates in the same relative order. The processing of these updates (validation) is deterministic (it behaves like a state machine) and will have the same result in all replicas. The validation and processing of the updates, in principle, is sequential, although for transactions that are not conflicting with each other they can be validated and processed in parallel. A transaction Ti does not pass the validation, if there is a transaction Tj in the system (set of replicas) that is concurrent with Ti, it has committed (MI (T /) <MC [Tj) <MC (Ti)) and Tj updated some data in common. Transactions whose spread of changes reaches Ia Replica in which they are local without having been aborted, they will always pass the validation. The details of the validation for remote transactions will be described in the next section.

When an update transaction exceeds the validation, a temporary commitment mark is assigned that is increased with each compromised transaction. In the replica in which the transaction is local, each updated data (the private version of the transaction) is labeled with the time stamp of the transaction commitment and the version is made public, becoming part of the cache. The release of the locks will cause the abortion of the non-validated local transactions that were waiting for these locks of the compromised transaction. Finally, the transaction in the database is compromised. The commitment of the database automatically propagates the updates of the data to the database. It should be noted that the versions of the updated data are kept in memory, in the cache until it is determined that they are unnecessary and deleted.

Because the use of locks can lead to interlocks, a mechanism for the detection and resolution of interlocks is necessary (eg based on the detection of cycles in a waiting graph). Interlocks, in any case, will be rare since they will only imply conflicts between writes.

The remote processing is described below. As mentioned in the local processing, when a transaction that has made updates has been processed locally, its updates are sent to all replicas including the local replica (in which the transaction has been executed). This local transaction is a remote transaction in the rest of the replicas. Since all replicas can execute update transactions, each replica must validate (detect possible conflicts between concurrent transactions that update the same data) to ensure that snapshot isolation is provided globally (of all replicas). Thanks to the changes associated with the same relative order in all replicas, they perform the validation in this order and commit the same transactions in the same order. In this way, the status of the cache and the database remains consistent among all replicas. That is, all instances of the databases have the same data with the same values and the caches have the necessary versions to provide cache and replication transparency. Transparency of replication means that for any replicated execution, there is an equivalent non-replicated execution in the system, in which the clients see the same results and the databases remain in the same final states. Cache transparency means that customers see the same results they would see if the system did not keep any data in the cache and always read from the database.

The first step in a remote replication (which has not executed the transaction locally) after receiving the message with the propagation of changes in a transaction is the validation. As with local transactions, it is verified that no concurrent transactions have been committed that have updated the same data (that is, that they have write conflicts). The validation of the transaction verifies that it has no conflicts with any other concurrent transaction that has been validated successfully (and therefore committed) previously (that is, transactions that were not known before the propagation of the updates of the transaction that was is validating). If the validation is unsuccessful, the reply discards the changes received. If the validation is unsuccessful, the reply discards the changes received. If the validation is successful, a temporary commitment mark is assigned to the transaction, as is the case with local transactions and a transaction is created in the database. For each updated data, it is checked whether there is a lock possessed by an unvalidated transaction (local and non-committed) if this occurs, the local transaction is aborted (it is concurrent, it has a write conflict and has not yet been validated). Next, a version of each data that has been updated is created, labeled with the time commitment mark and added to the cache (the local cache is updated). The remaining steps are the same as for local transactions, where the transaction of the database compromises, the commitment counter is increased and the transaction is stored in the list of committed transactions. If a remote transaction fails the validation, no further action is necessary except to discard the message.

The creation and deletion of data is also treated in the proposed method. When a new X data is created, its private version is created as an update, but no previous version exists. The lock is also requested on the X data to prevent other concurrent transactions from creating the same X data. The bolts are associated to the data keys (which coincide with the database key). When the transaction commits, the data is inserted in the database, the version becomes public and is available for other transactions that begin after the transaction that I believe committed.

Deletions are treated by creating a grave version of the data. 'The grave version is also a private version of the transaction until it compromises. If the transaction tries to access the data you will not find it since the grave version indicates that the data no longer exists. When a transaction compromises, the data will be deleted, and the grave version will be made public. It should be noted that even after the compromise of the transaction the previous versions of the deleted data cannot be deleted, since there may be active transactions that are associated with said versions (all transactions that began before the one that deleted the compromised data), They can read the data and thus satisfy the snapshot isolation.

Because the memory (cache) is limited, all the versions that are produced over time cannot be kept in the cache (in memory or even hibernated on disk). Therefore, they are removed from the cache when they are no longer needed to free up space.

This task is responsible for the mechanism of elimination of unnecessary copies or garbage collection. In order to carry out the garbage collection, each application server of each replica propagates the oldest starting time mark (of less value) of the active transactions (not compromised) that exist in the replica to the rest of the replicas. To reduce the cost, this propagation can be done within the propagation messages of transaction updates. Each replica keeps an updated vector with the oldest start time stamp of each replica. Each replica will remove the versions of the data older than the oldest mark (of lesser value) from among those of all the replicas. Versions labeled with version -1 require different processing. If a version of a data Xi (i other than -1) is no longer needed (there is no active transaction that has a start time stamp less than /), then version -1 of the data is also not required. If the cache is full with versions that cannot be discarded, they can always be hibernated (stored in persistent storage other than the database as in the hibernation used by JEE application servers to eject data from the cache). The data, hibernados can be brought to memory at any time by the application server (all versions of the data).

The proposed procedure for the centralized model is a simplification of the local processing proposed for the replicated model. In the centralized model in the absence of remote transactions, the propagation of updates or the global validation phase is not necessary to verify conflicts with remote transactions. Moreover, the centralized model follows the same steps as the local processing of the replicated model.

Industrial application

The invention is applicable in the industrial sector of multilayer information systems. A representative example of these are the application servers that are used in combination with databases, providing two layers on which the invention can be applied. Application servers generally maintain a cache of data from the database to increase their efficiency. Currently, application servers guarantee seriality. With the application of the present invention (centralized model) they can offer snapshot isolation.

Within the same field of application servers there are currently replication solutions (also known as clustering) to provide availability and increase scalability. Serial scalability is very limited due to conflicts between readings and writes. The replicated model of the present invention allows to avoid the conflicts of readings written by what is more scalable and providing a snapshot isolation very close to the seriality,

References

[Adya99] Adya, A. 1999 Weak Consistency: a Generalized Theory and Optimistic Implementations for Distríbuted Transactions. PhD Thesis Massachusetts Institute of Technology. [Berenson95] Berenson, H., Bernstein, P., Gray, J., Melton, J., O'Neil, E., and O'Neil, P. 1995. A critique of ANSI SQL isolation levéis. In Proceedings of the 1995 ACM SIGMOD international Conference on Management of Data (San José, California, United States, May 22-25, 1995). M. Carey and D. Schneider, Eds. SIGMOD '95. ACM, New York, NY, 1-10.

[Bemstein87] Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman: Concurrency Control and Recovery in Datábase Systems. Addison-Wesley 1987 [Bressoud98] Thomas C. Bressoud, John E. Ahern, Kenneth P. Birman, Robert CB Cooper, Bradford B. Glade, Fred B. Schneider, John D. Service. Transparent fault tolerant computer system. United States Patents 5802265 (1998) and 5968185 (1999). [Jacobs04] Lawrence Jacobs, Xiang Liu, Shehzaad Nakhoda, Zheng Zeng, Rajiv Mishra. Multi-version data caching. United States Patent 6785769. [JEE] JSR-000244 Java ™ Platform, Enterprise Edition 5 Specif cation. Java Community Process ^SM . 8 May 2006. [JimenezOO] Ricardo Jiménez-Peris, Marta Patino-Martínez, Sergio Arévalo: Deterministic Scheduling for Transactional Multithreaded Replicas. SRDS 2000: 164-173.

[KemmeOO] Bettina Kemme, Gustavo Alonso: Don't Be Lazy, Be Consistent: Postgres- R, A New Way to Implement Datábase Replication. In Proceedings of VLDB Conf. 2000. pp. 134-143.

[LinO5] Yi Lin, Bettina Kemme, Marta Patino-Martínez, Ricardo Jiménez-Peris:

Middleware based Data Replication providing Snapshot Isolation. SIGMOD Conference

2005: 419-430

[MattisOl] Peter Mattis, John Plevyak, Adam Beguelin, Brian Totty, David Gourley, Matthew Haines. Garbage collection in an object cache. United States Patent 6209003. 2001.

[Moser03] Moser, Louise, E ;; Melliar-Smith, Peter, M. Transparent consistent active replication of multithreaded application programs. European Software Patent EP1495414. 2003. [Moser03b] Moser, Louise, E .; Melliar-Smith, Peter, M. Transparent consistent semi-active and passive replication of multithreaded application programs. US Patent (WO / 2003/084116).

[NarasimhanO2] Priya Narasimhan, Louise E. Moser, PM Melliar-Smith: Eternal - a component _τ based framework for transparent fault-tolerant CORBA. Softw., Pract. Exper. 32 (8): 771-788 (2002) [PatiñoOδ] Marta Patino-Martínez, Ricardo Jíménez-Peris, Bettina Kemme, Gustavo Alonso: MIDDLE-R: Consistent datábase replication at the middleware level. ACM Trans. Comput Syst 23 (4): 375-423 (2005)

[PlattnerO4] Christian Plattner and Gustavo Alonso. Ganymed: Scalable Replication for Transactional Web Applications. In Proceedings of ACM / IFIP / USENIX 5th International Middleware Conference, Toronto, Won, October 18-22, 2004. [Vigna95] Del Vigna, Jr .; Paul System and method for providing a fault tolerant computer program runtime support environment. US Patent 5621885. 1995. [Wu04] Huaigu Wu, Bettina Kemme, Vanee Maverick: Eager Replication for Stateful J2EE Servers. In proceedings of CoopIS / DOA / ODBASE (2) 2004: 1376-1394.

[ZhaoOS] Wenbing Zhao, Louise E. Moser, P. M. Melliar-Smith: Unification of Transactions and Replication in Three-Tier Architectures Based on CORBA. IEEE Trans. Dependable Sec. Comput. 2 (1): 20-33 (2005).

Claims

1.- Cache system for application servers in multi-layer transactional systems characterized in that it comprises, at least, a cache that keeps a copy of the data stored in the database layer, said transactional or non-transactional layer, and where, In addition, at least one version of each data is maintained, or information sufficient for the generation of said data, according to the values taken by them in each compromised transaction, and where, in addition, said database layer provides snapshot isolation. or generalized version of this as PL2 +.

2. Data management procedure for the system described in claim 1 characterized in that it comprises at least the steps of: a. a first stage of providing each transaction that executes an image of the database with the same content it had at the time the transaction was initiated and b. a single stage of enabling the commitment of transactions that satisfy the following condition: there is no compromised transaction that:> i. is concurrent with the transaction to be committed and ii. Modify common data.

3. Method according to claim 2 characterized in that to eliminate an unnecessary version of the cache performs the following steps: a. if there are transactions in execution, for each data, only versions of the data produced by a compromised transaction are deleted when there is a later version of the data in the cache created by a compromised transaction and all the transactions in execution were initiated after that transaction was committed and, b. if there are no transactions in execution, all versions are eliminated except the most recent one.

4. Data management procedure according to claims 2 and 3 characterized in that the cache ejects (hibernates) data to a data repository different from the database layer.

5.- Procedure for managing replicated data for the system described in claim 1 characterized in that: a. replicate the system taking as a replication unit as application-database server pairs, b. execute each transaction locally on any replica of the system, c. maintain the same data and values in all databases of each replica, d. provide each transaction that executes an image of the database with the same content that it had at the time of initiation

The transaction, e. and by only allowing the commitment of the transactions that satisfy the following condition: there is no compromised transaction that 1) is concurrent with the transaction to be committed and 2) modifies common data.

6. Method according to claim 5 characterized in that to eliminate an unnecessary version of the cache, perform the following steps; to. if there are transactions in execution, for each data, only versions of the data produced by a compromised transaction are deleted when there is a later version of the data in the cache created by a compromised transaction and all the transactions in execution were initiated after that transaction was committed and b. if there are no transactions in execution, all versions are eliminated except the most recent one.

7. Data management procedure according to claims 5 and 6, characterized in that the cache ejects (hibernates) data to a data repository different from the database layer.

8. Method according to claims 5 to 7, characterized in that it can be combined with a session replication procedure.