[go: up one dir, main page]

US20020023129A1 - Method and system for efficiently coordinating commit processing in a parallel or distributed database system - Google Patents

Method and system for efficiently coordinating commit processing in a parallel or distributed database system Download PDF

Info

Publication number
US20020023129A1
US20020023129A1 US09/120,379 US12037998A US2002023129A1 US 20020023129 A1 US20020023129 A1 US 20020023129A1 US 12037998 A US12037998 A US 12037998A US 2002023129 A1 US2002023129 A1 US 2002023129A1
Authority
US
United States
Prior art keywords
node
transaction
nodes
list
participating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/120,379
Other versions
US6438582B1 (en
Inventor
Hui-I Hsiao
Amy Chang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US09/120,379 priority Critical patent/US6438582B1/en
Assigned to IBM CORPORATION reassignment IBM CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHANG, AMY, HSIAO, HUI-I
Publication of US20020023129A1 publication Critical patent/US20020023129A1/en
Application granted granted Critical
Publication of US6438582B1 publication Critical patent/US6438582B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1474Saving, restoring, recovering or retrying in transactions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99938Concurrency, e.g. lock management in shared database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99951File or database maintenance
    • Y10S707/99952Coherency, e.g. same view to multiple users
    • Y10S707/99953Recoverability

Definitions

  • This invention relates to parallel or distributed data processing systems and more particularly to a mechanism which avoids duplication of messages when processing transaction commit and rollback procedures.
  • a transaction which has been started on an originating, or coordinating, node will be sent to associated remote nodes, referred to as participant nodes.
  • a remote node, R will be added to the coordinating node's participant node list when a request is sent to node R on behalf of the transaction.
  • Standard procedure on a coordinating node is to maintain one participant list, which is continually updated as needed. Since more than one transaction may be executing in sequence, and less than all of the nodes on the system may be participating in each transaction, communications regarding a given transaction which are sent to all nodes on the participant list may arrive at nodes which are not participating in the transaction.
  • the remote participant node R may send a request on to another remote node, N.
  • Node N is then added to a transaction participant list at node R.
  • the coordinating node C does not have any awareness of node N. Therefore, should node C need to communicate information, such as a transaction rollback request, to all nodes which have participated in the transaction, node C will only have sufficient information for contacting those nodes in the node C participant node list.
  • node C would provide the rollback request to node R, and node R would then be responsible for passing the rollback request on to any nodes, N, which are on its participant list for the transaction, as illustrated in FIG. 1.
  • FIG. 1 shows that any given “subordinate” node, such as node 4 which has transaction ties to nodes 1 , 2 and 5 , could receive a given message generated at the coordinating node three times (one each from nodes 1 , 2 and 5 ).
  • participant node R could provide its participant list to coordinating node C in response to a message, with node C then resending the message to all participant nodes.
  • each of nodes 1 , 2 and 5 could receive the message two times, while node 4 could receive the message six times.
  • the sending of redundant messages results in unnecessary bandwidth consumption and unnecessary duplication of message processing at both the coordinating node and the remote participant nodes.
  • participant node lists are maintained for each individual transaction.
  • local nodes maintain transaction participant node lists and provide same to the transaction coordinating nodes under certain circumstances.
  • the coordinating nodes merge transaction participant node lists and selectively utilize the lists to minimize message duplication.
  • connection node lists for each application are maintained at coordinating nodes and are utilized for application rollback procedures.
  • FIG. 1 illustrates a prior art process flow for transaction operations conducted by a coordinating node of a parallel or a distributed transaction processing system.
  • FIGS. 2 a through 2 c illustrate commit processing when all participant nodes vote to commit in accordance with the present invention.
  • FIGS. 3 a and 3 b illustrate commit processing when one participant node votes read only while all others vote to commit in accordance with the present invention.
  • FIGS. 4 a and 4 b illustrate one implementation of abort processing in accordance with the present invention.
  • FIGS. 5 a and 5 b illustrate transaction rollback processing in accordance with the present invention.
  • each transaction may involve only a subset of the nodes configured on the system
  • a participant node list be maintained for each transaction, which list identifies the subset of nodes which are participating in the current transaction.
  • the transaction commits, or is rolled back, only the nodes in the participant node list for the current transaction need to be informed of the transaction outcome and need to take part in the processing of that outcome.
  • a participant node must invoke the assistance of another node for transaction processing, that participant node will maintain its own transaction participant list, which will be provided to the coordinating node under certain circumstances, as further discussed below.
  • a coordinating node will also maintain a connection node list comprising the union of all the participant node lists for an application.
  • the connection node list will be used to identify all of the nodes with which an application (and not simply a single transaction) has had contact. If the application terminates or a special transaction rollback is issued, all nodes on the connection node list will be contacted.
  • a further aspect of the invention provides that a list of the participant nodes will be stored in a node's log files along with the commit or abort record for the completed transaction at each participating node as well as at the coordinating node. In that way, if the database has to be restarted due to problems which occurred in the middle of the transaction processing, the log information can be used to resend the transaction outcome to the proper subset of nodes on the system.
  • That agent (hereinafter referred to as the “local node agent”) is responsible for ensuring that notification has been sent to other agents working on the same transaction at the same node, such that no new database update is done on behalf of the transaction after commit processing commences.
  • the local node agent for a participant node will additionally be responsible for providing the local transaction participant node list to the coordinating node for a transaction.
  • the local node agent for a coordinating node will send out a “prepare to commit” request to all nodes recorded in that transaction's participant node list.
  • nodes 1 - 3 receive the request, since the coordinating node is not yet aware of the participation of nodes 4 and 5 .
  • a parallel local node agent will be selected to serve the “prepare to commit” request.
  • node 1 responds “yes” and provides its list which includes node 4 ;
  • node 2 responds “yes” and provides its list which also includes node 4 ; and
  • node 3 responds “yes” and provides its list which includes node 5 .
  • the coordinating node then becomes aware of the participation of nodes 4 and 5 , merges its participant node lists and subsequently sends the “prepare to commit” request only to those newly-added nodes, 4 and 5 , as shown in FIG. 2 b.
  • the procedure (of the coordinating node sending the requests; receiving replies along with local participant node lists; merging participant node lists; and sending subsequent requests to newly-added participants) repeats itself until all nodes participating in the current transaction have sent back their replies to the “prepare to commit” message(s). In this way, the coordinating node can be assured that all necessary participant nodes have been contacted without need, or the inconvenience, of duplicating message transmission, receipt and processing. Since all of the participant nodes have responded “yes” to the “prepare to commit” request, FIG. 2 c illustrates all participant nodes, 15 , receiving the “commit” outcome message, and acknowledging same.
  • a node can vote “read only” if a transaction did not update the database state at that node, as described with reference to FIGS. 3 a and 3 b. Assuming, for this description, that coordinating node C is aware of all of the relevant participant nodes, a “prepare to commit” message is sent to nodes 1 - 5 from coordinating node C. In response, Node 3 responded with a “read only” message, while nodes 1 , 2 , 4 and 5 voted “yes.” After voting “read only,” the participant node can proceed to release all resources held on behalf of the transaction and forget about the transaction.
  • the coordinator when the coordinating agent receives a “read only” vote from a node, the coordinator will not send the subsequent commit or abort message to the node, as shown in FIG. 3 b wherein the commit message is sent only to nodes 1 , 2 , 4 and 5 . If all participant nodes voted “read only,” the transaction would be regarded as a local transaction and no more messages would be sent out to the participants.
  • the transaction is to be committed.
  • the coordinating node first performs commit processing locally, which includes writing a commit log record, including the most-recently updated participant node list for the transaction and a list of those participant nodes that voted “yes” to the “prepare to commit” request. Thereafter, the coordinating node sends a commit message to all participant nodes that answered “yes” at the “prepare to commit” phase and waits for commit acknowledgements from those nodes.
  • the local node agent selected to perform commit processing Upon receipt of the commit message at a participant node, the local node agent selected to perform commit processing locally writes a commit log record to signal the end of the transaction at the participant node and then sends a commit acknowledgement back to the coordinating node. Once all commit acknowledgements have been received, the coordinating node writes a forget log record, after which the transaction state can be “forgotten” at the coordinating node.
  • any node voted “no” in response to the “prepare to commit” request the transaction is to be aborted, as shown in FIGS. 4 a and 4 b.
  • the coordinating node sends the vote outcome to all nodes that answered “yes” to the “prepare to commit” request.
  • those nodes which answered “no” or “read only” do not require responses since they have already indicated that the outcome will not affect them. Therefore, as shown in FIG. 4 a, node 4 voted “no” and node 3 voted “read only”, which results in the coordinating node sending “abort” messages to only nodes 1 , 2 and 5 , as shown in FIG. 4 b.
  • each participant node Upon receiving the vote outcome, each participant node will select an agent to service the request by locally aborting the transaction. No acknowledgement is needed in the case of an abort outcome.
  • the application invokes a transaction rollback request, the transaction is to be aborted.
  • the coordinating node will send an abort message to all nodes on its participant node list, as shown in FIG. 5 a.
  • each participant node Upon receiving the abort message at the participant node, each participant node will select an agent to service the request.
  • the agent at the participant node will send an acknowledgement along with its own participant list to the coordinating node.
  • node 1 acknowledges and provides its list which includes node 4 ;
  • node 2 acknowledges and provides its list which also includes node 4 ; and
  • node 3 acknowledge and provides its list which includes node 5 .
  • the acknowledgement step is necessary since the coordinating node may not yet be aware of the nodes on the local participant node list for the transaction. Thereafter, the participant node lists are merged at the coordinating node and the rollback request sent to the newly-added nodes, 4 and 5 , as shown in FIG. 5 b.
  • the coordinating node's current participant node list may not be complete, and may not include all nodes which have opened cursors, the aforementioned connection node list will be used. Every node which has participated in any transaction of an application will be provided with the transaction abort message. In addition, the coordinating node will await receipt of additional participant node lists in response to the abort message, update its lists, send the message to newly-added participant nodes, etc. until all nodes involved in the application have been notified.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and system wherein participant node lists are maintained for each individual transaction. In addition, local nodes maintain participant node lists and provide same to coordinating nodes under certain circumstances. The coordinating nodes merge participant node lists and selectively utilize the lists to minimize message duplication. Further, connection node lists for each application are maintained at coordinating nodes and are utilized for application rollback procedures.

Description

    FIELD OF THE INVENTION
  • This invention relates to parallel or distributed data processing systems and more particularly to a mechanism which avoids duplication of messages when processing transaction commit and rollback procedures. [0001]
  • BACKGROUND OF THE INVENTION
  • In a multi-node environment, a transaction which has been started on an originating, or coordinating, node will be sent to associated remote nodes, referred to as participant nodes. After the transaction has started at the coordinating node, C, a remote node, R, will be added to the coordinating node's participant node list when a request is sent to node R on behalf of the transaction. Standard procedure on a coordinating node is to maintain one participant list, which is continually updated as needed. Since more than one transaction may be executing in sequence, and less than all of the nodes on the system may be participating in each transaction, communications regarding a given transaction which are sent to all nodes on the participant list may arrive at nodes which are not participating in the transaction. Furthermore, on behalf of the same transaction, the remote participant node R may send a request on to another remote node, N. Node N is then added to a transaction participant list at node R. It is possible, however, that the coordinating node C does not have any awareness of node N. Therefore, should node C need to communicate information, such as a transaction rollback request, to all nodes which have participated in the transaction, node C will only have sufficient information for contacting those nodes in the node C participant node list. In the prior art, node C would provide the rollback request to node R, and node R would then be responsible for passing the rollback request on to any nodes, N, which are on its participant list for the transaction, as illustrated in FIG. 1. FIG. 1 shows that any given “subordinate” node, such as [0002] node 4 which has transaction ties to nodes 1, 2 and 5, could receive a given message generated at the coordinating node three times (one each from nodes 1, 2 and 5).
  • In the alternative, participant node R could provide its participant list to coordinating node C in response to a message, with node C then resending the message to all participant nodes. In such as scenario, each of [0003] nodes 1, 2 and 5, could receive the message two times, while node 4 could receive the message six times. The sending of redundant messages results in unnecessary bandwidth consumption and unnecessary duplication of message processing at both the coordinating node and the remote participant nodes.
  • It is therefore an objective of the present invention to provide a mechanism by which, in parallel or distributed processing systems, transaction information can be most efficiently communicated to relevant nodes. [0004]
  • SUMMARY OF THE INVENTION
  • This and other objectives are realized by the present invention wherein participant node lists are maintained for each individual transaction. In addition, local nodes maintain transaction participant node lists and provide same to the transaction coordinating nodes under certain circumstances. The coordinating nodes merge transaction participant node lists and selectively utilize the lists to minimize message duplication. Further, connection node lists for each application are maintained at coordinating nodes and are utilized for application rollback procedures.[0005]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The invention will now be detailed with specific reference to the appended drawings wherein: [0006]
  • FIG. 1 illustrates a prior art process flow for transaction operations conducted by a coordinating node of a parallel or a distributed transaction processing system. [0007]
  • FIGS. 2[0008] a through 2 c illustrate commit processing when all participant nodes vote to commit in accordance with the present invention.
  • FIGS. 3[0009] a and 3 b illustrate commit processing when one participant node votes read only while all others vote to commit in accordance with the present invention.
  • FIGS. 4[0010] a and 4 b illustrate one implementation of abort processing in accordance with the present invention.
  • FIGS. 5[0011] a and 5 b illustrate transaction rollback processing in accordance with the present invention.
  • DESCRIPTION OF THE PREFERRED EMBODIMENT
  • Since more than one transaction may be executed in sequence, and each transaction may involve only a subset of the nodes configured on the system, it is preferable that a participant node list be maintained for each transaction, which list identifies the subset of nodes which are participating in the current transaction. When the transaction commits, or is rolled back, only the nodes in the participant node list for the current transaction need to be informed of the transaction outcome and need to take part in the processing of that outcome. In addition, if a participant node must invoke the assistance of another node for transaction processing, that participant node will maintain its own transaction participant list, which will be provided to the coordinating node under certain circumstances, as further discussed below. [0012]
  • A coordinating node will also maintain a connection node list comprising the union of all the participant node lists for an application. The connection node list will be used to identify all of the nodes with which an application (and not simply a single transaction) has had contact. If the application terminates or a special transaction rollback is issued, all nodes on the connection node list will be contacted. [0013]
  • A further aspect of the invention provides that a list of the participant nodes will be stored in a node's log files along with the commit or abort record for the completed transaction at each participating node as well as at the coordinating node. In that way, if the database has to be restarted due to problems which occurred in the middle of the transaction processing, the log information can be used to resend the transaction outcome to the proper subset of nodes on the system. [0014]
  • A detailed description of transaction processing involving the aforementioned lists will now be detailed with reference to FIGS. 2[0015] a through 5 b. After a transaction has started on node C, and as soon as a request is sent to a remote node R on behalf of the transaction, that remote node is added to the transaction's participant node list maintained at node C. At node R, if a request is sent to a remote node, N, on behalf of the transaction, node N is placed on a transaction participant node list which is maintained at node R. It is to be noted that, while there may be multiple agents at any given node which are concurrently working on behalf of the same transaction, only one agent per transaction per node is in charge of the commit processing. That agent (hereinafter referred to as the “local node agent”) is responsible for ensuring that notification has been sent to other agents working on the same transaction at the same node, such that no new database update is done on behalf of the transaction after commit processing commences. In addition, under certain circumstances, the local node agent for a participant node will additionally be responsible for providing the local transaction participant node list to the coordinating node for a transaction.
  • For a commit request which has been received by an application, as illustrated in FIGS. 2[0016] a through 2 c, the local node agent for a coordinating node will send out a “prepare to commit” request to all nodes recorded in that transaction's participant node list. As illustrated, nodes 1-3 receive the request, since the coordinating node is not yet aware of the participation of nodes 4 and 5. At each remote node R (1-3) which is on the list and receives the request, a parallel local node agent will be selected to serve the “prepare to commit” request. If the local transaction state is “ready to commit,” the local node agent will force a prepare log to disk and then send a reply message (in this case, “yes”) to the coordinating node. The local transaction participant node list maintained at each participant node will be piggybacked onto the reply message which is being sent to the coordinating node. In the illustrated case, node 1 responds “yes” and provides its list which includes node 4; node 2 responds “yes” and provides its list which also includes node 4; and node 3 responds “yes” and provides its list which includes node 5.
  • At the coordinating node, all transaction participant node lists received from participant nodes will be merged, thereby creating an updated transaction participant node list for the transaction. Since it is necessary for the coordinating node to obtain unanimous consent of all participant nodes prior to committing a transaction, and the coordinating node has become aware of additional participant nodes for the current transaction, another “prepare to commit” message must be sent. The coordinating node will send the subsequent “prepare to commit” message to only those nodes which have been added to the updated transaction participant node list for the transaction. In that way, duplicate requests will not be processed by the nodes from the original participant node list, which had already received the first “prepare to commit” request. As illustrated, the coordinating node then becomes aware of the participation of [0017] nodes 4 and 5, merges its participant node lists and subsequently sends the “prepare to commit” request only to those newly-added nodes, 4 and 5, as shown in FIG. 2b. The procedure (of the coordinating node sending the requests; receiving replies along with local participant node lists; merging participant node lists; and sending subsequent requests to newly-added participants) repeats itself until all nodes participating in the current transaction have sent back their replies to the “prepare to commit” message(s). In this way, the coordinating node can be assured that all necessary participant nodes have been contacted without need, or the inconvenience, of duplicating message transmission, receipt and processing. Since all of the participant nodes have responded “yes” to the “prepare to commit” request, FIG. 2c illustrates all participant nodes, 15, receiving the “commit” outcome message, and acknowledging same.
  • As an optimization, a node can vote “read only” if a transaction did not update the database state at that node, as described with reference to FIGS. 3[0018] a and 3 b. Assuming, for this description, that coordinating node C is aware of all of the relevant participant nodes, a “prepare to commit” message is sent to nodes 1-5 from coordinating node C. In response, Node 3 responded with a “read only” message, while nodes 1, 2, 4 and 5 voted “yes.” After voting “read only,” the participant node can proceed to release all resources held on behalf of the transaction and forget about the transaction. Similarly, when the coordinating agent receives a “read only” vote from a node, the coordinator will not send the subsequent commit or abort message to the node, as shown in FIG. 3b wherein the commit message is sent only to nodes 1, 2, 4 and 5. If all participant nodes voted “read only,” the transaction would be regarded as a local transaction and no more messages would be sent out to the participants.
  • If at least one participant node voted “yes” and no nodes voted “no,” the transaction is to be committed. The coordinating node first performs commit processing locally, which includes writing a commit log record, including the most-recently updated participant node list for the transaction and a list of those participant nodes that voted “yes” to the “prepare to commit” request. Thereafter, the coordinating node sends a commit message to all participant nodes that answered “yes” at the “prepare to commit” phase and waits for commit acknowledgements from those nodes. [0019]
  • Upon receipt of the commit message at a participant node, the local node agent selected to perform commit processing locally writes a commit log record to signal the end of the transaction at the participant node and then sends a commit acknowledgement back to the coordinating node. Once all commit acknowledgements have been received, the coordinating node writes a forget log record, after which the transaction state can be “forgotten” at the coordinating node. [0020]
  • If any node voted “no” in response to the “prepare to commit” request, the transaction is to be aborted, as shown in FIGS. 4[0021] a and 4 b. On normal transactions, the coordinating node sends the vote outcome to all nodes that answered “yes” to the “prepare to commit” request. As will be apparent, those nodes which answered “no” or “read only” do not require responses since they have already indicated that the outcome will not affect them. Therefore, as shown in FIG. 4a, node 4 voted “no” and node 3 voted “read only”, which results in the coordinating node sending “abort” messages to only nodes 1, 2 and 5, as shown in FIG. 4b. Upon receiving the vote outcome, each participant node will select an agent to service the request by locally aborting the transaction. No acknowledgement is needed in the case of an abort outcome.
  • If the application invokes a transaction rollback request, the transaction is to be aborted. The coordinating node will send an abort message to all nodes on its participant node list, as shown in FIG. 5[0022] a. Upon receiving the abort message at the participant node, each participant node will select an agent to service the request. In the case of a rollback request, the agent at the participant node will send an acknowledgement along with its own participant list to the coordinating node. In the illustrated embodiment node 1 acknowledges and provides its list which includes node 4; node 2 acknowledges and provides its list which also includes node 4; and node 3 acknowledge and provides its list which includes node 5. As above, the acknowledgement step is necessary since the coordinating node may not yet be aware of the nodes on the local participant node list for the transaction. Thereafter, the participant node lists are merged at the coordinating node and the rollback request sent to the newly-added nodes, 4 and 5, as shown in FIG. 5b.
  • On special transactions, where some cursors may remain open across transaction boundaries, the cursors will need to be closed in the case wherein the transaction is to be aborted. Since the coordinating node's current participant node list may not be complete, and may not include all nodes which have opened cursors, the aforementioned connection node list will be used. Every node which has participated in any transaction of an application will be provided with the transaction abort message. In addition, the coordinating node will await receipt of additional participant node lists in response to the abort message, update its lists, send the message to newly-added participant nodes, etc. until all nodes involved in the application have been notified. [0023]
  • Finally, if a transaction is to be rolled back due to a major system error, such as a remote node failure, then the abort message will be sent to all nodes on the system. The reason behind this action is that the coordinating node may not have the participant node list from the node that failed. Therefore, to ensure that all relevant participant nodes are informed of the transaction abort message, and that no nodes are left holding the transaction open, all nodes in the system are notified. [0024]
  • The invention has been described with reference to several specific embodiments. One having skill in the relevant art will recognize that modifications may be made without departing from the spirit and scope of the invention as set forth in the appended claims.[0025]

Claims (19)

Having thus described our invention, what we claim as new and desire to secure by Letters Patent is:
1. A method for a coordinating node to execute transactions in a multi-node system comprising the steps of:
maintaining a transaction participant list for all nodes participating in each of said transactions; and
communicating with only those nodes on said transaction participant node list for a given one of said transactions.
2. The method of claim 1 further comprising the step of maintaining a connection node list for all nodes participating in an application.
3. The method of claim 2 further comprising the step of communicating with all nodes on said connection node list to abort a transaction.
4. A method for a node to participate in transactions originating at a coordinating node in a multi-node system comprising the steps of:
maintaining a transaction participant list for all nodes participating with said node for each of said transactions; and
communicating said transaction participant node list for a given one of said transactions in response to a transaction request from said coordinating node.
5. A method for a participant node to respond to a transaction request from a coordinating node comprising the steps of:
receiving a transaction request from said coordinating node;
preparing a response message including a response to said request and said transaction participant node list; and
sending said response message to said coordinating node.
6. A method for a coordinating node to execute a transaction in a multi-node system comprising the steps of:
(a) issuing a first request to at least one participating node in said system;
(b) storing the identity of said at least one participating node on a first transaction participant node list;
(c) receiving a response message to said request from said at least one participating node, said response message including a response and at least one additional transaction participant node list; and
(d) storing said at least one additional transaction node list.
7. The method of claim 6 further comprising the step of:
(e) merging said first and said at least one additional transaction participant node lists to produce an updated transaction participant list.
8. The method of claim 7 further comprising the steps of:
(f) identifying at least one node on said updated transaction participant list which had not been on said first transaction participant list;
(g) issuing a successive request to said at least one identified node;
(h) repeating steps (c) through (g) until no further nodes have been identified; and
(i) finalizing said transaction based on said responses.
9. The method of claim 6 wherein said finalizing comprises notifying participating nodes on said updated transaction participant list of the outcome of said request.
10. The method of claim 9 wherein said request comprises a “prepare to commit” message.
11. The method of claim 10 wherein at least one of said participating node responses comprises a “no” response and wherein said finalizing comprises notifying participating nodes on said updated transaction participant list to abort said transaction.
12. The method of claim 10 wherein at least one of said participating node responses comprises a “no” response and wherein said finalizing comprises notifying all but said at least one of said participating nodes to abort said transaction.
13. The method of claim 10 wherein all of said participating node responses comprises “yes” responses and wherein said finalizing comprises notifying participating nodes on said updated transaction participant list to commit said transaction.
14. The method of claim 6 wherein a system error has occurred and wherein said issuing a first request comprises sending an abort message to all nodes on said system.
15. The method of claim 6 wherein said request comprises a “prepare to commit” message and at least one of said participating node responses comprises a “read only” response, wherein said finalizing comprises notifying all but said at least one of said participating nodes to commit said transaction.
16. A coordinating node system for executing transactions in a multi-node system comprising:
means for maintaining a transaction participant list for all nodes participating in each of said transactions; and
communication means for communicating with only those nodes on said transaction participant node list for a given one of said transactions.
17. The system of claim 16 further comprising:
means for receiving at least one response from at least one node participating in a transaction, said response including at least an additional transaction participant list of nodes with which said participating node communicated on behalf of said transaction; and
merging means for merging said additional transaction participant list with said transaction participant list.
18. The system of claim 17 wherein said communication means comprises means for sending communications to only those nodes merged from said additional transaction participant list.
19. A node system for participating in transactions originating at a coordinating node in a multi-node system comprising:
means for maintaining a transaction participant list for all nodes participating with said node for each of said transactions; and
communication means for communicating said transaction participant node list for a given one of said transactions in response to a transaction request from said coordinating node.
US09/120,379 1998-07-21 1998-07-21 Method and system for efficiently coordinating commit processing in a parallel or distributed database system Expired - Lifetime US6438582B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/120,379 US6438582B1 (en) 1998-07-21 1998-07-21 Method and system for efficiently coordinating commit processing in a parallel or distributed database system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/120,379 US6438582B1 (en) 1998-07-21 1998-07-21 Method and system for efficiently coordinating commit processing in a parallel or distributed database system

Publications (2)

Publication Number Publication Date
US20020023129A1 true US20020023129A1 (en) 2002-02-21
US6438582B1 US6438582B1 (en) 2002-08-20

Family

ID=22389907

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/120,379 Expired - Lifetime US6438582B1 (en) 1998-07-21 1998-07-21 Method and system for efficiently coordinating commit processing in a parallel or distributed database system

Country Status (1)

Country Link
US (1) US6438582B1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030079155A1 (en) * 2001-09-21 2003-04-24 Polyserve, Inc. System and method for efficient lock recovery
US6574749B1 (en) * 1999-10-29 2003-06-03 Nortel Networks Limited Reliable distributed shared memory
US20040098719A1 (en) * 2002-11-15 2004-05-20 International Business Machines Corporation Auto-commit processing in an IMS batch application
US20060236152A1 (en) * 2005-04-14 2006-10-19 International Business Machines Corporation Method and apparatus for template based parallel checkpointing
US20080104597A1 (en) * 2003-02-28 2008-05-01 International Business Machines Corporation Restarting failed ims auto-restart batch applications
US20080133668A1 (en) * 2001-07-31 2008-06-05 International Business Machines Corporation Managing intended group membership using domains
US20110191633A1 (en) * 2010-02-01 2011-08-04 International Business Machines Corporation Parallel debugging in a massively parallel computing system
US8015430B1 (en) * 2004-06-30 2011-09-06 Symantec Operating Corporation Using asset dependencies to identify the recovery set and optionally automate and/or optimize the recovery
US20130132458A1 (en) * 2011-11-21 2013-05-23 Mark Cameron Little System and method for managing participant order in distributed transactions
US20130246845A1 (en) * 2012-03-16 2013-09-19 Oracle International Corporation Systems and methods for supporting transaction recovery based on a strict ordering of two-phase commit calls
US20140108484A1 (en) * 2012-10-10 2014-04-17 Tibero Co., Ltd. Method and system for optimizing distributed transactions
US9389905B2 (en) 2012-03-16 2016-07-12 Oracle International Corporation System and method for supporting read-only optimization in a transactional middleware environment
US9659292B1 (en) * 2001-08-30 2017-05-23 EMC IP Holding Company LLC Storage-based replication of e-commerce transactions in real time
US9760584B2 (en) 2012-03-16 2017-09-12 Oracle International Corporation Systems and methods for supporting inline delegation of middle-tier transaction logs to database
CN108415757A (en) * 2018-02-02 2018-08-17 阿里巴巴集团控股有限公司 distributed transaction processing method and device
US20220229822A1 (en) * 2017-12-07 2022-07-21 Zte Corporation Data processing method and device for distributed database, storage medium, and electronic device
US20230004576A1 (en) * 2019-12-16 2023-01-05 Zte Corporation Data synchronization method and device for databases, and storage medium

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6941360B1 (en) * 1999-02-25 2005-09-06 Oracle International Corporation Determining and registering participants in a distributed transaction in response to commencing participation in said distributed transaction
JP4005739B2 (en) * 1999-03-24 2007-11-14 株式会社東芝 Agent system, information processing method, and recording medium recording information processing software
US6745240B1 (en) * 1999-11-15 2004-06-01 Ncr Corporation Method and apparatus for configuring massively parallel systems
US7555500B2 (en) * 2001-02-15 2009-06-30 Teradata Us, Inc. Optimized end transaction processing
GB0323780D0 (en) * 2003-10-10 2003-11-12 Ibm A data brokering method and system
US20050125556A1 (en) * 2003-12-08 2005-06-09 International Business Machines Corporation Data movement management system and method for a storage area network file system employing the data management application programming interface
US7318075B2 (en) * 2004-02-06 2008-01-08 Microsoft Corporation Enhanced tabular data stream protocol
US7523204B2 (en) * 2004-06-01 2009-04-21 International Business Machines Corporation Coordinated quiesce of a distributed file system
US7640317B2 (en) * 2004-06-10 2009-12-29 Cisco Technology, Inc. Configuration commit database approach and session locking approach in a two-stage network device configuration process
EP1631023B1 (en) * 2004-08-31 2006-10-11 Opportunity Solutions A/S System for handling electronic mail in a multiple user environment
US8538931B2 (en) * 2006-04-28 2013-09-17 International Business Machines Corporation Protecting the integrity of dependent multi-tiered transactions
US20100146033A1 (en) * 2008-12-10 2010-06-10 International Business Machines Corporation Selection of transaction managers based on runtime data
US8276141B2 (en) * 2008-12-10 2012-09-25 International Business Machines Corporation Selection of transaction managers based on transaction metadata
US8346851B2 (en) * 2009-08-31 2013-01-01 Red Hat, Inc. System and method for determining when to generate subordinate coordinators on local machines
US9417906B2 (en) * 2010-04-01 2016-08-16 Red Hat, Inc. Transaction participant registration with caveats
US9201919B2 (en) 2013-05-07 2015-12-01 Red Hat, Inc. Bandwidth optimized two-phase commit protocol for distributed transactions

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5202971A (en) * 1987-02-13 1993-04-13 International Business Machines Corporation System for file and record locking between nodes in a distributed data processing environment maintaining one copy of each file lock
JPH0415840A (en) * 1990-05-10 1992-01-21 Toshiba Corp Distributed data base control device
US5390302A (en) * 1991-02-21 1995-02-14 Digital Equipment Corporation Transaction control
JP2675968B2 (en) * 1992-08-20 1997-11-12 インターナショナル・ビジネス・マシーンズ・コーポレイション Extension of subscriber distributed two-phase commit protocol
US5557748A (en) * 1995-02-03 1996-09-17 Intel Corporation Dynamic network configuration
US5872969A (en) * 1995-06-23 1999-02-16 International Business Machines Corporation System and method for efficiently synchronizing cache and persistent data in an object oriented transaction processing system
US5799305A (en) * 1995-11-02 1998-08-25 Informix Software, Inc. Method of commitment in a distributed database transaction
US5884327A (en) * 1996-09-25 1999-03-16 International Business Machines Corporation System, method and program for performing two-phase commit with a coordinator that performs no logging
US6085295A (en) * 1997-10-20 2000-07-04 International Business Machines Corporation Method of maintaining data coherency in a computer system having a plurality of interconnected nodes
US5999712A (en) * 1997-10-21 1999-12-07 Sun Microsystems, Inc. Determining cluster membership in a distributed computer system
US5958004A (en) * 1997-10-28 1999-09-28 Microsoft Corporation Disabling and enabling transaction committal in transactional application components
US6247023B1 (en) * 1998-07-21 2001-06-12 Internationl Business Machines Corp. Method for providing database recovery across multiple nodes

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6574749B1 (en) * 1999-10-29 2003-06-03 Nortel Networks Limited Reliable distributed shared memory
US20080133668A1 (en) * 2001-07-31 2008-06-05 International Business Machines Corporation Managing intended group membership using domains
US9659292B1 (en) * 2001-08-30 2017-05-23 EMC IP Holding Company LLC Storage-based replication of e-commerce transactions in real time
US7266722B2 (en) * 2001-09-21 2007-09-04 Hewlett-Packard Development Company, L.P. System and method for efficient lock recovery
US20030079155A1 (en) * 2001-09-21 2003-04-24 Polyserve, Inc. System and method for efficient lock recovery
US7703097B2 (en) 2002-11-15 2010-04-20 International Business Machines Corporation Auto-commit processing in an IMS batch application
US20040098719A1 (en) * 2002-11-15 2004-05-20 International Business Machines Corporation Auto-commit processing in an IMS batch application
US8484644B2 (en) 2003-02-28 2013-07-09 International Business Machines Corporation Auto-restart processing in an IMS batch application
US20080104597A1 (en) * 2003-02-28 2008-05-01 International Business Machines Corporation Restarting failed ims auto-restart batch applications
US20080115136A1 (en) * 2003-02-28 2008-05-15 International Business Machines Corporation Auto-restart processing in an ims batch application
US7873859B2 (en) 2003-02-28 2011-01-18 International Business Machines Corporation Restarting failed IMS auto-restart batch applications
US8015430B1 (en) * 2004-06-30 2011-09-06 Symantec Operating Corporation Using asset dependencies to identify the recovery set and optionally automate and/or optimize the recovery
US20080195892A1 (en) * 2005-04-14 2008-08-14 International Business Machines Corporation Template based parallel checkpointing in a massively parallel computer system
US7487393B2 (en) * 2005-04-14 2009-02-03 International Business Machines Corporation Template based parallel checkpointing in a massively parallel computer system
US7478278B2 (en) * 2005-04-14 2009-01-13 International Business Machines Corporation Template based parallel checkpointing in a massively parallel computer system
US20080215916A1 (en) * 2005-04-14 2008-09-04 International Business Machines Corporation Template based parallel checkpointing in a massively parallel computer system
US20080092030A1 (en) * 2005-04-14 2008-04-17 International Business Machines Corporation Method and apparatus for template based parallel checkpointing
US20060236152A1 (en) * 2005-04-14 2006-10-19 International Business Machines Corporation Method and apparatus for template based parallel checkpointing
US7627783B2 (en) * 2005-04-14 2009-12-01 International Business Machines Corporation Template based parallel checkpointing in a massively parallel computer system
US20110191633A1 (en) * 2010-02-01 2011-08-04 International Business Machines Corporation Parallel debugging in a massively parallel computing system
US8307243B2 (en) * 2010-02-01 2012-11-06 International Business Machines Corporation Parallel debugging in a massively parallel computing system
US9055065B2 (en) * 2011-11-21 2015-06-09 Red Hat, lnc. Managing participant order in distributed transactions
US20130132458A1 (en) * 2011-11-21 2013-05-23 Mark Cameron Little System and method for managing participant order in distributed transactions
US9389905B2 (en) 2012-03-16 2016-07-12 Oracle International Corporation System and method for supporting read-only optimization in a transactional middleware environment
US9760584B2 (en) 2012-03-16 2017-09-12 Oracle International Corporation Systems and methods for supporting inline delegation of middle-tier transaction logs to database
US10289443B2 (en) 2012-03-16 2019-05-14 Oracle International Corporation System and method for sharing global transaction identifier (GTRID) in a transactional middleware environment
US9405574B2 (en) 2012-03-16 2016-08-02 Oracle International Corporation System and method for transmitting complex structures based on a shared memory queue
US20130246845A1 (en) * 2012-03-16 2013-09-19 Oracle International Corporation Systems and methods for supporting transaction recovery based on a strict ordering of two-phase commit calls
US9658879B2 (en) 2012-03-16 2017-05-23 Oracle International Corporation System and method for supporting buffer allocation in a shared memory queue
US9665392B2 (en) 2012-03-16 2017-05-30 Oracle International Corporation System and method for supporting intra-node communication based on a shared memory queue
US9146944B2 (en) * 2012-03-16 2015-09-29 Oracle International Corporation Systems and methods for supporting transaction recovery based on a strict ordering of two-phase commit calls
US10133596B2 (en) 2012-03-16 2018-11-20 Oracle International Corporation System and method for supporting application interoperation in a transactional middleware environment
US20140108484A1 (en) * 2012-10-10 2014-04-17 Tibero Co., Ltd. Method and system for optimizing distributed transactions
US20220229822A1 (en) * 2017-12-07 2022-07-21 Zte Corporation Data processing method and device for distributed database, storage medium, and electronic device
US11928089B2 (en) * 2017-12-07 2024-03-12 Zte Corporation Data processing method and device for distributed database, storage medium, and electronic device
CN108415757A (en) * 2018-02-02 2018-08-17 阿里巴巴集团控股有限公司 distributed transaction processing method and device
US20230004576A1 (en) * 2019-12-16 2023-01-05 Zte Corporation Data synchronization method and device for databases, and storage medium
US11836154B2 (en) * 2019-12-16 2023-12-05 Xi'an Zhongxing New Software Co., Ltd. Data synchronization method and device for databases, and storage medium

Also Published As

Publication number Publication date
US6438582B1 (en) 2002-08-20

Similar Documents

Publication Publication Date Title
US6438582B1 (en) Method and system for efficiently coordinating commit processing in a parallel or distributed database system
CN101706811B (en) Transaction commit method of distributed database system
US8140623B2 (en) Non-blocking commit protocol systems and methods
US7900085B2 (en) Backup coordinator for distributed transactions
US8166007B2 (en) Failure tolerant transaction processing system
EP0814590A2 (en) Preventing conflicts in distributed systems
Oki et al. Viewstamped replication: A new primary copy method to support highly-available distributed systems
US6247023B1 (en) Method for providing database recovery across multiple nodes
AU711220B2 (en) Method of commitment in a distributed database transaction
US6574749B1 (en) Reliable distributed shared memory
Guerraoui et al. Consensus in asynchronous distributed systems: A concise guided tour
KR100324165B1 (en) Method and apparatus for correct and complete transactions in a fault tolerant distributed database system
US20230274358A1 (en) 24 hours global low latency computerized exchange system
US20040240444A1 (en) System and method for managing transactions in a messaging system
US6823356B1 (en) Method, system and program products for serializing replicated transactions of a distributed computing environment
JPH06168169A (en) Distributed transaction processing using two-phase commit protocol with hypothetical commit without log force
US7873604B2 (en) Batch recovery of distributed transactions
US6922792B2 (en) Fault tolerance for computer programs that operate over a communication network
CN103731465A (en) Distributed system and transaction treatment method thereof
US6286110B1 (en) Fault-tolerant transaction processing in a distributed system using explicit resource information for fault determination
CN116775325A (en) Distributed transaction processing method and system
US8479044B2 (en) Method for determining a state associated with a transaction
US20040024807A1 (en) Asynchronous updates of weakly consistent distributed state information
US8725708B2 (en) Resolving a unit of work
Al-Houmaily et al. The implicit-yes vote commit protocol with delegation of commitment

Legal Events

Date Code Title Description
AS Assignment

Owner name: IBM CORPORATION, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HSIAO, HUI-I;CHANG, AMY;REEL/FRAME:009332/0729

Effective date: 19980716

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 12

SULP Surcharge for late payment

Year of fee payment: 11