[go: up one dir, main page]

WO2015045589A1 - Fault-tolerant system and fault-tolerant system control method - Google Patents

Fault-tolerant system and fault-tolerant system control method Download PDF

Info

Publication number
WO2015045589A1
WO2015045589A1 PCT/JP2014/069305 JP2014069305W WO2015045589A1 WO 2015045589 A1 WO2015045589 A1 WO 2015045589A1 JP 2014069305 W JP2014069305 W JP 2014069305W WO 2015045589 A1 WO2015045589 A1 WO 2015045589A1
Authority
WO
WIPO (PCT)
Prior art keywords
computer
message
processing
node
processing content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2014/069305
Other languages
French (fr)
Japanese (ja)
Inventor
拓 下沢
雅徳 吉田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Publication of WO2015045589A1 publication Critical patent/WO2015045589A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/187Voting techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • G06F11/1683Temporal synchronisation or re-synchronisation of redundant processing components at instruction level
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/182Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits based on mutual exchange of the output between redundant processing components

Definitions

  • the present invention relates to a fault-tolerant system and a fault-tolerant system control method, and more specifically, in a fault-tolerant system composed of a plurality of computer nodes having different architectures and infrastructure software, etc., the performance difference between the computer nodes.
  • the present invention relates to a technology that can improve the availability of a system by reducing the influence of the overall performance due to the fluctuations and preventing excessive failure determination due to such influence.
  • a fault tolerant system (hereinafter referred to as the same kind of FT system) configured by a plurality of computer nodes using the same hardware and the same hardware, the execution time when the same task is executed in each computer node is Variations occur between computer nodes.
  • This variation stems from performance differences between computer nodes in a fault-tolerant system (hereinafter referred to as a heterogeneous FT system) in which computer nodes differing in one or more of the architecture, performance, and basic software of the central processing unit. There is a tendency to become more prominent.
  • an object of the present invention is to reduce the influence of the overall performance due to the difference in performance and variation between the computer nodes in a fault tolerant system composed of a plurality of computer nodes having different architectures and basic software, etc. It is an object of the present invention to provide a technology capable of improving availability in a system by preventing excessive failure determination.
  • the fault-tolerant system of the present invention that solves the above-described problem includes, in each of a plurality of computers forming a majority redundant configuration, a task communication unit that acquires the processing contents from a program operating inside the computer, and the majority redundant configuration.
  • a message processing unit that returns a message indicating whether or not follow-up is possible to the other computer according to the result, and the message processing unit of the computer that is the other computer is a process obtained by the task communication unit of the other computer.
  • the fault tolerant control method of the present invention is a method in which, in each of a plurality of computers forming a majority voting redundant configuration, a process for obtaining the processing contents from a program operating inside the computer, and forming the majority voting redundant configuration From another computer, a predetermined message indicating the processing content of the program operating on the other computer is received, and according to a result of the matching determination between the processing content indicated by the predetermined message and the processing content obtained by the processing, Processing for returning a followability message to another computer, and, in the other computer, in accordance with the processing content obtained from the program operating on the other computer, the target of waiting for reception of the followability message Further, a process of changing the number of computers other than the other computer is further executed.
  • the influence of the overall performance due to the performance difference and variation between the computer nodes is reduced, and this influence is also caused.
  • availability in the system is improved.
  • the same task is executed by the computer that is the leader and the computer that is the follower in the majority redundant configuration.
  • the leader sequentially sends a notification message including the task processing state to the follower, and the follower accepts or rejects it as an approval / denial message (follow-up / possibility message).
  • the leader makes a determination on the continuation of the operation corresponding to the majority redundant configuration (for example, message waiting) for the corresponding task according to the response from the follower, that is, the content of the message, the reception timing, and the message type.
  • FIG. 1 is a diagram illustrating a configuration example of a fault tolerant system according to the first embodiment.
  • the fault-tolerant system 1 shown in FIG. 1 is a fault-tolerant system composed of a plurality of computer nodes having different architectures and basic software, etc., and reduces the influence of overall performance due to performance differences and variations between computer nodes. It is a computer system that improves availability in the system by preventing excessive failure determination due to influence.
  • the fault tolerant system 1 includes a plurality of computer nodes 110.
  • the system configuration includes three computer nodes 110A, 110B, and 110C as the computer node 110.
  • the number of computer nodes constituting the fault-tolerant system 1 is not limited to “3”, and may be, for example, “2” or “4” or more.
  • one of the computer nodes 110 is a leader node and the remaining computer nodes 110 are follower nodes.
  • the roles of the leader node and the follower node need not be fixed in the computer node 110.
  • the role of each computer may be changed using an algorithm such as the computer node 110 that first transmitted the notification message as a leader node. Good.
  • the fault tolerant system 1 interrupts the process, and one computer is selected from the follower nodes.
  • Node 110 will be selected as the leader node.
  • an existing one may be adopted.
  • the computer node 110A is a leader
  • the computer node 110B and the computer node 110C are followers.
  • each computer node 110 includes a main storage device 121, one or more central processing units 120, an external communication interface 122, and an inter-node communication interface 123.
  • the external communication interface 122 is connected to one or more terminals 102 via the external network 111.
  • the inter-node communication interface 123 is connected to another computer node 110 via an inter-node network 112.
  • These networks 111 and 112 are networks configured in conformity with, for example, the Ethernet (registered trademark) standard or the Infiniband standard.
  • the main storage device 121 in the computer node 110 is a volatile storage device that can read and write data, and is accessed from the central processing unit 120.
  • the main storage device 121 is composed of, for example, a DRAM (Dynamic Random Access Memory).
  • the central processing unit 120 in the computer node 110 is a general-purpose processor, and reads main software 130, fault-tolerant middleware (hereinafter referred to as FT middleware) 131, an application 132, and the like from the auxiliary storage device 124 and the main memory.
  • the program is expanded on the device 121 and the program is executed.
  • the architecture adopted by the central processing unit 120 and the processing speed thereof may not be the same among a plurality of computer nodes 110.
  • Each computer node 110 may include two or more central processing units 120.
  • the auxiliary storage device 124 in the computer node 110 is a permanent non-volatile storage device, and stores data read / written from these programs in addition to programs such as the basic software 130, the FT middleware 131, and the application 132.
  • the auxiliary storage device 124 is configured by, for example, a hard disk, a flash memory, or the like.
  • the external communication interface 122 in the computer node 110 is an interface for the computer node 110 to communicate with the terminal 102.
  • the inter-node communication interface 123 is an interface for performing communication between the computer nodes 110. Note that the external communication interface 122 and the inter-node communication interface 123 do not have to be separate devices, and one communication interface may also serve as the external communication interface 122 and the inter-node communication interface 123.
  • the external communication interface 122 and the inter-node communication interface 123 are configured by communication interfaces corresponding to the types of the external network 111 and the inter-node network 112, respectively.
  • the application 132 held in the main storage device 121 is a program for realizing the function provided by the FT system 1.
  • the application 132 is composed of one or more tasks 133.
  • the same application 132 is executed by a plurality of computer nodes 110.
  • the application 132 may have a difference according to a difference in the architecture, infrastructure software, or the like of the computer node 110 to be executed. That is, the instruction group for the central processing unit 120 constituting the application 132 and its task 133 need not be the same among the computer nodes 110.
  • the basic software 130 held in the main storage device 121 is an operating system capable of multitask processing, and can execute a plurality of tasks 133 and FT middleware 131 in parallel.
  • the basic software 130 accesses the FT middleware 131 to the auxiliary storage device 124, functions to transmit and receive data to the external communication interface 122 and the inter-node communication interface 123, and secures and releases a partial occupied area of the main storage device 121.
  • the FT middleware 131 held in the main storage device 121 is a program for realizing a fault tolerant function in the FT system 1.
  • the FT middleware 131 provides the above-described application 132 with functions such as access to the auxiliary storage device 124, data transmission / reception with respect to the external communication interface 122, and exclusive processing with respect to a part of the main storage device 121.
  • the FT middleware 131 is implement
  • the FT middleware 131 may be an independent program, may be integrated into the application 132, or may be integrated into the basic software 130. Further, the application 132, the FT middleware 131, and the basic software 130 may serve as a single program.
  • the FT middleware 131 includes a task communication unit 140, an inter-node communication unit 141, and a message processing unit 142.
  • the task communication unit 140, the inter-node communication unit 141, and the message processing unit 142 can be said to be functions that are implemented when the central processing unit 120 of each computer node 110 executes the FT middleware 131.
  • the task communication unit 140 receives a processing request from the task 133 and transmits the request to the message processing unit 142.
  • the message processing unit 142 executes the requested processing after synchronizing with other computer nodes 110 and transmits the result to the task communication unit 140.
  • the task communication unit 140 that has obtained the above result from the message processing unit 142 informs the task 133 of the corresponding result.
  • the inter-node communication unit 141 When the message processing unit 142 communicates with another computer node 110, the inter-node communication unit 141 is requested to transmit data via the inter-node communication interface 123. On the other hand, the inter-node communication unit 141 transmits the data received from the inter-node communication interface 123 to the message processing unit 142.
  • Arbitrary means can be adopted as means for communication between the task 133 and the task communication unit 140, the task communication unit 140 and the message processing unit 142, and the message processing unit 142 and the inter-node communication unit 141.
  • communication means provided by the basic software 130 such as sockets and pipes, and the like can be used.
  • the inter-node communication unit 141 when the inter-node communication unit 141 receives data from the inter-node communication interface 123, the inter-node communication unit 141 transmits the contents to the message processing unit 142. In addition, when a request for data transmission to another computer node 110 is received from the message processing unit 142, transmission is performed using the inter-node communication interface 123 to the computer node 110 whose contents are designated.
  • the FT middleware 131 includes an approval message list 150, a notification message list 151, and a message type table 152 in addition to the processing units 140, 141, 142 described above. These retained data and their roles will be described later.
  • the message type table 152 is not necessarily required for processing. This will also be described later.
  • FIG. 2 is a diagram illustrating a relationship example between a plurality of computer nodes constituting the FT system 1 according to the first embodiment, and a relationship example of each part necessary for the FT system 1 to realize a fault tolerant function. It is the figure shown about.
  • the fault tolerant function realized by the FT system 1 is a function that can continuously provide the function of the task 133 even if a hardware or software failure of less than half of the computer nodes 110 occurs.
  • the same task 133 is operated in a plurality of computer nodes 110.
  • the processing in which operations must be matched between the computer nodes 110 hereinafter referred to as “operation matching”.
  • messages are exchanged between the computer nodes 110 so that the task 133 shows the same behavior in all the computer nodes 110.
  • each task 133 calls the FT middleware 131 at each operation matching point and requests execution of the process.
  • the FT middleware 131 compares the content with the request of the task 133 of the other computer node 110, and if it is the same for the majority of the computer nodes 110, actually executes it. As a result, even if less than half of the computer nodes 110 perform no response, response delay, or abnormal response due to hardware failure or software failure, the operation of the task 133 is continued according to the result of the other computer node 110. be able to.
  • the computer node 110 is divided into two types, the leader node 110A and the follower nodes 110B and 110C, according to roles, in order to increase the efficiency of consensus formation between the computer nodes 110 at the operation matching point.
  • the leader node 110A determines the operation, and notifies the follower nodes 110B and 110C of the operation content corresponding to the processing request by the notification message 201.
  • the follower nodes 110B and 110C collate with the processing request from their own task 133, and can follow this operation content, that is, if it is possible to realize the same behavior as the leader node 110A, the approval message 202 is impossible.
  • a reply message 203 is returned to the leader node 110A.
  • the leader node 110A treats the approval message 202 as an approval vote in the majority vote and the denial message 203 as a negative vote, and when the majority receives the approval message 202, it actually executes this.
  • the basic behavior of the leader node 110A and the follower nodes 110B and 110C accompanying such a majority vote redundant configuration is the same as that of the prior art.
  • the approval message list 150 and the notification message list 151 are used by the message processing unit 142 as follows.
  • the message processing unit 142 receives a processing request from the task communication unit 140, if the computer node 110 is a leader node, the message processing unit 142 transmits this request to another computer node, that is, a follower node.
  • the message processing unit 142 records the reception history of the approval message and the denial message in the approval message list 150 in order to manage whether or not the approval message or the denial message is received from the follower node.
  • the follower node uses a notification message list 151 that is a list of notification messages 201 in order to manage whether or not a notification message is received from the leader node.
  • the approval message list 150 is a table having values such as a notification reference 300, an approval node set 301, and a denial node set 302 as rows.
  • the notification reference 300 is used for reference to the notification message 201 managed by the row (reference may be any method such as by identifier, by memory address, or by reference destination copy, and so on). This information becomes the identification information of the notification message 201.
  • An approval node set 301 represents a set of computer nodes 110 that are follower nodes that have returned the approval message 202 in response to the notification message 201.
  • the denial node set 302 represents a set of computer nodes 110 that are follower nodes that have returned the denial message 203 to the notification message 201.
  • the notification message list 151 has zero or more notification messages 201 sent from the leader node and not responding with the approval message 202 or the denial message 203 as a list.
  • processing request refers to a data structure for transmitting a processing request from a task to the message processing unit 142 from the task communication unit 140
  • processing result refers to a task communication unit from the message processing unit 142
  • Reference numeral 140 denotes a data structure for transmitting a processing result (processing content) in response to a processing request.
  • processing request there are a request type and a request parameter corresponding to the request type.
  • processing request reference to a corresponding processing request and a processing result value that is a value representing the processing result. If the process to be referred to can be uniquely specified by the transmission method between the task communication unit 140 and the message processing unit 142, the process request reference can be omitted.
  • the request type indicates the type of request to the FT middleware 131 of the task 133, for example, reading from the auxiliary storage device 124, writing to the auxiliary storage device 124, receiving data from the external communication interface 122, or data Transmission, exclusive processing for a partial area of the main storage device 121, and the like.
  • the request parameters are values having different meanings depending on the request type, and the number thereof also differs depending on the request type. Taking the case where the request type is “read from auxiliary storage device 124” as an example, the file name on the auxiliary storage device 124, the position on the file, the position of the main storage device 121 that stores the read value, and the read The number of bytes is included in the request parameter.
  • the processing result is used to notify the task communication unit 140 from the message processing unit 142 of the processing result of the processing request notified so far from the task communication unit 140 to the message processing unit 142.
  • the processing result value is a value having a different meaning depending on the request type of the processing request referred to by the processing request reference, and the number thereof also differs depending on the request type. Note that this processing result value does not need to express all the results for the request. For example, if the processing request has a side effect such as receiving data from the external communication interface 122 and copying to the designated main storage device 121, the message processing unit 142 performs processing for copying the designated data content to the main storage device 121. After that, the processing result is notified to the task communication unit 140. In such a case, it can be said that only the size of the received data is returned as the processing result value, and the information of the received data itself is not explicitly included in the processing result value.
  • FIG. 6 shows an example of the data structure of the notification message 201 used when the processing request is transmitted between the leader node and the follower node, and the approval message 202 and the denial message 203 as responses thereto.
  • the notification message 201 is a reference value to the request to be transmitted, and has a processing request 400 that defines the type of processing request.
  • the accompanying data 406 may be included.
  • the approval message 202 and the denial message 203 have a notification reference 300 that is a value for referring to the notification message 201 sent from the corresponding leader node. If the notification message 201 can be uniquely identified, it can be substituted by referring to the processing request.
  • the message processing unit 142 determines the message synchronization type for each processing request as will be described later.
  • the message synchronization type is, for example, “synchronous” or “semi-synchronous”, and specifies the process when the message processing unit 142 proceeds to the next process in response to the response or arrival of the message in response to the processing request. Is.
  • the message processing unit 142 in the leader node waits for responses from all the computer nodes 110 currently operating normally in the FT system 1 in response to a processing request whose message synchronization type is “synchronous”.
  • the process proceeds to the next process, and for a message that is “semi-synchronous”, it waits for half of the approval messages 202 of the computer nodes 110 that are currently operating normally in the FT system 1 and then proceeds to the next process. Perform the action.
  • this determination method there is a method using the message type table 152.
  • This message type table 152 is for determining the message synchronization type 304 from the request type 303 for the processing request described above.
  • the message type table 152 is a table having values such as the request type 303 and the message synchronization type 304 as rows. Each line means that the type of message waiting behavior of the processing request having the request type 303 and the notification message 201 having the processing type as a processing request reference is the message synchronization type 304.
  • the determination of the message synchronization type is not limited to this method.
  • the computer node 110 assigns a serial number that uniquely identifies the message to the message, sets a message of a certain multiple number as “synchronized”, and sets the other as “semi-standard”.
  • a method such as “synchronization” can also be used.
  • FIG. 7 is a diagram illustrating an example of an operation flow of each processing unit of the computer node 110 in the FT system 1 according to the first embodiment.
  • the operation of the computer node 110A as the leader node and the computer node 110B as the follower node are shown. Since the other follower node 110C performs the same operation as the computer node 110B, description thereof is omitted.
  • the task communication unit 140 of the computer node 110A receives a request from the application 132 (510), and transmits this request to the message processing unit 142 in the form of a processing request 400 (511).
  • the message processing unit 142 performs processing as needed according to the processing request 400, and sends a notification message 201 including the processing request 400 to the follower node by requesting the inter-node communication unit 141. (512).
  • the message processing unit 142 of the computer node 110A determines the message synchronization type for the processing request 400 received from the task communication unit 140 (513).
  • the message processing unit 142 waits for a response of the approval message 202 or the denial message 203 from the computer nodes 110B and 110C, which are follower nodes, according to the message synchronization type determined in step 513 (514).
  • the message processing unit 142 After waiting for the response, the message processing unit 142 performs majority processing on the processing result of the corresponding task obtained from each follower node, and transmits the finally matched result to the task communication unit 140 as the processing result ( 515).
  • the task communication unit 140 notifies the result to the application 132 (516).
  • the message processing unit 142 of the computer node 110B as the follower node waits for the notification message 201 from the computer node 110A as the leader node (517).
  • the message processing unit 142 of the computer node 110B receives the notification message 201 received from the leader node and the content of the processing request received from the task 133 by its own task communication unit 140.
  • the data is transmitted to the leader node via the inter-node communication unit 141 (519).
  • FIGS. 8 and 9 are diagrams showing operation flow examples 1 and 2 in the message processing unit of the leader node in the fault tolerant system according to the first embodiment.
  • the message processing unit 142 of the leader node has 5 is a flowchart showing an example of processing for receiving a processing request from the task communication unit 140, performing processing corresponding to the processing request, and returning the result to the task communication unit 140.
  • the message processing unit 142 of the leader node first receives the processing request from the task communication unit 140 (600), and when there is an external influence, that is, except for the case of transmission from the external communication interface 122, Processing corresponding to this processing request is performed (601). At this time, the message processing unit 142 generates request accompanying data (corresponding to the accompanying data 406 in FIG. 6) if necessary. Based on this, the message processing unit 142 generates a notification message 201 that refers to the received processing request, adds any data associated with the request, and adds it to all follower nodes (computer node 110B and computer node 110C). And transmitted via the inter-node communication unit 141 (602).
  • the message processing unit 142 adds a line corresponding to the corresponding notification message 201 to the approval message list 150 in the FT middleware 131, that is, including the notification message 201 as the notification reference 300 (603).
  • the message processing unit 142 determines the message synchronization type for the processing request received in step 600 (604).
  • step 604 the message processing unit 142 collates the above-described processing request information with the message type table 152 in the FT middleware 131, for example, so that the value of the request type 400 in each record of the message type table 152 is set.
  • the line that matches the value of the processing request received in step 600 eg, sync, Network_Send, Network_Recv, Lock_Acquire
  • the method for determining the message synchronization type is not limited to this method using the message type table 152, and any method can be adopted as long as it can determine the message synchronization type as one. .
  • step 604 it is assumed that the message synchronization type of the processing request received in step 600 is “synchronization” (604: synchronization).
  • the message processing unit 142 of the leader node requires responses from all the follower nodes operating at that time in order to continue the operation related to the processing request. That is, the message processing unit 142 of the leader node waits for an approval message 202 or a denial message 203 from all follower nodes (computer node 110B and computer node 110C) (605).
  • the message processing unit 142 is notified of the reception of the approval message 202 from the inter-node communication unit 141 (hereinafter simply referred to as “received”) (606: Ack), the message processing unit 142 in the approval message list 150 ( (The row added in step 603 regarding the notification message 201 related to the corresponding processing request), the identification information of the computer node 110 of the transmission source (that is, the follower node that returned the approval message 202 to the notification message 201) in the approval node set 301. Is added (607).
  • the message processing unit 142 when the message processing unit 142 receives the denial message 203 from the follower node (606: Nack), similarly, in the corresponding row of the approval message list 150, the message processing unit 142 adds the denial node set 302 to the transmission source computer node 110 (that is, The identification information of the follower node that responded with the denial message 202 to the notification message 201 is added (608).
  • the message processing unit 142 of the leader node repeats this until all follower nodes enter either the approval node set 301 or the denial node set 302 in the approval message list 150 (609). As a result, the message processing unit 142 of the leader node can confirm that the tasks 133 of all the follower nodes have arrived until the time when the processing request corresponding to the message 201 is executed.
  • step 609 If it is determined in step 609 that all follower nodes have entered either the approval node set 301 or the denial node set 302 in the approval message list 150 (609: Yes), the message processing unit 142 of the leader node performs the same task. It is determined whether or not the processing is to be continued according to the principle of majority vote for the execution result in each computer node (610). When the number of computer nodes that transmitted the approval message 202 is greater than or equal to the number of rejection messages (610: Yes), the message processing unit 142 of the leader node stops the node that transmitted the rejection message 203. This is because the leader node itself naturally agrees with the contents of the notification message 201.
  • the message processing unit 142 of the leader node generates the processing result of step 611 (612), transmits it to the task communication unit 140 (613), and waits for the next processing request from the task communication unit 140. Return to processing (600).
  • the message synchronization type is “quasi-synchronization” as a result of the above-described step 604 (604: quasi-synchronization).
  • the leader node requires a response from a majority of the follower nodes operating at that time in order to continue the operation related to the task.
  • the operation is the same as that in the previous stage with respect to the reception of the approval message 202 and the denial message 203 (616, 617, 618, 620), but the conditions are different and whether more than half of the follower nodes are added to the approval node set 301. This is repeated until all follower nodes enter either the approval node set 301 or the denial node set 302 (619).
  • the leader nodes themselves naturally agree with this decision, so that more than half of the follower nodes have sent the approval message 202 means that a majority of the nodes have responded. Subsequent majority decision result determination processing (610) and subsequent processing is the same as in the previous stage. Even after the leader node advances the processing, the remaining follower nodes may transmit the approval message 202 or the denial message 203. For this reason, when receiving these, the message processing unit 142 of the leader node performs an operation of adding to the corresponding approval node set 301 or denial node set 302 as needed. When the denial message 203 is received, it is known that this is a minority, and the node is immediately stopped.
  • the above-mentioned content is a case where all the follower nodes respond correctly, but as described in the background, the follower node is caused by a failure in the computer node 110, the inter-node network 112, the inter-node communication interface 123, or the like. May become unresponsive or the response may be significantly delayed. Under this situation, the process of waiting for reception (605) performed by the message processing unit 142 of the leader node when the message synchronization type is “synchronization” (604: synchronization) in the above-described flow is affected.
  • timeout time a predetermined time (hereinafter referred to as timeout time) has elapsed from the transmission time of the notification message 201 in step 602 (606: timeout)
  • the message processing unit 142 of the leader node has received the approval message 202 or the denial message by that time. It is assumed that a failure has occurred in the follower node that did not send 203. Then, the message processing unit 142 of the leader node proceeds to the determination processing (610) of the majority result. At this time, the follower node that is considered to have failed may be considered to be in the denial node set 302 (621), or may be determined by removing it from the total number of computer nodes.
  • the computer node 110 may determine the timeout time by an arbitrary algorithm. For example, a predetermined algorithm is used to predict a time for executing a predetermined process in accordance with the notification message 201 described above, and an expected arrival time of either an approval / denial message from the follower node is specified, and this is calculated as a timeout time. You may do it. Even if a constant is used, failure determination using the timeout time is performed only for all messages whose message synchronization type is "synchronous", so the majority of computer nodes are operating without significant delay. In some cases, even if there is a temporary delay in less than half of the computer nodes, that is, a minority computer node, it can be absorbed.
  • an execution constraint time can be added to a request parameter of a processing request in which the message synchronization type 304 is “synchronous”, and a timeout time can be determined based on this. This is advantageous in that it is possible to perform failure determination according to application characteristics such as a periodic time of a task that is periodically executed by explicitly controlling the timeout time from the application.
  • the processing request in which the message synchronization type 304 is “synchronous” is, for example, an area that is repeatedly executed in a task (“periodic task”) that repeatedly executes the same program at a constant time interval under a known time constraint. For example, a request for receiving a message from the outside that arrives at an unknown timing, that is, a request for receiving data from the external communication interface 122. For two consecutive messages whose message synchronization type 304 is "synchronous”, the time from the earlier arrival time to the execution location of the task 133 that issues a processing request corresponding to them, the time from the earlier arrival time When the constraint (hereinafter referred to as “time constraint”) is clear, the time constraint is added to the request parameter of any processing request.
  • time constraint hereinafter referred to as “time constraint”
  • Timeout processing in step 605 using this time constraint is as follows.
  • the message processing unit 142 of the leader node changes the time constraint included in the request parameter or the request parameter of the processing request currently being processed from the reception time of the processing request whose previous message synchronization type 304 is “synchronous”.
  • a timeout is considered when only the included time constraints (which depends on the implementation) are passed.
  • the time constraint included in the request parameter, or A time obtained by apportioning the time constraint at an appropriate ratio can be used as the timeout time.
  • a failure determination may be performed using a timeout time determined by another method, or a timeout may not be used.
  • the leader node waits indefinitely for an approval / denial message from the follower node.
  • the FT system 1 is used. This is a state in which the operation cannot be continued and should be stopped, and this is not greatly impaired in terms of availability.
  • FIG. 10 is a diagram illustrating an example of an operation flow in the message processing unit 142 of the follower node in the FT system 1 according to the first embodiment.
  • the message processing unit 142 of the follower node includes the task communication unit 140.
  • 7 is a flowchart illustrating an example of processing for receiving a processing request from the server and performing corresponding processing and returning the result to the task communication unit 140.
  • the message processing unit 142 of the follower node first waits for a processing request from the task communication unit 140 or a message from the inter-node communication unit 141 in the follower node (700).
  • the message processing unit 142 of the follower node receives a processing request from the task communication unit 140 (701: Yes)
  • the notification message list 151 is confirmed and a notification corresponding to the processing request received from the task communication unit 140 is received. It is confirmed whether the message 201 has arrived from the reader (702).
  • the message processing unit 142 of the follower node compares the processing request referred to by the notification message 201 with the processing request received from the task communication unit 140. (703).
  • the comparison processing is the result of processing in accordance with two processing requests, such as the determination of whether the processing request obtained from the task communication unit 140 and the processing request included in the notification message 201 match each other's identification information or accompanying data. As long as they are sufficient to determine that they are the same.
  • step 703 If it is determined in step 703 that the processing request referred to by the notification message 201 is the same as the processing request received from the task communication unit 140 (703: YES), the message processing unit 142 of the follower node The approval message 202 is transmitted to the node through the inter-node communication unit 141 (704).
  • the message processing unit 142 of the follower node executes a corresponding predetermined process if necessary based on the corresponding notification message 201 (705), generates a processing result (706), and outputs it as a task communication unit. It transmits to 140 (707).
  • step 703 if it is determined in step 703 that the processing request referred to by the notification message 201 is not the same as the processing request received from the task communication unit 140 (703: No), the message processing unit 142 of the follower node Then, the denial message 203 is transmitted to the leader node through the inter-node communication unit 141 (708). At this time, since the behavior of the leader node can be regarded as abnormal from the standpoint of the follower, the message processing unit 142 of the follower node makes a proposal such as re-election of the leader node to each of the other computer nodes. Processing is performed (711).
  • the message processing unit 142 of the follower node receives the request from the leader node. (709).
  • the message processing unit 142 of the follower node is the same as when the processing request from the leader node already exists in the notification message list 151 (702: Yes). Proceed to processing.
  • the notification message 201 from the leader node may be significantly delayed or cannot be received due to a failure of the leader node or the inter-node communication interface 123 or the like. Therefore, the message processing unit 142 of the follower node receives the corresponding notification message 201 from the leader node even when a certain time (hereinafter referred to as a timeout time) has elapsed from the reception time of the processing request while waiting for step 709. If not, it is considered that a failure has occurred in the leader node or the inter-node communication interface 123 (710: Yes).
  • a timeout time a certain time
  • step 703 in the case where it is determined in step 703 that the processing request referred to by the notification message 201 and the processing request received from the task communication unit 140 are not the same, at the time of failure such as a leader node re-election proposal Processing is performed (711).
  • step 701 when the message processing unit 142 of the follower node receives the notification message 201 from the inter-node communication unit 141 (701: No), the processing request corresponding to the notification message 201 is sent from the task communication unit 140. Since it has not been received yet, the information of the corresponding notification message 201 is stored in the notification message list 151 for later processing (712). Thereafter, the message processing unit 142 of the follower node returns to the process of waiting for reception of the processing request or notification message 201 (700).
  • the operation of the leader node and the follower node in the majority redundant configuration is kept the same, and the timeout determination at the leader node (reception waiting time of the approval / denial message from the follower node) Is reduced to only those related to synchronization messages, it is possible to allow a temporary processing and transmission delay of a small number of follower nodes.
  • the task communication unit 140 provides an API for issuing a synchronization request for performing synchronization, whereby the timing at which the application 132 is synchronized is determined. Control of timeout determination that matches the characteristics of the application 132 can be realized.
  • the processing of the task communication unit 140 and the message processing unit 142 is performed for one task 133 by performing parallel processing such as threading for each task or by adding multiplexing processing waiting for input / output. It is necessary to prevent other tasks 133 from being delayed by processing the request.
  • the reception of the processing request from the task communication unit 140 and the reception of the notification message 201 from the inter-node communication unit 141 are described as a single flow, but by using a plurality of threads or the like for efficiency, May be executed in parallel.
  • Second Embodiment It can be assumed that the message processing unit 142 of the computer node performs processing using the request list.
  • This request list is a table for selecting, when the message processing unit 142 receives the notification message 201 from another computer node, whether to return the approval message 202 or the denial message 203 in response thereto. Therefore, the request list has sufficient data for determining whether the message to be returned is an approval / denial message, for example, each value of a processing request or request-accompanying data.
  • the FT system 1 in the second embodiment has the same configuration as that in FIG.
  • the behavior of the follower node in the second embodiment follows the flow shown in FIG. Since other operations and configurations are the same as those of the first embodiment, description of overlapping portions is omitted.
  • FIG. 11 and FIG. 12 are diagrams showing an operation flow example 1 and an example 2 in the message processing unit of the follower node of the fault tolerant system in the second embodiment.
  • the major difference from the first embodiment is that the processing for the processing request obtained by the task communication unit 140 proceeds without waiting for the notification message 201 from the leader node to arrive.
  • the message processing unit 142 of the follower node in the second embodiment waits for a processing request from the task communication unit 140 or a notification message from the inter-node communication unit 141 (800). For example, when a processing request is received from the task communication unit 140 (801: Yes), the message processing unit 142 of the follower node checks the notification message list 151 and determines whether the corresponding notification message 201 has arrived from the leader node. Confirm (802).
  • step 802 when it is determined that the notification message 201 corresponding to the processing request obtained from the task communication unit 140 has already arrived (802: Yes), the message processing unit 142 of the follower node is the first embodiment. The same processing is executed (803 to 810). Since the corresponding process is the same as that of the first embodiment, the description thereof is omitted.
  • the message processing unit 142 of the follower node responds to the processing request. Corresponding processing is performed (806). However, in the case of processing that affects the outside, such as transmission from the external communication interface 122, only the expected result is calculated without actually performing the processing. Further, the message processing unit 142 of the follower node adds request-accompanying data including information related to the processing request and the result of the processing to the above-described request list (807).
  • the request list in which the records are generated in this way is a collection of records including data in the same format as the processing request 400 and the accompanying data 406 of the notification message 201 shown in FIG.
  • the message processing unit 142 of the follower node generates a processing result from the result of step 806 described above (809), and transmits the processing result to the task communication unit 140 (810). Thereafter, the message processing unit 142 of the follower node returns to waiting for transmission from the task communication unit 140 and the inter-node communication unit 141 (800).
  • the message processing unit 142 of the follower node confirms the request list and corresponds to the processing request reference included in the notification message 201. It is confirmed whether there is a processing request to be performed (811).
  • the message processing unit 142 of the follower node performs task communication.
  • the notification message 201 obtained from the unit 140 is compared with the corresponding processing request included in the request list (812). This comparison process is similar to that described above with respect to step 703 of FIG.
  • the message processing unit 142 of the follower node displays the denial message 203.
  • the data is transmitted to the leader node (814), and a failure process related to the leader node is performed (815). If there is no corresponding processing request in the request list in step 811 described above, the message processing unit 142 of the follower node adds the information of the notification message 201 to the notification message list 151 (816).
  • the follower node can proceed with each process related to a task without waiting for a notification message from the leader node, and therefore the processing time delay can be reduced.
  • the follower node in this case proceeds with the process without using the result at the leader node, there is a high probability that a difference in processing contents will occur. Therefore, the follower node can also partially apply the present embodiment in which a part of the processing waits for a notification message from the leader node and the other processing proceeds without waiting.
  • the message synchronization type can also be used for the determination in the follower node when performing such partial application.
  • the message synchronization type is “synchronous” or “semi-synchronous”, it waits for a notification message from the leader node, and if the message synchronization type described in the next embodiment is “asynchronous”, notification from the leader node It is an algorithm that does not wait for a message.
  • Third embodiment In the first and second embodiments described above, an example in which the process is selectively performed when the message synchronization type is “synchronous” or “semi-synchronous” has been described. Here, in addition to the same machine type, an example in which the message synchronization type is “asynchronous” is assumed.
  • FIG. 13 is a diagram showing an example of an operation flow in the message processing unit 142 of the leader node of the FT system 1 in the third embodiment, and is a flowchart showing only a difference from the flow of FIG. Therefore, the description of the same configuration and processing as in the first embodiment is omitted.
  • the message processing unit 142 of the leader node waits for transmission of a processing request from the task communication unit 140 or reception of the notification message 201 from the inter-node communication unit 141 (1000). If a processing request is received from the task communication unit 140 during standby (1001: Yes), the message processing unit 142 of the leader node determines the message synchronization type after executing steps 601 to 603 in the flow of FIG. (1002). The processing when the message synchronization type is “synchronous” and “semi-synchronous” is the same as the case shown in the first embodiment.
  • the message processing unit 142 of the leader node proceeds to step 611 in the flow of FIG. 8 without waiting for an approval / denial message from the follower node. .
  • the message processing unit 142 of the leader node determines the majority process (step 610 in FIG. 8) as in the case of “semi-synchronization”. ) May be performed.
  • majority processing it is necessary to perform majority processing after returning a result to the task communication unit 140. In this case, when the notification message 201 is received from the inter-node communication unit 141 (1001: No) ) To run.
  • the message processing unit 142 of the leader node When the message processing unit 142 of the leader node receives the notification message 201 from the inter-node communication unit 141 (1001: No), if it is the approval message 202 (1003: Ack), the message processing unit 142 of the approval node in the corresponding approval message list 150 Information of the follower node of the transmission source is added to the approval node set 301 of the row (1004). On the other hand, if the message is the denial message 203 (1003: Nack), the message processing unit 142 of the leader node adds the information of the source follower node to the denial node set 302 (1005).
  • the message processing unit 142 of the leader node When there is a follower node not yet included in the approval node set 301 or the denial node set 302 regarding this row (1006: No), the message processing unit 142 of the leader node returns to reception waiting (1000). When the approval / denial message is received from all the follower nodes (1006: Yes), the message processing unit 142 of the leader node goes to the majority process. At this time, when there are a large number of rejection messages 203 (1007: No), the message processing unit 142 of the leader node performs a failure process related to the leader node.
  • the message processing unit 142 of the leader node deletes the corresponding line in the approval message list 150 (1009), and returns to waiting for reception (1000). At this time, the message processing unit 142 of the leader node may perform processing for stopping the follower node that has sent the denial message 203 as necessary.
  • the message processing unit 142 of the leader node may move the processing from step 1006 to majority processing (1007).
  • FIG. 14 is a diagram showing an example of an operation flow in the message processing unit 142 of the leader node of the FT system 1 in the fourth embodiment, and is a flowchart showing only a difference from the flow of FIG. Therefore, the description of the same configuration and processing as in the first embodiment is omitted. The description of the same configuration and processing as those in the third embodiment is also omitted.
  • the message processing unit 142 of the leader node waits for transmission of a processing request from the task communication unit 140 or reception of the notification message 201 from the inter-node communication unit 141 (1100).
  • the message processing unit 142 of the leader node determines the message synchronization type after executing steps 601 to 603 in the flow of FIG. (1102).
  • the processing when the message synchronization type is “synchronization” is the same as that shown in the first embodiment (to step 605 in the flow of FIG. 8).
  • the message processing unit 142 of the leader node proceeds to step 611 in the flow of FIG. 8 without waiting for an approval / denial message from the follower node. .
  • the message processing unit 142 of the leader node determines the majority process (step 610 in FIG. 8) as in the case of “semi-synchronization”. ) May be performed.
  • majority processing it is necessary to perform majority processing after returning the result to the task communication unit 140.
  • step 1101 when the notification message 201 is received from the inter-node communication unit 141 (1101: No) ) To run.
  • step 1101 when the notification message 201 is received from the inter-node communication unit 141 (1001: No), the subsequent processing (1103 to 1109) is the same as in the third embodiment.
  • the number of times that the leader node waits for the approval / denial message from the follower node can be reduced as compared with the case of the first embodiment. It becomes possible to tolerate appropriately.
  • messages for synchronization between computer nodes are classified, and depending on the type, only a part of the messages waits for the result of majority voting.
  • the behavior change of majority voting such as executing asynchronously is executed to reduce the number of processing waits between computer nodes, thereby relatively suppressing the influence of performance variation between computer nodes.
  • Such an effect also leads to reduction of processing delay and efficiency reduction in the entire system.
  • the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the processing content is either synchronous or semi-synchronous. If the processing content corresponds to synchronization, for all the computers other than the other computer, wait for reception of the followability message, When the processing content corresponds to quasi-synchronization, it is possible to wait for reception of the follow-up propriety message for at least half of the computers other than the other computers.
  • the computer that is the leader in the fault-tolerant system should wait for receipt of a message indicating whether the task can be tracked Control the number of follower calculators.
  • the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved.
  • the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the corresponding processing content is synchronous, semi-synchronous, and Judgment is made whether it is asynchronous or not, and if the processing contents correspond to synchronization, all the computers other than the other computer are subjected to waiting for reception of the followability message. If the processing content corresponds to quasi-synchronization, wait for reception of the followability message for at least half of the computers other than the other computer, and the processing content is If it is asynchronous, it does not wait to receive the followability message from a computer other than the other computer. It may be.
  • the computer that is the leader in the fault-tolerant system receives a message indicating whether the task can be followed or not. Control the number of follower computers to wait for. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved.
  • the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.
  • the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the corresponding processing content is either synchronous or asynchronous. If the processing content corresponds to synchronization, the processing waits for reception of the follow-up propriety message for all computers other than the other computer, and the processing When the content corresponds asynchronously, it may be configured not to wait for reception of the follow-up enable / disable message from a computer other than the other computer.
  • the followers to wait for the follow-up message about the corresponding task in the fault-tolerant system, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization or asynchronous is necessary Control the number of calculators.
  • the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or
  • the availability in the system can be improved.
  • the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.
  • the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the processing content is synchronous or semi-synchronous. It is determined whether it corresponds to any one, and when the processing content corresponds to synchronization, for all the computers other than the other computer, wait for reception of the followability message, When the processing content corresponds to quasi-synchronization, waiting for reception of the followability message may be executed for at least half of the computers other than the other computers.
  • the computer that is the leader in the fault-tolerant system should wait for receipt of a message indicating whether the task can be tracked Control the number of follower calculators.
  • the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved.
  • the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the corresponding processing content is synchronous, semi-synchronous, And if the processing content corresponds to synchronization, and if all of the computers other than the other computer are the target, wait for reception of the follow-up enable / disable message. If the processing content corresponds to quasi-synchronization, wait for reception of the followability message for at least half of the computers other than the other computer, and execute the processing content Is not asynchronous, it does not wait to receive the follow-up message from other computers than the other computer. Good.
  • the computer that is the leader in the fault-tolerant system receives a message indicating whether the task can be followed or not. Control the number of follower computers to wait for. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved.
  • the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.
  • the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the processing content is either synchronous or asynchronous. If the processing content corresponds to synchronization, for all the computers other than the other computer, wait for reception of the followability message, When the processing contents are asynchronous, it may be determined not to wait for reception of the followability message from a computer other than the other computer.
  • the followers to wait for the follow-up message about the corresponding task in the fault-tolerant system, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization or asynchronous is necessary Control the number of calculators.
  • the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or
  • the availability in the system can be improved.
  • the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

Provided is a fault-tolerant system with a plurality of computers which form a majority voting redundancy configuration, each of which is provided with: a task communication unit (140) for acquiring process content from a program which operates in that computer (110B, 110C); and a message processing unit (142) which receives, from any of the other computers (110A) which form the majority voting redundancy configuration, a prescribed message which indicates the process content of a program which operates on the other computer (110A) and returns a replication permissibility message to the other computer (110A) according to the result of a match determination between the process content indicated by the prescribed message and the process content obtained by the task communication unit (140). The message processing unit (142) of the computer which is the other computer (110A) further executes a process for changing the number of computers (110B, 110C), aside from the other computer, for which the other computer is to await receipt of replication permissibility messages, depending on the process content obtained by the task communication unit (140) of the other computer (110A).

Description

フォールトトレラントシステム及びフォールトトレラントシステム制御方法Fault tolerant system and fault tolerant system control method

 本発明は、フォールトトレラントシステム及びフォールトトレラントシステム制御方法に関するものであり、具体的には、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性向上を可能とする技術に関する。 The present invention relates to a fault-tolerant system and a fault-tolerant system control method, and more specifically, in a fault-tolerant system composed of a plurality of computer nodes having different architectures and infrastructure software, etc., the performance difference between the computer nodes. The present invention relates to a technology that can improve the availability of a system by reducing the influence of the overall performance due to the fluctuations and preventing excessive failure determination due to such influence.

 ミッションクリティカルなコンピュータシステムにおいて、所定の冗長性を持たせることで高い可用性を確保するフォールトトレランスの概念が導入されている。そうした概念に対応する技術としては、該当システムを複数の計算機から構成し、同期動作する各計算機の出力信号を多数決チェックにて比較、監視して、不具合のある計算機をシステムから切り離すといったものが提案されている。例えば、ユーザが「インタフェースと対話して、モニタされる変数、すなわち多数決される変数と、多数決が行われるブレークポイントと、多数決および回復パラメータなどの情報と」を指定し、「指定されたブレークポイントで相異なるコピーによって生成されたユーザ指定変数のうちの1つ以上の値を比較することによって、コピーのうちの1つにフォールトが検出される」といった従来技術(特許文献1参照)がある。 In the mission critical computer system, the concept of fault tolerance has been introduced to ensure high availability by giving a predetermined redundancy. As a technology corresponding to such a concept, a system is proposed in which the corresponding system is composed of a plurality of computers, the output signals of the computers operating in synchronization are compared and monitored by majority vote, and the defective computer is separated from the system. Has been. For example, the user may specify “interacting with the interface, monitored variables, ie variables to be voted on, breakpoints on which the majority will be taken, and information such as majority and recovery parameters” and “specified breakpoints” There is a conventional technique (see Patent Document 1) in which a fault is detected in one of the copies by comparing one or more values of user-specified variables generated by different copies.

特許第3624119号公報Japanese Patent No. 3624119

 ところで、同じハードウェアで同じ基盤ソフトウェアを使用する複数の計算機ノードで構成したフォールトトレラントシステム(以下、同種FTシステムという)であっても、各計算機ノードにて同じタスクを実行した場合の実行時間は、計算機ノード間でばらつきを生じる。このばらつきは、中央演算装置のアーキテクチャ、性能、基盤ソフトウェアの種類の一つあるいは複数が異なる計算機ノードを組み合わせたフォールトトレラントシステム(以下、異種FTシステムという)において、計算機ノード間の性能差に由来し更に顕著なものとなる傾向にある。 By the way, even in a fault tolerant system (hereinafter referred to as the same kind of FT system) configured by a plurality of computer nodes using the same hardware and the same hardware, the execution time when the same task is executed in each computer node is Variations occur between computer nodes. This variation stems from performance differences between computer nodes in a fault-tolerant system (hereinafter referred to as a heterogeneous FT system) in which computer nodes differing in one or more of the architecture, performance, and basic software of the central processing unit. There is a tendency to become more prominent.

 あるプログラムを実行するのにかかる時間は、計算機ノードにおけるハードウェアの性能だけでなく様々な外的要因により変化するため、一般に予測が困難である。このため、同種FTシステムおよび異種FTシステムのいずれの場合においても、この計算機ノード間の実行時間の差を許容し、かつ、無応答あるいは大幅な処理遅延の事象に基づく障害ノードの判定とその排除を、可用性を損なわずに行う必要がある。 Since the time required to execute a certain program varies depending on various external factors as well as the hardware performance of the computer node, it is generally difficult to predict. For this reason, in both cases of the homogeneous FT system and the heterogeneous FT system, the difference in execution time between the computer nodes is allowed, and the determination of the failed node based on the event of no response or a large processing delay and its elimination Must be done without sacrificing availability.

 一方、従来技術においては、出力間で多数決を行うにあたり、ある時点での変数内容の比較を行う際に、同期的に各コピー(計算機ノード)から変数内容を収集し比較するために、必ず全ノードが該当時点で同期する必要がある。このことは、各コピーの実行時間のばらつきを該当時点で吸収する必要があるため、全体の処理効率低下を招くことにつながりかねない。また、そのようなばらつきを許容する障害判定時間の設定も、実行時間の予測困難性により困難である。 On the other hand, in the prior art, when making a majority decision between outputs, when comparing variable contents at a certain point in time, it is necessary to collect and compare variable contents from each copy (computer node) synchronously. Nodes need to be synchronized at the appropriate time. This is necessary to absorb the variation in the execution time of each copy at the corresponding time, which may lead to a decrease in the overall processing efficiency. Also, it is difficult to set a failure determination time that allows such variations due to difficulty in predicting the execution time.

 そこで本発明の目的は、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性向上を可能とする技術を提供することにある。 Accordingly, an object of the present invention is to reduce the influence of the overall performance due to the difference in performance and variation between the computer nodes in a fault tolerant system composed of a plurality of computer nodes having different architectures and basic software, etc. It is an object of the present invention to provide a technology capable of improving availability in a system by preventing excessive failure determination.

 上記課題を解決する本発明のフォールトトレラントシステムは、多数決冗長構成を形成する複数の計算機各々において、当該計算機の内部で動作するプログラムからその処理内容を取得するタスク通信部と、前記多数決冗長構成を形成するいずれかの他計算機より、前記他計算機で動作するプログラムの処理内容を示す所定メッセージを受信し、前記所定メッセージが示す処理内容と前記タスク通信部が得ている前記処理内容との一致判定の結果に応じ、前記他計算機に追従可否のメッセージを返信するメッセージ処理部とを備えており、前記他計算機となった計算機のメッセージ処理部は、当該他計算機のタスク通信部が得ている処理内容に応じて、前記追従可否のメッセージの受信待ち対象とする、当該他計算機以外の計算機の数を変更する処理を更に実行するものである、ことを特徴とする。 The fault-tolerant system of the present invention that solves the above-described problem includes, in each of a plurality of computers forming a majority redundant configuration, a task communication unit that acquires the processing contents from a program operating inside the computer, and the majority redundant configuration. Receiving a predetermined message indicating the processing content of the program operating on the other computer from any other computer to be formed, and determining whether the processing content indicated by the predetermined message and the processing content obtained by the task communication unit are consistent A message processing unit that returns a message indicating whether or not follow-up is possible to the other computer according to the result, and the message processing unit of the computer that is the other computer is a process obtained by the task communication unit of the other computer. Depending on the content, change the number of computers other than the other computers that are subject to reception of the followability message. It is to a process for further execution, and wherein the.

 また、本発明のフォールトトレラント制御方法は、多数決冗長構成を形成する複数の計算機各々において、当該計算機の内部で動作するプログラムからその処理内容を取得する処理と、前記多数決冗長構成を形成するいずれかの他計算機より、前記他計算機で動作するプログラムの処理内容を示す所定メッセージを受信し、前記所定メッセージが示す処理内容と前記処理で得ている前記処理内容との一致判定の結果に応じ、前記他計算機に追従可否のメッセージを返信する処理とを実行し、更に、前記他計算機においては、当該他計算機で動作するプログラムから得ている処理内容に応じて、前記追従可否のメッセージの受信待ち対象とする、当該他計算機以外の計算機の数を変更する処理を更に実行する、ことを特徴とする。 Further, the fault tolerant control method of the present invention is a method in which, in each of a plurality of computers forming a majority voting redundant configuration, a process for obtaining the processing contents from a program operating inside the computer, and forming the majority voting redundant configuration From another computer, a predetermined message indicating the processing content of the program operating on the other computer is received, and according to a result of the matching determination between the processing content indicated by the predetermined message and the processing content obtained by the processing, Processing for returning a followability message to another computer, and, in the other computer, in accordance with the processing content obtained from the program operating on the other computer, the target of waiting for reception of the followability message Further, a process of changing the number of computers other than the other computer is further executed.

 本発明によれば、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性が向上する。 According to the present invention, in a fault-tolerant system composed of a plurality of computer nodes having different architectures, infrastructure software, and the like, the influence of the overall performance due to the performance difference and variation between the computer nodes is reduced, and this influence is also caused. By preventing excessive fault determination, availability in the system is improved.

第1の実施形態に係るフォールトトレラントシステムの構成例を示す図である。It is a figure which shows the structural example of the fault tolerant system which concerns on 1st Embodiment. 第1の実施形態に係るフォールトトレラントシステムを構成する複数の計算機ノード間の関係例を示す図である。It is a figure which shows the example of a relationship between the some computer nodes which comprise the fault tolerant system which concerns on 1st Embodiment. 第1の実施形態に係るフォールトトレラントシステムを構成する計算機ノードが保持する承認メッセージリストの構成例を示す図である。It is a figure which shows the structural example of the approval message list which the computer node which comprises the fault tolerant system which concerns on 1st Embodiment hold | maintains. 第1の実施形態に係るフォールトトレラントシステムを構成する計算機ノードが保持する通知メッセージリストの構成例を示す図である。It is a figure which shows the structural example of the notification message list which the computer node which comprises the fault tolerant system which concerns on 1st Embodiment hold | maintains. 第1の実施形態に係るフォールトトレラントシステムを構成する計算機ノードが保持するメッセージ種類テーブルの構成例を示す図である。It is a figure which shows the structural example of the message type table which the computer node which comprises the fault tolerant system which concerns on 1st Embodiment hold | maintains. 第1の実施形態に係るフォールトトレラントシステムを構成する複数の計算機ノード間で授受されるデータの構造例を示す図である。It is a figure which shows the structural example of the data transferred between the some computer nodes which comprise the fault tolerant system which concerns on 1st Embodiment. 第1の実施形態に係るフォールトトレラントシステムにおける計算機ノードの各処理部の動作フロー例を示す図である。It is a figure which shows the example of an operation | movement flow of each process part of the computer node in the fault tolerant system which concerns on 1st Embodiment. 第1の実施形態に係るフォールトトレラントシステムにおけるリーダーノードのメッセージ処理部での動作フロー例1を示す図である。It is a figure which shows the example 1 of an operation | movement flow in the message processing part of the leader node in the fault tolerant system which concerns on 1st Embodiment. 第1の実施形態に係るフォールトトレラントシステムにおけるリーダーノードのメッセージ処理部での動作フロー例2を示す図である。It is a figure which shows the example 2 of an operation | movement flow in the message processing part of the leader node in the fault tolerant system which concerns on 1st Embodiment. 第1の実施形態に係るフォールトトレラントシステムにおけるフォロワーノードのメッセージ処理部での動作フロー例を示す図である。It is a figure which shows the example of an operation | movement flow in the message processing part of a follower node in the fault tolerant system which concerns on 1st Embodiment. 第2の実施形態におけるフォールトトレラントシステムのフォロワーノードのメッセージ処理部での動作フロー例1を示す図である。It is a figure which shows the example 1 of an operation | movement flow in the message processing part of the follower node of the fault tolerant system in 2nd Embodiment. 第2の実施形態におけるフォールトトレラントシステムのフォロワーノードのメッセージ処理部での動作フロー例2を示す図である。It is a figure which shows the example 2 of an operation | movement flow in the message processing part of the follower node of the fault tolerant system in 2nd Embodiment. 第3の実施形態におけるフォールトトレラントシステムのリーダーノードのメッセージ処理部での動作フロー例を示す図である。It is a figure which shows the example of an operation | movement flow in the message processing part of the leader node of the fault tolerant system in 3rd Embodiment. 第4の実施形態におけるフォールトトレラントシステムのリーダーノードのメッセージ処理部での動作フロー例を示す図である。It is a figure which shows the example of an operation | movement flow in the message processing part of the leader node of the fault tolerant system in 4th Embodiment.

 関連出願の相互参照
 この出願は、2013年9月26日に出願された日本特許出願、特願2013-199614に基づく優先権を主張し、その内容を援用する。
This application claims priority based on Japanese Patent Application No. 2013-199614 filed on Sep. 26, 2013, the contents of which are incorporated herein by reference.

---第1の実施形態---
 以下に本発明における第1の実施形態について図面を用いて詳細に説明する。第1の実施形態のフォールトトレラントシステムにおいては、多数決冗長構成におけるリーダーたる計算機およびフォロワーたる各計算機で、同一のタスクを実行する。リーダーはタスクの処理状体が含まれた通知メッセージをフォロワーに対して逐次送信し、フォロワーはそれに対して、リーダー同様に処理可能であるかどうかを承認/否認のメッセージ(追従可否のメッセージ)としてリーダーに応答する。この場合、リーダーはフォロワーからの応答すなわち上述のメッセージの内容、受信タイミングおよびメッセージの種類によって、該当タスクに関して多数決冗長構成に対応した動作(例:メッセージ待ち受け)の継続に関して判断する。
--- First embodiment ---
Hereinafter, a first embodiment of the present invention will be described in detail with reference to the drawings. In the fault-tolerant system of the first embodiment, the same task is executed by the computer that is the leader and the computer that is the follower in the majority redundant configuration. The leader sequentially sends a notification message including the task processing state to the follower, and the follower accepts or rejects it as an approval / denial message (follow-up / possibility message). Respond to the leader. In this case, the leader makes a determination on the continuation of the operation corresponding to the majority redundant configuration (for example, message waiting) for the corresponding task according to the response from the follower, that is, the content of the message, the reception timing, and the message type.

 図1は、第1の実施形態に係るフォールトトレラントシステムの構成例を示す図である。図1に示すフォールトトレラントシステム1は、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性を向上させるコンピュータシステムである。 FIG. 1 is a diagram illustrating a configuration example of a fault tolerant system according to the first embodiment. The fault-tolerant system 1 shown in FIG. 1 is a fault-tolerant system composed of a plurality of computer nodes having different architectures and basic software, etc., and reduces the influence of overall performance due to performance differences and variations between computer nodes. It is a computer system that improves availability in the system by preventing excessive failure determination due to influence.

 当該フォールトトレラントシステム1は、複数の計算機ノード110で構成される。図1のフォールトトレラントシステム1においては、計算機ノード110として3台の計算機ノード110A、110B、110Cでシステム構成した例を示している。勿論、フォールトトレラントシステム1を構成する計算機ノード数は「3」台である場合に限定されず、例えば「2」台や「4」台以上であってもよい。 The fault tolerant system 1 includes a plurality of computer nodes 110. In the fault tolerant system 1 of FIG. 1, an example is shown in which the system configuration includes three computer nodes 110A, 110B, and 110C as the computer node 110. Of course, the number of computer nodes constituting the fault-tolerant system 1 is not limited to “3”, and may be, for example, “2” or “4” or more.

 多数決冗長構成に対応したフォールトトレラントシステム1においては、計算機ノード110のうち1台をリーダーノードとし、残りの計算機ノード110をフォロワーノードとする。リーダーノードとフォロワーノードの役割は、計算機ノード110の中で固定的である必要はない。例えば、あるタスク毎、すなわち当該タスクに関する上述の通知メッセージごとに、一番最初にその通知メッセージを送信した計算機ノード110をリーダーノードとするなどといったアルゴリズムを用いて、それぞれの役割を交代してもよい。ただし、フォールトトレラントシステム1の動作中の任意の時点において、リーダーノードは、高々一つが存在する。リーダーノードが存在しない場合、もしくは、リーダーノードに何らかの障害が発生して処理を停止しリーダーノードが存在しなくなった場合、フォールトトレラントシステム1は処理を中断し、フォロワーノードの中から、一つの計算機ノード110をリーダーノードとして選ぶこととなる。こうした多数決冗長構成を成す計算機ノード中からリーダーノードを選択する処理技術については既存のものを採用してよい。なお、第1の実施形態においては、3台の計算機ノード110のうち、計算機ノード110Aをリーダーとし、計算機ノード110B及び計算機ノード110Cをフォロワーとしている。 In the fault tolerant system 1 corresponding to the majority redundant configuration, one of the computer nodes 110 is a leader node and the remaining computer nodes 110 are follower nodes. The roles of the leader node and the follower node need not be fixed in the computer node 110. For example, for each task, that is, for each of the above-described notification messages related to the task, the role of each computer may be changed using an algorithm such as the computer node 110 that first transmitted the notification message as a leader node. Good. However, at any time during the operation of the fault tolerant system 1, there is at most one leader node. When the leader node does not exist, or when a failure occurs in the leader node and the process stops and the leader node does not exist, the fault tolerant system 1 interrupts the process, and one computer is selected from the follower nodes. Node 110 will be selected as the leader node. As a processing technique for selecting a leader node from among computer nodes having such a majority redundant configuration, an existing one may be adopted. In the first embodiment, among the three computer nodes 110, the computer node 110A is a leader, and the computer node 110B and the computer node 110C are followers.

 また、各計算機ノード110のハードウェア構成は以下の如くとなる。計算機ノード110は、主記憶装置121と1つ以上の中央演算装置120、外部通信インタフェース122、およびノード間通信インタフェース123を備える。このうち外部通信インタフェース122は、1台以上の端末102と外部ネットワーク111を介し接続されている。また、ノード間通信インタフェース123は、他の計算機ノード110とノード間ネットワーク112を介し接続されている。これらのネットワーク111、112は、例えばイーサネット(登録商標)規格やInfiniband規格に準拠して構成されたネットワークである。 The hardware configuration of each computer node 110 is as follows. The computer node 110 includes a main storage device 121, one or more central processing units 120, an external communication interface 122, and an inter-node communication interface 123. Among these, the external communication interface 122 is connected to one or more terminals 102 via the external network 111. The inter-node communication interface 123 is connected to another computer node 110 via an inter-node network 112. These networks 111 and 112 are networks configured in conformity with, for example, the Ethernet (registered trademark) standard or the Infiniband standard.

 また、計算機ノード110における主記憶装置121は、データの読み込み及び書き込みが可能な揮発性記憶装置であり、中央演算装置120からアクセスされる。この主記憶装置121は、例えばDRAM(Dynamic Random Access Memory)等で構成されるものである。 The main storage device 121 in the computer node 110 is a volatile storage device that can read and write data, and is accessed from the central processing unit 120. The main storage device 121 is composed of, for example, a DRAM (Dynamic Random Access Memory).

 また、計算機ノード110における中央演算装置120は、汎用プロセッサであり、補助記憶装置124等から、基本ソフトウェア130、フォールトトレラントミドルウェア(以下、FTミドルウェア)131、及びアプリケーション132等のプログラムを読み出して主記憶装置121に展開し、該当プログラムを実行する。中央演算装置120の採用するアーキテクチャや、その処理速度は、複数ある計算機ノード110で同一のものでなくてもよい。また、各計算機ノード110は2つ以上の中央演算装置120を備えていてもよい。 The central processing unit 120 in the computer node 110 is a general-purpose processor, and reads main software 130, fault-tolerant middleware (hereinafter referred to as FT middleware) 131, an application 132, and the like from the auxiliary storage device 124 and the main memory. The program is expanded on the device 121 and the program is executed. The architecture adopted by the central processing unit 120 and the processing speed thereof may not be the same among a plurality of computer nodes 110. Each computer node 110 may include two or more central processing units 120.

 また、計算機ノード110における補助記憶装置124は、永続的な不揮発性記憶装置であり、基本ソフトウェア130、FTミドルウェア131、アプリケーション132等のプログラムに加えて、これらプログラムから読み書きされるデータ等が格納される。補助記憶装置124は、例えばハードディスク、フラッシュメモリなどにより構成される。 The auxiliary storage device 124 in the computer node 110 is a permanent non-volatile storage device, and stores data read / written from these programs in addition to programs such as the basic software 130, the FT middleware 131, and the application 132. The The auxiliary storage device 124 is configured by, for example, a hard disk, a flash memory, or the like.

 また、計算機ノード110における外部通信インタフェース122は、計算機ノード110が端末102と通信を行うためのインタフェースである。一方、ノード間通信インタフェース123は、計算機ノード110同士で通信を行うためのインタフェースである。なお、外部通信インタフェース122及びノード間通信インタフェース123は、個別の装置である必要はなく、一つの通信インタフェースが外部通信インタフェース122及びノード間通信インタフェース123を兼ねていてもよい。外部通信インタフェース122及びノード間通信インタフェース123は、それぞれ外部ネットワーク111及びノード間ネットワーク112の種類に応じた通信インタフェース等で構成される。 Further, the external communication interface 122 in the computer node 110 is an interface for the computer node 110 to communicate with the terminal 102. On the other hand, the inter-node communication interface 123 is an interface for performing communication between the computer nodes 110. Note that the external communication interface 122 and the inter-node communication interface 123 do not have to be separate devices, and one communication interface may also serve as the external communication interface 122 and the inter-node communication interface 123. The external communication interface 122 and the inter-node communication interface 123 are configured by communication interfaces corresponding to the types of the external network 111 and the inter-node network 112, respectively.

 また、主記憶装置121に保持されるアプリケーション132は、FTシステム1が提供する機能を実現するためのプログラムである。アプリケーション132は、一つ以上のタスク133から構成される。FTシステム1においては、複数の計算機ノード110で同じアプリケーション132が実行される。なおこの時、アプリケーション132は、実行される計算機ノード110のアーキテクチャや基盤ソフトウェア等の差異に応じた差異をもっていてもよい。すなわち、アプリケーション132およびそのタスク133を構成する、中央演算装置120のための命令群は、計算機ノード110間において同一である必要はない。 Further, the application 132 held in the main storage device 121 is a program for realizing the function provided by the FT system 1. The application 132 is composed of one or more tasks 133. In the FT system 1, the same application 132 is executed by a plurality of computer nodes 110. At this time, the application 132 may have a difference according to a difference in the architecture, infrastructure software, or the like of the computer node 110 to be executed. That is, the instruction group for the central processing unit 120 constituting the application 132 and its task 133 need not be the same among the computer nodes 110.

 また、主記憶装置121に保持される基本ソフトウェア130は、マルチタスク処理が可能なオペレーティングシステムであり、複数のタスク133およびFTミドルウェア131を並列に実行することができる。この基本ソフトウェア130は、FTミドルウェア131に対して、補助記憶装置124に対するアクセス、外部通信インタフェース122及びノード間通信インタフェース123に対するデータ送受信の機能、主記憶装置121の一部分の占有領域の確保および解放等の機能を提供する。 Further, the basic software 130 held in the main storage device 121 is an operating system capable of multitask processing, and can execute a plurality of tasks 133 and FT middleware 131 in parallel. The basic software 130 accesses the FT middleware 131 to the auxiliary storage device 124, functions to transmit and receive data to the external communication interface 122 and the inter-node communication interface 123, and secures and releases a partial occupied area of the main storage device 121. Provides the functionality of

 また、主記憶装置121に保持されるFTミドルウェア131は、FTシステム1においてフォールトトレラント機能を実現するためのプログラムである。FTミドルウェア131は、上述のアプリケーション132に対して、補助記憶装置124に対するアクセス、外部通信インタフェース122に対するデータ送受信、主記憶装置121の一部分に対する排他処理の機能等を提供する。これらの機能について、FTミドルウェア131は、基本ソフトウェア130等の提供する機能を用いて実現する。なお、FTミドルウェア131は、独立したプログラムであっても、アプリケーション132に組み込まれて一体となる形であっても、あるいは、基本ソフトウェア130に組み込まれて一体となる形であってもよい。また、アプリケーション132、FTミドルウェア131、基本ソフトウェア130の3つは、単一のプログラムが兼ねていてもよい。 Also, the FT middleware 131 held in the main storage device 121 is a program for realizing a fault tolerant function in the FT system 1. The FT middleware 131 provides the above-described application 132 with functions such as access to the auxiliary storage device 124, data transmission / reception with respect to the external communication interface 122, and exclusive processing with respect to a part of the main storage device 121. About these functions, the FT middleware 131 is implement | achieved using the function which the basic software 130 grade | etc., Provides. The FT middleware 131 may be an independent program, may be integrated into the application 132, or may be integrated into the basic software 130. Further, the application 132, the FT middleware 131, and the basic software 130 may serve as a single program.

 こうしたFTミドルウェア131は、タスク通信部140、ノード間通信部141、およびメッセージ処理部142を備えている。これらタスク通信部140、ノード間通信部141、メッセージ処理部142は、各計算機ノード110の中央演算装置120がFTミドルウェア131を実行することで実装される機能と言える。 The FT middleware 131 includes a task communication unit 140, an inter-node communication unit 141, and a message processing unit 142. The task communication unit 140, the inter-node communication unit 141, and the message processing unit 142 can be said to be functions that are implemented when the central processing unit 120 of each computer node 110 executes the FT middleware 131.

 このうちタスク通信部140は、タスク133からの処理の要求を受け取り、その要求をメッセージ処理部142に対して伝達する。一方、メッセージ処理部142は、他の計算機ノード110との同期をとった上で要求された処理を実行し、その結果をタスク通信部140に対して伝達する。メッセージ処理部142から上述の結果を得たタスク通信部140は、該当結果をタスク133に対して伝える。 Among these, the task communication unit 140 receives a processing request from the task 133 and transmits the request to the message processing unit 142. On the other hand, the message processing unit 142 executes the requested processing after synchronizing with other computer nodes 110 and transmits the result to the task communication unit 140. The task communication unit 140 that has obtained the above result from the message processing unit 142 informs the task 133 of the corresponding result.

 メッセージ処理部142が他の計算機ノード110との通信を行う際には、ノード間通信部141に対し、ノード間通信インタフェース123を経由したデータ送信を要求する。一方、ノード間通信部141はノード間通信インタフェース123から受信したデータをメッセージ処理部142に伝達する。こうした、タスク133とタスク通信部140、タスク通信部140とメッセージ処理部142、メッセージ処理部142とノード間通信部141の各間の伝達の手段としては、任意のものを採用できる。例えば、上述の伝達手段として、関数呼び出しとその返り値や、ソケットやパイプといった基本ソフトウェア130の提供する通信手段等を用いることができる。 When the message processing unit 142 communicates with another computer node 110, the inter-node communication unit 141 is requested to transmit data via the inter-node communication interface 123. On the other hand, the inter-node communication unit 141 transmits the data received from the inter-node communication interface 123 to the message processing unit 142. Arbitrary means can be adopted as means for communication between the task 133 and the task communication unit 140, the task communication unit 140 and the message processing unit 142, and the message processing unit 142 and the inter-node communication unit 141. For example, as the above-described transmission means, function calls and their return values, communication means provided by the basic software 130 such as sockets and pipes, and the like can be used.

 また、ノード間通信部141は、ノード間通信インタフェース123からデータを受信すると、その内容をメッセージ処理部142に伝達する。また、メッセージ処理部142から、他の計算機ノード110に対してのデータ送信の要求を受けると、その内容を指定された計算機ノード110に対して、ノード間通信インタフェース123を用いた送信を行う。 Further, when the inter-node communication unit 141 receives data from the inter-node communication interface 123, the inter-node communication unit 141 transmits the contents to the message processing unit 142. In addition, when a request for data transmission to another computer node 110 is received from the message processing unit 142, transmission is performed using the inter-node communication interface 123 to the computer node 110 whose contents are designated.

 FTミドルウェア131では、前述した処理部140、141、142の他に、承認メッセージリスト150、通知メッセージリスト151、およびメッセージ種類テーブル152を備える。これらの保持するデータ及びその役割については後述する。なお、メッセージ種類テーブル152は、処理において必ずしも必要なものではない。これについても後述する。 The FT middleware 131 includes an approval message list 150, a notification message list 151, and a message type table 152 in addition to the processing units 140, 141, 142 described above. These retained data and their roles will be described later. The message type table 152 is not necessarily required for processing. This will also be described later.

 続いて、FTシステム1を構成する各計算機ノード110の間の関係について説明する。図2は、第1の実施形態に係るFTシステム1を構成する複数の計算機ノード間の関係例を示す図であり、FTシステム1がフォールトトレラント機能を実現するために必要な各部分の関係例について示した図である。FTシステム1の実現するフォールトトレラント機能は、半数未満の計算機ノード110のハードウェアあるいはソフトウェアの障害が発生しても、タスク133の機能を継続して提供できる機能である。 Subsequently, the relationship between the computer nodes 110 constituting the FT system 1 will be described. FIG. 2 is a diagram illustrating a relationship example between a plurality of computer nodes constituting the FT system 1 according to the first embodiment, and a relationship example of each part necessary for the FT system 1 to realize a fault tolerant function. It is the figure shown about. The fault tolerant function realized by the FT system 1 is a function that can continuously provide the function of the task 133 even if a hardware or software failure of less than half of the computer nodes 110 occurs.

 そのために、先に説明したように、複数の計算機ノード110において同じタスク133を動作させる。タスク133が外部からの入力あるいは外部への出力を行う時、あるいは、複数のスレッドの動作順序を決定する時など、動作を計算機ノード110間で一致化させなければならない処理(以下、「動作一致化点」と呼ぶ)において、計算機ノード110間でメッセージをやりとりして、全ての計算機ノード110でタスク133が同一の挙動を示すようにする。 Therefore, as described above, the same task 133 is operated in a plurality of computer nodes 110. When the task 133 performs input from the outside or output to the outside, or when determining the operation order of a plurality of threads, the processing in which operations must be matched between the computer nodes 110 (hereinafter referred to as “operation matching”). In this case, messages are exchanged between the computer nodes 110 so that the task 133 shows the same behavior in all the computer nodes 110.

 このために、各タスク133は、各動作一致化点において、FTミドルウェア131を呼び出し、その処理の実行を要求する。一方、FTミドルウェア131は、その内容を他の計算機ノード110のタスク133の要求と比較し、過半数の計算機ノード110に関して同一であれば、それを実際に実行する。これにより、計算機ノード110の半数未満が、ハードウェア障害やソフトウェア障害によって無応答、応答遅延、あるいは、異常応答を行った場合でも、他の計算機ノード110の結果により、タスク133の動作を継続することができる。 For this purpose, each task 133 calls the FT middleware 131 at each operation matching point and requests execution of the process. On the other hand, the FT middleware 131 compares the content with the request of the task 133 of the other computer node 110, and if it is the same for the majority of the computer nodes 110, actually executes it. As a result, even if less than half of the computer nodes 110 perform no response, response delay, or abnormal response due to hardware failure or software failure, the operation of the task 133 is continued according to the result of the other computer node 110. be able to.

 本実施形態では、上述したように、動作一致化点において計算機ノード110間で合意形成の効率化のために、計算機ノード110を、役割別にリーダーノード110Aとフォロワーノード110B、110Cの二種に分けるものとする。動作一致化点において、リーダーノード110Aが、その動作を決定し、フォロワーノード110B、110Cに対して、その処理要求に対する動作内容を通知メッセージ201により通知する。フォロワーノード110B、110Cは、自身のタスク133からの処理要求と照合し、この動作内容に追従可能、すなわちリーダーノード110Aと同じ挙動を実現することが可能であれば承認メッセージ202、不可能であれば否認メッセージ203をリーダーノード110Aに返答する。リーダーノード110Aは、承認メッセージ202を多数決における賛成票、否認メッセージ203を反対票として扱い、過半数が承認メッセージ202を受け取った時には、これを実際に実行する。このような多数決冗長構成に伴うリーダーノード110A、フォロワーノード110B、110Cの基本的な挙動に関しては従来と同様である。 In the present embodiment, as described above, the computer node 110 is divided into two types, the leader node 110A and the follower nodes 110B and 110C, according to roles, in order to increase the efficiency of consensus formation between the computer nodes 110 at the operation matching point. Shall. At the operation matching point, the leader node 110A determines the operation, and notifies the follower nodes 110B and 110C of the operation content corresponding to the processing request by the notification message 201. The follower nodes 110B and 110C collate with the processing request from their own task 133, and can follow this operation content, that is, if it is possible to realize the same behavior as the leader node 110A, the approval message 202 is impossible. If the message is rejected, a reply message 203 is returned to the leader node 110A. The leader node 110A treats the approval message 202 as an approval vote in the majority vote and the denial message 203 as a negative vote, and when the majority receives the approval message 202, it actually executes this. The basic behavior of the leader node 110A and the follower nodes 110B and 110C accompanying such a majority vote redundant configuration is the same as that of the prior art.

 続いて、FTミドルウェア131内の構造であるタスク通信部140、メッセージ処理部142、ノード間通信部141の動作と、その際に使用されるデータ構造である承認メッセージリスト150、通知メッセージリスト151、およびメッセージ種類テーブル152について述べる。 Subsequently, operations of the task communication unit 140, the message processing unit 142, and the inter-node communication unit 141, which are structures in the FT middleware 131, and an approval message list 150, a notification message list 151, which are data structures used at that time, The message type table 152 will be described.

 図3、図4、図5は、承認メッセージリスト150、通知メッセージリスト151、およびメッセージ種類テーブル152の内部構造の一例をそれぞれ示した図である。このうち、承認メッセージリスト150及び通知メッセージリスト151は、メッセージ処理部142によって下記のように使用される。メッセージ処理部142は、タスク通信部140から処理要求を受け取ったとき、当該計算機ノード110がリーダーノードであれば、これを他の計算機ノードすなわちフォロワーノードに対して伝達する。メッセージ処理部142は、フォロワーノードからの承認メッセージや否認メッセージの受信の有無を管理するために、承認メッセージリスト150に承認メッセージ、否認メッセージの受信履歴を記録する。他方、フォロワーノードでは、リーダーノードからの通知メッセージの受信の有無を管理するために、通知メッセージ201のリストである通知メッセージリスト151を使用する。 3, 4, and 5 are diagrams illustrating examples of internal structures of the approval message list 150, the notification message list 151, and the message type table 152, respectively. Among these, the approval message list 150 and the notification message list 151 are used by the message processing unit 142 as follows. When the message processing unit 142 receives a processing request from the task communication unit 140, if the computer node 110 is a leader node, the message processing unit 142 transmits this request to another computer node, that is, a follower node. The message processing unit 142 records the reception history of the approval message and the denial message in the approval message list 150 in order to manage whether or not the approval message or the denial message is received from the follower node. On the other hand, the follower node uses a notification message list 151 that is a list of notification messages 201 in order to manage whether or not a notification message is received from the leader node.

 承認メッセージリスト150は、通知参照300、承認ノード集合301、及び、否認ノード集合302などの各値を各行として持つテーブルである。通知参照300は、その行の管理するところの通知メッセージ201への参照(参照は、識別子によるもの、メモリ番地によるもの、あるいは、参照先のコピーによるものなど任意の方法でよい。以下同様)用の情報で通知メッセージ201の識別情報となる。また、承認ノード集合301は、通知メッセージ201に対して承認メッセージ202を返答したフォロワーノードたる計算機ノード110の集合を表す。同様に、否認ノード集合302は、通知メッセージ201に対して否認メッセージ203を返答したフォロワーノードたる計算機ノード110の集合を表す。 The approval message list 150 is a table having values such as a notification reference 300, an approval node set 301, and a denial node set 302 as rows. The notification reference 300 is used for reference to the notification message 201 managed by the row (reference may be any method such as by identifier, by memory address, or by reference destination copy, and so on). This information becomes the identification information of the notification message 201. An approval node set 301 represents a set of computer nodes 110 that are follower nodes that have returned the approval message 202 in response to the notification message 201. Similarly, the denial node set 302 represents a set of computer nodes 110 that are follower nodes that have returned the denial message 203 to the notification message 201.

 また、通知メッセージリスト151は、リーダーノードから送付されていて、かつ、承認メッセージ202もしくは否認メッセージ203を返答していない0個以上の通知メッセージ201をリストとして持つ。 Further, the notification message list 151 has zero or more notification messages 201 sent from the leader node and not responding with the approval message 202 or the denial message 203 as a list.

 次に、タスク通信部140とメッセージ処理部142への処理の要求及びその結果の伝達に用いられるデータ構造について述べる。以下では、「処理要求」とはこのタスク通信部140からメッセージ処理部142に対して、タスクからの処理要求を伝達するデータ構造を指し、「処理結果」とはメッセージ処理部142からタスク通信部140に対して、処理要求に対する処理の結果(処理内容)を伝達するデータ構造を指す。処理要求の内部構造の一例としては、要求種別と、それに応じた要求パラメータがある。他方、処理結果の内部構造の一例としては、対応している処理要求への処理要求参照と、処理の結果を表す値である処理結果値がある。なお、タスク通信部140とメッセージ処理部142の間の伝達方式により、参照する処理が一意に特定できるのであれば、処理要求参照を省略することができる。 Next, the processing structure to the task communication unit 140 and the message processing unit 142 and the data structure used for transmitting the result will be described. Hereinafter, “processing request” refers to a data structure for transmitting a processing request from a task to the message processing unit 142 from the task communication unit 140, and “processing result” refers to a task communication unit from the message processing unit 142. Reference numeral 140 denotes a data structure for transmitting a processing result (processing content) in response to a processing request. As an example of the internal structure of the processing request, there are a request type and a request parameter corresponding to the request type. On the other hand, as an example of the internal structure of a processing result, there are a processing request reference to a corresponding processing request and a processing result value that is a value representing the processing result. If the process to be referred to can be uniquely specified by the transmission method between the task communication unit 140 and the message processing unit 142, the process request reference can be omitted.

 処理要求において、要求種別は、タスク133のFTミドルウェア131に対する要求の種別を示し、例えば、補助記憶装置124からの読み込みや補助記憶装置124への書き込み、外部通信インタフェース122からのデータ受信、あるいはデータ送信、主記憶装置121の一部領域に対する排他処理といったものである。一方、要求パラメータは、それぞれ要求種別により意味を異とする値であり、またその数も要求種別に応じて異なる。要求種別が「補助記憶装置124からの読み込み」であった場合を一例とすれば、補助記憶装置124上のファイル名、ファイル上の位置、読み込んだ値を格納する主記憶装置121の位置、読み込むバイト数等が、要求パラメータに含まれる。 In the processing request, the request type indicates the type of request to the FT middleware 131 of the task 133, for example, reading from the auxiliary storage device 124, writing to the auxiliary storage device 124, receiving data from the external communication interface 122, or data Transmission, exclusive processing for a partial area of the main storage device 121, and the like. On the other hand, the request parameters are values having different meanings depending on the request type, and the number thereof also differs depending on the request type. Taking the case where the request type is “read from auxiliary storage device 124” as an example, the file name on the auxiliary storage device 124, the position on the file, the position of the main storage device 121 that stores the read value, and the read The number of bytes is included in the request parameter.

 また処理結果は、タスク通信部140からメッセージ処理部142に対してそれまでに通知された処理要求の処理結果を、メッセージ処理部142からタスク通信部140に通知するために使用される。処理結果値は、処理要求参照により参照される処理要求の要求種別により意味を異とする値であり、またその数も要求種別に応じて異なる。なお、この処理結果値は、要求に対する結果を全て表現する必要はない。たとえば外部通信インタフェース122からデータを受信し指定した主記憶装置121へのコピーといった副作用を持つ処理要求であれば、メッセージ処理部142は、指定したデータ内容の主記憶装置121へのコピー処理を行った上で、処理結果をタスク通信部140に通知する。このような場合において、受信したデータのサイズのみを処理結果値として返し、受信データそのものの情報は、処理結果値には陽に含めないということができる。 Further, the processing result is used to notify the task communication unit 140 from the message processing unit 142 of the processing result of the processing request notified so far from the task communication unit 140 to the message processing unit 142. The processing result value is a value having a different meaning depending on the request type of the processing request referred to by the processing request reference, and the number thereof also differs depending on the request type. Note that this processing result value does not need to express all the results for the request. For example, if the processing request has a side effect such as receiving data from the external communication interface 122 and copying to the designated main storage device 121, the message processing unit 142 performs processing for copying the designated data content to the main storage device 121. After that, the processing result is notified to the task communication unit 140. In such a case, it can be said that only the size of the received data is returned as the processing result value, and the information of the received data itself is not explicitly included in the processing result value.

 図6にて、リーダーノードとフォロワーノードの間で、処理要求の伝達を行う際に用いられる通知メッセージ201及び、それに対する応答である承認メッセージ202と否認メッセージ203 のデータ構造の一例を示す。ここで通知メッセージ201は、伝達するリクエストへの参照値であり、処理要求の種別を規定した処理要求400をもつ。処理要求への参照値に加えて、あるいはそれに代えて、後述するフォロワーノードにおける内容比較のために十分な内容、すなわちその処理要求について挙動が同一であると判定するのに十分な情報を含む任意の付随データ406を含んでいてもよい。 FIG. 6 shows an example of the data structure of the notification message 201 used when the processing request is transmitted between the leader node and the follower node, and the approval message 202 and the denial message 203 as responses thereto. Here, the notification message 201 is a reference value to the request to be transmitted, and has a processing request 400 that defines the type of processing request. In addition to or instead of the reference value for the processing request, any content that is sufficient for content comparison in the follower node described later, that is, information that is sufficient to determine that the processing request has the same behavior The accompanying data 406 may be included.

 また、承認メッセージ202及び否認メッセージ203は、対応するリーダーノードから送付された通知メッセージ201を参照するための値である通知参照300を持つ。なお、通知メッセージ201を一意に識別できるのであれば、処理要求の参照で代用することもできる。 Also, the approval message 202 and the denial message 203 have a notification reference 300 that is a value for referring to the notification message 201 sent from the corresponding leader node. If the notification message 201 can be uniquely identified, it can be substituted by referring to the processing request.

 続いて図5に示すメッセージ種類テーブル152について述べる。メッセージ処理部142は、後述するように各処理要求に対して、メッセージ同期種類を判定する。メッセージ同期種類は、例えば「同期」、「準同期」のいずれかであり、その処理要求に対するメッセージの応答あるいは到着に対して、メッセージ処理部142がいつ次の処理に進むかの処理を規定するものである。例えば、リーダーノードにおけるメッセージ処理部142は、そのメッセージ同期種類が「同期」である処理要求に対しては、現在FTシステム1で正常に動作している全ての計算機ノード110からの応答を待ってから次の処理に進み、「準同期」であるメッセージに対しては、現在FTシステム1で正常に動作している計算機ノード110の半数の承認メッセージ202を待ってから次の処理に進む、といった動作を行う。この判断の方法の一例として、メッセージ種類テーブル152を用いた方法がある。 このメッセージ種類テーブル152は、上述した処理要求について、その要求種別303からメッセージ同期種類304を判断するものである。メッセージ種類テーブル152は、要求種別303とメッセージ同期種類304などの各値を各行として持つテーブルである。各行は、要求種別303をもつ処理要求、および、それを処理要求参照としてもつ通知メッセージ201のメッセージ待ちの挙動の種類が、メッセージ同期種類304であることを意味する。なお、メッセージ同期種類の判定は、この方法に限定されない。例えば、メッセージ種類テーブル152が存在しない場合であっても、計算機ノード110がメッセージにそのメッセージを一意に識別する連番を振り、ある倍数の番号のメッセージを「同期」とし、それ以外を「準同期」とするといった方法をとることもできる。 Next, the message type table 152 shown in FIG. 5 will be described. The message processing unit 142 determines the message synchronization type for each processing request as will be described later. The message synchronization type is, for example, “synchronous” or “semi-synchronous”, and specifies the process when the message processing unit 142 proceeds to the next process in response to the response or arrival of the message in response to the processing request. Is. For example, the message processing unit 142 in the leader node waits for responses from all the computer nodes 110 currently operating normally in the FT system 1 in response to a processing request whose message synchronization type is “synchronous”. The process proceeds to the next process, and for a message that is “semi-synchronous”, it waits for half of the approval messages 202 of the computer nodes 110 that are currently operating normally in the FT system 1 and then proceeds to the next process. Perform the action. As an example of this determination method, there is a method using the message type table 152. This message type table 152 is for determining the message synchronization type 304 from the request type 303 for the processing request described above. The message type table 152 is a table having values such as the request type 303 and the message synchronization type 304 as rows. Each line means that the type of message waiting behavior of the processing request having the request type 303 and the notification message 201 having the processing type as a processing request reference is the message synchronization type 304. Note that the determination of the message synchronization type is not limited to this method. For example, even when the message type table 152 does not exist, the computer node 110 assigns a serial number that uniquely identifies the message to the message, sets a message of a certain multiple number as “synchronized”, and sets the other as “semi-standard”. A method such as “synchronization” can also be used.

 続いて、FTシステム1を構成する各計算機ノード110A、110B、110Cの動作と連携について説明する。図7は第1の実施形態に係るFTシステム1における計算機ノード110の各処理部の動作フロー例を示す図である。この図においては、リーダーノードとして計算機ノード110A、フォロワーノードとして計算機ノード110Bの動作を示している。この他のフォロワーノード110Cについては、計算機ノード110Bと同様の動作を行うので説明を省略している。 Subsequently, the operation and cooperation of the computer nodes 110A, 110B, and 110C constituting the FT system 1 will be described. FIG. 7 is a diagram illustrating an example of an operation flow of each processing unit of the computer node 110 in the FT system 1 according to the first embodiment. In this figure, the operation of the computer node 110A as the leader node and the computer node 110B as the follower node are shown. Since the other follower node 110C performs the same operation as the computer node 110B, description thereof is omitted.

 まず、リーダーノードたる計算機ノード110Aの動作について述べる。計算機ノード110Aのタスク通信部140は、アプリケーション132から要求を受け(510)、これを処理要求400の形でメッセージ処理部142に伝達する(511)。メッセージ処理部142は、この処理要求400に応じて必要であれば適宜処理を行い、その処理要求400等を含む通知メッセージ201を、ノード間通信部141に依頼してフォロワーノードに対して送信する(512)。 First, the operation of the computer node 110A as a leader node will be described. The task communication unit 140 of the computer node 110A receives a request from the application 132 (510), and transmits this request to the message processing unit 142 in the form of a processing request 400 (511). The message processing unit 142 performs processing as needed according to the processing request 400, and sends a notification message 201 including the processing request 400 to the follower node by requesting the inter-node communication unit 141. (512).

 次に計算機ノード110Aのメッセージ処理部142は、タスク通信部140から受けた上述の処理要求400について、メッセージ同期種類を判断する(513)。メッセージ処理部142は、ステップ513で判断したメッセージ同期種類に応じた、フォロワーノードたる計算機ノード110B、110Cからの承認メッセージ202あるいは否認メッセージ203の応答待ちを行う(514)。この時、計算機ノード110Aのメッセージ処理部142は、すべてのフォロワーノードからの応答について受信待ち(メッセージ同期種類=「同期」)、半数のフォロワーノードからの承認メッセージ202について受信待ち(メッセージ同期種類=「準同期」)といった、メッセージ同期種類に応じた待機を行う。こうした応答待ちの後、メッセージ処理部142は、各フォロワーノードより得た該当タスクの処理結果に関して多数決処理を行い、タスク通信部140に対して、最終的に一致した結果を処理結果として伝達する(515)。タスク通信部140は、この結果をアプリケーション132に通知する(516)。 Next, the message processing unit 142 of the computer node 110A determines the message synchronization type for the processing request 400 received from the task communication unit 140 (513). The message processing unit 142 waits for a response of the approval message 202 or the denial message 203 from the computer nodes 110B and 110C, which are follower nodes, according to the message synchronization type determined in step 513 (514). At this time, the message processing unit 142 of the computer node 110A is waiting to receive responses from all follower nodes (message synchronization type = “synchronization”), and is waiting to receive approval messages 202 from half of the follower nodes (message synchronization type = Wait according to the message synchronization type, such as “semi-synchronous”). After waiting for the response, the message processing unit 142 performs majority processing on the processing result of the corresponding task obtained from each follower node, and transmits the finally matched result to the task communication unit 140 as the processing result ( 515). The task communication unit 140 notifies the result to the application 132 (516).

 他方、フォロワーノードたる計算機ノード110Bの動作について説明する。なお、フォロワーノードである計算機ノード110Bのタスク通信部140の動作は、リーダーノードたる計算機ノード110Aの場合と同様であるので説明を省略する。フォロワーノードたる計算機ノード110Bのメッセージ処理部142は、リーダーノードたる計算機ノード110Aからの通知メッセージ201を待つ(517)。リーダーノードから通知メッセージ201が送信されてきた場合、計算機ノード110Bのメッセージ処理部142は、リーダーノードから受信した通知メッセージ201と、自身のタスク通信部140でタスク133から受けている処理要求の内容とを比較し、リーダーノードの挙動に追従可能(すなわち同じタスクについて処理を同期的に実行可能)かどうかを判断し、追従可能であれば承認メッセージ202を、追従不可であれば否認メッセージ203をノード間通信部141経由でリーダーノードに送信する(519)。 On the other hand, the operation of the computer node 110B as a follower node will be described. Note that the operation of the task communication unit 140 of the computer node 110B that is the follower node is the same as that of the computer node 110A that is the leader node, and thus description thereof is omitted. The message processing unit 142 of the computer node 110B as the follower node waits for the notification message 201 from the computer node 110A as the leader node (517). When the notification message 201 is transmitted from the leader node, the message processing unit 142 of the computer node 110B receives the notification message 201 received from the leader node and the content of the processing request received from the task 133 by its own task communication unit 140. To determine whether it is possible to follow the behavior of the leader node (that is, processing can be executed synchronously for the same task). The data is transmitted to the leader node via the inter-node communication unit 141 (519).

 以下では、図7の点線で囲んだフローについて、その詳細を述べる。リーダーノードのメッセージ処理部142のフロー(501)について図8及び図9で、フォロワーノードのメッセージ処理部142のフロー(502)について図10でそれぞれ述べる。 Below, the details of the flow enclosed by the dotted line in FIG. 7 will be described. The flow (501) of the message processing unit 142 of the leader node will be described with reference to FIGS. 8 and 9, and the flow (502) of the message processing unit 142 of the follower node will be described with reference to FIG.

 図8、図9は第1の実施形態に係るフォールトトレラントシステムにおけるリーダーノードのメッセージ処理部での動作フロー例1、2を示す図であり、具体的には、リーダーノードのメッセージ処理部142が、タスク通信部140から処理要求を受け、この処理要求に対応する処理を行い、その結果をタスク通信部140に返す処理の一例を示したフローチャートである。 8 and 9 are diagrams showing operation flow examples 1 and 2 in the message processing unit of the leader node in the fault tolerant system according to the first embodiment. Specifically, the message processing unit 142 of the leader node has 5 is a flowchart showing an example of processing for receiving a processing request from the task communication unit 140, performing processing corresponding to the processing request, and returning the result to the task communication unit 140.

 この場合、リーダーノードのメッセージ処理部142は、まずタスク通信部140から処理要求を受け取る(600)と、外部に影響がある場合、すなわち、外部通信インタフェース122からの送信などの場合を除いて、この処理要求に対応する処理を行う(601)。このときメッセージ処理部142は、必要であれば要求付随データ(図6の付随データ406に対応)を生成する。メッセージ処理部142は、これをもとに、受信した処理要求を参照する通知メッセージ201を生成し、要求付随データがあればそれを加え、全フォロワーノード(計算機ノード110B及び計算機ノード110C)に対してノード間通信部141を介して送信する(602)。 In this case, the message processing unit 142 of the leader node first receives the processing request from the task communication unit 140 (600), and when there is an external influence, that is, except for the case of transmission from the external communication interface 122, Processing corresponding to this processing request is performed (601). At this time, the message processing unit 142 generates request accompanying data (corresponding to the accompanying data 406 in FIG. 6) if necessary. Based on this, the message processing unit 142 generates a notification message 201 that refers to the received processing request, adds any data associated with the request, and adds it to all follower nodes (computer node 110B and computer node 110C). And transmitted via the inter-node communication unit 141 (602).

 次にメッセージ処理部142は、FTミドルウェア131における承認メッセージリスト150に、該当通知メッセージ201に対応する、すなわち通知参照300として当該通知メッセージ201を含む行を追加する(603)。次にメッセージ処理部142は、ステップ600で受けた処理要求について、メッセージ同期種類を判定する(604)。 Next, the message processing unit 142 adds a line corresponding to the corresponding notification message 201 to the approval message list 150 in the FT middleware 131, that is, including the notification message 201 as the notification reference 300 (603). Next, the message processing unit 142 determines the message synchronization type for the processing request received in step 600 (604).

 ステップ604において、メッセージ処理部142は、例えばFTミドルウェア131におけるメッセージ種類テーブル152に対し、上述の処理要求の情報を照合することにより、メッセージ種類テーブル152の各レコードのうち、要求種別400の値が、ステップ600で受けた処理要求(例:sync,Network_Send,Network_Recv,Lock_Acquire)の値にマッチした行を特定し、該当処理要求のメッセージ同期種類304を判断する。但し、既に上述したように、メッセージ同期種類を判断する手法は、メッセージ種類テーブル152を用いたこの方法に限定されず、メッセージ同期種類を一つに決定できる方法であればいずれのものも採用できる。 In step 604, the message processing unit 142 collates the above-described processing request information with the message type table 152 in the FT middleware 131, for example, so that the value of the request type 400 in each record of the message type table 152 is set. The line that matches the value of the processing request received in step 600 (eg, sync, Network_Send, Network_Recv, Lock_Acquire) is identified, and the message synchronization type 304 of the corresponding processing request is determined. However, as described above, the method for determining the message synchronization type is not limited to this method using the message type table 152, and any method can be adopted as long as it can determine the message synchronization type as one. .

 上述のステップ604の結果、ステップ600で受けた処理要求のメッセージ同期種類が「同期」であったとする(604:同期)。このとき、リーダーノードのメッセージ処理部142は、その時点で動作している全てのフォロワーノードからの応答を、該当処理要求に関する動作の継続に必要とする。すなわちリーダーノードのメッセージ処理部142は、全フォロワーノード(計算機ノード110B及び計算機ノード110C)からの承認メッセージ202あるいは否認メッセージ203を待つ(605)。 As a result of step 604 described above, it is assumed that the message synchronization type of the processing request received in step 600 is “synchronization” (604: synchronization). At this time, the message processing unit 142 of the leader node requires responses from all the follower nodes operating at that time in order to continue the operation related to the processing request. That is, the message processing unit 142 of the leader node waits for an approval message 202 or a denial message 203 from all follower nodes (computer node 110B and computer node 110C) (605).

 その後、メッセージ処理部142は、承認メッセージ202の受信をノード間通信部141から通知された(以下、簡単に「受信した」とする)場合(606:Ack)、承認メッセージリスト150の当該行(該当処理要求に関する通知メッセージ201に関してステップ603で追加した行)において、承認ノード集合301に、送信元の計算機ノード110(すなわち該当通知メッセージ201に対して承認メッセージ202を返答したフォロワーノード)の識別情報を追加する(607)。 Thereafter, when the message processing unit 142 is notified of the reception of the approval message 202 from the inter-node communication unit 141 (hereinafter simply referred to as “received”) (606: Ack), the message processing unit 142 in the approval message list 150 ( (The row added in step 603 regarding the notification message 201 related to the corresponding processing request), the identification information of the computer node 110 of the transmission source (that is, the follower node that returned the approval message 202 to the notification message 201) in the approval node set 301. Is added (607).

 他方、メッセージ処理部142は、否認メッセージ203をフォロワーノードから受信した場合(606:Nack)、同様に、承認メッセージリスト150の当該行において、否認ノード集合302に、送信元の計算機ノード110(すなわち該当通知メッセージ201に対して否認メッセージ202を返答したフォロワーノード)の識別情報を追加する(608)。 On the other hand, when the message processing unit 142 receives the denial message 203 from the follower node (606: Nack), similarly, in the corresponding row of the approval message list 150, the message processing unit 142 adds the denial node set 302 to the transmission source computer node 110 (that is, The identification information of the follower node that responded with the denial message 202 to the notification message 201 is added (608).

 リーダーノードのメッセージ処理部142は、全てのフォロワーノードが承認メッセージリスト150における承認ノード集合301あるいは否認ノード集合302のいずれかに入るまで、これを繰り返す(609)。これにより、リーダーノードのメッセージ処理部142は、該当メッセージ201に対応する処理要求を実行する時点まで、すべてのフォロワーノードのタスク133が到達したことを確認できる。 The message processing unit 142 of the leader node repeats this until all follower nodes enter either the approval node set 301 or the denial node set 302 in the approval message list 150 (609). As a result, the message processing unit 142 of the leader node can confirm that the tasks 133 of all the follower nodes have arrived until the time when the processing request corresponding to the message 201 is executed.

 ステップ609において、全てのフォロワーノードが承認メッセージリスト150における承認ノード集合301あるいは否認ノード集合302のいずれかに入ったと判定したならば(609:Yes)、リーダーノードのメッセージ処理部142は、同タスクに関する各計算機ノードでの実行結果について、多数決の原理に従って、処理を継続するかどうか判定する(610)。承認メッセージ202を送信した計算機ノードが否認メッセージより多数、ないしは否認メッセージと同数である場合(610:Yes)、リーダーノードのメッセージ処理部142は、否認メッセージ203を送信したノードを停止させる。これは、リーダーノード自身は当然この通知メッセージ201の内容に同意しているためで、承認メッセージ202を送信したフォロワーノードと否認メッセージ203を送信したフォロワーノードが同数であれば、過半数のノードが同意していることになる。承認メッセージ202を送信したノードが多数である場合に、まだ処理要求に対応する処理を実行していない場合には、この時点においては過半数の計算機ノードの同意が得られているため、リーダーノードはこれを実行する(611)。最後に、リーダーノードのメッセージ処理部142は、ステップ611の処理結果を生成して(612)、それをタスク通信部140に伝達し(613)、タスク通信部140からの次の処理要求を待つ処理(600)に戻る。 If it is determined in step 609 that all follower nodes have entered either the approval node set 301 or the denial node set 302 in the approval message list 150 (609: Yes), the message processing unit 142 of the leader node performs the same task. It is determined whether or not the processing is to be continued according to the principle of majority vote for the execution result in each computer node (610). When the number of computer nodes that transmitted the approval message 202 is greater than or equal to the number of rejection messages (610: Yes), the message processing unit 142 of the leader node stops the node that transmitted the rejection message 203. This is because the leader node itself naturally agrees with the contents of the notification message 201. If the number of follower nodes that transmitted the approval message 202 and the number of follower nodes that transmitted the denial message 203 are the same, the majority of the nodes agree. Will be. If there are a large number of nodes that have sent the approval message 202 and the processing corresponding to the processing request has not yet been executed, the agreement of the majority of the computer nodes has been obtained at this point, so the leader node This is executed (611). Finally, the message processing unit 142 of the leader node generates the processing result of step 611 (612), transmits it to the task communication unit 140 (613), and waits for the next processing request from the task communication unit 140. Return to processing (600).

 他方、否認メッセージ203を送信したノードのほうが多数である場合には(610:No)、リーダーノード自身に障害が発生したとみなせるので、リーダーノードのメッセージ処理部142は当該リーダーノード自身の動作を停止するなどの障害時の処理を実行する(614)。 On the other hand, when the number of nodes that transmitted the denial message 203 is larger (610: No), it can be considered that a failure has occurred in the leader node itself, so the message processing unit 142 of the leader node performs the operation of the leader node itself. Processing at the time of failure such as stopping is executed (614).

 一方、上述のステップ604の結果、メッセージ同期種類が「準同期」であったとする(604:準同期)。このとき、リーダーノードは、その時点で動作しているフォロワーノードのうち過半数からの応答を、該当タスクに関する動作の継続に必要とする。動作は、承認メッセージ202や否認メッセージ203の受信に関しては、前段の場合と同様であるが(616,617,618,620)、条件が異なり、フォロワーノードの半数以上が承認ノード集合301に加わるか、全てのフォロワーノードが承認ノード集合301あるいは否認ノード集合302のいずれかに入るまでこれを繰り返す(619)。リーダノード自身は、この決断に当然同意しているので、フォロワーノードの半数以上が承認メッセージ202を送信したということは、過半数のノードが応答したということになる。その後の多数決結果の判定処理(610)以降の処理は、前段の場合と同様である。なお、リーダーノードが処理を進めた後も、残りのフォロワーノードは、承認メッセージ202あるいは否認メッセージ203を送信する可能性がある。このために、リーダーノードのメッセージ処理部142は、これらを受け取った場合に、該当する承認ノード集合301あるいは否認ノード集合302に加わる操作を随時行う。否認メッセージ203を受信した場合には、これが少数派であることがわかるので、ただちにそのノードを停止させる。 On the other hand, it is assumed that the message synchronization type is “quasi-synchronization” as a result of the above-described step 604 (604: quasi-synchronization). At this time, the leader node requires a response from a majority of the follower nodes operating at that time in order to continue the operation related to the task. The operation is the same as that in the previous stage with respect to the reception of the approval message 202 and the denial message 203 (616, 617, 618, 620), but the conditions are different and whether more than half of the follower nodes are added to the approval node set 301. This is repeated until all follower nodes enter either the approval node set 301 or the denial node set 302 (619). The leader nodes themselves naturally agree with this decision, so that more than half of the follower nodes have sent the approval message 202 means that a majority of the nodes have responded. Subsequent majority decision result determination processing (610) and subsequent processing is the same as in the previous stage. Even after the leader node advances the processing, the remaining follower nodes may transmit the approval message 202 or the denial message 203. For this reason, when receiving these, the message processing unit 142 of the leader node performs an operation of adding to the corresponding approval node set 301 or denial node set 302 as needed. When the denial message 203 is received, it is known that this is a minority, and the node is immediately stopped.

 上述した内容は、全てのフォロワーノードが正しく応答をした場合であるが、背景で述べたように、計算機ノード110や、ノード間ネットワーク112、ノード間通信インタフェース123での障害発生等により、フォロワーノードが無応答となる場合、あるいは、応答が著しく遅延する場合がある。この状況下では、上述のフローのうち、メッセージ同期種類が「同期」(604:同期)であった際の、リーダーノードのメッセージ処理部142が行う受信待ち(605)の処理に影響が及ぶ。 The above-mentioned content is a case where all the follower nodes respond correctly, but as described in the background, the follower node is caused by a failure in the computer node 110, the inter-node network 112, the inter-node communication interface 123, or the like. May become unresponsive or the response may be significantly delayed. Under this situation, the process of waiting for reception (605) performed by the message processing unit 142 of the leader node when the message synchronization type is “synchronization” (604: synchronization) in the above-described flow is affected.

 ステップ602での通知メッセージ201の送信時刻から所定時間(以下、タイムアウト時間とする)が経過した場合(606:タイムアウト)、リーダーノードのメッセージ処理部142は、その時刻までに承認メッセージ202もしくは否認メッセージ203を送らなかったフォロワーノードで障害が発生したとみなす。そして、リーダーノードのメッセージ処理部142は、多数決結果の判定判定(610)の処理にうつる。このとき、障害が発生したとみなされたフォロワーノードは、否認ノード集合302にあるとみなしてもよいし(621)、計算機ノードの全数から取り除いて判断してもよい。 When a predetermined time (hereinafter referred to as timeout time) has elapsed from the transmission time of the notification message 201 in step 602 (606: timeout), the message processing unit 142 of the leader node has received the approval message 202 or the denial message by that time. It is assumed that a failure has occurred in the follower node that did not send 203. Then, the message processing unit 142 of the leader node proceeds to the determination processing (610) of the majority result. At this time, the follower node that is considered to have failed may be considered to be in the denial node set 302 (621), or may be determined by removing it from the total number of computer nodes.

 上述のタイムアウト時間としては、最も単純な例では定数値が想定できる。或いは、例えば計算機ノード110が、任意のアルゴリズムによってタイムアウト時間を決定するとしてもよい。例えば、上述の通知メッセージ201に応じて所定処理を実行する時間を所定アルゴリズムで予測して、フォロワーノードからの承認/否認のいずれかのメッセージの到達期待時間を特定し、これをタイムアウト時間と算出するなどしてもよい。定数を用いた場合であっても、全メッセージのうちメッセージ同期種類が「同期」のものについてのみ、タイムアウト時間を用いた障害判定が行われるため、過半数の計算機ノードが大きな遅延なく動作している場合には、半数未満の、すなわち少数派の計算機ノードで一時的な遅れがあっても、それを吸収することができる。 ∙ As the above timeout period, a constant value can be assumed in the simplest example. Alternatively, for example, the computer node 110 may determine the timeout time by an arbitrary algorithm. For example, a predetermined algorithm is used to predict a time for executing a predetermined process in accordance with the notification message 201 described above, and an expected arrival time of either an approval / denial message from the follower node is specified, and this is calculated as a timeout time. You may do it. Even if a constant is used, failure determination using the timeout time is performed only for all messages whose message synchronization type is "synchronous", so the majority of computer nodes are operating without significant delay. In some cases, even if there is a temporary delay in less than half of the computer nodes, that is, a minority computer node, it can be absorbed.

 また、メッセージ同期種類304が「同期」となる処理要求の要求パラメータに、実行制約時間を加え、これを基準にタイムアウト時間を定めることもできる。これは、タイムアウト時間を明示的にアプリケーションから制御することで、例えば周期的に実行されるタスクの周期時間というようなアプリケーションの特性に応じた障害判定を行うことが可能になるメリットがある。 Also, an execution constraint time can be added to a request parameter of a processing request in which the message synchronization type 304 is “synchronous”, and a timeout time can be determined based on this. This is advantageous in that it is possible to perform failure determination according to application characteristics such as a periodic time of a task that is periodically executed by explicitly controlling the timeout time from the application.

 メッセージ同期種類304が「同期」となる処理要求は、例えば、既知の時間制約のもと一定の時間間隔をおいて同じプログラムを繰り返し実行するタスク(「周期タスク」)での、繰り返し実行する領域の開始点を示す要求であるとか、未知のタイミングで到達する外部からのメッセージ受信、つまり、外部通信インタフェース122からのデータ受信の要求などである。メッセージ同期種類304が「同期」である二つの連続したメッセージについて、それらに対応する処理要求を発行するタスク133の実行箇所への早い方の到達時刻から、遅い方の到達時刻の時間についての時間制約(以下「時間制約」という)が明確である場合に、その時間制約を、いずれかの処理要求の要求パラメータに加える。 The processing request in which the message synchronization type 304 is “synchronous” is, for example, an area that is repeatedly executed in a task (“periodic task”) that repeatedly executes the same program at a constant time interval under a known time constraint. For example, a request for receiving a message from the outside that arrives at an unknown timing, that is, a request for receiving data from the external communication interface 122. For two consecutive messages whose message synchronization type 304 is "synchronous", the time from the earlier arrival time to the execution location of the task 133 that issues a processing request corresponding to them, the time from the earlier arrival time When the constraint (hereinafter referred to as “time constraint”) is clear, the time constraint is added to the request parameter of any processing request.

 この時間制約を用いたステップ605におけるタイムアウト処理は、下記の通りになる。リーダーノードのメッセージ処理部142は、直前のメッセージ同期種類304が「同期」である処理要求の受信時刻から、その要求パラメータに含まれる時間制約、もしくは、現に処理している処理要求の要求パラメータに含まれる時間制約(どちらを用いるかは実装に依存する)だけ経った場合に、タイムアウトとみなす。これ以外のメッセージ同期種類304についても、あるいは、フォロワーノードについても、例えば、直前のメッセージ同期種類304が「同期」である処理要求の受信時刻から、その要求パラメータに含まれる時間制約を、あるいは、その時間制約を適当な割合で按分した時間をタイムアウト時間として用いることができる。 Timeout processing in step 605 using this time constraint is as follows. The message processing unit 142 of the leader node changes the time constraint included in the request parameter or the request parameter of the processing request currently being processed from the reception time of the processing request whose previous message synchronization type 304 is “synchronous”. A timeout is considered when only the included time constraints (which depends on the implementation) are passed. For other message synchronization types 304 or follower nodes, for example, from the reception time of the processing request whose previous message synchronization type 304 is “synchronous”, the time constraint included in the request parameter, or A time obtained by apportioning the time constraint at an appropriate ratio can be used as the timeout time.

 なお、後述するが、メッセージ種類が「非同期」の時の受信待ちにおいては、別の方法で決められたタイムアウト時間を用いて、障害判定を行ってもよいし、タイムアウトを用いないということもできる。後者では、過半数のノードが応答をしない場合にはリーダーノードはフォロワーノードからの承認/否認メッセージを無限に待つことになるが、そもそも過半数のノードが無応答である場合には、FTシステム1として動作を継続できない状態であって、停止させられるべき状態であり、可用性の点ではこれを大きく損なうものではない。 As will be described later, when waiting for reception when the message type is “asynchronous”, a failure determination may be performed using a timeout time determined by another method, or a timeout may not be used. . In the latter case, when a majority of the nodes do not respond, the leader node waits indefinitely for an approval / denial message from the follower node. However, if a majority of the nodes are not responding, the FT system 1 is used. This is a state in which the operation cannot be continued and should be stopped, and this is not greatly impaired in terms of availability.

 続いて、フォロワーノードのメッセージ処理部142における処理について説明する。図10は第1の実施形態に係るFTシステム1におけるフォロワーノードのメッセージ処理部142での動作フロー例を示す図であり、具体的には、フォロワーノードのメッセージ処理部142が、タスク通信部140から処理要求を受けて対応する処理を行い、その結果をタスク通信部140に返す処理の一例を示したフローチャートである。 Subsequently, processing in the message processing unit 142 of the follower node will be described. FIG. 10 is a diagram illustrating an example of an operation flow in the message processing unit 142 of the follower node in the FT system 1 according to the first embodiment. Specifically, the message processing unit 142 of the follower node includes the task communication unit 140. 7 is a flowchart illustrating an example of processing for receiving a processing request from the server and performing corresponding processing and returning the result to the task communication unit 140.

 この場合、フォロワーノードのメッセージ処理部142は、まず当該フォロワーノードにおけるタスク通信部140からの処理要求、またはノード間通信部141からのメッセージを待つ(700)。ここで、フォロワーノードのメッセージ処理部142は、タスク通信部140から処理要求を受け取った場合(701:Yes)、通知メッセージリスト151を確認し、タスク通信部140から受けた処理要求に対応する通知メッセージ201がリーダーから到着しているかどうか確認する(702)。 In this case, the message processing unit 142 of the follower node first waits for a processing request from the task communication unit 140 or a message from the inter-node communication unit 141 in the follower node (700). Here, when the message processing unit 142 of the follower node receives a processing request from the task communication unit 140 (701: Yes), the notification message list 151 is confirmed and a notification corresponding to the processing request received from the task communication unit 140 is received. It is confirmed whether the message 201 has arrived from the reader (702).

 リーダーノードから対応する通知メッセージ201が到着している場合(702:Yes)、フォロワーノードのメッセージ処理部142は、通知メッセージ201の参照する処理要求を、タスク通信部140から受け取った処理要求と比較する(703)。ここでの比較処理は、タスク通信部140から得た処理要求と通知メッセージ201が含んでいた処理要求の互いの識別情報や付随データ等の一致判定など、二つの処理要求に応じた処理の結果が同一であると判定するのに十分な処理であればよい。 When the corresponding notification message 201 has arrived from the leader node (702: Yes), the message processing unit 142 of the follower node compares the processing request referred to by the notification message 201 with the processing request received from the task communication unit 140. (703). The comparison processing here is the result of processing in accordance with two processing requests, such as the determination of whether the processing request obtained from the task communication unit 140 and the processing request included in the notification message 201 match each other's identification information or accompanying data. As long as they are sufficient to determine that they are the same.

 このステップ703の結果、通知メッセージ201の参照する処理要求と、タスク通信部140から受け取った処理要求が同一であると判定したならば(703:YES)、フォロワーノードのメッセージ処理部142は、リーダーノードに対する承認メッセージ202の送信をノード間通信部141を通じて行う(704)。 If it is determined in step 703 that the processing request referred to by the notification message 201 is the same as the processing request received from the task communication unit 140 (703: YES), the message processing unit 142 of the follower node The approval message 202 is transmitted to the node through the inter-node communication unit 141 (704).

 また、フォロワーノードのメッセージ処理部142は、該当通知メッセージ201をもとに、必要であれば対応する所定処理を実行して(705)、処理結果を生成し(706)、これをタスク通信部140に対して伝達する(707)。 Further, the message processing unit 142 of the follower node executes a corresponding predetermined process if necessary based on the corresponding notification message 201 (705), generates a processing result (706), and outputs it as a task communication unit. It transmits to 140 (707).

 他方、上述のステップ703の結果、通知メッセージ201の参照する処理要求と、タスク通信部140から受け取った処理要求が同一でないと判定したならば(703:No)、フォロワーノードのメッセージ処理部142は、リーダーノードに対する否認メッセージ203の送信をノード間通信部141を通じて行う(708)。なお、このときフォロワーの立場からはリーダーノードの挙動が異常とみなせるので、フォロワーノードのメッセージ処理部142は、リーダーノードの再選出等の提案を他の計算機ノード各々に行うなど、障害発生時の処理を行う(711)。 On the other hand, if it is determined in step 703 that the processing request referred to by the notification message 201 is not the same as the processing request received from the task communication unit 140 (703: No), the message processing unit 142 of the follower node Then, the denial message 203 is transmitted to the leader node through the inter-node communication unit 141 (708). At this time, since the behavior of the leader node can be regarded as abnormal from the standpoint of the follower, the message processing unit 142 of the follower node makes a proposal such as re-election of the leader node to each of the other computer nodes. Processing is performed (711).

 上述のステップ701にてタスク通信部140からの処理要求を受け取った時点で、リーダーノードからの処理要求が到着していない場合(702:No)、フォロワーノードのメッセージ処理部142は、リーダーノードからの処理要求の到着を待つこととなる(709)。リーダーノードからの処理要求を受信した場合(710:No)、フォロワーノードのメッセージ処理部142は、リーダーノードからの処理要求が既に通知メッセージリスト151に存在していた場合(702:Yes)と同じ処理に進む。 When the processing request from the leader node has not arrived when the processing request from the task communication unit 140 is received in step 701 described above (702: No), the message processing unit 142 of the follower node receives the request from the leader node. (709). When the processing request from the leader node is received (710: No), the message processing unit 142 of the follower node is the same as when the processing request from the leader node already exists in the notification message list 151 (702: Yes). Proceed to processing.

 他方、リーダーノードの場合と同様に、フォロワーノードにおいて、リーダーノードもしくはノード間通信インタフェース123の障害等により、リーダーノードからの通知メッセージ201が著しく遅延するか、あるいは、これを受信できないことがある。このため、フォロワーノードのメッセージ処理部142は、ステップ709の待ちにおいて、処理要求の受信時刻からある時間(以下、タイムアウト時間という)が経過しても、リーダーノードから相当する通知メッセージ201を受信しなかった時は、リーダーノードあるいはそのノード間通信インタフェース123において障害が発生したとみなす(710:Yes)。そして、ステップ703で、通知メッセージ201の参照する処理要求と、タスク通信部140から受け取った処理要求とが同一でないと判定された場合と同様に、リーダーノードの再選出提案などの障害発生時の処理を行う(711)。 On the other hand, as in the case of the leader node, in the follower node, the notification message 201 from the leader node may be significantly delayed or cannot be received due to a failure of the leader node or the inter-node communication interface 123 or the like. Therefore, the message processing unit 142 of the follower node receives the corresponding notification message 201 from the leader node even when a certain time (hereinafter referred to as a timeout time) has elapsed from the reception time of the processing request while waiting for step 709. If not, it is considered that a failure has occurred in the leader node or the inter-node communication interface 123 (710: Yes). Then, in the case where it is determined in step 703 that the processing request referred to by the notification message 201 and the processing request received from the task communication unit 140 are not the same, at the time of failure such as a leader node re-election proposal Processing is performed (711).

 なお、ステップ701において、フォロワーノードのメッセージ処理部142が、ノード間通信部141から通知メッセージ201を受け取った場合(701:No)、当該通知メッセージ201に対応する処理要求は、タスク通信部140から未だ受信していないため、後の処理のために、該当通知メッセージ201の情報を通知メッセージリスト151に格納する(712)。その後、フォロワーノードのメッセージ処理部142は、処理要求もしくは通知メッセージ201の受信待ち(700)の処理に戻る。 In step 701, when the message processing unit 142 of the follower node receives the notification message 201 from the inter-node communication unit 141 (701: No), the processing request corresponding to the notification message 201 is sent from the task communication unit 140. Since it has not been received yet, the information of the corresponding notification message 201 is stored in the notification message list 151 for later processing (712). Thereafter, the message processing unit 142 of the follower node returns to the process of waiting for reception of the processing request or notification message 201 (700).

 以上の第1の実施形態を採用することで、多数決冗長構成を成すリーダーノードとフォロワーノードの動作を同一に保ちつつ、リーダーノードでのタイムアウト判定(フォロワーノードからの承認/否認メッセージの受信待ち期限の判定)を同期メッセージに関するもののみに減らすことにより、少数のフォロワーノードの一時的な処理や伝送の遅延を許容することができる。 By adopting the above first embodiment, the operation of the leader node and the follower node in the majority redundant configuration is kept the same, and the timeout determination at the leader node (reception waiting time of the approval / denial message from the follower node) Is reduced to only those related to synchronization messages, it is possible to allow a temporary processing and transmission delay of a small number of follower nodes.

 また、メッセージ種類テーブル152を使用したメッセージ同期種類の判定を行うことによって、例えば、同期を行うための同期要求を発行するAPIをタスク通信部140が提供することにより、アプリケーション132が同期するタイミングを制御できるようになり、アプリケーション132の特性に一致したタイムアウト判定の制御を実現することができる。 Further, by determining the message synchronization type using the message type table 152, for example, the task communication unit 140 provides an API for issuing a synchronization request for performing synchronization, whereby the timing at which the application 132 is synchronized is determined. Control of timeout determination that matches the characteristics of the application 132 can be realized.

 なお、以上の説明では、タスク133が単一の場合を述べたが、複数ある場合も同様である。ただし、タスク通信部140およびメッセージ処理部142の処理は、タスクごとにスレッド化を行うなどの並列化を行うか、あるいは、入出力待ちの多重化処理を加えることによって、一つのタスク133に対する処理要求の処理によって、他のタスク133が滞らないようにする必要がある。 In the above description, the case where there is a single task 133 has been described. However, the processing of the task communication unit 140 and the message processing unit 142 is performed for one task 133 by performing parallel processing such as threading for each task or by adding multiplexing processing waiting for input / output. It is necessary to prevent other tasks 133 from being delayed by processing the request.

 また、タスク通信部140からの処理要求の受け取りとノード間通信部141からの通知メッセージ201の受け取りを単一の流れとして記述したが、効率化のために複数のスレッド等を用いることによって、それぞれを並列的に実行しても構わない。
---第2の実施形態---
 計算機ノードのメッセージ処理部142が要求リストを用いて処理を行う形態も想定できる。この要求リストとは、メッセージ処理部142が、他の計算機ノードから通知メッセージ201を受け取った際に、それに対して承認メッセージ202を返すか否認メッセージ203を返すかを選択するためのテーブルである。そのため要求リストは、返信するメッセージを承認/否認メッセージのいずれにするか判断するための十分なデータ、例えば、処理要求や要求付随データの各値を持つ。
In addition, the reception of the processing request from the task communication unit 140 and the reception of the notification message 201 from the inter-node communication unit 141 are described as a single flow, but by using a plurality of threads or the like for efficiency, May be executed in parallel.
--- Second Embodiment ---
It can be assumed that the message processing unit 142 of the computer node performs processing using the request list. This request list is a table for selecting, when the message processing unit 142 receives the notification message 201 from another computer node, whether to return the approval message 202 or the denial message 203 in response thereto. Therefore, the request list has sufficient data for determining whether the message to be returned is an approval / denial message, for example, each value of a processing request or request-accompanying data.

 こうした形態を第2の実施例として説明する。なお、第2の実施形態におけるFTシステム1は図1と同様の構成である。第2の実施形態におけるフォロワーノードの挙動は、図11に示すフローに沿ったものとなる。これ以外の動作と構成は、第1の実施形態と同様であるため、重複箇所の説明については省略する。 Such a form will be described as a second embodiment. The FT system 1 in the second embodiment has the same configuration as that in FIG. The behavior of the follower node in the second embodiment follows the flow shown in FIG. Since other operations and configurations are the same as those of the first embodiment, description of overlapping portions is omitted.

 図11、図12は第2の実施形態におけるフォールトトレラントシステムのフォロワーノードのメッセージ処理部での動作フロー例1、例2を示す図である。ここでの第1の実施形態との大きな違いは、タスク通信部140で得た処理要求に対する処理を、リーダーノードからの通知メッセージ201の到着を待たずに進行させる点となる。 FIG. 11 and FIG. 12 are diagrams showing an operation flow example 1 and an example 2 in the message processing unit of the follower node of the fault tolerant system in the second embodiment. The major difference from the first embodiment is that the processing for the processing request obtained by the task communication unit 140 proceeds without waiting for the notification message 201 from the leader node to arrive.

 第2の実施形態におけるフォロワーノードのメッセージ処理部142は、タスク通信部140からの処理要求、またはノード間通信部141からの通知メッセージの伝達を待つ(800)。例えば、タスク通信部140から処理要求を受け取った場合(801:Yes)、フォロワーノードのメッセージ処理部142は、通知メッセージリスト151を確認し、対応する通知メッセージ201がリーダーノードから到着しているかどうか確認する(802)。 The message processing unit 142 of the follower node in the second embodiment waits for a processing request from the task communication unit 140 or a notification message from the inter-node communication unit 141 (800). For example, when a processing request is received from the task communication unit 140 (801: Yes), the message processing unit 142 of the follower node checks the notification message list 151 and determines whether the corresponding notification message 201 has arrived from the leader node. Confirm (802).

 ステップ802の結果、タスク通信部140から得た処理要求に対応する通知メッセージ201が既に到着していると判定した場合(802:Yes)、フォロワーノードのメッセージ処理部142は、第1の実施形態と同様の処理を実行する(803~810)。該当処理は第1の実施形態と同様であるので説明は省略する。 As a result of step 802, when it is determined that the notification message 201 corresponding to the processing request obtained from the task communication unit 140 has already arrived (802: Yes), the message processing unit 142 of the follower node is the first embodiment. The same processing is executed (803 to 810). Since the corresponding process is the same as that of the first embodiment, the description thereof is omitted.

 他方、ステップ802の結果、タスク通信部140から得た処理要求に対応する通知メッセージ201が到着していないと判定した場合(802:No)、フォロワーノードのメッセージ処理部142は、該当処理要求に対応する処理を行う(806)。ただし、外部通信インタフェース122からの送信などの、外部に対して影響を与える処理の場合には、実際にはその処理を行わずに、その想定される結果のみ計算する。また、フォロワーノードのメッセージ処理部142は、上述した要求リストに対して、処理要求に関する情報と、その処理の結果などを含む要求付随データを付け加える(807)。こうしてレコードが生成された要求リストは、図6に示した通知メッセージ201の処理要求400および付随データ406と同様形式のデータを含むレコードの集合体となる。 On the other hand, when it is determined in step 802 that the notification message 201 corresponding to the processing request obtained from the task communication unit 140 has not arrived (802: No), the message processing unit 142 of the follower node responds to the processing request. Corresponding processing is performed (806). However, in the case of processing that affects the outside, such as transmission from the external communication interface 122, only the expected result is calculated without actually performing the processing. Further, the message processing unit 142 of the follower node adds request-accompanying data including information related to the processing request and the result of the processing to the above-described request list (807). The request list in which the records are generated in this way is a collection of records including data in the same format as the processing request 400 and the accompanying data 406 of the notification message 201 shown in FIG.

 フォロワーノードのメッセージ処理部142は、上述のステップ806の結果から処理結果を生成し(809)、当該処理結果をタスク通信部140に伝達する(810)。その後、フォロワーノードのメッセージ処理部142は、タスク通信部140とノード間通信部141からの伝達待ちに戻る(800)。 The message processing unit 142 of the follower node generates a processing result from the result of step 806 described above (809), and transmits the processing result to the task communication unit 140 (810). Thereafter, the message processing unit 142 of the follower node returns to waiting for transmission from the task communication unit 140 and the inter-node communication unit 141 (800).

 一方、ステップ800において、ノード間通信部141から通知メッセージ201を受け取った場合(801:No)、フォロワーノードのメッセージ処理部142は、要求リストを確認し、通知メッセージ201のもつ処理要求参照に対応する処理要求があるかどうか確認する(811)。この確認処理の結果、ノード間通信部141から受け取った通知メッセージ201に対応する処理要求が、要求リスト中に存在すると判定した場合(811:Yes)、フォロワーノードのメッセージ処理部142は、タスク通信部140から得た通知メッセージ201と、要求リストに含まれていた該当処理要求とを比較する (812)。この比較の処理は、図10のステップ703に関して上述したものと同様である。 On the other hand, when the notification message 201 is received from the inter-node communication unit 141 in Step 800 (801: No), the message processing unit 142 of the follower node confirms the request list and corresponds to the processing request reference included in the notification message 201. It is confirmed whether there is a processing request to be performed (811). As a result of this confirmation processing, when it is determined that a processing request corresponding to the notification message 201 received from the inter-node communication unit 141 exists in the request list (811: Yes), the message processing unit 142 of the follower node performs task communication. The notification message 201 obtained from the unit 140 is compared with the corresponding processing request included in the request list (812). This comparison process is similar to that described above with respect to step 703 of FIG.

 こうした比較処理の結果、通知メッセージ201の参照する処理要求と、タスク通信部140から受け取った処理要求とが同一であるとみなされた場合(812:Yes)、フォロワーノードのメッセージ処理部142は、承認メッセージ202をリーダーノードに送信する(813)。他方、通知メッセージ201の参照する処理要求と、タスク通信部140から受け取った処理要求とが同一でなかったとみなされた場合(812:NO)、フォロワーノードのメッセージ処理部142は、否認メッセージ203をリーダーノードに送信し(814)、リーダーノードに関する障害時処理を行う(815)。なお、上述のステップ811において、対応する処理要求が要求リスト中に無かった場合、フォロワーノードのメッセージ処理部142は、該当通知メッセージ201の情報を通知メッセージリスト151に追加する(816)。 As a result of such comparison processing, when the processing request referred to by the notification message 201 and the processing request received from the task communication unit 140 are considered to be the same (812: Yes), the message processing unit 142 of the follower node An approval message 202 is transmitted to the leader node (813). On the other hand, when the processing request referred to by the notification message 201 and the processing request received from the task communication unit 140 are not the same (812: NO), the message processing unit 142 of the follower node displays the denial message 203. The data is transmitted to the leader node (814), and a failure process related to the leader node is performed (815). If there is no corresponding processing request in the request list in step 811 described above, the message processing unit 142 of the follower node adds the information of the notification message 201 to the notification message list 151 (816).

 以上の処理により、フォロワーノードは、リーダーノードからの通知メッセージを待つことなく、タスクに関する処理を各々進めることができるため、処理時間の遅延を短縮することができる。ただし、この場合のフォロワーノードは、リーダーノードでの結果を使わずに処理を進めるため、処理内容の相違が発生する確率が高くなる。そのためフォロワーノードは、一部の処理については、リーダーノードからの通知メッセージを待ち、その他の処理については、待たずに進めるといった、本実施例の部分的な適用も可能である。こうした部分的な適用をする際のフォロワーノードにおける判断には、メッセージ同期種類を用いることもできる。例えば、メッセージ同期種類が「同期」あるいは「準同期」のものについてはリーダーノードからの通知メッセージを待ち、次の実施例で述べるメッセージ同期種類が「非同期」の場合については、リーダーノードからの通知メッセージを待たないとするアルゴリズムである。
---第3の実施形態---
 上述の第1および第2の実施形態では、メッセージ同期種類が「同期」、「準同期」のいずれかの場合について選択的に処理を行う例を示した。ここでは、これら同機種類に加えて、メッセージ同期種類が「非同期」の場合についても想定した例を示すものとする。図13は第3の実施形態におけるFTシステム1のリーダーノードのメッセージ処理部142での動作フロー例を示す図であり、図8のフローに対する差分のみを示したフローチャートである。従って、第1の実施例と同様の構成と処理については説明を省略する。
With the above processing, the follower node can proceed with each process related to a task without waiting for a notification message from the leader node, and therefore the processing time delay can be reduced. However, since the follower node in this case proceeds with the process without using the result at the leader node, there is a high probability that a difference in processing contents will occur. Therefore, the follower node can also partially apply the present embodiment in which a part of the processing waits for a notification message from the leader node and the other processing proceeds without waiting. The message synchronization type can also be used for the determination in the follower node when performing such partial application. For example, if the message synchronization type is “synchronous” or “semi-synchronous”, it waits for a notification message from the leader node, and if the message synchronization type described in the next embodiment is “asynchronous”, notification from the leader node It is an algorithm that does not wait for a message.
--- Third embodiment ---
In the first and second embodiments described above, an example in which the process is selectively performed when the message synchronization type is “synchronous” or “semi-synchronous” has been described. Here, in addition to the same machine type, an example in which the message synchronization type is “asynchronous” is assumed. FIG. 13 is a diagram showing an example of an operation flow in the message processing unit 142 of the leader node of the FT system 1 in the third embodiment, and is a flowchart showing only a difference from the flow of FIG. Therefore, the description of the same configuration and processing as in the first embodiment is omitted.

 ここで、リーダーノードのメッセージ処理部142は、タスク通信部140からの処理要求の伝達、もしくはノード間通信部141からの通知メッセージ201の受信を待つ(1000)。待機中、タスク通信部140から処理要求を受け取った場合(1001:Yes)、リーダーノードのメッセージ処理部142は、図8のフローにおけるステップ601~603を同様に実行した後、メッセージ同期種類を判定する(1002)。メッセージ同期種類が「同期」であった場合及び「準同期」であった場合の処理は、第1の実施形態で示した場合と同様である。 Here, the message processing unit 142 of the leader node waits for transmission of a processing request from the task communication unit 140 or reception of the notification message 201 from the inter-node communication unit 141 (1000). If a processing request is received from the task communication unit 140 during standby (1001: Yes), the message processing unit 142 of the leader node determines the message synchronization type after executing steps 601 to 603 in the flow of FIG. (1002). The processing when the message synchronization type is “synchronous” and “semi-synchronous” is the same as the case shown in the first embodiment.

 他方、メッセージ同期種類が「非同期」であった場合(1002:非同期)、リーダーノードのメッセージ処理部142は、フォロワーノードからの承認/否認メッセージを待つことなく、図8のフローにおけるステップ611に進む。なお、十分な数の承認/否認メッセージをフォロワーノードから既に受信していた場合、リーダーノードのメッセージ処理部142は、この時点で「準同期」の場合と同様に多数決処理(図8のステップ610)を行なってもよい。他方、多数決処理をしない場合は、タスク通信部140に対して結果を返した後に多数決処理を行う必要があるが、その場合、ノード間通信部141から通知メッセージ201を受信した場合(1001:No)に実行する。 On the other hand, when the message synchronization type is “asynchronous” (1002: asynchronous), the message processing unit 142 of the leader node proceeds to step 611 in the flow of FIG. 8 without waiting for an approval / denial message from the follower node. . When a sufficient number of approval / denial messages have already been received from the follower node, the message processing unit 142 of the leader node at this point determines the majority process (step 610 in FIG. 8) as in the case of “semi-synchronization”. ) May be performed. On the other hand, when majority processing is not performed, it is necessary to perform majority processing after returning a result to the task communication unit 140. In this case, when the notification message 201 is received from the inter-node communication unit 141 (1001: No) ) To run.

 リーダーノードのメッセージ処理部142は、ノード間通信部141から通知メッセージ201を受信したとき(1001:No)、それが承認メッセージ202であれば(1003:Ack)、それに対応する承認メッセージリスト150の行の承認ノード集合301に、その送信元のフォロワーノードの情報を加える(1004)。他方、否認メッセージ203であれば(1003:Nack)、リーダーノードのメッセージ処理部142は、否認ノード集合302に送信元のフォロワーノードの情報を加える(1005)。この行に関して、承認ノード集合301あるいは否認ノード集合302にまだ入っていないフォロワーノードが存在する場合(1006:No)、リーダーノードのメッセージ処理部142は受信待ちに戻る(1000)。すべてのフォロワーノードから承認/否認メッセージを受信した場合(1006:Yes)、リーダーノードのメッセージ処理部142は、多数決処理にうつる。この時、否認メッセージ203が多数の場合(1007:No)、リーダーノードのメッセージ処理部142は、当該リーダーノードに関する障害時処理を行う。 When the message processing unit 142 of the leader node receives the notification message 201 from the inter-node communication unit 141 (1001: No), if it is the approval message 202 (1003: Ack), the message processing unit 142 of the approval node in the corresponding approval message list 150 Information of the follower node of the transmission source is added to the approval node set 301 of the row (1004). On the other hand, if the message is the denial message 203 (1003: Nack), the message processing unit 142 of the leader node adds the information of the source follower node to the denial node set 302 (1005). When there is a follower node not yet included in the approval node set 301 or the denial node set 302 regarding this row (1006: No), the message processing unit 142 of the leader node returns to reception waiting (1000). When the approval / denial message is received from all the follower nodes (1006: Yes), the message processing unit 142 of the leader node goes to the majority process. At this time, when there are a large number of rejection messages 203 (1007: No), the message processing unit 142 of the leader node performs a failure process related to the leader node.

 他方、承認メッセージ202が多数の場合(1007:Yes)、リーダーノードのメッセージ処理部142は、承認メッセージリスト150の当該行の削除などを行い(1009)、受信待ち(1000)に戻る。この時、リーダーノードのメッセージ処理部142は、必要に応じて否認メッセージ203を送ったフォロワーノードを停止させる処理等を行なってもよい。 On the other hand, when there are many approval messages 202 (1007: Yes), the message processing unit 142 of the leader node deletes the corresponding line in the approval message list 150 (1009), and returns to waiting for reception (1000). At this time, the message processing unit 142 of the leader node may perform processing for stopping the follower node that has sent the denial message 203 as necessary.

 また、十分な数の承認/否認メッセージをフォロワーノードから既に受信していた場合、リーダーノードのメッセージ処理部142は、ステップ1006から多数決処理(1007)へ処理を移すとしてもよい。 If a sufficient number of approval / denial messages have already been received from the follower node, the message processing unit 142 of the leader node may move the processing from step 1006 to majority processing (1007).

 以上の実施形態により、第1の実施形態の場合と比べて、リーダーノードがフォロワーノードからの承認/否認メッセージを待つ回数を低減することができるため、個々の計算機ノードの遅れをより効率的かつ適切に許容することが可能になる。
---第4の実施形態---
 ここでは、メッセージ同期種類が「同期」、「非同期」の2種である場合について想定した例を示すものとする。図14は第4の実施形態におけるFTシステム1のリーダーノードのメッセージ処理部142での動作フロー例を示す図であり、図8のフローに対する差分のみを示したフローチャートである。従って、第1の実施例と同様の構成と処理については説明を省略する。また、第3の実施形態と同様の構成と処理についても説明を省略する。
According to the above embodiment, the number of times that the leader node waits for the approval / denial message from the follower node can be reduced as compared with the case of the first embodiment. It becomes possible to tolerate appropriately.
--- Fourth embodiment ---
Here, an example is assumed in which there are two types of message synchronization types, “synchronous” and “asynchronous”. FIG. 14 is a diagram showing an example of an operation flow in the message processing unit 142 of the leader node of the FT system 1 in the fourth embodiment, and is a flowchart showing only a difference from the flow of FIG. Therefore, the description of the same configuration and processing as in the first embodiment is omitted. The description of the same configuration and processing as those in the third embodiment is also omitted.

 ここで、リーダーノードのメッセージ処理部142は、タスク通信部140からの処理要求の伝達、もしくはノード間通信部141からの通知メッセージ201の受信を待つ(1100)。待機中、タスク通信部140から処理要求を受け取った場合(1101:Yes)、リーダーノードのメッセージ処理部142は、図8のフローにおけるステップ601~603を同様に実行した後、メッセージ同期種類を判定する(1102)。メッセージ同期種類が「同期」であった場合の処理は、第1の実施形態で示した場合と同様である(図8のフローにおけるステップ605へ)。 Here, the message processing unit 142 of the leader node waits for transmission of a processing request from the task communication unit 140 or reception of the notification message 201 from the inter-node communication unit 141 (1100). When a processing request is received from the task communication unit 140 during standby (1101: Yes), the message processing unit 142 of the leader node determines the message synchronization type after executing steps 601 to 603 in the flow of FIG. (1102). The processing when the message synchronization type is “synchronization” is the same as that shown in the first embodiment (to step 605 in the flow of FIG. 8).

 他方、メッセージ同期種類が「非同期」であった場合(1102:非同期)、リーダーノードのメッセージ処理部142は、フォロワーノードからの承認/否認メッセージを待つことなく、図8のフローにおけるステップ611に進む。なお、十分な数の承認/否認メッセージをフォロワーノードから既に受信していた場合、リーダーノードのメッセージ処理部142は、この時点で「準同期」の場合と同様に多数決処理(図8のステップ610)を行なってもよい。他方、多数決処理をしない場合は、タスク通信部140に対して結果を返した後に多数決処理を行う必要があるが、その場合、ノード間通信部141から通知メッセージ201を受信した場合(1101:No)に実行する。また、ステップ1101において、ノード間通信部141から通知メッセージ201を受信したとき(1001:No)、以降の処理(1103~1109)については、第3の実施形態と同様である。 On the other hand, when the message synchronization type is “asynchronous” (1102: asynchronous), the message processing unit 142 of the leader node proceeds to step 611 in the flow of FIG. 8 without waiting for an approval / denial message from the follower node. . When a sufficient number of approval / denial messages have already been received from the follower node, the message processing unit 142 of the leader node at this point determines the majority process (step 610 in FIG. 8) as in the case of “semi-synchronization”. ) May be performed. On the other hand, when majority processing is not performed, it is necessary to perform majority processing after returning the result to the task communication unit 140. In this case, when the notification message 201 is received from the inter-node communication unit 141 (1101: No) ) To run. In step 1101, when the notification message 201 is received from the inter-node communication unit 141 (1001: No), the subsequent processing (1103 to 1109) is the same as in the third embodiment.

 以上の実施形態により、第1の実施形態の場合と比べて、リーダーノードがフォロワーノードからの承認/否認メッセージを待つ回数を低減することができるため、個々の計算機ノードの遅れをより効率的かつ適切に許容することが可能になる。 According to the above embodiment, the number of times that the leader node waits for the approval / denial message from the follower node can be reduced as compared with the case of the first embodiment. It becomes possible to tolerate appropriately.

 以上、本発明を実施するための最良の形態などについて具体的に説明したが、本発明はこれに限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能である。 The best mode for carrying out the present invention has been specifically described above. However, the present invention is not limited to this, and various modifications can be made without departing from the scope of the present invention.

 こうした本実施形態によれば、フォールトトレラントシステムにおいて、計算機ノード間の同期のためのメッセージを分類し、その種類に応じて、一部のメッセージでのみ多数決投票の結果を待ち、そうでない場合には非同期的に実行するといった多数決投票の挙動変更を実行して、計算機ノード間での処理の待ち合わせ回数を低減し、以って計算機ノード間での性能ばらつきの影響を相対的に抑制可能となる。そうした効果は、システム全体における処理の遅延や効率低下を軽減できることにもつながる。また、障害判定を行うにあたり、アプリケーションの制約から明確である時間を障害判定の基準として用い、かつ、その基準点とメッセージ種類を一致させることにより、性能差を許容しつつ、過度な障害判定による可用性の低減を防止することも出来る。 According to the present embodiment, in the fault-tolerant system, messages for synchronization between computer nodes are classified, and depending on the type, only a part of the messages waits for the result of majority voting. The behavior change of majority voting such as executing asynchronously is executed to reduce the number of processing waits between computer nodes, thereby relatively suppressing the influence of performance variation between computer nodes. Such an effect also leads to reduction of processing delay and efficiency reduction in the entire system. Also, when performing failure determination, use the time that is clear from the application constraints as the failure determination criterion, and match the reference point and message type to allow for performance differences while allowing excessive failure determination. It is also possible to prevent a reduction in availability.

 したがって、アーキテクチャや基盤ソフトウェア等が互いに異なる複数の計算機ノードで構成されたフォールトトレラントシステムにおいて、計算機ノード間の性能差やばらつきによる全体性能の影響を低減し、またこうした影響に起因する過度な障害判定を防止することにより、システムにおける可用性が向上可能となる。 Therefore, in a fault-tolerant system composed of multiple computer nodes with different architectures and basic software, etc., the influence of overall performance due to performance differences and variations between computer nodes is reduced, and excessive fault determination due to such influences is reduced. By preventing this, availability in the system can be improved.

 本明細書の記載により、少なくとも次のことが明らかにされる。すなわち、フォールトトレラントシステムにおいて、前記他計算機となった計算機のメッセージ処理部は、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または準同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行するものである、としてもよい。 記載 At least the following will be made clear by the description in this specification. That is, in the fault tolerant system, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the processing content is either synchronous or semi-synchronous. If the processing content corresponds to synchronization, for all the computers other than the other computer, wait for reception of the followability message, When the processing content corresponds to quasi-synchronization, it is possible to wait for reception of the follow-up propriety message for at least half of the computers other than the other computers.

 こうして、各計算機が並行して実行すべき同一タスクの種類、すなわち同期、準同期の必要性有無に応じて、フォールトトレラントシステムにおけるリーダーたる計算機での、該当タスクに関する追従可否のメッセージ受信を待ち受けるべきフォロワー計算機の数をコントロールする。これにより、上述のメッセージを待ち受けるべき時間を必要性に応じて選択的に増減して、計算機間の性能差やばらつきによる上述のメッセージの到着時間差を適宜吸収し、ひいては過度な障害判定の防止や、システムにおける可用性向上が可能となる。 In this way, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization or semi-synchronization is necessary, the computer that is the leader in the fault-tolerant system should wait for receipt of a message indicating whether the task can be tracked Control the number of follower calculators. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved.

 また、フォールトトレラントシステムにおいて、前記他計算機となった計算機のメッセージ処理部は、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期、準同期、および非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しないものである、としてもよい。 In the fault tolerant system, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the corresponding processing content is synchronous, semi-synchronous, and Judgment is made whether it is asynchronous or not, and if the processing contents correspond to synchronization, all the computers other than the other computer are subjected to waiting for reception of the followability message. If the processing content corresponds to quasi-synchronization, wait for reception of the followability message for at least half of the computers other than the other computer, and the processing content is If it is asynchronous, it does not wait to receive the followability message from a computer other than the other computer. It may be.

 こうして、各計算機が並行して実行すべき同一タスクの種類、すなわち同期、準同期、非同期の必要性有無に応じて、フォールトトレラントシステムにおけるリーダーたる計算機での、該当タスクに関する追従可否のメッセージ受信を待ち受けるべきフォロワー計算機の数をコントロールする。これにより、上述のメッセージを待ち受けるべき時間を必要性に応じて選択的に増減して、計算機間の性能差やばらつきによる上述のメッセージの到着時間差を適宜吸収し、ひいては過度な障害判定の防止や、システムにおける可用性向上が可能となる。特に、該当タスクが各計算機間で非同期で処理されるものである場合、リーダーたる計算機において、フォロワー計算機からの上述のメッセージの受信待ちが不要となり、障害判定に要する各計算機の使用リソースを効率化し、ひいてはシステムにおける更なる可用性向上が可能となる。 In this way, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization, quasi-synchronization, or asynchronous is necessary, the computer that is the leader in the fault-tolerant system receives a message indicating whether the task can be followed or not. Control the number of follower computers to wait for. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved. In particular, when the corresponding task is processed asynchronously between the computers, the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.

 また、フォールトトレラントシステムにおいて、前記他計算機となった計算機のメッセージ処理部は、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しないものである、としてもよい。 In the fault tolerant system, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the corresponding processing content is either synchronous or asynchronous. If the processing content corresponds to synchronization, the processing waits for reception of the follow-up propriety message for all computers other than the other computer, and the processing When the content corresponds asynchronously, it may be configured not to wait for reception of the follow-up enable / disable message from a computer other than the other computer.

 こうして、各計算機が並行して実行すべき同一タスクの種類、すなわち同期、非同期の必要性有無に応じて、フォールトトレラントシステムにおけるリーダーたる計算機での、該当タスクに関する追従可否のメッセージ受信を待ち受けるべきフォロワー計算機の数をコントロールする。これにより、上述のメッセージを待ち受けるべき時間を必要性に応じて選択的に増減して、計算機間の性能差やばらつきによる上述のメッセージの到着時間差を適宜吸収し、ひいては過度な障害判定の防止や、システムにおける可用性向上が可能となる。特に、該当タスクが各計算機間で非同期で処理されるものである場合、リーダーたる計算機において、フォロワー計算機からの上述のメッセージの受信待ちが不要となり、障害判定に要する各計算機の使用リソースを効率化し、ひいてはシステムにおける更なる可用性向上が可能となる。 In this way, the followers to wait for the follow-up message about the corresponding task in the fault-tolerant system, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization or asynchronous is necessary Control the number of calculators. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved. In particular, when the corresponding task is processed asynchronously between the computers, the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.

 また、フォールトトレラント制御方法において、前記他計算機となった計算機のメッセージ処理部が、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または準同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行する、としてもよい。 In the fault tolerant control method, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the processing content is synchronous or semi-synchronous. It is determined whether it corresponds to any one, and when the processing content corresponds to synchronization, for all the computers other than the other computer, wait for reception of the followability message, When the processing content corresponds to quasi-synchronization, waiting for reception of the followability message may be executed for at least half of the computers other than the other computers.

 こうして、各計算機が並行して実行すべき同一タスクの種類、すなわち同期、準同期の必要性有無に応じて、フォールトトレラントシステムにおけるリーダーたる計算機での、該当タスクに関する追従可否のメッセージ受信を待ち受けるべきフォロワー計算機の数をコントロールする。これにより、上述のメッセージを待ち受けるべき時間を必要性に応じて選択的に増減して、計算機間の性能差やばらつきによる上述のメッセージの到着時間差を適宜吸収し、ひいては過度な障害判定の防止や、システムにおける可用性向上が可能となる。 In this way, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization or semi-synchronization is necessary, the computer that is the leader in the fault-tolerant system should wait for receipt of a message indicating whether the task can be tracked Control the number of follower calculators. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved.

 また、フォールトトレラント制御方法において、前記他計算機となった計算機のメッセージ処理部が、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期、準同期、および非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しない、としてもよい。 In the fault tolerant control method, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the corresponding processing content is synchronous, semi-synchronous, And if the processing content corresponds to synchronization, and if all of the computers other than the other computer are the target, wait for reception of the follow-up enable / disable message. If the processing content corresponds to quasi-synchronization, wait for reception of the followability message for at least half of the computers other than the other computer, and execute the processing content Is not asynchronous, it does not wait to receive the follow-up message from other computers than the other computer. Good.

 こうして、各計算機が並行して実行すべき同一タスクの種類、すなわち同期、準同期、非同期の必要性有無に応じて、フォールトトレラントシステムにおけるリーダーたる計算機での、該当タスクに関する追従可否のメッセージ受信を待ち受けるべきフォロワー計算機の数をコントロールする。これにより、上述のメッセージを待ち受けるべき時間を必要性に応じて選択的に増減して、計算機間の性能差やばらつきによる上述のメッセージの到着時間差を適宜吸収し、ひいては過度な障害判定の防止や、システムにおける可用性向上が可能となる。特に、該当タスクが各計算機間で非同期で処理されるものである場合、リーダーたる計算機において、フォロワー計算機からの上述のメッセージの受信待ちが不要となり、障害判定に要する各計算機の使用リソースを効率化し、ひいてはシステムにおける更なる可用性向上が可能となる。 In this way, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization, quasi-synchronization, or asynchronous is necessary, the computer that is the leader in the fault-tolerant system receives a message indicating whether the task can be followed or not. Control the number of follower computers to wait for. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved. In particular, when the corresponding task is processed asynchronously between the computers, the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.

 また、フォールトトレラント制御方法において、前記他計算機となった計算機のメッセージ処理部が、当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しない、としてもよい。 In the fault tolerant control method, the message processing unit of the computer that has become the other computer collates the processing content obtained by the task communication unit of the other computer with a predetermined standard, and the processing content is either synchronous or asynchronous. If the processing content corresponds to synchronization, for all the computers other than the other computer, wait for reception of the followability message, When the processing contents are asynchronous, it may be determined not to wait for reception of the followability message from a computer other than the other computer.

 こうして、各計算機が並行して実行すべき同一タスクの種類、すなわち同期、非同期の必要性有無に応じて、フォールトトレラントシステムにおけるリーダーたる計算機での、該当タスクに関する追従可否のメッセージ受信を待ち受けるべきフォロワー計算機の数をコントロールする。これにより、上述のメッセージを待ち受けるべき時間を必要性に応じて選択的に増減して、計算機間の性能差やばらつきによる上述のメッセージの到着時間差を適宜吸収し、ひいては過度な障害判定の防止や、システムにおける可用性向上が可能となる。特に、該当タスクが各計算機間で非同期で処理されるものである場合、リーダーたる計算機において、フォロワー計算機からの上述のメッセージの受信待ちが不要となり、障害判定に要する各計算機の使用リソースを効率化し、ひいてはシステムにおける更なる可用性向上が可能となる。 In this way, the followers to wait for the follow-up message about the corresponding task in the fault-tolerant system, depending on the type of the same task that each computer should execute in parallel, that is, whether synchronization or asynchronous is necessary Control the number of calculators. As a result, the time to wait for the above message is selectively increased / decreased according to necessity, and the above arrival time difference of the above message due to the performance difference or variation between computers is appropriately absorbed, thereby preventing excessive failure determination or The availability in the system can be improved. In particular, when the corresponding task is processed asynchronously between the computers, the computer that is the leader does not need to wait to receive the above-mentioned message from the follower computer, and the use resources of each computer required for fault determination are made efficient. As a result, the availability of the system can be further improved.

1 フォールトトレラントシステム
102 端末
110 計算機ノード(計算機)
110A リーダーノード
110B、110C フォロワーノード
111 外部ネットワーク
112 ノード間ネットワーク
120 中央演算装置
121 主記憶装置
122 外部通信インタフェース
123 ノード間通信インタフェース
124 補助記憶装置
130 基本ソフトウェア
131 FTミドルウェア
132 アプリケーション
133 タスク
140 タスク通信部
141 ノード間通信部
142 メッセージ処理部
150 承認メッセージリスト
151 通知メッセージリスト
152 メッセージ種類テーブル
1 fault tolerant system 102 terminal 110 computer node (computer)
110A leader node 110B, 110C follower node 111 external network 112 inter-node network 120 central processing unit 121 main storage unit 122 external communication interface 123 inter-node communication interface 124 auxiliary storage unit 130 basic software 131 FT middleware 132 application 133 task 140 task communication unit 141 Inter-node communication unit 142 Message processing unit 150 Approval message list 151 Notification message list 152 Message type table

Claims (8)

 多数決冗長構成を形成する複数の計算機各々において、
 当該計算機の内部で動作するプログラムからその処理内容を取得するタスク通信部と、
 前記多数決冗長構成を形成するいずれかの他計算機より、前記他計算機で動作するプログラムの処理内容を示す所定メッセージを受信し、前記所定メッセージが示す処理内容と前記タスク通信部が得ている前記処理内容との一致判定の結果に応じ、前記他計算機に追従可否のメッセージを返信するメッセージ処理部とを備えており、
 前記他計算機となった計算機のメッセージ処理部は、当該他計算機のタスク通信部が得ている処理内容に応じて、前記追従可否のメッセージの受信待ち対象とする、当該他計算機以外の計算機の数を変更する処理を更に実行するものである、
 ことを特徴とするフォールトトレラントシステム。
In each of a plurality of computers that form a majority redundant configuration,
A task communication unit for acquiring the processing contents from a program operating inside the computer;
The predetermined message indicating the processing content of the program running on the other computer is received from any other computer forming the majority voting redundant configuration, and the processing content indicated by the predetermined message and the processing obtained by the task communication unit In accordance with the result of determination of coincidence with the contents, a message processing unit that returns a followability message to the other computer,
The message processing unit of the computer that has become the other computer has a number of computers other than the other computer that are subject to reception of the follow-up / possibility message according to the processing content obtained by the task communication unit of the other computer. To further execute the process of changing
A fault-tolerant system.
 前記他計算機となった計算機のメッセージ処理部は、
 当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または準同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行するものである、
 ことを特徴とする請求項1に記載のフォールトトレラントシステム。
The message processing unit of the computer that has become the other computer is:
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard to determine whether the corresponding processing content corresponds to either synchronization or quasi-synchronization, and the processing content corresponds to synchronization If it is, for all computers other than the other computer, wait for reception of the followability message, and if the processing content corresponds to quasi-synchronization, the computer other than the other computer For at least half of the computers, waiting for receiving the follow-up message is executed.
The fault tolerant system according to claim 1.
 前記他計算機となった計算機のメッセージ処理部は、
 当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期、準同期、および非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しないものである、
 ことを特徴とする請求項1に記載のフォールトトレラントシステム。
The message processing unit of the computer that has become the other computer is:
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard to determine whether the corresponding processing content corresponds to one of synchronous, semi-synchronous, and asynchronous, and the processing content is synchronized. If it is compatible, execute waiting for reception of the follow-up enable / disable message for all computers other than the other computer, and if the processing content corresponds to semi-synchronization, If at least half of the other computers are targeted for receiving the follow-up success / failure message, and the processing content is asynchronous, the follow-up from a computer other than the other computer It does not execute the waiting for receiving the message of availability.
The fault tolerant system according to claim 1.
 前記他計算機となった計算機のメッセージ処理部は、
 当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しないものである、
 ことを特徴とする請求項1に記載のフォールトトレラントシステム。
The message processing unit of the computer that has become the other computer is:
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard, it is determined whether the corresponding processing content corresponds to either synchronous or asynchronous, and the processing content corresponds to synchronization. If there is, execute the waiting for reception of the followability message for all the computers other than the other computer, and if the processing content is asynchronous, if from the computer other than the other computer It does not execute reception waiting for the follow-up message.
The fault tolerant system according to claim 1.
 多数決冗長構成を形成する複数の計算機各々において、
 当該計算機の内部で動作するプログラムからその処理内容を取得する処理と、
 前記多数決冗長構成を形成するいずれかの他計算機より、前記他計算機で動作するプログラムの処理内容を示す所定メッセージを受信し、前記所定メッセージが示す処理内容と前記処理で得ている前記処理内容との一致判定の結果に応じ、前記他計算機に追従可否のメッセージを返信する処理とを実行し、
 更に、前記他計算機においては、当該他計算機で動作するプログラムから得ている処理内容に応じて、前記追従可否のメッセージの受信待ち対象とする、当該他計算機以外の計算機の数を変更する処理を更に実行する、
 ことを特徴とするフォールトトレラントシステム制御方法。
In each of a plurality of computers that form a majority redundant configuration,
Processing to acquire the processing contents from a program operating inside the computer;
A predetermined message indicating the processing content of a program operating on the other computer is received from any other computer forming the majority voting redundant configuration, and the processing content indicated by the predetermined message and the processing content obtained by the processing In accordance with the result of the match determination, a process of returning a followability message to the other computer is executed.
Furthermore, in the other computer, a process of changing the number of computers other than the other computer, which is subject to reception of the followability message, according to the processing content obtained from the program operating on the other computer. Perform further,
A fault tolerant system control method characterized by the above.
 前記他計算機となった計算機のメッセージ処理部が、
 当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または準同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行する、
 ことを特徴とする請求項5に記載のフォールトトレラント制御方法。
The message processing unit of the computer that has become the other computer,
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard to determine whether the corresponding processing content corresponds to either synchronization or quasi-synchronization, and the processing content corresponds to synchronization If it is, for all computers other than the other computer, wait for reception of the followability message, and if the processing content corresponds to quasi-synchronization, the computer other than the other computer Waiting for reception of the followability message for at least half of the computers
The fault tolerant control method according to claim 5.
 前記他計算機となった計算機のメッセージ処理部が、
 当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期、準同期、および非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が準同期に対応するものであった場合、当該他計算機以外の計算機のうち少なくとも半数以上の計算機を対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しない、
 ことを特徴とする請求項5に記載のフォールトトレラント制御方法。
The message processing unit of the computer that has become the other computer,
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard to determine whether the corresponding processing content corresponds to one of synchronous, semi-synchronous, and asynchronous, and the processing content is synchronized. If it is compatible, execute waiting for reception of the follow-up enable / disable message for all computers other than the other computer, and if the processing content corresponds to semi-synchronization, If at least half of the other computers are targeted for receiving the follow-up success / failure message, and the processing content is asynchronous, the follow-up from a computer other than the other computer Do not wait for receipt of acceptable messages,
The fault tolerant control method according to claim 5.
 前記他計算機となった計算機のメッセージ処理部が、
 当該他計算機のタスク通信部が得ている処理内容を所定基準に照合し、該当処理内容が同期または非同期のいずれかに対応したものであるか判定し、前記処理内容が同期に対応するものであった場合、当該他計算機以外の計算機の全てを対象として、前記追従可否のメッセージの受信待ちを実行し、前記処理内容が非同期に対応するものであった場合、当該他計算機以外の計算機からの前記追従可否のメッセージの受信待ちを実行しない、
 ことを特徴とする請求項5に記載のフォールトトレラント制御方法。
The message processing unit of the computer that has become the other computer,
The processing content obtained by the task communication unit of the other computer is checked against a predetermined standard, it is determined whether the corresponding processing content corresponds to either synchronous or asynchronous, and the processing content corresponds to synchronization. If there is, execute the waiting for reception of the followability message for all the computers other than the other computer, and if the processing content is asynchronous, if from the computer other than the other computer Do not wait to receive the follow-up message.
The fault tolerant control method according to claim 5.
PCT/JP2014/069305 2013-09-26 2014-07-22 Fault-tolerant system and fault-tolerant system control method Ceased WO2015045589A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2013-199614 2013-09-26
JP2013199614A JP6100135B2 (en) 2013-09-26 2013-09-26 Fault tolerant system and fault tolerant system control method

Publications (1)

Publication Number Publication Date
WO2015045589A1 true WO2015045589A1 (en) 2015-04-02

Family

ID=52742746

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/069305 Ceased WO2015045589A1 (en) 2013-09-26 2014-07-22 Fault-tolerant system and fault-tolerant system control method

Country Status (2)

Country Link
JP (1) JP6100135B2 (en)
WO (1) WO2015045589A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11822535B2 (en) * 2021-06-08 2023-11-21 Salesforce, Inc. Director-based database system for transactional consistency

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4321666A (en) * 1980-02-05 1982-03-23 The Bendix Corporation Fault handler for a multiple computer system
JP2004355233A (en) * 2003-05-28 2004-12-16 Nec Corp Fault-tolerant system, program parallel execution method, fault detector for fault-tolerant system, and program
WO2012032572A1 (en) * 2010-09-08 2012-03-15 株式会社日立製作所 Computing device
WO2012056487A1 (en) * 2010-10-25 2012-05-03 株式会社日立製作所 Computer system
WO2012127652A1 (en) * 2011-03-23 2012-09-27 株式会社日立製作所 Computer system, data processing method, and data processing program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4321666A (en) * 1980-02-05 1982-03-23 The Bendix Corporation Fault handler for a multiple computer system
JP2004355233A (en) * 2003-05-28 2004-12-16 Nec Corp Fault-tolerant system, program parallel execution method, fault detector for fault-tolerant system, and program
WO2012032572A1 (en) * 2010-09-08 2012-03-15 株式会社日立製作所 Computing device
WO2012056487A1 (en) * 2010-10-25 2012-05-03 株式会社日立製作所 Computer system
WO2012127652A1 (en) * 2011-03-23 2012-09-27 株式会社日立製作所 Computer system, data processing method, and data processing program

Also Published As

Publication number Publication date
JP6100135B2 (en) 2017-03-22
JP2015064833A (en) 2015-04-09

Similar Documents

Publication Publication Date Title
US20220239602A1 (en) Scalable leadership election in a multi-processing computing environment
US9934242B2 (en) Replication of data between mirrored data sites
US8117156B2 (en) Replication for common availability substrate
CN111368002A (en) Data processing method, system, computer equipment and storage medium
CN101383690B (en) Grid synchronization method for fault tolerant computer system based on socket
JP6353086B2 (en) Multi-database log with multi-item transaction support
US7865763B2 (en) Data replication method
US20180027048A1 (en) File transmission method, apparatus, and distributed cluster file system
CN103607448B (en) A kind of method of ATC system dynamic data storage
CN105069152B (en) data processing method and device
CN115794499B (en) Method and system for dual-activity replication data among distributed block storage clusters
WO2020024615A1 (en) Consensus process recovery method and related nodes
CN109739435B (en) File storage and updating method and device
JP4612714B2 (en) Data processing method, cluster system, and data processing program
CN111913837A (en) System for realizing distributed middleware message recovery policy management in big data environment
CN109347906B (en) Data transmission method, device and server
CN108462737A (en) Individual-layer data consistency protocol optimization method based on batch processing and assembly line
CN105323271A (en) Cloud computing system, and processing method and apparatus thereof
JP6100135B2 (en) Fault tolerant system and fault tolerant system control method
CN119537483A (en) Data synchronization method, device, equipment and medium
CN110309224A (en) A kind of data copy method and device
RU2714602C1 (en) Method and system for data processing
CN118158022B (en) A multi-bus communication method, port machinery edge computing device and related equipment
Lin et al. An optimized multi-Paxos protocol with centralized failover mechanism for cloud storage applications
CN114138530B (en) Cloud-native-based message processing method, device, equipment, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14848627

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14848627

Country of ref document: EP

Kind code of ref document: A1