HK1181578A

HK1181578A - Clustered client failover

Info

Publication number: HK1181578A
Application number: HK13108754.4A
Authority: HK
Inventors: D.M.克鲁斯; D．法萨拉; J．T．平克顿; M.乔治; P．普拉哈达; T．E．乔利
Original assignee: 微软技术许可有限责任公司
Priority date: 2011-09-09
Filing date: 2013-07-26
Publication date: 2013-11-08

Description

Cluster client failover

Technical Field

The invention relates to cluster client failover.

Background

Clustered environments (e.g., environments in which workload is distributed across multiple machines) are often used to provide clients with failover (failover) and high information availability. A clustered environment allows clients to access resources via one or more nodes that are part of the environment. The clustered environment may act as a client, a server, or both. In a client cluster server, applications may reside on any of the nodes that make up the cluster. An application may issue a request for a resource stored locally within a client cluster or remotely. If an error occurs on that node, the client fails over (or migrates) to a different node in the cluster. However, when the client requests the resource it was working at the time of the error again, the server will block (nonce) or lock the resource for the previous client node where the application resides.

It is with respect to these and other considerations that the various embodiments have been made. Moreover, while relatively specific problems have been discussed, it should be understood that the embodiments should not be limited to solving the specific problems identified in the background.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Systems and methods are disclosed herein that provide an application or process with continuous access to resources after the application is migrated to a new node in a clustered client environment. An application or process residing on a node in a client cluster sends a request to a server to access a resource. In an embodiment, a unique application instance identifier is used to identify the application requesting the resource. The unique application instance identifier may be provided with the request. When a client accesses a resource, an application instance identifier is associated with the requested resource.

Before the application or process completes its operations on the resource, the node in the cluster environment where the client resides may experience errors that cause it to fail or otherwise lose access to the resource before the application properly releases the resource. In this case, the resource may remain in a blocked or locked state on the server for previous client requests. After failover to a different node in the client cluster, the application on the new client node may reestablish a connection with the server managing the resource and make a second request for the resource that the application previously accessed at the time of the error. The second request may include the application instance identifier that was sent with the first request. Although the second request for the resource may be received from a different node in the cluster environment, the application instance identifier allows the server managing the request to determine that the second request belongs to the same application or process that previously locked the resource. This allows the server to invalidate (invalidate) the resource and grant the client a second request to access the resource while ensuring that no conflict situation occurs.

Embodiments may be implemented as a computer process, a computing system, or as an article of manufacture such as a computer program product or computer readable media. The computer program product may be a computer storage media readable by a computer system and encoding a computer program of instructions for executing a computer process. The computer program product may also be a propagated signal on a carrier readable by a computing system and encoding a computer program of instructions for executing a computer process.

Drawings

Non-limiting and non-exhaustive embodiments are described with reference to the following drawings.

FIG. 1 illustrates a system that may be used to implement embodiments described herein.

FIG. 2 is a block diagram illustrating a software environment that may be used to implement embodiments disclosed herein.

FIG. 3 is an embodiment of a method that a client may execute to obtain continuous access to a resource in a clustered environment.

FIG. 4 is an embodiment of a method performed by a node in a cluster environment to provide continuous access to a resource.

FIG. 5 illustrates a block diagram of a computing environment suitable for implementing embodiments.

Detailed Description

Various embodiments will now be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show specific exemplary embodiments. The embodiments are, however, implemented in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of these embodiments to those skilled in the art. Accordingly, embodiments may take the form of a hardware implementation, an entirely software implementation, or an implementation combining software and hardware aspects. The following detailed description is, therefore, not to be taken in a limiting sense.

Embodiments of the present disclosure are directed to providing a cluster client failover mechanism that allows a requestor to regain access to a resource after a failover event. In embodiments, the requestor may be a process, an application, or one or more sub-processes of an application. A resource may be a file, an object, data, or any other type of resource in a computing environment. In embodiments, the resources may reside on separate servers, or the resources may reside in a clustered environment. In embodiments disclosed herein, a cluster environment may include one or more nodes (e.g., client and/or server devices).

In an example embodiment, an application residing on a node in a cluster environment may request access to a particular resource. In embodiments, the resource may be stored locally (e.g., on the client node), in a remote device (e.g., a remote server or a different node in a client cluster environment), or in a cluster environment different from the client cluster environment (e.g., an environment containing multiple nodes). For example, in embodiments, a cluster environment may be a client or server cluster; however, those skilled in the art will appreciate that the systems and methods disclosed herein may be used in any other type of environment, such as, but not limited to, a virtual network.

In such an environment, resources may be shared between various clients and applications. When an application accesses a resource, the resource may be blocked or locked, thereby prohibiting other applications from accessing the resource until the accessing application releases the resource. Blocking or locking a resource may be used to protect against conflicts, i.e., to protect against modification of the resource by another application before the accessing application has performed its operations on the resource. However, if a node in the clustered client environment fails, the application that is accessing the resource may not properly release the resource from the blocked or locked state. For example, a client node accessing the resource on behalf of the application may lose network connectivity, may crash, or may otherwise lose access to the resource before the application completes its operations and properly releases the resource. Thus, the resource may remain in a state in which the resource is unavailable to other clients or applications. A mechanism may be employed to automatically release a resource from a blocked or locked state, thereby preventing the resource from being permanently locked out. However, such mechanisms typically wait for a period of time before releasing the blocked or locked resource.

In some cases, when the application performs failover to migrate from the failed client node to a different client node in the client cluster, the application may attempt to reestablish its previous connection with the server via the different client node and resume its operation on the resource. However, because the resource was not properly released by the client node that previously failed to access the resource on behalf of the application, the application that previously accessed the resource may not be able to resume its access to the resource due to the error until the server releases the resource from its blocked or locked state. However, because a different node is now attempting to access the resource on behalf of the application, the server may not be able to identify the application as the same application that previously established a lock on the resource. However, because the same application is attempting to access the resource, no conflict situation occurs. In such a situation, waiting for the server to release the previous lock on the resource may result in an unacceptable delay for the application.

As described above, because the application is operating in a clustered client environment, when the application requests access to the resource a second time, the request to access the resource may be made from a different location (such as a different node in the clustered client environment). Thus, the second request may come from a location or a different IP address. Because the request may be made from a different location, it may be difficult for the server to ensure that the client or application attempting to access the resource again is in fact the same client that previously accessed the resource. The systems and methods disclosed herein provide a mechanism to identify situations in which the same application is attempting to access a resource, thereby avoiding such delays and providing the application with continuous access to the resource.

FIG. 1 illustrates a system 100 that may be used to implement some embodiments disclosed herein. The system 100 includes a client cluster 102 and a server cluster 106. The client cluster includes a plurality of nodes, such as clients 102A and 102B. The clients 102A and 102B may be devices or applications residing in the client cluster 102. The client cluster 102 may communicate with the server cluster 106 over a network 108. In embodiments, the network 108 may be the internet, a WAN, a LAN, or any other type of network known in the art. The server cluster 106 stores resources that are accessed by applications on the client cluster 102 (e.g., applications that reside on the client 102A or the client 102B). In an embodiment, a client (e.g., client 102A) may establish a session with the cluster 106 to access resources on the cluster 106 on behalf of an application residing on the client. Although the client cluster 102 includes only two clients (e.g., client 102A and client 102B) in fig. 1, one skilled in the art will appreciate that any number of clients may be included in the client cluster 102.

As shown in fig. 1, the server cluster 106 includes servers 106A, 106B, and 106C that provide both high availability and redundancy of information stored on the cluster 106. In embodiments, the cluster 106 may have a file system, database, or other information accessed by the clients 102 and 104. Although 3 servers are shown in fig. 1, in other embodiments, the cluster 106 may include more than 3 servers or less than 3 servers. Moreover, although the embodiments described herein involve a client communicating with a server that is part of a server cluster, those skilled in the art will appreciate that the embodiments disclosed herein may also be performed using a stand-alone server.

In an embodiment, the client cluster 102 provides a failover mechanism that allows clients to migrate from a first client node to a second client node in the event of an error or failure on the first client node. One skilled in the art will appreciate that any type of failover mechanism may be used with the systems and methods disclosed herein. The methods and systems disclosed herein may be used to avoid excessive delays when an application attempts to gain access again to a resource that is migrated from one client to another in the event of failover. In embodiments, an application instance identifier that identifies an application that accesses the resource may be associated with the resource. The application instance identifier may be a Globally Unique Identifier (GUID) associated with the application, an action performed by the application, or a sub-process of the application. For example, in one embodiment, an application may be associated with an application instance identifier that is a GUID. In another embodiment, the application instance identifier may be associated with a particular operation or action performed by the application. For example, if the application issues two different open requests for two different files, each open request may have its own application instance identifier. In yet another embodiment, the application instance identifier may be associated with one or more sub-processes of the application. As will be apparent to those skilled in the art from the embodiments disclosed herein, associating an application instance identifier of an application with one or more child processes thereof will allow the child processes to access a resource belonging to the application when the resource is placed in a locked or blocked state. In embodiments, the application instance identifier may be sent by the client at the time of sending the request for the resource or after sending the request for the resource.

According to another embodiment, in addition to storing information accessed by clients that are part of the client cluster 102, the server cluster 106 also provides a failover mechanism that allows continuous access to resources in the event of a server node failover. Also, one skilled in the art will appreciate that any type of failover mechanism may be used with the systems and methods disclosed herein.

In an embodiment, when a client requests access to a resource on behalf of an application, an application instance identifier for the application is sent with the request. The server receiving the request may associate the application instance identifier with the resource. For example, the server cluster can store the application instance identifier in a table or cache located on one or more nodes (e.g., servers such as servers 106A, 106B, and/or 106C) in the server cluster 106 in such a manner that the application instance identifier is associated with the resource. Before the client completes the resource, the client may experience an error that will force it to lose connection with the resource. For example, a client hosting (host) the application or performing a request or operation on behalf of the application may lose its network connection to the server cluster, the client may crash, or any other type of error may occur that interferes with the application's use of the resource. After experiencing the error, the application may failover to a new client node in the client cluster 102. The new client node may reconnect to the server cluster and send a second request for access to the resource on behalf of the application. In embodiments, the client may reconnect to the same node or a different node in the server cluster 106. The second request to access the resource may include an application instance identifier for the application. After receiving the second request, the server (e.g., server 106A of server cluster 106) compares the application instance identifier of the second request to the application instance identifier associated with the resource. If the two application instance identifiers match, the server cluster invalidates the resource. In embodiments, invalidating the resource may include closing a file, removing a lock on the resource, or otherwise taking any action that releases the resource for use. The server node may then grant the application a second request to access the resource. If the application instance identifier of the second node does not match the application instance identifier associated with the resource, the server will not allow access to the resource until the resource is released.

To illustrate one embodiment, a requestor (e.g., a process, an application, etc.) on a client 102A in a client cluster 102 can request that the client 102A establish a session with a server of the server cluster 106. For example, the client 102A may establish a session with the server 106A to access a database stored on the server 106A, or to access a database that is part of the server cluster 106 where the server 106A may access the database. The client 102A then issues a request for a resource on behalf of the requestor. An application instance identifier that identifies the requestor is associated with the request. In embodiments, the request may include the application instance identifier, or the application instance identifier may be sent separately in a manner that server 106A may determine that the application instance identifier is associated with the request. In yet another embodiment, the server 106A or server cluster 106 may already have the information needed to associate the application instance identifier with the request without receiving the application instance identifier with the request. Server 106A then grants the requestor access to the resource, thereby allowing the requestor to perform an operation on or otherwise access the resource. When the requestor is granted access to the resource, server 106A associates the application instance identifier with the resource in a manner that indicates that the requestor is currently accessing the resource. The resource may then be blocked or locked so that other clients or applications cannot access or modify the resource until the client 102 has completed its operation.

Before the requestor completes its operations on the resource, an error occurs that causes the client 102A to fail or otherwise lose its access to the resource. Because the client requestor has not completed its operation, it has not released control of the resource. Thus, the resource may remain in a blocked or locked state. The requestor or client cluster 102 may employ a failover mechanism to migrate the requestor from client 102A to client 102B. Once the failover operation is complete, the client 102B may reconnect to the server cluster 106 on behalf of the requestor. The client 102B may reconnect to the server 106A or establish a new connection with any other server in the server cluster 106 (e.g., server 106B or 106C). In an example scenario, the client 102B reconnects to the server 106A. After the reconnection, the client 102B may send a second request to access the resource on behalf of the requestor. As previously noted, the resource is still in the locked or blocked state because the requestor has not released control of the resource. To access the resource, without waiting for the server to automatically change the state of the resource, e.g., through a timeout operation, the requestor may again provide its application instance identifier with the second request. Server 106A compares the application instance identifier provided with the second request to the application instance identifier associated with the resource. For example, by comparing the application instance identifier received or otherwise associated with the second request to the application instance identifier that server 106A associated with the resource. The associated application instance identifier may be stored in a local cache or table of the server 106A, or may be stored elsewhere in the server cluster 106. If the application instance identifier stored in the cache matches the application instance identifier associated with the resource, server 106A invalidates or otherwise releases the resource and allows client 102B to access the resource again on behalf of the requestor without waiting for the resource to be released by some other mechanism (e.g., by blocking or a lock state timeout). If the application instance identifiers do not match, the client 102B will have to wait for the resource to become released before accessing the resource.

Although in the above example the client 102B reconnects to the same server 106A, in other embodiments it is possible for the client to connect to another node in the server cluster 106. For example, the client 102B may reconnect to the server 106B and submit a second request to regain access to the resource on behalf of the requestor. The second request may again be associated with the application instance identifier of the requestor, for example, by being included in or otherwise associated with the second request. In this example, server 106B may not have the application instance identifier associated with the resource stored in its local cache because the original access of the resource was on server 106A. In this case, the server 106B may contact other servers in the server cluster 106 to determine whether they have an application identifier associated with the resource. If the application identifier associated with the resource is stored on a different node in the server cluster (e.g., server 106A), the application instance identifier on the other node in the server cluster is compared to the application instance identifier provided with the second request. If they match, server 106B may send a request to server 106A to invalidate the resource, and then server 106B may allow the requestor (now on client 102B) to access the resource. If the application instance identifiers do not match, the client 102B will have to wait for the resource to be released.

Based on the above example, those skilled in the art will appreciate that any client node in the client cluster 102 can request access for, and subsequently provide access to, a requestor in the client cluster 102. Moreover, any server node in the server cluster (e.g., any server in the server cluster 106) can determine whether the requestor previously had access to the resource, even if the access occurred on a different server node in the server cluster. Those skilled in the art will appreciate that the following description is only one example of how the embodiment shown in fig. 1 may operate, and that other embodiments exist. For example, a client node may perform the embodiments described herein to provide a requestor (e.g., an application or process) with continuous access to resources residing in a cluster environment (e.g., on the same or different client cluster nodes that comprise the client cluster), rather than accessing resources on a remote server or server cluster. As described in more detail below, the embodiments described herein may involve a variety of different steps or operations. Moreover, the embodiments described herein may be implemented using any suitable software or hardware components or modules.

Turning now to fig. 2, fig. 2 illustrates a block diagram of software environment 200, software environment 200 illustrating a client node cluster 201 having a plurality of client nodes (e.g., clients 202 and 204) and a server node cluster 206 having a plurality of server nodes (e.g., node 1208 and node 2216). In an embodiment, a client 202 requests access to a resource (such as resource 226) in the server cluster environment 206 on behalf of a requestor. Client node cluster 201 may be a client cluster such as client cluster 102 (fig. 1). Although not shown, a client cluster may include more than two nodes. The server node cluster 206 may be a server cluster, such as server cluster 106 (FIG. 1), or may be any other type of cluster environment, such as, but not limited to, a virtual network. Resources 226 can be stored in a data store 228 that is part of the cluster environment. Although not shown, in an alternative embodiment, the data store 228 may not be part of the clustered environment, but may be connected to the clustered environment via a network. Examples of such networks include, but are not limited to: the internet, WAN, LAN, or any other type of network known in the art. In further embodiments, the data store may be part of a node (e.g., a device) that is part of the cluster 206.

Server node cluster 206 may include one or more nodes, such as node 1208 and node 2216. Although only two node clusters are shown in FIG. 2, any number of node clusters may be included in the clustering environment 206. In an embodiment, the node clusters 208 and 216 can receive requests to perform operations on the resource 226 and/or grant access to the resource 226. In embodiments, the resources 226 may be files, objects, applications, data, or any other type of resource stored on or accessible by a node or independent server in the cluster node 206.

In an embodiment, a client sends an initial request 222 to the clustered environment 206. As shown in fig. 2, initial request 222 may be sent by client 202 and received by node 1208. However, in alternative embodiments, the initial request 222 may be sent by the client or any other client node in the client cluster 201 and received by the node 2216 or any other node in the server cluster 206. Example requests include, but are not limited to: a request to create, open, or otherwise access a file. Request 222 may be transmitted from the client to the node cluster over a network such as, but not limited to, the internet, a WAN, a LAN, or any other type of network known in the art. Initial request 222 may include a request to access a resource, such as resource 226. In embodiments, request 222 may also include an application instance identifier that identifies the requestor on whose behalf client 202 is making the request. In an embodiment, the initial request 222 may be made up of one or more messages. For example, request 222 may be a single message containing both the request and the application instance identifier. In another embodiment, request 222 may be a plurality of messages including one or more requests and one or more application instance identifiers. In embodiments, client 202 may include an application instance cache 214 to store and/or generate one or more application instance identifiers that may be transmitted with request 222.

As shown in fig. 2, node 1208 may receive request 222 and an application instance identifier from client 202. If the requested resource 226 is available (e.g., not blocked or locked by another client or application), node 1 may grant the client's (e.g., client 202) request to access the resource 226 on behalf of the requestor executing on the client. Upon granting access to resource 226, filter driver 210 may assign or otherwise create an association between client 202 and resource 226 by storing the application instance identifier it received from client 202. In an embodiment, the association may be stored as an object in application instance cache 212 as part of node 1. Although the illustrated embodiment shows the application instance cache 212 as part of the node 1208, in embodiments, the application instance cache 212 may be stored elsewhere as part of the node cluster 206. Those skilled in the art will appreciate that node cluster 206 may include one or more application instance caches, such as application instance cache 220 on node 2216. In embodiments, when there is more than one application instance cache, the data stored in the multiple application instance caches may be replicated across all application instance caches, or each application instance cache may store separate data.

In one embodiment, an application INSTANCE identifier received from a client identifying a requestor (e.g., an application or process) may be stored in a NETWORK APP entity ECP CONTEXT structure. The _ NETWORK _ APP _ entity _ ECP _ CONTEXT structure may be defined as follows:

typdefstruct_NETWORK_APP_INSTANCE_ECP_CONTEXT{

USHORT Size;

USHORT Reserved;

GUID AppInstanceID;

}_NETWORK_APP_INSTANCE_ECP_CONTEXT,

*PNETWORK_APP_INSTANCE_ECP_CONTEXT;

in such an embodiment, the variable size (size) may store information about the size of the structure, while the variable AppInstanceID may be a unique application instance identifier for a failover-clustered client application, such as a requestor executing on client 202. In an embodiment, _ NETWORK _ APP _ entity _ ECP _ CONTEXT or another object or variable containing the requestor's application INSTANCE identifier may be stored in a Globally Unique Identifier (GUID) cache 214. In an embodiment, a NETWORK APP instant ECP CONTEXT structure may be sent from a client to a server in conjunction with a request to access a resource (e.g., a create or open request). In one embodiment, the application instance identifier of the requestor may be stored in the GUID cache of the client node in the clustered client environment 201 on which the requestor is executing. In another embodiment, although not shown in FIG. 2, the client node cluster 201 may have a central repository that stores application instance identifiers. In such an embodiment, multiple client nodes in the client node cluster 201 may access the centralized repository. In yet another embodiment, the application instance identifier may be stored across multiple GUID caches (e.g., GUID cache 214 and GUID cache 216). In such an embodiment, the cluster of client nodes 201 can employ a replication algorithm to ensure that multiple GUID caches contain the same application instance identifier.

As previously described, the application instance identifier may be associated with a resource 226 when the client 202 accesses the resource 226 on behalf of a requestor. The server node 206 may store such associations in one or more application instance caches, such as application instance caches 212 and 220, as a cluster of server nodes 206. In one embodiment, the application instance identifier may be associated with the resource by adding the application instance identifier to an additional creation parameters (ECP) list of the resource 226. The ECP list can be stored in an application instance cache, such as application instance caches 212 and 220, that is part of server node cluster 206. In embodiments, when an ECP is received by the server, the server extracts the application instance identifier from the ECP and adds the application instance identifier to the cache for association with the resource, resource handle, or the like. As described with reference to storing application instance identifiers in the client cluster 201, application instance identifiers associated with nodes may be replicated in individual application instance caches on nodes in the server node cluster 206, in a central repository in the server cluster 206, or across multiple application instance caches on multiple nodes in the server node cluster 206.

In an embodiment, resource 226 is blocked or locked when a requestor executing on client 202 has access to resource 222, thereby preventing other clients or applications from accessing resource 226 and avoiding any potential conflicts. In an embodiment, client 202 experiences an error that causes it to lose connection with a resource 226 before the requestor completes its operation on that resource. For example, the client may crash, be taken offline, or lose network connection with the server node 208. In this case, the resource 226 may still be in the blocked or locked state because the requestor did not release the lock on the resource, thereby preventing other clients from accessing the resource 226.

When client 202 experiences an error, the requestor may utilize client failover mechanism 232 to migrate to a new client node (e.g., client 204) in client cluster 201. Those skilled in the art will appreciate that any type of failover mechanism may be employed at client failover 232. In embodiments, the failover mechanism 232 may also include migration of the requestor's application instance identifier, which may have been stored in the GUID cache 214 on the client 202 that has failed today. After the migration is complete, the requestor may attempt to regain access to the resource 202. In an embodiment, client 216 may send a second request 224 to node 1 to request access to resource 226 on behalf of the requestor. However, without the use of the continuous access embodiments disclosed herein, when node 1208 receives a request to access resource 226 on behalf of client 204 (the sender of second request 224), it may deny the request because resource 226 is still in a blocked or locked state as a result of a previous access to the resource by client 202. Without the use of the embodiments disclosed herein, node 1208 would recognize that: a second request to access resource 226 is from a different location (e.g., client 204). Node 1208 will not be able to determine that the request is for the same requestor that remains locked on resource 226 and will therefore determine that granting the request will result in a conflict. However, if the same requestor is attempting to access resource 224, there is no conflict problem, and forcing the client to wait for the resource to be released by the system may result in excessive delay.

The application instance identifier may be used to address this issue. In an embodiment, the second request 224 may also include the application instance identifier that identifies the requestor that migrated to the client 204 during the failover shown at 232. In an embodiment, the application instance identifier of the requestor may be present in the GUID cache 228 of the client 204 prior to migration of the requestor during the client failover 232. For example, a replication mechanism may have been employed to replicate the application instance identifier of the requestor across the nodes in the client cluster 201. In another embodiment, requestor 203 may store its application instance identifier. In yet another embodiment, the application instance identifier of the requestor 203 may be migrated during the client failover 232.

As described with reference to request 222, the application instance identifier may be transmitted in the same message as second request 224, or second request 224 may be made up of multiple different messages. When a second request is received at the node cluster 206 or at an individual node in the cluster, such as node 1208, and the receiving server determines that the resource is blocked or locked, a determination is made as to whether the application instance identifier in the second request 224 is the same as the application instance identifier associated with the resource 226. In an embodiment, node 2216 may compare the application instance identifier received with second request 222 with the application instance identifier associated with resource 226. The application instance identifier associated with resource 226 may be stored in application instance cache 212 of node 1212. In embodiments where there are multiple application instance caches in the node cluster 206, the determination may examine more than one application instance cache in the node cluster 206. In such embodiments, if the matching application instance identifier is not located in application instance cache 212, node 1216 can send a request to node 2212 to determine if the matching application instance identifier is located in application instance cache 220.

In one embodiment, if the application instance cache received in the second request 224 does not match the application instance identifier associated with the resource 226 (which may be stored in the application instance caches 212 and/or 220), the second request 224 may not be granted until the resource 226 is released. However, if a match is found, the receiving server (e.g., node 1208) and/or server cluster 206 performs an action to grant access to resource 226 without causing excessive delay to client 204 and requester 203. In such a case, the node cluster 206 may invalidate the resource 226, thereby removing the resource 226 from the blocked or locked state. In an embodiment, invalidating the previous access may include any action that leaves the resource from the blocked or locked state. One non-limiting example is closing an open file (e.g., if the resource 226 is a file). Once the previous access is invalidated, a second request 224 to access a resource 226 may be granted, thereby providing the requestor 203 with continued access.

In one embodiment, a node receiving the second request 224 (such as node 1208 in FIG. 2) may perform the required actions to invalidate a previous access to the resource 226 if a different node (e.g., node 2216) has access and/or permission to invalidate the previous access. However, in some cases, the node receiving the request may not have access rights or permission to invalidate the previous access. This situation may arise, for example, if the original request 222 was made to node 2216, in which case node 2216 may have control over the resource. In this case, the node receiving the second request 224 may send a request to the control node to invalidate the previous access. The node receiving the second request 224 may grant the second request 224 once the control node has invalidated the previous access. In other embodiments, the node receiving the second request 224 may send a request to a different node to grant the client 204 and/or requestor 203 (now residing on client 204) access to resource 226.

By using the application instance identifier, the described process avoids excessive delays in granting the second request 224 to access the resource 226 from a requester 203 that previously accessed the resource 226 and still maintains a lock on the resource 226. Moreover, the application instance identifier provides the benefit of ensuring that any granted requests do not conflict on resources 226. For example, if the request is received from a different application, the request will include an application instance identifier that is different from the application instance identifier associated with the resource, which may result in the request being denied. Because the application instance identifier is a globally unique identifier, the application instance identifiers of different applications will not be the same.

FIG. 3 is an embodiment of a method 300 that a requestor may use to gain continuous access to a resource in a client cluster environment. For example, the requestor may be a client, such as client 202 (FIG. 2), that employs method 300 to access a resource (e.g., resource 226). In an embodiment, a resource may reside on a remote machine (such as a server). The server may be a standalone server or part of a clustered environment, such as server cluster 206 (FIG. 2). Flow begins at operation 302 where a request for a resource is sent to a server. In an embodiment, the request may be: a resource is accessed. In embodiments, accessing a resource may include opening a file, creating a file, or otherwise accessing or performing an operation on a resource that may be remote to the client. In an embodiment, the requestor may operate in a client cluster environment. In these embodiments, the request sent at operation 302 may be sent from a first client in a client cluster environment.

Flow continues to operation 304 where the application instance identifier is sent to, for example, a server (e.g., a stand-alone server, or a node in a clustered environment). In one embodiment, the first client sending the request may also send an application instance identifier on behalf of the requestor. As previously described, the application instance identifier is a GUID that identifies a requestor (e.g., an application, client, or sub-process of an application requesting access to a resource). In one embodiment, the application instance identifier may be sent in a message transmitted over a network. The application instance identifier may be transmitted in the same message that contains the request in operation 302, or may be transmitted in a different message. In these embodiments, an object containing the application instance identifier may be sent at operation 302, such as but not limited to, NETWORK APP INS value ECP CONTEXT described with reference to fig. 2.

In one embodiment, the application instance identifier may be sent using an interface at operation 304. The interface may be a kernel-level interface located on the client or available to clients operating in a client cluster environment. In embodiments, the kernel-level interface may be used by a requestor and/or a client to send an application instance identifier to a server. The following are non-limiting examples of kernel-level interfaces that may be employed at operation 304 to send the application instance identifier:

although a specific kernel-level interface is provided, those skilled in the art will appreciate that other kernel-level interfaces may be employed to transmit the application instance identifier at operation 304.

In another embodiment, an application user interface (API) may be employed to send the application instance identifier at operation 304. In such embodiments, the requestor and/or client may send the application instance identifier by making a call to the API. The API may be hosted (host) on the client performing operation 304 (e.g., the first client in a server cluster) or the API may be hosted on another device and accessed by a requestor or another application or process. The following are non-limiting examples of APIs that may be employed at operation 304 to send the application instance identifier:

NTSTATUS RegisterAppInstance(

_in PGUID AppInstance

);

although one specific API is provided, those skilled in the art will appreciate that other APIs may be employed at operation 304. Moreover, although operation 304 is illustrated as a separate operation, those skilled in the art will appreciate that sending the application instance identifier may be performed at the same time that the request is sent in operation 302.

When the requested resource is not locked, the request sent at operation 302 is granted and the flow continues to operation 306 where the resource is accessed at operation 306. As previously described, when a requestor accesses a resource at operation 306, the server or device controlling the resource may place the resource in a blocked or locked state. At some point in accessing the resource, an error occurs, such as the error described with reference to FIG. 2, which may cause the client to fail or otherwise lose connection to the resource. The error may cause the client (e.g., the first client in the server cluster) to lose access to the resource before the requestor completes its use of the resource. In this case, the resource may not be released from its blocked or locked state.

Flow continues to operation 308 where a failover operation is performed. In an embodiment, the failover operation may include cloning the requestor and its state to a different client (e.g., a second client) in the client node cluster. In an embodiment, the state of the requestor may be cloned onto the second client, and the requestor may execute on the second client in such a way that it can resume execution from the point where the first client failed. In another embodiment, the requestor may communicate with the first client (rather than execute on the first client) upon failover of the first client. In such embodiments, the failover operation may include the requestor establishing communication with a second client in the client cluster.

In an embodiment, state information (including but not limited to a requestor application instance identifier) may be transmitted from a first client to a second client. In one embodiment, the first client may send a message including an application instance identifier of the requestor and/or status information of the requestor. The application instance identifier and/or state may be sent during the failover process, or in embodiments, may be sent prior to failure of the first client, such as during a replication process that clones information across clients in a client cluster environment. In another embodiment, the application instance identifier and/or state information of the requestor may be stored at a central location or repository in the client cluster network. In these embodiments, the failover process may provide the location of the requestor's application instance identifier and/or state information to the second client. In yet another embodiment, the requestor may maintain its application instance identifier. In these embodiments, the client failover operation may include reallocating or otherwise establishing a connection between the requestor and the second client.

In an embodiment, after the client failover operation, operational flow continues to operation 310. At operation 310, a second request for the same resource is sent to the clustered environment. In an embodiment, the second request is sent by a second client in the client cluster on behalf of the requestor. The second request may be sent in the same manner as described with reference to the first resource at operation 302. To maintain continuous access to the resources and avoid excessive delays, flow continues to operation 312 where the application instance identifier is again sent to the cluster environment. The application instance identifier may be sent at operation 308 according to one of the embodiments described with reference to operation 304. In embodiments, because a different client (e.g., a second client) is sending a second request, the server receiving the request may not be able to identify the second request as belonging to the same requestor that holds the lock on the resource (e.g., because the request was made from a different machine, a different address, etc.). However, by sending the application instance identifier at operations 304 and 308, the server will be able to identify the requests as belonging to the same requestor and will grant continuous access to the resource, as described previously with reference to fig. 1 and 2. Flow continues to operation 314 and the requestor resumes access to the resource. In an embodiment, the second client may receive a response to the second request from the server indicating that the server granted the second request. In embodiments, upon receiving the indication, the second client may access the resource on behalf of the requestor.

FIG. 4 is an embodiment of a method 400 performed by a node in a server cluster environment to provide continuous access to a resource. Embodiments of method 400 may be performed by a node, such as node 1208 (fig. 2), in a cluster environment, such as node cluster 206 (fig. 2). In an embodiment, method 400 may be performed by a node having access to a resource. Flow begins at operation 402 where the node receives a request for a resource. In embodiments, a resource may be a file, object, method, data, or any other type of resource that is under the control of and/or accessible by the node performing operation 400. At operation 402, an application instance identifier may be received with the request.

Flow continues to decision operation 404 where a determination is made as to whether the resource is in a blocked or locked state. Those skilled in the art will appreciate that any manner of determining whether a resource is in a blocked or locked state may be employed at operation 404. If the resource is not in a blocked or locked state, flow branches "No" to operation 412 where the request for the resource is granted at operation 412. In embodiments, granting the request may include allowing the requestor access to the resource, performing an operation on the resource on behalf of the requestor, or granting any kind of access or modification to the resource. For example, granting the request in operation 412 may include opening a file or creating a file.

If the resource is in a blocked or locked state, flow branches YES from operation 404 to decision operation 406. At decision operation 406, the application instance identifier received with the request at operation 402 is compared to the application instance identifier associated with the resource. For example, as described with reference to FIG. 2, when a client or application accesses a resource, the node may associate an application instance identifier with the resource. As described earlier, the association of the application instance identifier of the requestor accessing the resource may be stored on the node, e.g., in an application instance cache, as described in the embodiments discussed in FIG. 2. In an embodiment, an application INSTANCE identifier provided in the ECP sent with the request for the resource (e.g., provided in the _ NETWORK _ APP _ entity _ ECP _ CONTEXT structure) may be added to the list of ECPs associated with the resource.

In one embodiment, the association of the application instance with the resource may reside locally on the node performing the method 400. In these cases, the comparison may be made at a local application instance cache residing on the server. However, as discussed with reference to FIG. 2, a clustered environment may include multiple application instance caches distributed across different nodes. Moreover, these different application instance caches may each store separate and/or different data. The application identifier associated with the blocked or locked resource may be stored on a different node in the cluster environment. In this case, operation 406 may include sending a request to a different node to perform the comparison at operation 406. The request may include the application instance identifier received at operation 402.

If the received application instance identifier is not the same as the application instance identifier associated with the resource, flow branches "No" to operation 410. At operation 410, the request to access the resource received at operation 402 is denied. In an embodiment, the request may be denied to avoid resource conflicts. Because the received application identifier is different from the associated application instance identifier, the request to access the resource received at operation 402 is from a different requestor or application. Granting requests to different clients or applications, as may be the case in this example, may result in a conflict situation that would interfere with the application currently accessing the resource. For example, the disparate application may modify the resource in a manner that modifies or otherwise interferes with operations performed on the resource by requesters currently holding locks on the resource.

However, receiving the same application identifier associated with the blocked or locked resource with request 402 indicates that an error may have occurred that caused a requestor that was otherwise accessing the resource to lose its access to the resource without releasing the resource correctly. For example, a requestor may operate in a cluster of client nodes. The particular client on which the requestor operates may have lost access to the server or otherwise failed before the requestor completes its access to the resource. Flow branches yes to operation 408 in order to provide continuous access to the resource, i.e., to allow the requestor to regain access to the resource without experiencing excessive or unacceptable delays.

At operation 408, the resource is invalidated. As described earlier herein, invalidating the resource may include changing a blocking status of the resource or otherwise removing a lock on the resource. For example, if the resource is a file, invalidating the resource may include closing the file. Those skilled in the art will appreciate that any method of releasing the barrier or locking the resource may be employed at operation 408.

Referring back to FIG. 2, in an embodiment, access to a resource may be under the control of a different node in the clustered environment than the node receiving the request to access the resource at operation 402. For example, the handle to the resource may reside on a different node in the cluster environment. In these embodiments, invalidating the resource may include sending a request to a node controlling access to the resource to invalidate the resource. In response to sending the request, the remote node may invalidate the resource.

After the resource is invalidated, flow continues to operation 412 where the request to access the resource is granted. Granting the request may include allowing the requestor access to the resource, performing an operation on the resource on behalf of the requestor, or granting any kind of access or modification to the resource. For example, granting the request in operation 412 may include opening a file or creating a file. Granting such access may be performed by the node receiving the request at operation 402 or by another node in the cluster environment.

The methods 300 and 400 are only some examples of operational flows that may be performed according to embodiments. Embodiments are not limited to the specific descriptions provided above with reference to fig. 3-6, but may include additional operations. Moreover, the illustrated operational steps may be combined into other steps and/or rearranged. Also, fewer or additional steps may be used with the methods described with reference to fig. 3-4.

FIG. 5 illustrates a general computer system 500, which can be used to implement embodiments described herein. The computer system 500 is only one example of a computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the computer and network architectures. Neither should the computer system 500 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the example computer system 500. In an embodiment, the system 500 may function as a client and/or server as described above with reference to fig. 1 and 2.

In its most basic configuration, system 500 typically includes at least one processing unit 502 and memory 504. Depending on the exact configuration and type of computing device, memory 504 may be volatile (such as RAM), non-volatile (such as ROM, flash memory, etc.) or some combination thereof. This most basic configuration is illustrated in fig. 5 by dashed line 506. System memory 504 stores instructions 502, such as instructions to perform the continuous availability methods disclosed herein, and data 522, such as application instance identifiers that may be stored in a file storage system having storage, such as storage 508.

The term computer readable media as used herein may include computer storage media. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. System memory 504, removable storage, and non-removable storage 508 are all computer storage media examples (e.g., memory storage). Computer storage media may include, but is not limited to, RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store information and which can be accessed by computing device 500. Any such computer storage media may be part of device 500. Computing device 500 may also have input device(s) 514 such as keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) 516 such as a display, speakers, printer, etc. may also be included. The above devices are examples, and other devices may be used.

The term computer readable media as used herein may also include communication media. Communication media may be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term "modulated data signal" may describe a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, Radio Frequency (RF), infrared and other wireless media.

Embodiments of the invention may be practiced with a system on a chip (SOC) in which each or many of the components shown in fig. 5 may be integrated onto a single integrated circuit. Such SOC devices may include one or more processing units, graphics units, communication units, system virtualization units, and various application functions, all integrated (or "burned") onto a chip substrate as a single integrated circuit. When operating via an SOC, the functionality described herein with respect to providing contact access to resources may operate via application-specific logic integrated with other components of computing device/system 500 on a single integrated circuit (on a chip).

Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular described feature, structure, or characteristic is included in at least one embodiment. Thus, usage of such phrases may refer to more than just one embodiment. Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details, or with other methods, resources, materials, etc. In other instances, well-known structures, resources, or operations are not shown or described in detail merely to avoid obscuring aspects of the embodiments.

While example embodiments and applications have been illustrated and described, it is to be understood that the present embodiments are not limited to the precise configuration and resources described above. Various modifications, changes, and variations apparent to those skilled in the art may be made in the arrangement, operation, and details of the methods and systems disclosed herein without departing from the scope of the claimed embodiments.

Claims

1. A method of providing continuous access to a resource, the method comprising:

receiving (402) a first request from a requestor to access a resource, wherein the request is received from a first client;

associating a first application instance identifier with the first resource;

allowing (412) the first request to access the resource;

receiving (402), after recovering from the failure, a second request for the resource from the requestor, wherein the second request is received from a second client different from the first client;

receiving a second application identifier associated with the second request;

determining (406) whether the first application identifier and the second application identifier are the same; and when the first and second application identifiers are the same, performing steps comprising:

invalidating (408) the first request; and

granting (412) the second request to access the resource.

2. The method of claim 1, wherein the first application identifier is associated with an application instance that opens a request.

3. The method of claim 1, wherein the first application identifier is associated with a process.

4. The method of claim 1, wherein the first application identifier is associated with at least one sub-process of an application.

5. The method of claim 1, wherein associating the first application INSTANCE identifier comprises receiving the first application INSTANCE identifier in a NETWORK APP entity ECP CONTEXT structure.

6. A method for providing cluster client failover, the method comprising:

receiving, at a second client, an application instance identifier of a requestor, wherein the requestor previously accessed a resource using a first client;

sending (310), from a second client, a second request to access the resource on behalf of the requestor;

sending (312), from the second client, the application instance identifier of the requestor;

receiving an indication that the server grants the second request; and

accessing (314) the resource by the second client on behalf of the client.

7. The method of claim 6, wherein the server granting the second request previously granted a first request from the first client to access the resource on behalf of the requestor.

8. The method of claim 6, wherein the second client sends the second request in response to a client failover.

9. A system for facilitating client failover in a clustered environment, the system comprising:

at least one server (208), the at least one server comprising:

at least one processor configured to execute computer-executable instructions;

at least one computer-readable storage medium storing the computer-executable instructions, which when executed by the at least one processor provide:

receiving (402), from a first client, a first request to access a resource on behalf of a requestor;

associating a first application instance identifier with the first resource;

allowing (412) access to the resource by the requestor;

receiving (402) a second request for the resource from a second client, wherein the second client is different from the first client;

receiving a second application identifier associated with the second request;

determining (406) whether the first application identifier and the second application identifier are the same;

when the first and second application identifiers are the same, performing steps comprising:

invalidating (408) the first request; and

granting (412) the second request to access the resource.

10. The system of claim 9, wherein the system further comprises:

the first client (202) comprising:

at least one processor configured to execute computer-executable instructions;

at least one computer-readable storage medium storing the computer-executable instructions that, when executed by the at least one processor:

sending (302) the first request;

sending (304) the application instance identifier to the second client.