CN100394726C

CN100394726C - A Method of Improving the Reliability of Component-Based Software System

Info

Publication number: CN100394726C
Application number: CNB2005100428759A
Authority: CN
Inventors: 赵天海; 候迪; 赵季中; 齐勇; 郗旻
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2005-06-30
Filing date: 2005-06-30
Publication date: 2008-06-11
Anticipated expiration: 2025-06-30
Also published as: CN1710865A

Abstract

A method for improving the reliability of a component-based software system. This method adopts a distributed load distribution strategy, which can dynamically load and delete nodes in the cluster without stopping the service of a JToneFrame application server cluster with multiple nodes; support level and vertical segmentation; object-level and method-level EJB container load balancing; support for stateless session bean, state session bean and entity bean cluster. There is no possibility of a single point of failure, and the load distributor is not independent, but the load distributor is used as a component of each node, which solves the centralized problem and distributes the distribution of requests to each node superior.

Description

A Method of Improving the Reliability of Component-Based Software System

技术领域 technical field

本发明涉及一种应用服务器集群与容错的方法，特别涉及一种提高基于构件软件系统的可靠性方法。The invention relates to an application server cluster and a fault-tolerant method, in particular to a method for improving the reliability of a component-based software system.

背景技术 Background technique

服务器端Java平台(J2EE标准)业已成为提供Web信息服务的最佳方法。对于关键业务系统来说，可靠性已成为互连网的一大挑战。直接影响关键业务用户或业务伙伴的故障是无法接受的。基础结构必须具备很高的可用性，才能够向企业提供不间断的服务。The server-side Java platform (J2EE standard) has become the best way to provide Web information services. For mission-critical systems, reliability has become a major challenge for the Internet. Failures that directly impact critical business users or business partners are unacceptable. The infrastructure must be highly available to provide uninterrupted service to the business.

在当今动态业务环境下，为了满足需求，企业必须具备动态提高容量的能力。支持这种应用的基础结构必须具备相当高的伸缩性，以便在不改变软件和硬件的情况下进行接近线性的扩展。因此支持对于J2EE平台的集群，负载平衡和错误恢复，是一个性能优良，可扩展，稳定的应用服务器必须支持的特性。集群是一组相互独立的服务器在网络中表现为单一的系统，并以单一系统的模式加以管理。此单一系统为客户工作站提供高可靠性的服务。多数情况下，集群中所有的计算机拥有一个共同的名称，集群内任意一个系统上运行的服务可被所有的网络客户所使用。集群必须可以协调管理各分离组件的错误和失败，并可透明地向集群中加入组件。一个集群包含多台(至少二台)拥有共享数据存储空间的服务器。任何一台服务器运行一个应用时，应用数据被存储在共享的数据空间内。每台服务器的操作系统和应用程序文件存储在其各自的本地储存空间上。集群内各节点服务器通过内部局域网相互通讯。当一台节点服务器发生故障时，这台服务器上所运行的应用程序将在另一节点服务器上被自动接管。当一个应用服务发生故障时，应用服务将被重新启动或被另一台服务器接管。当以上的任一故障发生时，客户都将能很快连接到新的应用服务上。对于用于大型企业级应用的J2EE服务器来说，集群服务是必要的。但是J2EE规范中并没有对集群方面内容做出规定，所以对于集群的实现都是由具体的服务器来各自决定的。应用服务器作为企业级的应用服务器，需要处理大量且并发的请求，同时这些处理过程中的数据都是重要而不允许出错的。为了提高大量访问情况下的性能以及满足可靠性的要求，应用服务器需要负载均衡和容错的能力，提供应用服务器以下新的特性：可扩展性：是指一个应用程序能支持不断增长的用户数量的能力。当由于某种需要动态添加和减少服务器，不会对系统造成影响。通常由于系统负载量的增加，需要动态添加新的服务器，因此这对客户的调用应该是透明的，不会中断其他服务器的工作；可靠性：可靠性是指系统有一定程度的冗余能力。在集群系统中，某个节点的崩溃不会影响系统的服务，系统能够自动迁移任务，能够处理这个任务的服务器自动处理客户的请求。In today's dynamic business environment, in order to meet demand, enterprises must have the ability to dynamically increase capacity. The infrastructure supporting such applications must be sufficiently scalable to allow near-linear expansion without changing software and hardware. Therefore, supporting clustering, load balancing and error recovery for the J2EE platform is a feature that must be supported by an application server with excellent performance, scalability and stability. A cluster is a group of mutually independent servers that behave as a single system in the network and are managed in a single system mode. This single system provides highly reliable services to client workstations. In most cases, all computers in the cluster have a common name, and services running on any system in the cluster can be used by all network clients. The cluster must coordinate the management of errors and failures of separate components, and transparently join components to the cluster. A cluster consists of multiple (at least two) servers with shared data storage space. When any server runs an application, the application data is stored in the shared data space. Each server's operating system and application files are stored on its own local storage. Each node server in the cluster communicates with each other through the internal LAN. When a node server fails, the applications running on this server will be automatically taken over on another node server. When an application service fails, the application service will be restarted or taken over by another server. When any of the above failures occurs, customers will be able to quickly connect to new application services. Clustering services are necessary for J2EE servers used in large enterprise applications. However, the J2EE specification does not specify the content of the cluster, so the implementation of the cluster is determined by the specific server. As an enterprise-level application server, an application server needs to process a large number of concurrent requests, and the data in these processes are important and cannot be mistaken. In order to improve the performance of a large number of visits and meet the requirements of reliability, the application server needs the ability of load balancing and fault tolerance, and the following new features of the application server are provided: Scalability: refers to the ability of an application to support a growing number of users ability. When servers are dynamically added or reduced due to certain needs, there will be no impact on the system. Usually due to the increase of system load, new servers need to be added dynamically, so this call should be transparent to customers, without interrupting the work of other servers; Reliability: Reliability means that the system has a certain degree of redundancy. In the cluster system, the crash of a certain node will not affect the service of the system, the system can automatically migrate the task, and the server that can handle this task will automatically process the client's request.

从上面集群需要支持的特性可以看出，集群包括了负载均衡和容错两个方面。负载均衡使得服务器群共同分担工作负载，它采用某种负载分配策略，优化服务器的性能。而且所有这些工作对于客户的访问是透明的，不需要客户的干预。理想状况下，如果集群中的每个服务器有着相同的处理能力，那么承担相同的负载量。因此每个负载的分配算法和服务器自身的性能有关，如何收集、处理这些信息，以及采用什么样的算法来分配负载。容错采用多数据复本，允许某种程度的错误和失败。在理想情况下，错误恢复应该对于客户是完全透明的。当一个服务器失败时，正在和这个服务器进行交互的客户能够自动转交为其它服务器。错误恢复的关键点是客户能够持续利用服务器的服务，即使某些应用服务器发生故障，而且在服务器修复之后，立即可以和其它服务器一起提供服务，所有这些仅仅和服务器群有关，客户端完全不需要知道。正如我们上面所分析的，对于用于大型企业级应用的J2EE服务器来说，集群服务是必要的。当前的应用服务器大多不支持细粒度的集群，并且不支持方法级的请求重定向，以及对象组的状态复制技术，J2EE规范中并没有对集群方面内容做出规定，所以对于集群的实现都是由具体的服务器来各自决定的。As can be seen from the above features that the cluster needs to support, the cluster includes two aspects of load balancing and fault tolerance. Load balancing enables the server group to share the workload, and it adopts a certain load distribution strategy to optimize the performance of the server. And all these jobs are transparent to the client's access without the client's intervention. Ideally, if each server in the cluster has the same processing power, it will carry the same amount of load. Therefore, each load distribution algorithm is related to the performance of the server itself, how to collect and process this information, and what algorithm is used to distribute the load. Fault tolerance employs multiple replicas of data, allowing some degree of error and failure. Ideally, error recovery should be completely transparent to clients. When a server fails, clients that are interacting with this server can be automatically transferred to other servers. The key point of error recovery is that customers can continue to use server services, even if some application servers fail, and after the server is repaired, they can provide services with other servers immediately, all of which are only related to the server group, and the client does not need it at all Know. As we have analyzed above, cluster services are necessary for J2EE servers used in large-scale enterprise applications. Most of the current application servers do not support fine-grained clusters, and do not support method-level request redirection and state replication technology for object groups. The J2EE specification does not specify the content of clusters, so the implementation of clusters is It depends on the specific server.

发明内容 Contents of the invention

本发明的目的在于克服上述现有技术的缺点，提供了一种提高基于构件软件系统可靠性的方法，此方法能够实现多个节点的JToneFrame应用服务器集群在不停止服务的情况下，动态加载、删除集群内的节点；支持水平和垂直分割；对象级和方法级EJB容器负载平衡；支持无状态会话Bean，状态会话Bean和实体Bean的集群。The purpose of the present invention is to overcome the shortcoming of above-mentioned prior art, a kind of method that improves the reliability based on component software system is provided, this method can realize the JToneFrame application server cluster of a plurality of nodes under the situation of not stopping service, dynamic loading, Delete nodes in the cluster; support horizontal and vertical segmentation; object-level and method-level EJB container load balancing; support for stateless session beans, state session beans and entity bean clusters.

为达到上述目的，本发明采用的技术方案是：In order to achieve the above object, the technical scheme adopted in the present invention is:

1)首先由客户通过其客户端组件接口代理发出一个服务器调用请求；1) First, the client sends a server call request through its client component interface agent;

2)遵照EJB2.1规范的EJB组件，在其扩展部署描述符中添加负载均衡和容错功能标签，指示EJB组件支持集群功能和默认采用随机分配负载均衡算法，通过预编译器对EJB组件进行编译，生成相应的EJBHome和EJBObject接口的实现类，并且对其生成实现类的Stub，使用预编译工具改变Stub类原有远程调用的执行逻辑，在每次业务方法调用前，请求服务器端的EJB组件的各个服务器的引用列表，实现请求的再分配；2) For EJB components that comply with the EJB2.1 specification, add load balancing and fault-tolerant function tags in their extended deployment descriptors, instruct the EJB components to support cluster functions and use the random allocation load balancing algorithm by default, and compile the EJB components through the precompiler , generate the corresponding implementation classes of the EJBHome and EJBObject interfaces, and generate the Stub of the implementation class, use the precompilation tool to change the execution logic of the original remote call of the Stub class, and request the EJB component on the server side before each business method call The reference list of each server realizes the redistribution of requests;

3)客户的请求随机由服务器群中任一服务器接收，这个服务器使用底层的服务器组通讯协议和其它服务器进行通讯，根据EJB组件的扩展部署描述符，客户根据自己的需要选择负载参数，如果客户有自己定制的负载参数和分配策略，则装载客户自己定制的参数和策略，然后根据装载定制参数和策略模块，在其它服务器上调用负载收集命令，在各个节点收集负载信息；如果客户没有定制此项，服务器可以装载默认的参数和分配策略；3) The customer's request is randomly received by any server in the server group. This server uses the underlying server group communication protocol to communicate with other servers. According to the extended deployment descriptor of the EJB component, the customer selects the load parameters according to his own needs. If the customer If you have your own customized load parameters and distribution strategies, load the parameters and strategies customized by the customer, and then call the load collection command on other servers to collect load information on each node according to the loaded customized parameters and policy modules; if the customer does not customize this item, the server can load default parameters and allocation strategies;

4)定义负载参数4) Define load parameters

系统基本运行环境参数：在基于J2EE应用服务器中间件系统中，系统基本运行环境参数定义为JVM的堆大小、使用内存；Basic operating environment parameters of the system: In the J2EE-based application server middleware system, the basic operating environment parameters of the system are defined as the heap size of the JVM and the memory used;

组件容器：组件容器包含Servlet容器和EJB容器，Servlet容器参数包含：Servlet运行数量和用户单位事件请求数量，EJB容器参数包括分别为SessionBean组件、EentityBean组件和MessageDriverBean组件而建立的其创建组件数量、运行组件数量、池态组件数量；Component container: The component container includes the Servlet container and the EJB container. The Servlet container parameters include: the number of Servlet running and the number of user unit event requests. The EJB container parameters include the number of created components, the running Number of components, number of pooled components;

其它引用的资源按照其类别分为：数据源资源、JCA资源、JMS资源，定义其连接数量参数、可用连接数量资源和等待引用资源数量；Other referenced resources are divided according to their categories: data source resources, JCA resources, JMS resources, define their connection quantity parameters, available connection quantity resources and waiting reference resource quantities;

5)根据以上的负载参数定义，各节点定时启动负载信息收集，从Java虚拟机中收集数据，动态计算当前时刻的服务器负载量：L_i(t)＝αB(t)+βC(t)+λR(t)，B(t)表示在时刻t系统基本运行环境负载量，C(t)表示组件容器在t时刻的负载量，R(t)表示t时刻资源引用量，α，β，λ表示权重，每个节点将其当前的负载情况L_i(t)返回给请求节点，接收客户请求的节点计算t时刻的服务器群的负载总量为 $L (t) = Σ_{i = 1}^{n} L_{i} (t)$ 和当前时刻的阈值为 $T (t) = \frac{L (t)}{n},$ 其中n为服务器的数量；5) According to the above definition of load parameters, each node regularly starts load information collection, collects data from the Java virtual machine, and dynamically calculates the server load at the current moment: L _i (t)=αB(t)+βC(t)+ λR(t), B(t) represents the load of the basic operating environment of the system at time t, C(t) represents the load of the component container at time t, R(t) represents the amount of resource references at time t, α, β, λ Indicates the weight, each node returns its current load condition L _i (t) to the requesting node, and the node receiving the client request calculates the total load of the server group at time t as $L (t) = Σ_{i = 1}^{no} L_{i} (t)$ and the threshold at the current moment is $T (t) = \frac{L (t)}{no},$ where n is the number of servers;

6)服务器各个节点的Stub依据负载量进行排序，并返回给客户接口代理所有可用的服务器列表；6) The Stubs of each node of the server are sorted according to the load, and the list of all available servers is returned to the client interface agent;

7)当L_i(t)＞T(t)时，客户根据返回的这个列表顺序选择节点，将负载最轻的节点取出引用，并根据服务器引用发出请求，此时请求将会在轻载节点上进行执行；7) When L _i (t)>T(t), the client selects nodes according to the order of the returned list, removes the reference from the node with the lightest load, and sends a request based on the server reference. At this time, the request will be in the light-loaded node execute on

8)如果这个节点能够正确处理请求，没有异常抛出，则同步其它节点的状态信息，以保证下一次请求调用和分配的正确性，如果调用不成功，客户端的请求则返回客户节点列表，根据服务器的节点列表选择下一个轻载的服务器节点，调用EJB组件的业务方法，当节点列表的所有节点都发生故障时，客户调用将不会被正确处理。8) If this node can process the request correctly and no exception is thrown, then synchronize the state information of other nodes to ensure the correctness of the next request call and allocation. If the call is unsuccessful, the client's request will return the list of client nodes, according to The server's node list selects the next light-loaded server node to call the business method of the EJB component. When all nodes in the node list fail, the client call will not be processed correctly.

由于本发明采用分布式负载分配策略，不存在集中的负载分配器，不存在单点失效的可能性，且也不把负载分配器独立出来，而是将负载分配器作为每个节点的一个组件，这样就解决了集中式的问题，把请求的分配分散到各个节点上。Since the present invention adopts a distributed load distribution strategy, there is no centralized load distributor, there is no possibility of a single point of failure, and the load distributor is not independent, but the load distributor is used as a component of each node , which solves the centralized problem and distributes the distribution of requests to each node.

附图说明 Description of drawings

图1是客户端的调用形式图；Figure 1 is a call form diagram of the client;

图2为两个服务器复本节点组成的集群系统。Figure 2 is a cluster system composed of two server replica nodes.

具体实施方式 Detailed ways

下面结合附图对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.

为了实现上述的应用服务器集群与容错技术内容，本发明将实施分为EJB组件客户端和服务器端实施方式两个部分，In order to realize the above-mentioned application server cluster and fault-tolerant technical content, the present invention divides the implementation into two parts, the EJB component client and the server-side implementation,

客户端：client:

首先由客户通过客户端组件接口代理发出一个服务器调用请求；客户端负责请求转发的工作。这些工作具体都是由RMI体系中的客户代理(Stub)来完成的。客户端Stub将来自客户端的业务方法调用请求发送到任一服务器复本节点上，并且通过将实际调用动作上移到其超类中而实现容错。然后由客户端stub进行请求转发，客户端stub必须持有服务器端的信息，因此在每个服务器节点上同步维护整个服务器组的列表视图，而这个信息就保存在成员域server中。server中的具体内容，是此EJB在各个服务器复本节点上生成的客户端stub(包括home stub和remote stub)。客户端stub只需从其中取出任一服务器节点的stub，获得它的remote reference，用它来替代自身的remotereference，然后让业务方法调用发生在现在的remote reference上，即可顺利的将业务方法调用发送到stub对应的服务器复本节点上。First, the client sends a server call request through the client component interface agent; the client is responsible for the work of request forwarding. These tasks are specifically performed by the client agent (Stub) in the RMI system. The client Stub sends the business method call request from the client to any server replica node, and realizes fault tolerance by moving the actual call action up to its superclass. Then the request is forwarded by the client stub. The client stub must hold the server-side information, so the list view of the entire server group is synchronously maintained on each server node, and this information is stored in the member domain server. The specific content in the server is the client stub (including home stub and remote stub) generated by this EJB on each server replica node. The client stub only needs to take out the stub of any server node, obtain its remote reference, use it to replace its own remote reference, and then let the business method call happen on the current remote reference, and the business method can be called smoothly Send to the server replica node corresponding to the stub.

参见图1，1、客户端Stub在加入了getServerList()方法之后，客户端最初没有进行任何方法调用，server中的内容只有一个，就是持有它的客户端stub本身。See Figure 1, 1. After the client Stub has added the getServerList() method, the client does not call any method at first, and there is only one content in the server, which is the client stub itself that holds it.

2、在每次客户端进行业务方法调用之前，客户端Stub都会进行getServerList()方法的调用而获得服务器列表，以集群系统的当前服务器复本列表来更新本地的server内容然后返回服务器列表给客户端Stub。getServerList()方法的实现在服务器端的Home/Remote接口的实现类中，所以客户端stub进行的getServerList()方法调用实际上也是一次远程方法调用。2. Before each client calls a business method, the client stub will call the getServerList() method to obtain the server list, update the local server content with the current server replica list of the cluster system, and then return the server list to the client End Stub. The getServerList() method is implemented in the implementation class of the Home/Remote interface on the server side, so the getServerList() method call made by the client stub is actually a remote method call.

3、当客户端获得了服务器列表后，客户端stub从列表中选取一个节点，再次进行业务方法调用，服务器接收到此业务方法调用后，调用实际部署EJB的Bean实例的方法，然后给客户端返回调用结果。3. When the client obtains the server list, the client stub selects a node from the list and calls the business method again. After receiving the business method call, the server calls the method of actually deploying the Bean instance of the EJB, and then sends the client Return the call result.

为了实现以上功能，可以采用代码自动生成技术，并且定义扩展部署描述符，扩展部署描述符遵照EJB2.1规范的EJB组件，在其扩展部署描述符中添加负载均衡和容错功能标签，指示EJB组件支持集群功能和默认采用随机分配负载均衡算法，通过预编译器对EJB组件进行编译，生成相应的EJBHome和EJBObject接口的实现类，并且对其生成实现类的Stub进行修改，改变原有远程调用的执行逻辑，在每次业务方法调用前，请求服务器端的EJB组件的各个服务器的引用列表，可以实现请求的再分配；In order to achieve the above functions, you can use code automatic generation technology, and define extended deployment descriptors, extended deployment descriptors comply with the EJB2. It supports cluster function and adopts random distribution load balancing algorithm by default, compiles EJB components through pre-compiler, generates corresponding implementation classes of EJBHome and EJBObject interfaces, and modifies the Stub of the generated implementation classes to change the original remote call Execute the logic, before each business method call, request the reference list of each server of the EJB component on the server side, so as to realize the redistribution of the request;

预编译过程通过如下步骤：The precompilation process goes through the following steps:

1、通过部署描述符和反射机制来获取输入的jar文件的元信息，将其与模板文件相结合从而生成EJB实现类的源文件。1. Obtain the meta-information of the input jar file through the deployment descriptor and reflection mechanism, and combine it with the template file to generate the source file of the EJB implementation class.

2、调用javac编译生成的源文件从而得到类文件。2. Call javac to compile the generated source file to obtain the class file.

3、调用rmic来生成相应的stub和skeleton文件。这里保留生成的stub和skeleton的源文件。3. Call rmic to generate corresponding stub and skeleton files. The source files of the generated stub and skeleton are kept here.

4、修改stub和skeleton源文件加入对事务和安全的支持。4. Modify stub and skeleton source files to add support for transactions and security.

5、重新编译修改后的stub和skeleton文件，得到新的类文件。5. Recompile the modified stub and skeleton files to obtain new class files.

6、最后以这些生成的类文件来更新输入的jar文件。6. Finally, update the input jar file with these generated class files.

7、最终输出内容包括实现类、stub和skeleton类和原有客户端定义的类文件的新的jar(ear)文件。7. The final output content includes new jar (ear) files of implementation classes, stub and skeleton classes, and class files defined by the original client.

底层的服务器组通讯协议和其它服务器进行通讯是保障服务器统一视图的标准，是服务器间调用的总线。The underlying server group communication protocol communicates with other servers to ensure a unified view of the server, and it is the bus used by servers.

1、客户的请求由预编译生产stub代理发起；1. The client's request is initiated by the precompiled production stub agent;

2、客户的请求随机由服务器群中某个服务器接收，这个服务器使用底层的服务器组通讯协议和其它服务器进行通讯；2. The client's request is randomly received by a server in the server group, and this server uses the underlying server group communication protocol to communicate with other servers;

3、接收请求的服务器节点向其它服务器发出负载收集命令，在各个节点的负载均衡器发出收集负载信息。底层的收集器收集各个节点的系统基本运行环境参数；组件容器，和其它引用资源三类负载参数，动态计算当前时刻的服务器负载量：L_i(t)＝αB(t)+βC(t)+λR(t)，B(t)表示在时刻t系统基本运行环境负载量，C(t)表示组件容器在t时刻的负载量，R(t)表示t时刻资源引用量，α，β，λ分别表示权重，每个节点将其当前的负载情况L_i(t)返回给请求节点。3. The server node receiving the request sends a load collection command to other servers, and the load balancer of each node sends to collect load information. The underlying collector collects the basic system operating environment parameters of each node; component containers, and other load parameters of reference resources, and dynamically calculates the server load at the current moment: L _i (t) = αB(t) + βC(t) +λR(t), B(t) represents the load of the basic operating environment of the system at time t, C(t) represents the load of the component container at time t, R(t) represents the amount of resource references at time t, α, β, λ represents the weight respectively, and each node returns its current load situation L _i (t) to the requesting node.

4、接收客户请求调用的节点计算t时刻的服务器群的负载总量为 $L (t) = Σ_{i = 1}^{n} L_{i} (t),$ 得到当前时刻的阈值为 $T (t) = \frac{L (t)}{n},$ 经过节点信息收集和负载量计算，将serverList中各个节点的stub按照负载优先级排序，并通过getServerList()方法将经过排序的serverList传送给客户端stub。当前节点的服务器量L_i(t)＞T(t)时客户端stub按照serverList顺序调用。4. The node receiving the call from the client calculates the total load of the server group at time t as $L (t) = Σ_{i = 1}^{no} L_{i} (t),$ Get the threshold at the current moment as $T (t) = \frac{L (t)}{no},$ After node information collection and load calculation, the stubs of each node in the serverList are sorted according to the load priority, and the sorted serverList is sent to the client stub through the getServerList() method. When the number of servers of the current node L _i (t)>T(t), the client stub is invoked in order of serverList.

服务器端Service-Terminal

集群系统服务器端的主要要求，就是要使得每一个服务器复本节点在任意时刻对于一个特定客户端来说都是完全相同的，也即状态一致。根据EJB生命的周期，对一个EJB来说，它的集群系统服务器端有这样3个阶段：The main requirement of the server side of the cluster system is to make each server replica node identical to a specific client at any time, that is, the state is consistent. According to the EJB life cycle, for an EJB, its cluster system server has three stages:

1、创建：EJB在某个节点上调用了创建方法，使得此节点可以接收此EJB的方法调用。那么，此时集群系统中的其它节点也需要进行某些创建工作，使得它们中的任何一个都可以接收此EJB的方法调用。1. Creation: EJB calls the creation method on a certain node, so that this node can receive the method call of this EJB. Then, at this time, other nodes in the cluster system also need to perform some creation work, so that any one of them can receive the method call of this EJB.

2、方法调用(即一个EJB的使用过程)：对于无状态会话Bean，因为Bean不保存状态，所以无需做什么；而对于有状态会话Bean和实体Bean，因为Bean保存状态，所以在每次方法调用时还需要进行状态复制，使得所有节点对于此EJB状态一致，从而可以实现透明迁徙。2. Method call (that is, the use process of an EJB): For stateless session beans, because the beans do not save the state, there is no need to do anything; for stateful session beans and entity beans, because the beans save the state, so in each method State replication is also required when calling, so that all nodes have the same state for this EJB, so that transparent migration can be realized.

3、移除：EJB在某个节点上调用了移除方法，则此节点此后将无法接收此EJB的方法调用请求。集群系统中的其它节点也要进行移除工作，使得每一个节点都无法接收此EJB的方法调用请求。3. Removal: If the EJB invokes the removal method on a certain node, the node will not be able to receive the EJB's method invocation request thereafter. Other nodes in the cluster system also need to be removed, so that each node cannot receive the method call request of this EJB.

每种类型的EJB有不同对于以上的处理有不同处理方式，服务器端的具体设计，将分为有状态会话Bean，实体Bean和无状态会话Bean，三种Bean的服务器端实现基本相比，因此首先对状态会话Bean的实施进行详细描述，再对实体Bean和无状态会话Bean的特殊部分进行补充。Each type of EJB is different. There are different processing methods for the above processing. The specific design of the server will be divided into stateful session beans, entity beans and stateless session beans. The server-side implementations of the three types of beans are basically compared. Therefore, first Describe the implementation of stateful session beans in detail, and then supplement the special parts of entity beans and stateless session beans.

4、状态会话Bean(Stateful Session Bean)4. Stateful Session Bean (Stateful Session Bean)

实现有状态会话Bean的集群有几个具体问题要考虑：首先，需要考虑复制那些状态；其次，进行状态复制时要在不同的节点间具有正确的对应关系；同时，还要考虑钝化和激活操作对于状态复制的影响。容器是EJB组件的运行环境，容器初始化虽然不是一个EJB生命周期的一部分，但是在集群中也担任了重要的工作，即将EnterpriseContextRequestMessage和EnterpriseContextReplicationWrap类实例注册到状态管理器，并将home实现类实例注册到复制管理器以将此节点加入对于home stub的serverList中。There are several specific issues to consider when implementing a stateful session bean cluster: first, you need to consider replicating those states; second, you must have the correct correspondence between different nodes when performing state replication; at the same time, you must also consider passivation and activation The effect of the operation on state replication. The container is the running environment of the EJB component. Although the container initialization is not a part of the EJB life cycle, it also plays an important role in the cluster, that is, registering the EnterpriseContextRequestMessage and EnterpriseContextReplicationWrap class instances with the state manager, and registering the home implementation class instance with the The replication manager adds this node to the serverList for the home stub.

容器的创建过程The container creation process

1、创建过程由客户端调用home接口中的create()方法开始；1. The creation process is started by the client calling the create() method in the home interface;

2、经过实例拦截器时，将会为此创建操作关联一个新的context实例；2. When passing through the instance interceptor, a new context instance will be associated with this creation operation;

3、在容器类中，首先调用Bean实现类实例的ejbCreate()方法。然后赋给context实例一个id，为了在整个集群系统中具有唯一的名称，id改为节点名和只增的长整型值组成，这样的id在集群系统中是唯一的。接下来是启动ejbObject实例，将其发布到rmi端口使其对于客户端可用，同时将ejbObject注册到复制管理器以将此节点加入serverList中。3. In the container class, first call the ejbCreate() method of the Bean implementation class instance. Then assign an id to the context instance. In order to have a unique name in the entire cluster system, the id is changed to a node name and a long integer value that only increases. Such an id is unique in the cluster system. The next step is to start the ejbObject instance, publish it to the rmi port to make it available to the client, and register the ejbObject to the replication manager to add this node to the serverList.

4、方法调用返回实例拦截器时，如果创建成功则将此context插入cache，并向其它节点发送创建的要求，而结果则返回客户端。4. When the method call returns to the instance interceptor, if the creation is successful, the context will be inserted into the cache, and the creation request will be sent to other nodes, and the result will be returned to the client.

业务方法调用过程Business method call process

1、业务方法的调用过程由客户端调用Remote接口中的业务方法开始；1. The calling process of the business method starts from the client calling the business method in the Remote interface;

2、经过实例拦截器时，从cache中取出对应id的context实例，并且在进入下一个拦截器前，首先发送一个保持的消息。接收到这个消息的其它节点，会锁定其上具有同样id值的context实例。因为此次方法调用应该对于所有节点具有同样的影响，所以要防止其它节点上的同值context实例在此节点进行方法调用期间被钝化甚至被删除。2. When passing through the instance interceptor, take out the context instance corresponding to the id from the cache, and send a retained message first before entering the next interceptor. Other nodes receiving this message will lock the context instance with the same id value on it. Because this method call should have the same impact on all nodes, it is necessary to prevent context instances of the same value on other nodes from being passivated or even deleted during the method call on this node.

3、在结束方法调用返回实例拦截器时，无论调用成功与否都要进行状态同步的工作，即将context实例包装之后发送给其它所有节点。3. When the method call returns to the instance interceptor, no matter whether the call is successful or not, the work of state synchronization must be carried out, that is, the context instance will be packaged and sent to all other nodes.

4、其它节点接收到状态同步要求之后的响应工作主要有两方面：一是本节点上context实例的instance引用将指向接收到的instance实例，以同步instance实例的状态变化；二是本节点上的locked值置为接收到的locked值，以同步context实例的调用状态。4. The response work of other nodes after receiving the state synchronization request mainly includes two aspects: one is that the instance reference of the context instance on this node will point to the received instance instance to synchronize the state changes of the instance instance; the other is that the instance reference on this node The locked value is set to the received locked value to synchronize the calling state of the context instance.

5、后创建过程是一个特殊的创建过程。在一个新的节点加入集群系统后，如果它接收到一个EJB的业务方法状态同步要求，那么因为在此新节点上并没有创建此EJB，因此首先要进行后创建过程。5. The post-creation process is a special creation process. After a new node joins the cluster system, if it receives an EJB business method state synchronization request, then because the EJB has not been created on this new node, the post-creation process must first be performed.

移除过程removal process

1、移除过程由客户端调用Remote接口中的remove()方法开始。1. The removal process starts when the client calls the remove() method in the Remote interface.

2、经过实例拦截器时，与一般业务方法调用相同。2. When passing through the instance interceptor, it is the same as the general business method call.

3、在容器类中，首先停止ejbObject实例，使其对于客户端不可用；然后调用ejbRemove()方法；最后将context实例的id置为null。3. In the container class, first stop the ejbObject instance to make it unavailable to the client; then call the ejbRemove() method; finally set the id of the context instance to null.

4、结束方法调用实例拦截器时，因为context实例的id被置为null，所以此context实例将被从cache中移除。4. When the end method calls the instance interceptor, because the id of the context instance is set to null, the context instance will be removed from the cache.

服务器端状态同步Server-side state synchronization

参见图2，图2为两个服务器复本节点组成的集群系统，表示了在node1上进行方法调用以及状态复制的完整过程。图2中分别是前面的创建过程、业务方法调用过程、响应创建过程和响应业务方法状态同步过程。Referring to Figure 2, Figure 2 is a cluster system composed of two server replica nodes, showing the complete process of method invocation and state replication on node1. Figure 2 shows the previous creation process, business method invocation process, response creation process and response business method state synchronization process.

1、客户的方法调用由节点1接收，方法分为两种类型：Home接口的创建方法和业务方法调用。1. The client's method call is received by node 1, and the method is divided into two types: the creation method of the Home interface and the business method call.

2、如果是Home接口的创建方法，首先在节点1进行组件对象创建，然后进行状态同步方法。节点2接收到状态同步方法，判断同步方法的类型，如果是Home创建方法，则在节点2创建相同的对象实例，完成同步处理后，由节点1返回调用结果给客户。2. If it is the creation method of the Home interface, first create the component object on node 1, and then perform the state synchronization method. Node 2 receives the state synchronization method and judges the type of the synchronization method. If it is a Home creation method, it creates the same object instance on Node 2. After the synchronization process is completed, Node 1 returns the call result to the client.

3、如果是业务方法调用，首先根据客户调用方法的Id从Cache中取出对应的对象上下文(EnterpriseContext)，同时发出上下文锁请求(LockingRequest)，节点2从Cache中取出对象上下文，进行锁定操作(setLock)，使其处于等待状态(Wait)。然后节点1处理业务方法调用，完成后，进行状态同步。节点2接收到同步方法调用，进行业务方法的同步操作，同步对象实例的状态属性。完成同步操作后，由节点1返回调用结果给客户。3. If it is a business method call, first take out the corresponding object context (EnterpriseContext) from the Cache according to the Id of the method being called by the client, and at the same time issue a context lock request (LockingRequest), node 2 takes out the object context from the Cache, and performs the locking operation (setLock ) to make it in the waiting state (Wait). Then node 1 processes the business method call, and after completion, performs state synchronization. Node 2 receives the synchronization method call, performs the synchronization operation of the business method, and synchronizes the state attributes of the object instance. After the synchronization operation is completed, node 1 returns the call result to the client.

4、如果节点2已经创建了EJB对象实例，只要同步本次业务方法调用的状态，否则首先在节点2进行EJB对象实例的创建过程，然后和节点1的状态同步。4. If node 2 has already created an EJB object instance, just synchronize the state of this business method call, otherwise firstly create an EJB object instance at node 2, and then synchronize with the state of node 1.

容错过程fault-tolerant process

1、如果客户端调用返回正确结果，则调用完成。1. If the client call returns a correct result, the call is complete.

2、如果客户的调用捕捉到IOException或者为RunTimeException则从ServerList中选取次轻负载节点，继续向其它服务器进行业务方法调用。2. If the client's call catches IOException or RunTimeException, select the second light load node from the ServerList, and continue to make business method calls to other servers.

3、如果ServerList中可用节点为空，并且没有返回正确结果，则调用失败。3. If the available nodes in the ServerList are empty and no correct result is returned, the call fails.

实体Bean(Entity Bean)Entity Bean (Entity Bean)

基本和状态会话Bean的过程相同，不同在于一个有状态会话Bean的生命周期是由create方法调用开始的，而一个实体Bean的生命周期可以由create方法调用开始，也可以由finder方法调用开始。The process is basically the same as that of a stateful session bean, except that the life cycle of a stateful session bean is started by calling the create method, while the life cycle of an entity bean can be started by calling the create method or by calling the finder method.

finder方法又分为单数查找和复数查找过程：单数查找只返回一个ejbObject实例，而复数查找返回一批ejbObject实例。单数查找过程和创建过程很类似，而复数查找过程则只是创建一批ejbObject实例并启动，其它工作则等到具体通过某个ejbObject实例进行方法调用时才进行。因此，这里将create过程和单数finder过程通称单数过程，而将复数查找过程成为复数过程。The finder method is divided into singular search and plural search processes: singular search returns only one ejbObject instance, while plural search returns a batch of ejbObject instances. The singular search process is very similar to the creation process, while the plural search process just creates and starts a batch of ejbObject instances, and other work is performed when a specific ejbObject instance is used to call a method. Therefore, the create process and the singular finder process are referred to as singular processes here, and the plural finder processes are referred to as plural processes.

对于单数过程和复数过程的同步响应也不相同：单数过程和有状态会话Bean的响应创建基本相同，只是并不复制instance实例，仅将valid置为false；复数过程则从ejbObjectIdList中获得id列表，在本地创建这些ejbObject实例并启动，其它工作则等到具体通过某个ejbObject实例进行方法调用时才进行。The synchronous response to the singular process and the plural process is also different: the response creation of the singular process and the stateful session bean are basically the same, but the instance instance is not copied, and only valid is set to false; the plural process obtains the id list from ejbObjectIdList, These ejbObject instances are created and started locally, and other work is performed when a method call is made through an ejbObject instance.

业务方法调用过程Business method call process

可以说与有状态会话Bean的业务方法调用过程是相同的。唯一的不同就是不复制instance实例，只是将本地context实例的valid置为false，强迫它进行方法调用之前从数据库获得同步It can be said that it is the same as the business method invocation process of a stateful session bean. The only difference is that the instance instance is not copied, but the valid of the local context instance is set to false, forcing it to obtain synchronization from the database before making a method call

无状态会话Bean(Stateless Session Bean)Stateless Session Bean (Stateless Session Bean)

无状态会话Bean由于不持有状态，所以与前两种Bean有着较大的区别。但是基本的设计并没有大的差别，下面仍然只叙述不同于有状态会话Bean的部分。Stateless session beans are quite different from the first two types of beans because they do not hold state. But there is no big difference in the basic design, and only the parts different from stateful session beans are described below.

容器初始化container initialization

无状态会话Bean的容器在启动之后即将自身实例挂载到一个全局访问点上。这个全局访问点即静态成员变量containerList，它持有一个节点上所有已经启动的无状态会话Bean容器类实例。The container of the stateless session bean will mount its own instance to a global access point after it starts. This global access point is the static member variable containerList, which holds all the started stateless session bean container class instances on a node.

业务方法调用和后创建Business Method Invocation and Post-Creation

对于没有状态的无状态会话Bean来说，是无需状态同步的，所以在进行业务方法调用时将不会对其它节点提出任何状态同步要求。因为这个原因，无状态会话Bean的后创建工作无法由Bean本身来主动发起，因为Bean完全无法知道是否有新的节点加入。For a stateless session bean without state, there is no need for state synchronization, so when calling a business method, it will not make any state synchronization requirements for other nodes. For this reason, the post-creation work of a stateless session bean cannot be initiated by the bean itself, because the bean has no way of knowing whether a new node has joined.

所以，无状态会话Bean的后创建工作可以采用了一种比较曲折的方式：当集群中的节点感觉到有新的节点加入时，它就会通知所有已经在本节点启动的无状态会话Bean容器，而相对于这些容器的Bean将在下一次业务方法调用(remove()方法除外)时向其它节点发出创建的要求。接收到创建要求的节点则检查自身，如果已经创建过了就置之不理，否则就进行后创建工作。Therefore, the post-creation work of the stateless session bean can adopt a more tortuous way: when a node in the cluster senses that a new node has joined, it will notify all the stateless session bean containers that have been started on this node , and the Bean relative to these containers will issue a creation request to other nodes when the next business method call (except the remove() method). The node that receives the creation request checks itself, and ignores it if it has already been created, otherwise it performs post-creation work.

以上对应用服务器集群几个关键实施技术进行了详细的说明，并且这些实施过程在JToneCluster集群服务器中都得到了验证，应用服务器只要遵循以上的技术实施方案都可以实现高可靠和高性能的服务器群。Several key implementation technologies of application server clusters have been described in detail above, and these implementation processes have been verified in JToneCluster cluster servers. As long as application servers follow the above technical implementation solutions, they can achieve highly reliable and high-performance server clusters .

Claims

1. A method for improving the reliability of component software systems, characterized in that:

1) First, the client sends a server call request through its client component interface agent;

2) For EJB components that comply with the EJB2.1 specification, add load balancing and fault-tolerant function tags in their extended deployment descriptors, instruct the EJB components to support cluster functions and use the random allocation load balancing algorithm by default, and compile the EJB components through the precompiler , generate the corresponding implementation classes of the EJBHome and EJBObject interfaces, and generate the Stub of the implementation class, use the precompilation tool to change the execution logic of the original remote call of the Stub class, and request the EJB component on the server side before each business method call The reference list of each server realizes the redistribution of requests;

3) The customer's request is randomly received by any server in the server group. This server uses the underlying server group communication protocol to communicate with other servers. According to the extended deployment descriptor of the EJB component, the customer selects the load parameters according to his own needs. If the customer If you have your own customized load parameters and distribution strategies, load the parameters and strategies customized by the customer, and then call the load collection command on other servers to collect load information on each node according to the loaded customized parameters and policy modules; if the customer does not customize this item, the server can load default parameters and allocation strategies;

4) Define load parameters

Basic operating environment parameters of the system: In the J2EE-based application server middleware system, the basic operating environment parameters of the system are defined as the heap size of the JVM and the memory used;

Component container: The component container includes the Servlet container and the EJB container. The Servlet container parameters include: the number of Servlet running and the number of user unit event requests. The EJB container parameters include the number of created components, the running Number of components, number of pooled components;

Other referenced resources are divided according to their categories: data source resources, JCA resources, JMS resources, define their connection quantity parameters, available connection quantity resources and waiting reference resource quantities;

5) According to the above definition of load parameters, each node regularly starts load information collection, collects data from the Java virtual machine, and dynamically calculates the server load at the current moment: L _i (t)=αB(t)+βC(t)+ λR(t), B(t) represents the load of the basic operating environment of the system at time t, C(t) represents the load of the component container at time t, R(t) represents the amount of resource references at time t, α, β, λ Indicates the weight, each node returns its current load condition L _i (t) to the requesting node, and the node receiving the client request calculates the total load of the server group at time t as

L (t) = Σ_{i = 1}^{no} L_{i} (t)

and the threshold at the current moment is

T (t) = \frac{L (t)}{no},

where n is the number of servers;

6) The Stubs of each node of the server are sorted according to the load, and the list of all available servers is returned to the client interface agent;

7) When L _i (t)>T(t), the client selects nodes according to the order of the returned list, removes the reference from the node with the lightest load, and sends a request based on the server reference. At this time, the request will be in the light-loaded node execute on

8) If this node can process the request correctly and no exception is thrown, then synchronize the state information of other nodes to ensure the correctness of the next request call and allocation. If the call is unsuccessful, the client's request will return the list of client nodes, according to The server's node list selects the next light-loaded server node to call the business method of the EJB component. When all nodes in the node list fail, the client call will not be processed correctly.