KR20040076385A

KR20040076385A - An apparatus and method for mirroring file system of high availability system

Info

Publication number: KR20040076385A
Application number: KR1020030011715A
Authority: KR
Inventors: 원수영
Original assignee: 엘지엔시스(주)
Priority date: 2003-02-25
Filing date: 2003-02-25
Publication date: 2004-09-01

Abstract

본 발명은, 파일시스템의 장애로 인한 업무 인계횟수를 감소시키고 서비스의 가용시간을 증대시키기 위한 고가용성 시스템의 파일시스템 미러링장치 및 미러링방법에 관한 것으로서, 본 발명에 따른 고가용성 시스템의 파일시스템 미러링방법은, 각 노드의 메모리에 각각 저장되어 있는 업무 상태정보 및 상기 각 노드에 대한 상태정보에 근거하여 요청된 업무를 수행하는 제 1단계; 상기 업무 수행도중, 상기 업무 구성요소인 파일시스템 각각의 파일 상태변화를 감지하여 해당 파일을 지정된 파일시스템에 미러링하는 제 2단계; 상기 파일시스템의 장애여부를 판단하여 그 판단결과를 메모리에 저장하는 제 3단계; 및 상기 장애로 판단된 파일시스템의 장애종류에 따라 장애를 처리하는 제 4단계를 포함하여 이루어져, 파일시스템의 장애 복구로 서비스 어플리케이션의 운용 신뢰도를 향상시키고, 디스크의 효율적 구성과, 장애 대응시간이 단축되어 요청된 서비스를 보다 신속히 제공할 수 있는 매우 유용한 발명인 것이다.The present invention relates to a file system mirroring apparatus and a mirroring method of a high availability system for reducing the number of business take-over due to the file system failure and to increase the available time of the service, the file system mirroring of the high availability system according to the present invention The method includes: a first step of performing a requested task based on task state information stored in a memory of each node and state information of each node; A second step of detecting a file state change of each of the file system which is the business component and mirroring the file to a designated file system during the work execution; Determining whether the file system is in an error state and storing a result of the determination in a memory; And a fourth step of handling the failure according to the type of failure of the file system determined to be the failure, improving the operational reliability of the service application by recovering the failure of the file system, and efficiently configuring the disk and response time of the failure. It is a very useful invention that can be shortened to provide the requested service more quickly.

Description

An apparatus and method for mirroring file system of high availability system

본 발명은, 파일시스템의 장애로 인한 업무 인계횟수를 감소시키고 서비스의 가용시간을 증대시키기 위한 고가용성 시스템의 파일시스템 미러링장치 및 미러링방법에 관한 것이다.The present invention relates to a file system mirroring apparatus and a mirroring method of a high availability system for reducing the number of business take-over due to the file system failure and to increase the available time of the service.

도1은 N대의 서버시스템(Node)으로 구성되는 고가용성(HA : High Availability) 시스템(100)의 네트워크 구성을 도시한 것으로서, 상기 N대의 노드(A Node, B Node, C Node,.., N-1 Node, N Node) 각각은 도2에 도시한 바와 같이 노드 매니저(NM : Node Manager)(10)와, 업무 매니저(TM : Task Manager)(20)와, 정보 매니저(IM : Information Manager)(30), 그리고 Task 상태정보 및 노드정보 등을 저장하기 위한 공유 메모리(Shared Memory)(40)를 포함하여 구성된다.1 illustrates a network configuration of a high availability (HA) system 100 including N server systems (Nodes), wherein the N nodes (A Node, B Node, C Node, .., Each of the N-1 nodes and N nodes is a node manager (NM) 10, a task manager 20, and an information manager IM as shown in FIG. 30, and a shared memory 40 for storing task status information and node information.

상기 고가용성 시스템(100)은 외부 클라이언트의 서비스(업무) 요청에 응대하기 위해, 각 노드의 운용상태를 노드 상호간에 지속적으로 확인하여 운용중인 서비스에 문제가 발생하였는지, 또는 운용중인 노드에 문제가 발생하였는지를 확인하여 문제가 발생한 경우 지정된 노드로 요청된 서비스 권리를 전환시켜 연속적인 서비스를 할 수 있도록 하는데, 이는 각 노드에 위치한 3개의 매니지먼트 데몬(Management Daemon), 즉 상기 NM(10), TM(20), IM(30)에 의해 처리된다.The high availability system 100 continuously checks each node's operation status between nodes in order to respond to a service (business) request of an external client, and whether there is a problem with a running service or a problem with a running node. If a problem occurs, it is possible to perform continuous service by switching the requested service right to a designated node, in which case there are three management daemons located in each node, that is, the NM 10 and TM ( 20), it is processed by the IM 30.

도2에는 상기 고가용성 시스템(100)에 의한 노드의 동작과정이 도시되어 있는데, 설명의 편의를 위해 A 노드와 B 노드의 동작과정만을 도시하였다.2 shows an operation process of a node by the high availability system 100. For convenience of description, only an operation process of a node A and a node B is shown.

도2에서와 같이, 우선 HA 마스터(Master)가 HA 시작하여 상기 NM(10), TM(20), IM(30)을 각각 동작(Active)시키는 S1단계, 상기 TM(20)이 수집한 Task 상태정보를 상기 공유 메모리(40)에 저장하는 S2단계, IM(30)이, 연결된 타 노드의 노드정보를 수신하는 S3단계, IM(30)이, 수신한 상기 노드정보를 상기 공유 메모리(40)에 저장하는 S4단계, NM(10)이, 상기 공유 메모리(40)에 저장된 로컬(Local) Task의 상태정보를 참조하는 S5단계, NM(10)이, 상기 공유 메모리(40)에 저장된 타 노드의 상태정보를 리드하는 S6단계, NM(10)이, S5 및 S6단계에서 읽어온 정보를 바탕으로 정리된 자신의 노드 전체정보를 연결된 타 노드로 멀티캐스팅(Multi-casting)하는 S7단계, TM(20)이, NM(10)으로부터 전송되어온 정보에 따라 Task를 관리(수행, 중지, 감시)하는 S8단계, NM(10)이, 의사결정을 위해 IM(30)의 정보를 참조하는 S9단계, 그리고 NM(10)과 IM(30)의 장애시, TM(20)이 재수행하는 S10단계를 포함하여 이루어진다.As shown in FIG. 2, first, an HA master starts HA, and then steps S1 of activating the NM 10, the TM 20, and the IM 30, respectively, the task collected by the TM 20. Step S2 of storing the state information in the shared memory 40, step S3 of the IM 30 receiving node information of another connected node, and the IM 30 receiving the node information received from the shared memory 40 In step S4, the NM 10 refers to the state information of the local task stored in the shared memory 40, the step S5 refers to the state information of the local task stored in the shared memory 40, Step S6, which leads the state information of the node, NM (10), S7 step of multi-casting the entire information of their own node based on the information read in the steps S5 and S6 (Multi-casting) to other nodes connected, Step S8 in which the TM 20 manages (performs, suspends, monitors) a task according to the information transmitted from the NM 10, and the NM 10 refers to the information of the IM 30 for decision making. Step S9, and when the NM (10) and the IM (30) is disabled, the step S10 is performed by the TM (20).

상기한 바와 같이, HA를 시작하면 NM(10), TM(20), IM(30)이 각각 동작하며, 문제 발생시 시스템 구성파일(Configuration File)에 지정된 순서에 따라 해당 Task가 인계(Takeover)되는데, 이를 상세히 설명하면 다음과 같다.As described above, when the HA starts, the NM 10, the TM 20, and the IM 30 operate, respectively, and when a problem occurs, the task is taken over in the order specified in the system configuration file. This will be described in detail as follows.

도1에서와 같이, 각 노드가 할당된 Task(a∼m)를 각각 수행하고 있는 상태에서, 문제가 발생하지 않는다면 각 노드간에 Task 인계는 이루어지지 않는다. 즉 외부 클라이언트가 a 서비스를 요청하면, A 노드는 자신이 처리해야 됨을 인지하고 서비스 요청에 대한 적절한 응답을 하게 된다. 그러나, 만약 서비스 요청에 대한응답이 없을 경우 상기 고가용성 시스템(100)은 구성파일에 지정된 순서에 따라 A 노드의 Task(IP, FS, AP)를 타 노드로 인계하게 된다.As shown in Fig. 1, in the state where each node performs each assigned task (a to m), if no problem occurs, no task takeover is performed between the nodes. That is, when an external client requests a service, node A knows that it should handle it and responds appropriately to the service request. However, if there is no response to the service request, the high availability system 100 takes over the task (IP, FS, AP) of the node A to another node in the order specified in the configuration file.

상기 Task 인계 상황이 발생하기 전에, 상기 A 노드의 TM(20)은 자신의 Task가 사장(Killed)된 상태를 확인하고 재시작(Restart)을 수행하는데, 이때 문제가 발생하지 않는다면 Task 인계는 발생하지 않지만, 네트워크 장애 등 해결할 수 없는 상황이 발생하면 상기 구성파일에 지정된 순서에 따라 Task를 인계하게 된다.Before the task takeover situation occurs, the TM node 20 of the A node checks the status of its own task and kills it. If the problem does not occur, the task takeover does not occur. However, if a situation that cannot be solved such as a network failure occurs, the task is turned over in the order specified in the configuration file.

상기 TM(20)은 항상 모니터링 상태에 있어, 외부로부터 서비스(Task) 요청이 있게 되면 즉시 전술한 상기의 동작을 수행하게 된다.The TM 20 is always in a monitoring state, and upon receiving a service request from the outside, the TM 20 immediately performs the above-described operation.

만약, Task가 아니라 노드에 문제가 발생한 경우(서비스 요청에 관계없이)에는, 즉 허트비트(Heartbeat)를 통해 항상 노드간에 노드 액티브(Active) 상태를 1차적으로 확인하여 시그널을 확인하게 되면, 상대 노드는 액티브 상태이므로 Task 인계는 이루어지지 않는다. 그러나, 노드가 사장된 상태 또는 허트비트 케이블 단락 등으로 인해 시그널을 확인할 수 없는 경우에는, Task를 인계하고자 하는 노드에서 2차적으로 핑(Ping)을 사용하여 확인하게 되는데, 이때 정상상태의 응답을 받게 되면 상대 노드가 액티브한 상태이므로 Task 인계는 이루어지지 않는다. 그리고 마지막으로, 상기 Task 구성요소인 파일시스템(File System)을 확인하게 된다.If a problem occurs in a node other than a task (regardless of the service request), that is, if the signal is checked first by checking the node active state between nodes through the heartbeat, Since the node is active, no task takeover occurs. However, if the node cannot be confirmed due to the dead state or short-circuit of the heartbeat cable, the node that wants to take over the task is confirmed by pinging secondly. When received, the task is not taken over because the other node is active. And finally, it checks the file system (File System) that is the Task component.

그리고, 허트비트 이상, 핑 이상이 있는 경우에는 SCSI 채널을 통해 상대 노드의 파일시스템을 확인하는데, 만약 이 파일시스템이 작동하고 있다면 단지 노드간에 허트비트와 핑으로 확인할 수 없다는 것뿐, 상대 노드는 정상적인 서비스를 하고 있다고 볼 수 있으며, 만약 상기에서 파일시스템의 작동을 확인할 수 없는 경우에는 노드가 사장된 상태이므로 구성파일에 정해진 순서에 따라 상대 노드가 Task를 인계하게 된다.If there is more than a heartbeat or ping, it checks the filesystem of the other node through the SCSI channel. If this filesystem is active, it is simply not able to verify with the heartbeat and ping between nodes. It can be said that normal service is performed. If the file system operation cannot be confirmed above, the node takes over because the node is dead.

한편, 도1 및 도2의 고가용성 시스템(100)에서 이루어지는 파일시스템 장애 검사방법은 노드의 마운트(Mount) 정보를 유지하고 있는 파일을 검색하는 것으로서, 만약 마운트 포인트가 마운트되어 있지 않다면 파일시스템 장애로 인식하게 되는데, 이와 같은 경우 파일시스템의 장애는 인식되지만 실제 항상 Task 인계가 발생하는 것은 아니다. Task 인계는, 파일시스템의 마운트되지 않은 상태가 서비스 어플리케이션(Service Application)의 운용에 영향을 미치는 경우에만 발생하는데 이는 서비스 어플리케이션의 장애로 인식되어 대응하는 경우이다.On the other hand, the file system failure checking method performed in the high availability system 100 of FIGS. 1 and 2 searches for a file holding the mount information of the node. If the mount point is not mounted, the file system failure is detected. In this case, file system failure is recognized, but task takeover does not always occur. Task takeover occurs only when the unmounted state of the file system affects the operation of the service application, which is recognized as a failure of the service application.

상기 서비스 어플리케이션(Service Application)의 운용에 영향을 미치는 경우란, 구성파일에 서비스 어플리케이션의 상태를 검사하도록 등록된 데몬(Daemon)의 실행에 영향을 미치는 경우로 서비스 어플리케이션의 실행파일을 갖는 파일시스템이 마운트되어 있지 않은 경우를 의미한다.The case of affecting the operation of the service application is a case in which a file system having an executable file of a service application is affected when the daemon is registered to check the state of the service application in a configuration file. It is not mounted.

만약, 마운트되지 않은 파일시스템이 데이터만 가지고 이 데이터가 서비스 어플리케이션에 의해서 기록되어진다 해도 서비스 어플리케이션 데몬은 실행 중이므로 Task 인계는 발생하지 않으며, 이 경우 서비스 어플리케이션이 실행 중이라도 정상적인 서비스 운용이라 할 수 없으며, 이상에서와 같이 종래에는 파일시스템의 장애로 인한 고가용성 시스템(100)의 Task 인계가 빈번하여 서비스 가용시간이 감소되는 문제점이 있었다.If the unmounted file system contains only data and this data is recorded by the service application, the task application does not occur because the service application daemon is running. In this case, even if the service application is running, normal service operation is not possible. As described above, the task availability of the high availability system 100 due to the failure of the file system is frequent, so that the service availability time is reduced.

따라서, 본 발명은 상기와 같은 문제점을 해결하기 위하여 창작된 것으로서, 고가용성 시스템에 있어서 파일시스템 미러링을 통해 파일시스템의 장애로 인한 업무 인계횟수를 감소시키고 서비스 가용시간을 증대시키도록 하는 고가용성 시스템의 파일시스템 미러링장치 및 미러링방법을 제공하는 데 그 목적이 있는 것이다.Accordingly, the present invention was created to solve the above problems, and in the high availability system, a high availability system that reduces the number of business takeovers due to file system failures and increases service availability time through file system mirroring. Its purpose is to provide a file system mirroring device and a mirroring method of the system.

도1은 N대의 서버시스템으로 구성되는 고가용성 시스템의 네트워크 구성을 도시한 것이고,Figure 1 shows a network configuration of a high availability system consisting of N server systems,

도2는 도1 시스템을 구성하는 노드의 상세구성 및 동작과정을 도시한 것이고,FIG. 2 illustrates a detailed configuration and operation process of a node constituting the system of FIG.

도3은 본 발명에 따른 고가용성 시스템의 파일시스템 미러링장치의 상세구성 및 동작과정을 도시한 것이고,Figure 3 shows the detailed configuration and operation of the file system mirroring apparatus of the high availability system according to the present invention,

도4는 도3의 파일시스템 매니저를 구성하는 파일시스템 미러링부와, 파일시스템 보고부, 업무 매니저를 구성하는 파일시스템 장애 확인부, 그리고 노드 매니저를 구성하는 장애 운용부의 내부구성 및 동작과정을 도시한 것이다.4 is a diagram illustrating an internal configuration and operation process of a file system mirroring unit constituting the file system manager of FIG. 3, a file system reporting unit, a file system failure checking unit constituting the task manager, and a failure management unit constituting the node manager. It is.

※ 도면의 주요부분에 대한 부호의 설명※ Explanation of code for main part of drawing

10 : 노드 매니저(Node Manager) 11 : 파일상태 감지부10: Node Manager 11: File status detection unit

12 : 변경파일 저장부 13 : 처리부12: change file storage unit 13: processing unit

14 : 미러링파일 저장부 15 : 미러링데이터 저장부14: mirroring file storage unit 15: mirroring data storage unit

16 : 미러링결과 출력부 20 : 업무 매니저(Task Manager)16: mirroring result output unit 20: task manager

21 : 미러링결과 입력부 22 : 장애 진단결과 입력부21: mirroring result input unit 22: fault diagnosis result input unit

23 : 장애 처리결과 입력부 24 : 결과화면 출력부23: Error processing result input unit 24: Result screen output unit

30 : 정보 매니저(Information Manager)30: Information Manager

31 : 파일시스템 검사부 32 : 시스템 시그널 수신/해석부31: file system check unit 32: system signal receiving / analysis unit

33 : 장애 판정부 34 : 장애진단 출력부33: failure determination unit 34: failure diagnosis output unit

40 : 공유 메모리(Shared Memory) 41 : 장애 입력부40: Shared Memory 41: Fault Input

42 : 장애 종류/처리 결정부 43 : 파일시스템 재설정부42: failure type / process determination unit 43: file system reset unit

44 : 장애처리 출력부44: fault handling output unit

50 : 파일시스템 매니저(File System Manager)50: File System Manager

51 : 파일시스템 미러링부(File System Mirroring Part)51: File System Mirroring Part

52 : 파일시스템 보고부(File System Reporting Part)52: File System Reporting Part

53 : 파일시스템 장애 확인부(File System Fault Checking Part)53: File System Fault Checking Part

54 : 파일시스템 장애 운용부(File System Fault Handing Part)54: File System Fault Handing Part

100 : 고가용성(High Availability) 시스템100: High Availability System

상기와 같은 목적을 달성하기 위한 본 발명에 따른 고가용성 시스템의 파일시스템 미러링장치는, 로컬 노드의 업무를 수행 또는 중지하는 업무 운용작업과 업무 운용상태 감시작업을 수행하는 업무 매니저; 타 노드로부터 멀티캐스팅된 상태정보 수신작업과 수신데이터 검사작업, 수신데이터를 메모리에 저장하는 작업을 수행하는 정보 매니저; 로컬 노드의 상태정보를 수집하여 이를 타 노드로 멀티캐스팅하는 작업을 수행하는 노드 매니저; 및 로컬 업무의 서비스 파일시스템으로 지정된 파일시스템을 지정된 파일시스템으로 미러링하는 작업과, 상기 서비스 파일시스템 검사결과와 상기 서비스 파일시스템의 장애 대응결과를 출력하는 작업을 수행하는 파일시스템 매니저를 포함하여 구성되는 것에 그 특징이 있는 것이며,File system mirroring apparatus of a high availability system according to the present invention for achieving the above object, the task manager for performing a task operation task or a task operation state monitoring task to perform or stop the work of the local node; An information manager for performing a task of receiving multicasted state information from another node, a task of inspecting a received data, and storing the received data in a memory; A node manager which collects state information of the local node and multicasts the state information to another node; And a file system manager for performing a task of mirroring a file system designated as a service file system of a local operation to a designated file system, and outputting the service file system check result and a failure response result of the service file system. It is characterized by being

또한, 본 발명에 따른 고가용성 시스템의 파일시스템 미러링방법은, 각 노드의 메모리에 각각 저장되어 있는 업무 상태정보 및 상기 각 노드에 대한 상태정보에 근거하여 요청된 업무를 수행하는 제 1단계; 상기 업무 수행도중, 상기 업무 구성요소인 파일시스템 각각의 파일 상태변화를 감지하여 해당 파일을 지정된 파일시스템에 미러링하는 제 2단계; 상기 파일시스템의 장애여부를 판단하여 그 판단결과를 메모리에 저장하는 제 3단계; 및 상기 장애로 판단된 파일시스템의 장애종류에 따라 장애를 처리하는 제 4단계를 포함하여 이루어지는 것에 그 특징이 있는 것이다.In addition, the file system mirroring method of the high availability system according to the present invention, the first step of performing the requested task based on the state information for each node and the state information stored in the memory of each node; A second step of detecting a file state change of each of the file system which is the business component and mirroring the file to a designated file system during the work execution; Determining whether the file system is in an error state and storing a result of the determination in a memory; And a fourth step of handling the failure according to the type of failure of the file system determined to be the failure.

이하, 본 발명에 따른 고가용성 시스템의 파일시스템 미러링장치 및 미러링방법의 일 실시예에 대해, 첨부된 도면에 의거하여 상세히 설명한다.Hereinafter, an embodiment of a file system mirroring apparatus and a mirroring method of a high availability system according to the present invention will be described in detail with reference to the accompanying drawings.

우선, 본 발명에 따른 고가용성 시스템의 파일시스템 미러링(Mirroring)장치는 도1의 노드 각각에 구성되되, 상기 각 노드에 구성되는 파일시스템 미러링장치는 도3에 도시한 바와 같이, 도2의 노드 매니저(NM)(10), 업무 매니저(TM)(20), 정보 매니저(IM)(30), 공유 메모리(40) 외에, 파일시스템 매니저(FSM : File System Manager)(50)를 더 포함하여 구성된다.First, the file system mirroring apparatus of the high availability system according to the present invention is configured in each node of FIG. 1, and the file system mirroring apparatus configured in each node is shown in FIG. In addition to the manager (NM) 10, the work manager (TM) 20, the information manager (IM) 30, and the shared memory 40, a file system manager (FSM) 50 is further included. It is composed.

상기 고가용성 시스템(100)의 각 노드간은 정보전송을 위해 허트비트로 상호 연결되며, 외부로의 서비스는 연결된 서비스 네트워크를 통해서 수행되며, 이와 같은 연결은 시스템 노드에 장착된 네트워크 카드에 의해 이루어진다.Each node of the high availability system 100 is interconnected by the heartbeat for information transmission, the service to the outside is performed through the connected service network, such a connection is made by a network card mounted on the system node.

그리고, 상기 고가용성 시스템(100)의 디스크는 파일시스템의 이중화를 위해 여분의 파일시스템이 하나씩 필요하다.In addition, the disk of the high availability system 100 requires one extra file system for the duplication of the file system.

한편, 도3에는 상기 고가용성 시스템(100)에 의한 노드의 동작과정이 도시되어 있는데, 설명의 편의를 위해 A 노드와 B 노드의 동작과정만을 도시하였다.Meanwhile, FIG. 3 illustrates an operation process of a node by the high availability system 100. For convenience of description, only an operation process of a node A and a node B is shown.

본 발명에 따른 파일시스템 미러링장치가 구성되는 상기 고가용성시스템(100)에서는, 우선 HA 마스터(Master)가 HA 시작하여 상기 NM(10), TM(20), IM(30), FSM(50) 데몬을 수행하는 S1단계, 상기 TM(20)이 수집한 Task 상태정보를 상기 공유 메모리(40)에 저장하는 S2단계, IM(30)이, 연결된 타 노드의 노드정보를 수신하는 S3단계, IM(30)이, 수신한 상기 노드정보를 상기 공유 메모리(40)에 저장하는 S4단계, NM(10)이, 상기 공유 메모리(40)에 저장된 로컬 Task의 상태정보를 참조하는 S5단계, NM(10)이, 상기 공유 메모리(40)에 저장된 타 노드의 상태정보를 리드하는 S6단계, NM(10)이, S5 및 S6단계에서 읽어온 정보를 바탕으로 정리된 자신의 노드 전체정보를 연결된 타 노드로 멀티캐스팅(Multi-casting)하는 S7단계, TM(20)이, NM(10)으로부터 전송되어온 정보에 따라 Task를 관리(수행, 중지, 감시)하는 S8단계, NM(10)이, 의사결정을 위해 IM(30)의 정보를 참조하는 S9단계, NM(10)과 IM(30)의 장애시, TM(20)이 재수행하는 S10단계, TM(20)이, 파일시스템 검사후 그 결과를 상기 공유 메모리(40)에 저장하는 S11단계, NM(10)이, 상기 공유 메모리(40)에서 파일시스템 장애 확인후 장애를 처리하는 S12단계, FSM(50)이, 상기 공유 메모리(40)로부터 파일시스템 관련 결과값을 읽어 출력하는 S13단계를 포함하여 이루어진다.In the high availability system 100 in which the file system mirroring apparatus according to the present invention is configured, the HA master first starts HA, and the NM 10, TM 20, IM 30, and FSM 50. Step S1 for performing the daemon, step S2 for storing the task state information collected by the TM 20 in the shared memory 40, step S3 for the IM 30 to receive node information of another connected node, IM Step S4 of storing the received node information in the shared memory 40, step S5 of NM 10 referencing state information of a local task stored in the shared memory 40, and NM (30). 10), the step S6 of reading the status information of the other node stored in the shared memory 40, the NM (10) is connected to the entire node information of the own node organized based on the information read in the steps S5 and S6 In step S7 of multicasting to a node, the TM 20 manages a task according to information transmitted from the NM 10 (performs, stops, and monitors). In step S8, the NM 10 refers to the information of the IM 30 for decision making. In the step S9, when the NM 10 and the IM 30 fail, the step S10 performed by the TM 20, TM Step 20 in which S20 stores the result in the shared memory 40 after checking the file system, and step S12 in which the NM 10 processes the failure after checking the file system failure in the shared memory 40, FSM. 50 includes a step S13 of reading a file system related result value from the shared memory 40 and outputting the result.

도4는 상기 FSM(50)을 구성하는 파일시스템 미러링부(File System Mirroring Part)(51)와, 파일시스템 보고부(File System Reporting Part)(52), 상기 TM(20)을 구성하는 파일시스템 장애 확인부(File System Fault Checking Part)(53), 그리고 상기 NM(10)을 구성하는 장애 운용부(File System Fault Handing Part)(54)의 내부구성 및 동작과정을 도시한 것으로서, 이들에 대한 내부 상세구성 및 동작과정은하기에서 상세히 설명하기로 한다.4 is a file system mirroring part 51 constituting the FSM 50, a file system reporting part 52, and a file system constituting the TM 20. FIG. The internal configuration and operation process of the File System Fault Checking Part 53 and the File System Fault Handing Part 54 constituting the NM 10 are described. Detailed internal configuration and operation process will be described in detail later.

상기 파일시스템 미러링부(51)는 상기 고가용성 시스템(100)의 파일시스템 변경시마다 여분으로 마련된 파일시스템에 미러링(백업)하는 부분이고, 상기 파일시스템 장애 확인부(53)는 전술한 종래의 파일시스템 검사방법을 포함하여 파일시스템의 각종 장애를 검사하고, 이에 따른 장애여부를 상기 공유 메모리(40)에 저장하는 부분이다.The file system mirroring unit 51 is a part for mirroring (backup) the redundant file system whenever a file system of the high availability system 100 is changed, and the file system failure checking unit 53 is the conventional file described above. The system checks various failures of the file system, including a system check method, and stores the failures in the shared memory 40 accordingly.

또한 상기 파일시스템 장애 운용부(54)는 상기 공유 메모리(40)에 저장된 정보로부터 파일시스템의 장애여부를 판독하여 장애 발생시 미러링 파일시스템으로 Task를 복구시키는 부분이며, 상기 파일시스템 보고부(52)는 상기 파일시스템 미러링부(51), 파일시스템 장애 확인부(53), 파일시스템 장애 운용부(54)의 결과를 사용자에게 보고하는 부분이다.In addition, the file system failure management unit 54 is a part for restoring a task to a mirroring file system in the event of a failure by reading whether the file system has failed from the information stored in the shared memory 40, the file system report unit 52 Denotes a part of reporting the results of the file system mirroring unit 51, the file system failure checking unit 53, and the file system failure managing unit 54 to the user.

그리고, 도4에 도시된 파일시스템(#1, #2, #3, #4)은 도1의 공유 하드디스크((Shared Hard Disk)의 일부영역에 위치하게 된다.In addition, the file systems # 1, # 2, # 3, and # 4 shown in FIG. 4 are located in a partial region of the shared hard disk of FIG.

이하에서는, 도3 및 도4의 구성 및 동작과정을 참조하여 본 발명에 따른 고가용성 시스템의 파일시스템 미러링방법에 대해 상세히 설명하기로 한다.Hereinafter, a file system mirroring method of a high availability system according to the present invention will be described in detail with reference to the configuration and operation of FIGS. 3 and 4.

우선, HA 마스터가 상기 NM(10), TM(20), IM(30), FSM(50)을 각각 동작시키면, 상기 동작되는 각 매니저들은 상기 고가용성 시스템(100) 운용을 위한 정보전송 및 정보입수를 시작한다.First, when the HA master operates the NM 10, the TM 20, the IM 30, and the FSM 50, respectively, the managers that operate operate the information transmission and information for operating the high availability system 100. Start the acquisition.

상기 TM(20)은, 상기 NM(10)의 명령에 따라 로컬 노드의 Task를 수행하거나중지하는 Task 운용작업과 Task의 운용상태 감시작업을 담당하고, 상기 IM(30)은 타 노드로부터 멀티캐스팅된 상태정보 수신작업과 수신데이터의 검사작업, 수신데이터를 상기 공유 메모리(40)에 저장하는 작업을 담당한다.The TM 20 is responsible for the task operation task for performing or stopping the task of the local node according to the command of the NM 10 and the operation state monitoring task of the task, and the IM 30 is multicasting from another node. Responsible for receiving the received status information, the inspection operation of the received data, and storing the received data in the shared memory 40.

또한, 상기 TM(20)은 상기 NM(10)과 IM(30)의 상태를 검사하여 상기 NM(10)과 IM(30)이 사장된 상태라면 재수행하고, 상기 NM(10)은 로컬 노드의 상태정보를 수집하여 타 노드에 멀티캐스팅하는 작업을 수행하는 한편, 상기 TM(20)과 FSM(50)의 상태를 검사하여 상기 두 매니저가 사장된 상태라면 재수행하게 된다.In addition, the TM 20 checks the states of the NM 10 and the IM 30 and re-executes the NM 10 and the IM 30 if they are in a dead state. While collecting state information and performing multicasting to other nodes, the state of the TM 20 and the FSM 50 is examined and re-run if the two managers are dead.

그리고, 상기 FSM(50)은 로컬 Task의 서비스 파일시스템으로 지정된 파일시스템(예를 들어, 파일시스템 #1)을 지정된 여분의 파일시스템(예를 들어, 파일시스템 #2)에 미러링하는 작업과, 서비스 파일시스템의 검사결과와 서비스 파일시스템의 장애 대응결과를 출력하는 작업을 수행하는데, 상기 FSM(50)을 구성하는 파일시스템 미러링부(51)와, 파일시스템 보고부(52), 상기 TM(20)을 구성하는 파일시스템 장애 확인부(53), 그리고 상기 NM(10)을 구성하는 장애 운용부(54)의 내부구성 및 동작과정을 상세히 설명하면 다음과 같다.The FSM 50 mirrors a file system (for example, file system # 1) designated as a service file system of a local task to a designated extra file system (for example, file system # 2), A file system mirroring unit 51 constituting the FSM 50, a file system reporting unit 52, and the TM An internal configuration and an operation process of the file system failure checking unit 53 constituting the 20) and the failure management unit 54 constituting the NM 10 will be described in detail as follows.

우선, 상기 파일시스템 미러링부(51)는 파일시스템의 변경상태를 실시간으로 감지하는 파일상태 감지부(11)와, 상기 파일상태 감지부(11)로부터 수신되는 정보를 저장하는 변경파일 저장부(12)와, 파일백업에 관한 전반적인 동작을 처리 및 제어하는 처리부(13)와, 백업을 수행할 파일의 리스트를 저장하기 위한 미러링파일 저장부(14)와, 미러링파일의 데이터를 저장하는 미러링데이터 저장부(15)와, 그리고 상기 처리부(13)에서 수행된 백업결과를 출력하는 미러링결과 출력부(16)를 포함하여 구성되어, 상기 파일상태 감지부(11)가 파일시스템 상에서 파일들의 변경을 실시간으로 감지하여(F1) 상태변경이 감지된 파일을 변경파일 저장부(12)로 전송하고(F2), 상기 변경파일 저장부(12)는 지속적으로 수신되는 변경파일 식별자를 저장한다(F3). 이와 같은 상태에서, 백업작업이 시작되면 상기 처리부(13)는 미러링파일 저장부(14)로부터 파일리스트를 읽고(F4) 상기 변경파일 저장부(12)에 저장된 식별자에 해당하는 파일만을 파일시스템으로부터 읽어 이를 상기 미러링데이터 저장부(15)에 전송하여(F5) 저장되도록 한다(F6).First, the file system mirroring unit 51 includes a file state detection unit 11 for detecting a change state of a file system in real time, and a change file storage unit for storing information received from the file state detection unit 11 ( 12), a processing unit 13 for processing and controlling overall operations related to file backup, a mirroring file storage unit 14 for storing a list of files to be backed up, and mirroring data for storing data of the mirroring file. And a storage unit 15 and a mirroring result output unit 16 for outputting a backup result performed by the processing unit 13 so that the file state detection unit 11 can change the files on the file system. Detects in real time (F1) and transmits the file whose state change is detected to the change file storage unit 12 (F2), and the change file storage unit 12 stores the change file identifier continuously received (F3). . In this state, when the backup operation is started, the processing unit 13 reads the file list from the mirroring file storage unit 14 (F4) and only the file corresponding to the identifier stored in the change file storage unit 12 from the file system. Read it and transmit it to the mirroring data storage unit 15 (F5) to be stored (F6).

상기 파일시스템 보고부(52)는, 상기 파일시스템 미러링부(51)의 수행결과를 수신하는 미러링결과 입력부(21)와, 상기 파일시스템 장애 확인부(53)의 수행결과를 수신하는 장애 진단결과 입력부(22)와, 상기 파일시스템 장애 운용부(54)의 수행결과를 수신하는 장애 처리결과 입력부(23)와, 그리고 상기 각 입력부(21, 22, 23)의 입력을 처리하고 결과화면을 출력하는 결과화면 출력부(24)를 포함하여 구성되어, 상기 각 입력부(21, 22, 23)가 주기적으로 상기 공유 메모리(40)의 해당 위치로 접근하여하여 저장된 수행결과를 독출하고(R1∼R3), 상기 결과화면 출력부(24)는 상기 각 입력부(21, 22, 23)로부터 전송되는(R4) 상기 수행결과를 출력하게 된다.The file system report unit 52 may include a mirroring result input unit 21 for receiving a result of the file system mirroring unit 51 and a failure diagnosis result for receiving a result of the file system failure checking unit 53. The input unit 22, a failure processing result input unit 23 for receiving a result of performing the file system failure operation unit 54, and the inputs of the input units 21, 22, and 23 are processed and a result screen is output. And a result screen output section 24, wherein each of the input sections 21, 22, and 23 periodically approaches the corresponding location of the shared memory 40 to read out the stored execution results (R1 to R3). The result screen output unit 24 outputs the execution result transmitted from each of the input units 21, 22, and 23 (R4).

상기 파일시스템 장애 확인부(53)는, 지속적으로 파일시스템의 현재 마운트 상태를 검사하는(C1) 파일시스템 검사부(31)와, 운용체제가 전달하는 파일시스템 관련 에러신호를 검사하는(C2) 시스템 시그널 수신/해석부(32)와, 상기 파일시스템 검사부(31) 및 시스템 시그널 수신/해석부(32)의 결과값으로 파일시스템의 장애를판정(C3)하는 장애 판정부(33)와, 그리고 상기 장애 판정부(33)로부터 전송되는(C4) 장애 판정결과를 상기 공유 메모리(40)에 저장하는(C5) 장애진단 출력부(34)를 포함하여 구성된다.The file system failure checking unit 53 continuously checks the current mount state of the file system (C1), and checks the file system-related error signal transmitted by the operating system (C2). A failure determination unit 33 for determining a failure of the file system (C3) based on the signal reception / interpretation unit 32, the result values of the file system inspection unit 31 and the system signal reception / interpretation unit 32, and And a failure diagnosis output section 34 for storing the failure determination result transmitted from the failure determination unit 33 (C4) to the shared memory 40 (C5).

상기 장애 운용부(54)는, 지속적으로 상기 공유 메모리(40)로부터 파일시스템의 장애상태를 확인하는(H1) 장애 입력부(41)와, 해당 장애의 종류와 처리방법을 결정하는(H2) 장애 종류/처리 결정부(42)와, 미러링 파일시스템으로 파일시스템을 재설정하는(H3)하는 파일시스템 재설정부(43)와, 그리고 장애처리 결과를 상기 공유 메모리(40)에 저장하는(H4) 장애처리 출력부(44)를 포함하여 구성된다.The fault management unit 54 continuously checks the fault state of the file system from the shared memory 40 (H1) and the fault input unit 41, and determines the type and method of the fault (H2). A type / process determination section 42, a file system reset section 43 for resetting the file system to a mirroring file system (H3), and a fault for storing a result of the failure processing in the shared memory 40 (H4) The process output part 44 is comprised.

상기에서, 예를 들어 어느 하나의 노드가 파일시스템 "/app"와 "/data"를 구성요소로 포함하는 Task를 수행중일 때, 상기 파일시스템 미러링부(51)는 지정된 "/app"와 "/data" 파일시스템 각각의 파일상태 변화를 감지하며, 상태변화 감지시 그 파일에 대해서 미러링하도록 지정된 여분의 파일시스템에 동일한 변화를 적용시킨다.In the above, for example, when any one node is executing a task including file systems "/ app" and "/ data" as components, the file system mirroring unit 51 is assigned "/ app" and " / data "Detect changes in each file state and apply the same change to the extra file system specified to mirror the file.

상기 파일시스템 장애 확인부(53)는 로컬 노드의 서비스 파일시스템을 검사하는 부분으로, 파일시스템의 마운트 상태를 검사하는 방법과 운용체제가 발생시키는 시스템시그널을 해석하여 검사하는 방법이 있는데, 상기 장애 판정부(33)는 상기 검사결과가 장애인지 아닌지를 판정하고 그 판정결과는 상기 장애진단 출력부(34)를 통해 상기 공유 메모리(40)에 저장된다.The file system failure checking unit 53 is a part for checking a service file system of a local node, and a method for checking a mount state of a file system and a method for analyzing and analyzing a system signal generated by an operating system. The determination unit 33 determines whether or not the inspection result is a disabled person, and the determination result is stored in the shared memory 40 through the failure diagnosis output unit 34.

상기 파일시스템 장애 운용부(54)는, 장애로 판단된 서비스 파일시스템에 대해서 장애의 종류를 파악하는 작업과 그에 따라 대응하는 작업을 수행하는데, 파일시스템의 장애가 미러링 파일시스템의 교체로 해결되는 장애로 판단되면 상기 파일시스템 재설정부(43)는 상기 TM(20)으로 하여금 서비스 파일시스템을 재설정하도록 명령하는데, 이때에는 미러링 파일시스템(예를 들어, 파일시스템 #2)을 서비스 파일시스템으로 재설정하게 된다.The file system failure management unit 54 performs a task of identifying a type of a failure and a corresponding operation with respect to the service file system determined to be a failure, wherein the failure of the file system is solved by replacing the mirroring file system. If so, the file system reset unit 43 instructs the TM 20 to reset the service file system. In this case, the file system reset unit 43 resets the mirroring file system (for example, file system # 2) to the service file system. do.

그리고, 만약 디스크 케이블(Cable) 장애와 같이 미러링된 파일시스템이 유용하지 않은 경우는 타 노드로 Task를 인계하도록 대응한다.If the mirrored file system is not useful, such as a disk cable failure, it responds to take over the task to another node.

이상에서와 같이, 본 발명에 따른 파일시스템 미러링은 물리적인 RAID를 사용하여 제공할 수 있는 기능으로서, 본 발명을 적용함에 따라 고가용성 시스템(100)의 소프트웨어가 자체적으로 기능을 가지게 될 경우 미러링 대상 파일시스템을 주요 파일시스템으로 선택 구성할 수 있다.As described above, the file system mirroring according to the present invention is a function that can be provided by using physical RAID, and if the software of the high availability system 100 has its own function according to the present invention, the mirroring target You can optionally configure the filesystem as the primary filesystem.

그리고, 종래 고가용성 시스템(100)의 소프트웨어는 서비스 어플리케이션 데몬만이 살아있는(Alive) 상태라면 서비스 어플리케이션이 실제 데이터를 기록하는 데이터 파일시스템의 장애(Fault)는 고려하지 않은 채 정상 서비스로 판단하고 있는데, 본 발명에서는 데이터 파일시스템의 장애를 감지하고 장애 발생시 미러링 파일시스템으로 복구하도록 되어 있어 서비스 어플리케이션의 데몬이 살아있을 뿐 아니라 서비스 어플리케이션의 정상적인 운용이 보장된다.In addition, if only the service application daemon is alive, the software of the conventional high availability system 100 determines that the service is a normal service without considering a fault of the data file system in which the service application records the actual data. In the present invention, the failure of the data file system is detected and restored to the mirroring file system when a failure occurs, so that the daemon of the service application is alive and normal operation of the service application is guaranteed.

또한, 종래에는 고가용성 시스템(100)의 소프트웨어가 장애를 발견하고 대응하기 위해 상기 NM(10)이 로컬 Task의 장애를 발견하는 과정(약, 5초), 상기 TM(20)에 Task를 중지하도록 명령하는 과정(약, 5초), 상기 TM(20)이 Task를 중지하는 과정(약, 15초), 다른 노드가 로컬 Task를 수행하도록 협상하는 과정(약, 10초), 협상이 설정된 다른 노드가 Task를 시작하는 과정(약, 15초)을 수행하여야 하므로, Task의 장애로 타 노드로 Task가 인계되는 데에는 약 50초가 소요된다. 그러나, 본 발명에서는 고가용성 시스템(100)의 소프트웨어가 장애를 발견하고 대응하기 위해 로컬 Task의 장애를 발견하는 과정(약, 5초)과 TM(20)에 Task를 재설정하는 과정(약, 10초)만 수행하면 되므로, 종래 대비 소요시간이 70%나 단축된다.In addition, in the prior art, the software of the high availability system 100 detects a failure and responds to the process of the NM 10 detecting a failure of a local task (about 5 seconds), and stops the task in the TM 20. Command (approximately 5 seconds), process TM 20 stops the task (approximately 15 seconds), negotiate another node to perform a local task (approximately 10 seconds), negotiation is set Since another node has to perform the task (about 15 seconds) to start the task, it takes about 50 seconds to take over the task to another node due to the failure of the task. However, in the present invention, in order for the software of the high availability system 100 to detect and respond to a failure, a process of detecting a failure of a local task (about 5 seconds) and a process of resetting a task on the TM 20 (about, 10) Second) only need to be performed, the time required compared to the conventional 70% is reduced.

한편, 본 발명의 적용으로 파일시스템에 데이터를 동시에 기록하는데 이중 입출력 발생으로 현재보다 2배의 오버헤드가 발생할 수 있지만, 이는 백업 파일시스템에 데이터를 기록하는 방법론(methodology)에 버퍼링(Buffering)과 같은 기능을 적용하여 줄일 수 있다.On the other hand, the application of the present invention to write data to the file system at the same time may occur twice the overhead than the current due to the occurrence of double input and output, but this is the buffering (Buffering) and the methodology (methodology) It can be reduced by applying the same function.

이상 전술한 본 발명의 바람직한 실시예는 예시의 목적을 위해 개시된 것으로, 당업자라면 이하 첨부된 특허청구범위에 개시된 본 발명의 기술적 사상과 그 기술적 범위 내에서, 다양한 다른 실시예들을 개량, 변경, 대체 또는 부가 등이 가능할 것이다.The above-described preferred embodiments of the present invention are disclosed for purposes of illustration, and those skilled in the art can improve, change, and substitute various other embodiments within the technical spirit and scope of the present invention disclosed in the appended claims below. Or addition may be possible.

상기와 같이 구성되어 이루어지는 본 발명에 따른 고가용성 시스템의 파일시스템 미러링장치 및 미러링방법은, 데이터 파일시스템의 장애 복구로 서비스 어플리케이션의 운용 신뢰도를 향상시키고, 디스크의 효율적 구성과, 장애 대응시간이단축되어 요청된 서비스를 보다 신속히 제공할 수 있는 매우 유용한 발명인 것이다.The file system mirroring apparatus and the mirroring method of the high availability system according to the present invention configured as described above improve the operational reliability of the service application by failback of the data file system, reduce the effective configuration of the disk and reduce the response time of the failure. It is a very useful invention that can provide the requested service more quickly.

Claims

In a high availability system consisting of multiple nodes,

A task manager performing a task operation task for performing or stopping a task of a local node and a task operation state monitoring task;

An information manager for performing a task of receiving multicasted state information from another node, a task of inspecting a received data, and storing the received data in a memory;

A node manager which collects state information of the local node and multicasts the state information to another node; And

And a file system manager configured to mirror a file system designated as a service file system of a local business to a designated file system, and output a result of the service file system check and a result of responding to a failure of the service file system. File system mirroring device in a high availability system.

The method of claim 1,

And each of the plurality of nodes is interconnected by a heartbeat for information transmission.

The method of claim 1,

The task manager comprises a file system failure checking means for checking a service file system of a local node to determine whether there is a failure and storing the determination result in the memory.

The node manager is configured to include a file system failure management means for determining a processing method for the file system determined as the failure and storing a result of the failure processing according to the determined processing method in the memory.

The file system manager may further include: file system mirroring means for continuously detecting a file state change of each file system, mirroring the file system, and storing the mirroring result in the memory; And

And a file system reporting means for receiving the processing results of the file system mirroring means, the file system failure checking means, and the file system failure operating means through the memory, processing them, and outputting a result screen. File system mirroring device in a high availability system.

The method of claim 3, wherein

The file system failure checking means may include: a file system checking unit for continuously checking a current mount state of the file system;

A system signal reception / interpretation unit for checking a file system related error signal transmitted from an operating system;

A failure determination unit that determines whether or not the file system is damaged from the inspection result values of the file system checking unit and the system signal receiving / interpreting unit; And

And a failure diagnosis output unit for storing the determination result of the failure determination unit in the memory.

The method of claim 4, wherein

The file system fault management means may include a fault input unit configured to continuously check whether the file system fails from the memory;

A failure type / process determination unit for determining a failure type and a processing method according to the file system determined as a failure according to the confirmation;

A file system reset unit for resetting the file system to the mirroring file system according to the determined processing method; And

File system mirroring apparatus of the high availability system, characterized in that it comprises a failure processing output unit for storing the result of the failure processing according to the determined processing method in the memory.

The method of claim 5,

The file system mirroring means may include: a file state detection unit configured to detect a change state of the file system in real time;

A change file storage unit for storing the information transmitted from the file state detection unit;

A mirroring file storage unit for storing a list of files to be backed up;

A mirroring data storage unit for storing data of the mirroring file;

A processing unit controlling overall details of the file backup; And

And a mirroring result output unit for outputting a backup result performed by the processing unit.

The method of claim 6,

The file system report means may include: an input unit configured to receive, through the memory, a result of performing the file system mirroring means, the file system failure checking means, and the file system failure operating means; And

And a result screen output unit for processing the received results and outputting the result screens.

In the service method of a high availability system composed of a plurality of nodes,

A first step of performing a requested task based on task state information stored in a memory of each node and state information of each node;

A second step of detecting a file state change of each of the file system which is the business component and mirroring the file to a designated file system during the work execution;

Determining whether the file system is in an error state and storing a result of the determination in a memory; And

And a fourth step of handling the failure according to the failure type of the file system determined as the failure.

The method of claim 8,

In the fourth step, in the case where the failure type of the file system determined to be a failure is solved by replacing a mirroring file system, the file system of the high availability system comprises resetting the file system to the mirrored file system. Mirroring method.

The method of claim 8,

In the fourth step, if the failure type of the file system determined to be a failure is a disk cable failure, the file system mirroring method of the high availability system, characterized in that the task is handed over to another node.