Summary of the invention
In view of the above problems, the present invention aims to provide and a kind ofly can effectively control and can keep the integrated platform for IT system disaster recovery of the consistency of data and the continuation of business to the business host implementation in the distributed system of strange land.
Integrated platform for IT system disaster recovery of the present invention can effectively solve the concentration problem of each main frame in the distributed I T system of strange land, and to business, host implementation is effectively controlled, and by operation flow function flexibly, ensures operational sustainability.The integrated platform of one-tenth IT system of the present invention disaster recovery realized and each business main frame between real time communication, data are recovered and business is switched and unified in operation flow.
Integrated platform for IT system disaster recovery of the present invention, for strange land distributed I T system is managed concentratedly, this strange land distributed I T system possesses the Liang Tai administrative center of a plurality of local service main frames, a plurality of cross regional business main frame, management local service main frame and cross regional business main frame, and this integrated platform comprises:
System management module, for monitoring in real time and manage described strange land distributed I T system, so that the information that described administrative center can each business main frame of Real-time Obtaining;
System communication module, be deployed in described each local service main frame, each cross regional business main frame, administrative center, and for realize between communicating by letter between each local service main frame and administrative center, each cross regional business main frame and administrative center communicate by letter and administrative center between communication;
Data simultaneous module, for realizing the real time data synchronization of described strange land distributed I T system;
Data comparative analysis module, for realizing the consistency checking of described strange land distributed I T system data;
Data memory module, for realizing the data storage of described strange land distributed I T system;
Operation flow module, for realizing all kinds of operation flows of described strange land distributed I T system;
Business recovery module, for realizing the adapter of cross regional business flow process the described strange land distributed I T system generation disaster in the situation that;
Security audit module, is encrypted, deciphers for reception and the transmission of the message between each business main frame He Ge administrative center.
Preferably, described system management module, described data simultaneous module, described data comparative analysis module, described data memory module, described operation flow module, described business recovery module, described security audit module are all associated with described system communication module, described data simultaneous module, described data comparative analysis module, described data memory module, described operation flow module all with described business recovery module relation.
Preferably, described system communication module is for realizing information receiving and transmitting, message parse, command execution, the result feedback between each business main frame He Ge administrative center.
Preferably, between described each administrative center and each unit module in described each business main frame, by WTC interface, being connected Tuxedo/Q serves.
Preferably, described security audit module is encrypted by using WSL to insert message the sending and receiving of the message between the unit module in administrative center and business main frame.
Preferably, the generation that described security audit module is inserted use-z in the process of message by WSL is carried out rsa and is encrypted and arrange.
Preferably, described data comparative analysis module can be carried out data location and analysis and carry out accordingly data and remedy according to the otherness of data.
Preferably, described business recovery module comprises: for starting the adapting system application in strange land and the first submodule of database; For obtaining the second submodule of the time of disaster switching; The 3rd submodule switching for carrying out network.
Preferably, described data simultaneous module synchronously copies with the data that realize between local service system for disposing mirrored storage in local service system.
Preferably, described data simultaneous module is for realizing the data asynchronous replication between local mirrored storage and strange land storage.
Preferably, described data simultaneous module also for covering local data base by strange land data after local system recovery business.
Preferably, described Liang Tai administrative center is that function is identical and backup each other.
Preferably, described administrative center adopts the authentication of ldap server mirror image.
The technical problem that the present invention mainly solves is as follows: how (1) is to the centralized management of strange land distributed system and control; (2) how to realize the fastext switching of strange land distributed system, guarantee the continuation of business when disaster occurs; (3) how to realize the automatic processing of business; (4) how to monitor the state of controlled end; (5) how to compare the consistency of two places Service Database.
For above-mentioned technical problem (1), the technological means adopting is: the information receiving and transmitting mechanism by between administrative center and each controlled end (used Tuxedo /queue of Q reliable news), realizes the control to all business main frames.
For above-mentioned technical problem (2), the technological means adopting is: for every suit operation system of operation, set up a corresponding strange land and switch and switchback flow process, and remedying module with data combines, the sustainability of business when the former guarantees disaster generation, the integrality of data when the latter guarantees disaster generation.
For above-mentioned technical problem (3), the technological means adopting is: because administrative center can realize the management of all business main frames and control, so the daily operation of operation system can realize automation, by administrative center, send the fixedly job instruction of flow process and realize; For more specific business demands, also can be realized by a set of arbitrary procedure of business personnel oneself definition in addition, its mode is versatile and flexible.
For above-mentioned technical problem (4), the technological means adopting is: in Liang Tai administrative center, dispose respectively the WSL service of corresponding tuxedo, simultaneously on every controlled end, WSNADDR environmental variance is set, the value of environmental variance is that the WSL of tuxedo service end issue serves (ip address: port numbers), be connected to tuxedo service end for tuxedo client-side program (controlled application end program), corresponding address.If connection failure, interval, after 30 seconds, reconnects.Meanwhile, controlled end can regularly send heartbeat message to administrative center, and administrative center judges that whether the state of controlled end is normal accordingly.
For above-mentioned technical problem (5), the technological means adopting is: pass through data comparison module, can compare any table in the Service Database of two places or a table set (multiple tables), manner of comparison is various, has 1] comparison to table record number; 2] comparison of some field in his-and-hers watches; 3] his-and-hers watches carry out the comparison of MD5 algorithm; By these data manner of comparison, whether unanimously can accurately find out local and remote side Service Database, inconsistently where can tell its otherness of user.
In sum, integrated platform for IT system disaster recovery of the present invention can be realized the real time communication between each business main frame, can data recovery and business be switched unified in operation flow, therefore, the present invention can provide a kind of and when strange land distributed I T system generation disaster, hardware recovery, data are recovered, the integrated platform for IT system disaster recovery of the effective combination of business recovery three.
Embodiment
What introduce below is some in a plurality of embodiment of the present invention, aims to provide basic understanding of the present invention.Be not intended to confirm key of the present invention or conclusive key element or limit claimed scope.
Fig. 1 means that the integrated platform for IT system disaster recovery of the present invention manages the organigram of strange land distributed I T system concentratedly.As shown in Figure 1, this strange land distributed I T system possesses a plurality of local service main frames and (in this locality, possesses business main frame 1, business main frame 2, business main frame 3 ... ..), a plurality of cross regional business main frames (possess business main frame 4, business main frame 5, business main frame 6 in strange land ... ..), the Liang Tai administrative center of management local service main frame and cross regional business main frame.Local service main frame, cross regional business main frame and Liang Tai administrative center are associated by communication line.Wherein, Liang Tai administrative center management function identical, sealed each other.This strange land distributed I T system comprises above-mentioned all business main frames and administrative center." controlled end " that will mention is in the present invention the module being deployed on all business main frames.Like this concerning of the present invention for the integrated platform of IT system disaster recovery, the operation that all business main frames can be accepted to be correlated with from the instruction of administrative center (therefore, here also can by the main frame of having disposed this unit module referred to as " controlled end ").
Fig. 2 means the organigram of the integrated platform for IT system disaster recovery of the present invention.
As shown in Figure 2, the integrated platform for IT system disaster recovery of the present invention comprises: for monitoring in real time and manage described strange land distributed I T system so that described administrative center can each business main frame of Real-time Obtaining the system management module 100 of information; Be deployed in described each local service main frame, each cross regional business main frame, administrative center and for realize between communicating by letter between each local service main frame and administrative center, each cross regional business main frame and administrative center communicate by letter and administrative center between the system communication module 200 of communication; For realizing the data simultaneous module 300 of the real time data synchronization of described strange land distributed I T system; For realizing the data comparative analysis module 400 of the consistency checking of described strange land distributed I T system data; For realizing the data memory module 500 of the data storage of described strange land distributed I T system; For realizing the operation flow module 600 of all kinds of operation flows of described strange land distributed I T system; For realize the business recovery module 700 of the adapter of cross regional business flow process in the situation that of described strange land distributed I T system generation disaster; For the security audit module 800 that reception and the transmission of the message between each business main frame He Ge administrative center are encrypted, are deciphered.
System management module 100, data simultaneous module 300, data comparative analysis module 400, data memory module 500, operation flow module 600, business recovery module 700, security audit module 800 are all associated with system communication module 200.Data simultaneous module 300, data comparative analysis module 400, data memory module 500, operation flow module 600 are all associated with business recovery module (700).
This strange land distributed I T system can be monitored and manage to system management module 100 in real time, by the reliable message communication mechanism between each associative cell module and administrative center in system, makes the information that described administrative center can each business main frame of Real-time Obtaining.
System communication module 200 is deployed in intrasystem each unit module of this strange land distributed I T and administrative center, and for real-time messages transmitting-receiving, message parse, command execution, result feedback, be the basis of realizing IT system disaster recovery.Between administrative center and each unit module, by WTC interface, connect Tuxedo/Q service.The WEB of WTCShi BEA company supports the fastening means between product Weblogic and middleware product Tuxedo, full name Weblogic Tuxedo Connector.WTC makes between Weblogic and Tuxedo, to have two-way access ability, the middleware product of Tuxedo Ye Shi BEA company, Tuxedo/Q parts can be realized in reliable mode, it allows message to be stored in lasting medium after queuing up, if disk or non-lasting medium are as in internal memory, so that for later.In the present invention, administrative center has disposed Weblogic platform and java application, each unit module (being each controlled end) deploy Tuxedo /queue of Q reliable news, be used for information order-> execution-> that receiving management center sends to deposit results messages in response queue.And communicating by letter between Tuxedo between the Weblogic of administrative center and each unit module used, it is exactly WTC interface.
Administrative center sends relevant command messages, and receives the message that returns results of carrying out.The command messages at each unit module receiving management center, and send the message that returns results of carrying out.In the time of implementation overlength of order or the situation of network failure, Tuxedo/Q can provide reliable messenger service, has guaranteed the integrality that message is transmitted.A mechanism, provides the more flexible more reliable asynchronous execution method simultaneously than tpacall () like this, has met the needs of strange land distributed system.Therefore,, in the present invention by adopt Tuxedo/Q between administrative center and each controlled end, can strange land distributed system be continued centralized management and be controlled.
Data simultaneous module 300, data comparative analysis module 400, data memory module 500 have built the assurance of data consistency in the distributed system of strange land jointly, data simultaneous module 300 and data module 500 storages are for realizing the real-time synchronization of the operation system data that in system, strange land distributes, the checking of the operation system data consistency that data comparative analysis module 400 distributes for strange land also can be carried out data location and analysis according to otherness, carries out relevant data and remedies.Data simultaneous module 300, data comparative analysis module 400, data memory module 500 are the basis of business recovery module 600.
Operation flow module 600 will be for effectively realizing every operation flow of strange land compartment system, procedure information is based on basic elements such as flow process, step, function, combination functions, adopt the formal definition of orderly functional steps in database, and can pass through script custom-modification.Administrative center's program is read procedure information and is explained and carry out, and completes the execution of traffic flow function, realizes fixing routine work flow process in system, and these operation flows are referred to as fixedly flow process.In addition, for dealing with, process some interim system requirements, as equipment replacement, line upkeep, troubleshooting etc., need the random business function of carrying out a series of necessity, derive thus the function of arbitrary procedure, support is carried out for the selection of defined relevant specific function, is for the supplementing of fixed service flow process, and is a kind of control mode of operation system very flexibly.
The assurance of business recovery module 700 based on data consistency, the module by operation flow represents, and has guaranteed that strange land distributed system is when the abnormal conditions such as disaster occur, and can realize fast cross regional business flow process and take over, and has guaranteed the continuation of data and business.Business recovery module 700 comprises following submodule: for starting the operation system application in strange land and the first submodule of database; For obtaining the second submodule of the time of disaster switching; The 3rd submodule switching for carrying out network.Give an example: in situation about breaking down such as the transfer service system at center, Shanghai, need to be switched to center, Beijing at once, now the first submodule of business recovery module 700 can start adapting system application and the database in strange land, and judge whether to possess switching condition, the second submodule of business recovery module 700 obtains the time point (preparing against follow-up the remedying of data of carrying out) that disaster is switched, and the 3rd submodule of business recovery module 700 is carried out the work such as network switching.
If these flow processs are decomposed, each step in flow process is exactly in fact the control to certain business main frame, automatically complete the operational order that administrative center sends, after this flow performing is complete, concerning user, be all transparent, and actual trading processing place has become Beijing by Shanghai, guaranteed the continuation of business.After center, Shanghai operation system is recovered, also corresponding a set of service switchback flow process, can normally deliver to transaction Shanghai after switchback and process.And data are remedied flow process can remedy center, Shanghai by switching the transaction of processing in Beijing during this period of time.
Security audit module 800 is for avoiding the plaintext transmission of message between administrative center and unit module, and message has been increased and encrypt arranged, and inserts the parameter of uses-z in the process of message carry out rsa encryption setting by WSL.And, in the reception of message with in sending, all with encrypted form, to carry out, message is deciphering automatically again after receiving, can guarantee like this safety that data are transmitted.Meanwhile, administrative center adopts unified LDAP(Lightweight Directory Access Protocol) server carries out authentication.Operator's authority configuration information is taken from ldap server equally, first checks associated authorization when carrying out various functions operation.Only have authorized user could carry out the function of every operation flow.In addition, security audit module 800 also records and audits log-on message, Operation Log, flow performing.
In the situation that integrated platform for IT system disaster recovery of the present invention shown in Fig. 2 is managed concentratedly the strange land distributed I T system shown in Fig. 1, on the business main frame of disposing for local and remote side, there is controlled end, local and remote side is disposed administrative center separately simultaneously, each administrative center all realizes communication with all business main frames of local and remote side, to reach the object of system management, business realizing and recovery.
Fig. 3 means that the data that the integrated platform for IT system disaster recovery of the present invention carries out are stored, the schematic diagram of data synchronization processing.As shown in Figure 3, in the integrated platform local service system of IT system disaster recovery of the present invention, disposed a set of mirrored storage, realized that data between local main business system synchronously copy and data are bidirectional replication.Between the storage of local mirrored storage and strange land, realized data asynchronous replication and data are unidirectional replication.Such data synchronization mechanism has guaranteed, when local service system or data generation disaster, can in strange land, realize business recovery rapidly, and data can not lost.After local system recovery business, strange land data can be covered in local data base again.
About " data covering ", can understand like this, for example, local service system (Shanghai) continues an example of mentioning above when describing " business recovery module 700 ": when need to be switched Beijing, can record the time point T0 switching, after switching, all transaction reality has been transformed into center, Beijing and has processed.After center, Shanghai recovery business, can carry out the switchback flow process of corresponding operation system, also can record the time point T1 of switching simultaneously, follow-uply will remedy flow process by executing data, because time difference of T1-T0 is exactly that transaction is in the time period of Beijing center processing.So remedying flow process, data now can start, Beijing administrative center can send instruction, from the transaction data base at center, Beijing, read data (namely strange land data) during this period of time, fiber optic network by this segment data by Beijing to Shanghai passes to Shanghai administrative center, and then Shanghai administrative center can be inserted into these data in corresponding Service Database.Like this, no matter for operation system or user, transaction data is all complete, just as not switching.
Fig. 4 means that the unit module under the management of the integrated platform for IT system disaster recovery of the present invention is that controlled end is to the handling process of information receiving and transmitting (idiographic flow that namely system communication module 200 carries out).As shown in Figure 4, at a unit module, first, and process initialization, allocation space, and generate one with the chained list of head node.On the server of Liang Tai administrative center, dispose respectively the WSL service of corresponding tuxedo, simultaneously on every client-server, WSNADDR environmental variance is set, the value of environmental variance is that the WSL of tuxedo service end issue serves (ip address: port numbers), be connected to tuxedo service end for tuxedo client-side program (controlled application end program), corresponding address.If connection failure, interval, after 30 seconds, reconnects.
Chained list is mainly used in depositing the state information of the current executive process of carrying out, the content of each node comprises that process number, message function number, message uniqueness mark, set of parameter values, process start time and the whether available sign (0 for available, and 1 is unavailable) of this node carried out.After executive process is finished dealing with, host process can empty the nodal information in chained list corresponding to this executive process, and availability sign is set to 0, for later.
The validity of judgement message is mainly that the value (value) of the checking mark of the application system in command messages (system), function number (func_id), IP address (ip), time (time), type of message (type) etc. is carried out to validity judgement.
Function treatment script carries out feature operation while processing, can be according to different situations, and whether to processing, how this situation such as processes judge and determines, avoids the unnecessary operation of mistake, returns to corresponding value.Return value is 0 presentation function operational processes success, non-zero expression unsuccessfully.
When the message receiving is interrupt message, host process sends interrupt signal to corresponding executive process, and executive process receives after interrupt signal, stops circulation, no longer carries out operation below.
In Fig. 4, the execution flow process that left-hand component is host process.Host process is a cyclic program, mainly completes transmission heartbeat message, accepts message and judges the validity of message, according to message content, produces corresponding executive process, and the operation of the executive process carried out of management.
According to the above-mentioned management of the integrated platform for IT system disaster recovery of the present invention, quick, simple, effective disaster recovery mechanism can be provided, in design object, reach RPO=0, RTO=0, when actual disaster occurs, also can within the shortest time, provide lasting business service.Aspect disaster recovery, at present industry generally acknowledges have three desired values must effort.The one, recovery time, how long enterprise does not have IT if standing, in the state of stopping doing business; The 2nd, how long network can recover; The 3rd, the recovery of service layer.In whole recovery process, the measurement index of most critical has two: one is RTO, and another is RPO.So-called RTO(Recovery Time Objective) after referring to that disaster occurs, from IT system when machine causes in service pause, to IT system, return to and can support all departments' running, recover in operation, the time period between these 2 is called RTO.So-called RPO(Recovery Point Objective) refer to from system and application data, realize returning to and can support all departments' business running, what kind of renewal degree system and creation data should return to.This renewal degree can be the Backup Data of upper a week, can be also the real time data of last transaction.Visible, the management of the integrated platform for IT system disaster recovery of the present invention can provide lasting business service when there is disaster within the shortest time.
And according to the integrated platform for IT system disaster recovery of the present invention, controlled end (being each unit module) can be safeguarded the reliable robustness moving that is connected, keeps abundance keeping with server end automatically.
And, according to the integrated platform for IT system disaster recovery of the present invention, can effectively monitor the running status of the controlled end of all deployment (being each unit module), running status for miscellaneous service flow process provides effective monitoring, meanwhile, can provide management maintenance mode for configurable parameter.
And, according to the integrated platform for IT system disaster recovery of the present invention, for operation flow, can realize flexible configuration and combination, such as supporting that configuring by parametrization the generality of dealing with business function changes; For the mistake occurring in flow performing, the function of abnormality processing is provided on stream, realize for abnormal effective processing.
And in order to guarantee the RTO of strange land distributed system, the performance requirement of RPO, the integrated platform for IT system disaster recovery of the present invention is realized miscellaneous service flow process by design, the flow process completing under daily, inside the plan and disaster scenario is controlled.Fixed service flow process and the arbitrarily realization of functional sequence are the Core Features that calamity provides for application system.For effectively realizing every operation flow, procedure information, based on basic elements such as flow process, step, function, combination functions, adopts the formal definition of orderly functional steps in database, and can pass through script custom-modification.Administrative center's program is read procedure information and is explained and carry out, and complete the execution of traffic flow function, these operation flows are referred to as fixedly flow process.For dealing with, process some interim operation system requirements in addition, as equipment replacement, line upkeep, troubleshooting etc., need the random business function of carrying out a series of necessity.
Above example has mainly illustrated the integrated platform that the present invention is directed to IT system disaster recovery.Although only some of them the specific embodiment of the present invention is described, those of ordinary skills should understand, and the present invention can be within not departing from its purport and scope implements with many other forms.Therefore, the example of showing and execution mode are regarded as illustrative and not restrictive, and in the situation that not departing from spirit of the present invention as defined in appended each claim and scope, the present invention may be contained various modifications and replacement.