A FAULT PROTECTION SCHEME
Field of the Invention The present invention relates to an improved 1 :N fault protection scheme.
Background to the Invention
Fault tolerance is the ability of a system to continue correct performance of its tasks after the occurrence of hardware or software faults. The physical 0 replication of hardware is perhaps the most common form of fault tolerance used in systems. In an active approach to hardware redundancy, fault tolerance is achieved by detecting the existence of faults and performing some action to remove faulty hardware from the system. In other words, active techniques use fault detection, fault location, and fault recovery in an attempt to achieve fault tolerance. 5 An example of an active approach to hardware redundancy is 1 :1 standby sparing, in which one unit operates as a spare to replace a primary unit when it fails. An extension to this scheme is 1 :N standby sparing, where one unit operates as a spare to replace any one unit in a group of N similar units when that unit fails. A 1:N fault protection scheme can be a more cost effective solution since it does o not require the same level of redundancy as a 1 : 1 fault protection scheme. This is particularly so in cases where the cost and/or dimensions of standby units is critical. However, a 1:N protection scheme does require a mechanism for appropriately connecting the standby unit following the failure of any one of the primary units, which inevitably introduces its own reliability issues. 5
Summary of the Invention
According to a first aspect of the present invention, a fault protection unit comprises an array of nxm switches, where n and m > 2, each switch being arranged to couple a respective signal received at an input port of the switch to an o output port of the switch, wherein a normally unused output port of a first switch in the array is coupled to a normally unused input port of a second switch in the array
so that a change in switch state of one switch completes a signal protection path through the other switch for the respective signal received at an input port of the one switch.
Preferably, each switch in the array is a 2x2 switch. The switches would normally be in one state, for example the cross state, and one or more switches are switched to the bar state to complete a signal protection path.
Preferably, the switches are optical switches. More preferably, the switches are opto-mechanical switching devices, in which the switch state of the device is controlled by the energisation of a relay coil within each device. In one example, the signal protection path for a signal received at one first switch is completed by the operation of only that switch. However, dependent on the implementation of the fault protection scheme, the signal protection path for a signal received at one switch may need to be completed by the operation of that switch and one other switch in the array. According to a second aspect of the present invention, a device comprises
N working units and a fault protection unit, the fault protection unit including an array of N nxm switches, where n and m ≥2, each switch having an input port that receives a signal and couples the signal to a respective working unit via an output port of the switch, wherein a normally unused output port of one switch in the array is coupled to a normally unused input port of another switch in the array so that a change in switch state of one switch completes a signal protection path for the signal received at its input port through said other switch.
Preferably, the device comprises a standby protection unit, wherein the signal protection path is coupled to the protection unit. A change in switch state of a switch is made in response to a detected failure of the respective working unit so that the standby protection unit can replace the faulty working unit.
Preferably, the working units are optical transponders.
Preferably, the device is an optical node for a communications network. More preferably, the optical node comprises an optical add/drop multiplexer (OADM).
According to a third aspect of the present invention, a method of providing fault tolerance in a system having N working units, where N ≥2, comprises the step
of establishing a fault protection path when a working unit fails in order to re-route a signal normally coupled to the failed working unit to a protection unit by changing the switch state of one or more switches in a concatenated array of switches so that the signal propagates along a path connecting normally unused ports of switches 5 between the faulty working unit and the protection unit.
Preferably, the switches form an Nx1 array, each switch being associated with a respective working unit and being arranged to couple a signal to the working unit through the switch in normal operation.
o Brief Description of the Drawings
Examples of the present invention will now be described in detail with reference to the accompanying drawings, in which:
Figure 1 shows an example of an optical routing node;
Figure 2 shows an example of the general form of a 1:N transponder 5 protection scheme in accordance with the present invention;
Figures 3 and 4 illustrate the operation of a transponder protection unit in accordance with the present invention;
Figure 5 shows the switching mechanism for an optical switch in the transponder protection unit shown in Figures 3 and 4; and, o Figures 6 and 7 show another example of a transponder protection unit in accordance with the present invention.
Detailed Description
Figure 1 illustrates an example of an optical routing node 10. Traffic arriving 5 at a number of input fibres 11 can be routed to a number of output fibres 12: this is referred to as an optical cross-connect (OXC) function 13. Furthermore, the optical routing node 10 incorporates an optical add/drop multiplexing (OADM) function 14 that provides flexible traffic management at the interface 15 with local clients 16. The OADM enables a variety of protocols to be transported across a o network in a transparent manner.
In operation, some traffic arriving at the input fibres 11 of the optical routing
node 10 will traverse the node through the OXC 13 without being dropped to local clients 16. This is termed "through traffic". The remainder is directed through the OADM 14, as "drop traffic". Traffic that originates from a local client 16 is termed "add traffic", and is routed through the OADM 14 to the output fibres 12 of the OXC 5 13.
A number of optical transponders 17 are provided at the interface 15 of the OADM 14 of the optical routing node 10. These transponders 17 may either be fixed-wavelength or tuneable, depending upon requirements. The transponders 17 provide a gateway between the core network and the clients requiring access to it. i o They ensure that the data rates, data format, power levels and wavelengths of the client signals are groomed appropriately for transport through the network. This is achieved through optical-electrical-optical conversion. As will be described below, it is desirable to provide a transponder protection scheme to switch client traffic to a protection path (not shown) when a working transponder fails.
15 A transponder failure may occur when any one or more of its constituent components cease to function correctly. Any behavioural change in a component that takes one of the transponder performance parameters outside defined boundaries constitutes a unit failure. Component failures may generally be grouped into two major categories: gradual degradation and sudden failure. An
2 o example of the former is the reduction in the optical power output of a laser for a fixed drive current, over its lifetime. An example of the latter is the spontaneous short-circuit failure of a transistor.
Figure 2 shows an example of the general form of a 1 :N transponder protection scheme 20 in accordance with the present invention. In this example,
25 there are four working transponders (not shown) each having an input port 21 and an output port 22, respectively, connected to a transponder protection unit 23. A single protection transponder (not shown), also having an input port 24 and an output port 25, is provided for each of the four working transponders, to (or from) which client traffic can be directed by the transponder protection unit 23 in the
30 event of a detected failure of one of the working transponders. The transponder protection unit 23 provides the necessary local control to implement the protection
scheme, via the detection of local status alarms.
As shown, the working transponders are each connected to a respective 2x2 optical switch 26 in an array. Preferably, the switches are opto-mechanical since these devices are proven technology with high reliability. Furthermore, this type of optical switch 26 can have a low insertion loss of only around 0.2 dB. The normal state of the optical switches is the "cross" state, with the unused optical output port of each switch 26 connected to the unused optical input port of the adjacent switch. An unused optical port of the fourth switch 264 in each set is connected to the respective port of the protection transponder. Figures 3 and 4 illustrate the operation of the transponder protection unit 20 in more detail. In simple terms, the detected failure of one of the working transponders 212 causes the optical switch 262 associated with the failed transponder to operate, changing from "cross" to "bar", thereby providing an optical protection path 27 to the protection transponder 24, through a concatenated chain of any adjacent switches 263 and 264 in the array. Operation of an optical switch to reconfigure the optical path can be triggered by one of a number of local status alarms indicating some form of failure. The switch would then remain in that state until the associated working transponder had been repaired or replaced.
Figure 5 shows a simplified schematic illustrating how an optical switch 31 in a transponder protection unit 30 can be triggered in the event of a detected failure of the associated working transponder 32. For simplicity, only two working transponders are shown.
Each transponder 32 includes a dedicated microprocessor 33 that controls the function of the transponder and which is arranged to monitor the status of the transponder. In the event of a detected failure of the transponder, the microprocessor 33, implementing a predetermined logic function, generates a control signal SWCTRL that is coupled via a AND gate 36 to an input 37 of the associated optical switch 31 in the transponder protection unit 30. A change in the control signal SWCTRL causes the relay coil (not shown) within the optical switch 31 to energise, thereby changing the state of the switch from "cross" to "bar". In addition, a microprocessor 34 on board the transponder protection unit 30 receives the control signal SWCTRL and, implementing a predetermined logic function,
generates a number of inhibit signals that are coupled to the AND gates associated with an associated one of each of the other optical switches 31 in the array. With these signals lowered, the transponder protection unit 30 inhibits the operation of the other switches in the array to prevent further switching should one of the remaining working transponders fail. In addition, a control signal PTCTRL is directed to the protection transponder 35, enabling it to emulate the status of the failed working transponder. This signal is generated on the basis of stored information, which may be periodically updated, defining operational parameters of each of the working transponders, for example the wavelength of the optical output required.
An advantage of this local approach to fault protection is that it is extremely fast since it relies on local communication between devices rather than utilising a solution implemented at a higher level in management software. In high bit rate data communications, even relatively short delays can cause a loss of client traffic, which is clearly undesirable.
Figures 6 and 7 show an extension to the architecture of the transponder protection scheme shown in Figures 3 and 4. This architecture allows the use of any one of five working transponders 40 as a protection transponder 41. As shown in Figure 7, this protection scheme requires two optical switches 42 to operate, rather than one, in the event of a transponder failure. One of the working transponders 0! is pre-designated as the protection transponder 41 by node management software. The traffic throughput of each transponder is prioritised according to the class of service required by each client and, in the event of a failure, the transponder dealing with the lowest priority traffic behaves as the protection transponder. The appropriate pair of switches 42., and 424 are operated and traffic associated with the failed transponder 404 is then protected at the expense of the traffic previously carried by what is now functioning as the protection transponder 41.