US20140355453A1

US20140355453A1 - Method and arrangement for fault analysis in a multi-layer network

Info

Publication number: US20140355453A1
Application number: US14/367,735
Authority: US
Inventors: Ying Zhang; Neda Beheshti-Zavareh; Alisa Devlic; Catalin MEIROSU
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2011-12-21
Filing date: 2012-01-16
Publication date: 2014-12-04
Also published as: US20130192469A1; WO2013095247A1; EP2795841B1; US9259671B2; EP2795841A1; EP2795841A4

Abstract

The invention relates to a method for fault analysis in a multi-layer network. The method includes receiving a first set of performance measurements occurring in response to the fault and representing at least a first and a second layer in a multi-layer communications network. A probable set of network elements affected by the fault are localized. The type of fault is inferred from one or more symptoms in the performance measurements. The root cause of the failure is determined from a combination of information on probable set of network elements and inferred type of fault.

The invention also relates to a network management node for carrying out the method. The node includes a root cause analyzer to detect affected network elements and fault type upon network failure, wherein affected network elements are determined in a spatial localization unit and the fault type is determined in a fault type inference unit.

Description

TECHNICAL FIELD

The present invention relates to a method and a network node for diagnosing one or more faults in a multi-layer communications network.

BACKGROUND

Fast failure detection and failure diagnosis is an important area in network management. After a failure is detected and data switched to alternative paths, there is a need to quickly localize the failure so that measures may be taken to replace or repair the faulty network element. A communications network involves a large set of distributed hardware and software components. Errors may occur in each device in the network despite best practices in design, implementation, and testing. Root causes can also be affected by external factors.
The multi-layer network comprises network elements in several network protocol layers. An Ethernet E-line service on top of a packet-optical integrated network is an example of a network service provided in a multi-layer network, but other large-scale distribution networks may also be configured as multi-layer networks. Fault localization in a multi-layer networks is generally difficult. Path-trace capabilities, such as IP trace-route are not available in the optical layer.
In multi-layer networks, network elements with measurement capability in the network layers generate measurement records on network traffic. These records may be collected and used for statistical and/or reporting purposes. When faults are experienced in a multi-layer network, the collected information is used to detect failures and performance degradations that exist in the network. Typically, the information is gathered in a network management system NMS that handles fault management and performance management.
Known fault localization methods aim at finding a correlation between a network fault and one or more fault carrying events, i.e., events occurring in response to the network fault. This is usually a difficult task due to the relatively large amount of fault carrying events caused by a network fault. Events may in this context be the issuance of messages informing about something happening in the network, such as the occurrence of an alarm, a performance parameter increment, or a service/action request made by one node to itself or to another node.
A disadvantage of these methods is that they are limited in the scope of correlation of fault carrying events. In a complex communication system, processing of fault carrying events exclusively may not be sufficient to succeed in fault localization. The fault carrying events may be an effect of a fault but may also occur as a result of symptoms of the real fault. These symptoms may be localized far away from the actual location of the fault.
There are existing techniques to search for dependencies between fault carrying event(s) and the problem causing the fault carrying events, within one or more different subsystems. Dependencies between for instance an alarm in a certain subsystem and the cause of the problem, if the cause resides in a different subsystem, are thus not considered. Network performance monitoring, automated failure localization and diagnosis are critical to service providers of large distribution networks, due to the increases in scale, diversity and complexity of the application services. There are significant network management challenges that arise from the combination of rapid growth and increased complexity of the network. Failure and performance degradation diagnosis is an important area in network management.
The complexity and prevalence of communication networks, systems, and associated services have increased over the past several years. Many new applications are increasingly complex in terms of the resources required to operate and deliver the applications, the application functions, and storage architecture, for example. The resources necessary to conceive, develop, activate, and eventually to provide increasingly complex applications continue to increase. In addition to the increasing complexity of applications and services, there is increased demand for applications and services that traverse various network technologies and systems.
From a network management standpoint, these various networks and network devices often report operational information in different ways. For example, the networks and network devices may employ particular network management approaches and technologies for monitoring operation of the network system, and network management personnel associated with particular networks and network devices may rely upon specific, and varied, network management systems and methods. Furthermore, modern networks increasingly rely upon third party vendors to provide hardware and/or software for offered services. These hardware and software devices frequently operate and report according to systems, methods, and even protocols that are not the same as the network providing the services.
None of the techniques for localizing faults and failure diagnosis discussed above are applicable in a large scale multi-layer network. Thus, there is a need for a method to support diagnosis of failure and performance degradation instances in a multi-layer network.

SUMMARY

It is an object of the present invention to provide such a method for diagnosis of failure in a multi-layer network. This object is achieved through a method for root cause diagnosis of network failures and performance degradations in a multi-layer network method identifying the network elements that are the source of the failure as well the type of fault in these elements.
In an embodiment of a method according to the invention, a first set of performance measurements occurring in response to the fault and representing at least a first and a second layer in a multi-layer communications network are received in the network management system. The fault is localized by identifying a probable set of network elements affected by the fault. A type of fault is inferred from one or more symptoms in the first set of performance measurements. The root cause of the failure is identified from a combination of the information on probable set of network elements and inferred type of fault.
In another embodiment of a method according to the invention, the root cause analysis is performed in a pre-generated fault type inference graph.
In a further embodiment of a method according to the invention, additional measurements, a second set of measurements, are triggered when the analysis of localization of a fault and/or the fault type is non-conclusive. The steps of localizing the fault by identifying a probable set of network elements affected by the fault, inferring the type of fault from one or more symptoms incurred from one or more performance measurements, and determining the root cause by combining the information on the probable set of network elements with the inferred type of fault is repeated for the additional measurements.
It is another object of the invention to provide an arrangement in a network management node in a multi-layer network for carrying out the inventive method. The arrangement includes a receiver configured to receive a first set of performance measurements representing performance of network elements on at least a first and a second layer in the multi-layer communications network. The arrangement further includes a root cause analyzer arranged to detect affected network elements and fault type upon network failure. The root cause analyzer further comprises a spatial localization unit configured to identify a probable set of network elements affected by the fault based on the first set of performance measurements, and a fault type inference unit configured to infer the type of fault from one or more symptoms incurred from one or more performance measurements in the set of performance measurements. A transmitter is configured to output the information from the root cause analyzer.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically illustrates a multi-layer metro Ethernet network

FIG. 2 is a flowchart illustrating embodiments of method steps

FIG. 3 is a block diagram illustrating an embodiment of an arrangement in a network management node

FIG. 4 is an illustration of a fault type inference graph for a three-layer network

FIG. 5 illustrates a simplified scenario in a three-layer network

FIG. 6 a fault type inference graph for the simplified scenario

FIG. 7 illustrates the triggering of on-demand measurement on a new path

DETAILED DESCRIPTION

FIG. 1 schematically illustrates a multi-layer packet-optical network 100 with Ethernet services running on top of a generic MPLS transport, which is carried by an optical network. The network could be a so called Metro Ethernet used as a metropolitan access network to connect subscribers and businesses to a large service network. Such a network encompasses end-customers with intermediate distribution devices that operate on the different protocol layers: the optical layer 130, the MPLS layer 120 and the Ethernet layer 110.
Failure diagnosis in multi-layer networks requires the ability to localize faults in a large set of distributed components. FIG. 2 discloses a flowchart illustrating an embodiment of method steps for fault analysis in a multi-layer network. The method could be carried out in a root cause analysis, RCA, module disclosed in FIG. 3. The following description will be based on execution in such module. However, it is apparent to the person skilled in the art that the execution of the inventive method may be performed in any type of arrangement in a multi-layer network and in a multi-layer network that consists of other types of protocols, than MPLS, Ethernet and Optical networks.
In a first step of the illustrated embodiment of the inventive method a first set of performance measurements are received 210 representing elements within at least a first and a second layer in the multi-layer communications network. The performance measurements are preferably collected in measurement tools implemented on routers, switches or on hosts, capturing network performance, statistics of traffic performance metrics, e.g., optical bit-error-rate, MPLS delay, Ethernet bandwidth, or any other suitable metric from any protocol layer element, in the multi-layer network. In an embodiment of the invention the first set of performance measurements are represented by input events, wherein a set of network elements are associated with each event. These set of elements are referred to as ‘Elements of e’. The resolution in terms of elements associated with an event may vary between different events. For example, for an event e indicating a long delay on a path, the associated set of elements includes all network components that are part of the path. A router can be one of these components. However, the operator may also need to make the resolution finer and include measurements that have not previously been included. This could be accomplished by adding all involved routers' interfaces to the Element set.
In a next step 220, fault localization is performed identifying a probable set of network elements affected by the fault. The goal of the fault localization is to find all elements that could be the source of the failure/degradation. The fault localization is performed based on an assumption that at any given time, there is only a single fault leading to many input events and corresponding performance measurement anomalies from the elements associated with the input events. In the fault localization step, one or more elements that are part of all the input events are identified.
The first set of performance measurements are represented by input events e(i). A set of network elements e(i).elements are associated to each input event, where a network element may be associated to a plurality of input events. The fault localization process further involves identifying the network elements, common-elements, associated to each of the input events.


Fault-Localization Algorithm

	1. let e(1), e(2),..., e(n) be the input events	[1]
	2. common-elements = e(1).elements
	3. for each event e(i) do
	4. common-elements = common-elements ∩ e(i).elements
	5. end

In step 230 of the fault analysis method, the type of failure is assessed. In a root cause analysis, inferring the type of the root cause is uncorrelated to determining the location of the fault. Different types of faults results in different groups of measurement events, e.g., an optical link problem result in a first group of measurement events, a congestion at MPLS layer in a second group of measurement events, non-overlapping with the first group. A congestion at the Ethernet layer on the other hand, will result in a third set of measurement events, that may in part overlap with a fourth group of measurement events following on a situation of near congestion at the MPLS layer. In the step of assessing the fault type, the possible combination of events is tracked through a graph of events. The step of inferring the type of failure is preferably performed in sequence of the step of evaluating the fault. FIG. 4 illustrates an embodiment of a fault type inference graph in a three-layer network, based on which the root cause could be identified from a group or set of events. By means of tracking in the graph, it is possible to infer a most likely root cause type given the set of measurements.
In step 240, the root cause is determined by combining information on the location on the fault from step 220 and type of fault from step 230.
In the disclosed inference graph, each oval represents a measurement event and the rims of the ovals represent a state transition. The boxes map to possible causes, fault types, associated with the measurement event linked to the fault type in the inference graph. The graph contains types of measurements and is not specific to any location in the network. For each group of events, the most likely locations are identified using the Fault-Localization Algorithm [1], prior to inferring the type of root cause. The output of the two consecutive steps will be an identification of the fault location and the type of fault.
The identification of measurement events is preferably threshold based, with thresholds set to detect anomalies in the performance measurement results. The invention can be applied with different event detection mechanisms.
In an embodiment of the invention, the step of inferring a fault type involves a search process starting at a lower level moving to a higher level in the networking protocol stack. Events in higher layers may many times be a symptom of events in lower levels. Thus starting the search at a lower level may lead to identification of a fault type faster than performing the fault type inference with a starting point at a higher level. However, the invention is not limited to a lower level starting point, but is equally possible from an opposite protocol starting point.
When deciding on starting points within the same protocol layer, the search process checks measurement events with lower measurement overhead first. Execution of performance measurements in the network introduces a measurement load in the multilayer network, also known as the measurement overhead in the multilayer network. Starting with lower overhead measurements, the search process will be initiated for those measurements that may be performed with little impact on the overall performance of the multi-layer network; thus, unnecessary measurement load may be reduced in the system. Low overhead measurements are usually conducted more frequently and may thus provide more accurate information to help isolate the fault type faster.
The inference type search uses a pre-generated inference graph as input. The graph is traversed from a root node to a root cause node. For each node, a search is performed in an event set E for all events with the same type as the node. The search stops when a root cause node is detected.
In the step of inferring the fault type, the first set of performance measurements are represented by input events e(i). An event set E is formed by a set of input events e(i) forming a sequence of input events. Root cause analysis is performed for event set E, determining one or more symptoms.


Root Cause Analysis Algorithm (G,E)

1. Let event set E={e(1), e(2),..., e(n)} be the set of input events	[2]
2. Start from the root node in fault type inference graph G
3. for current node n in G do
4. find all events in E where e.type=n.type
5. start transition to the next node i.transition(e.value)
6. if n.type is a root cause type
7. break
8. end+TZ,1/32

In an embodiment of the invention, a pre-generated inference graph is used as an input, mapping a type of fault to the symptoms. The graph is traversed from the root node. Starting in a root node, a search is performed in the event set E for all events with the same type as the current node. If any event of the current type has an anomaly, traversing will continue to the nest node with the “Y” branch. If there is no anomaly, traversing will continue through the “X” branch. If the current node is a root cause node the search stops. If there are no unambiguous results relating to a root cause at this point of the search, e.g., when the search results in two or more possible root causes, an on-demand algorithm for generating further measurements should be triggered at this point.
FIG. 5 illustrates a scenario in a three-layer network, involving four routers R1-R4. The three-layer network includes an Ethernet layer, a MPLS layer and an Optical layer. FIG. 6 exemplifies an inference graph for such a three-layer network.
In order to accurately identify the root cause, a complete set of information on performance at each layer and at each network element is required. However, the conciseness of the set of performance information must be balanced against the concerns for measurement overhead. In the disclosed scenario, measurement capability is available involving the routers R1-R4 in the three layer network. In response to detection of some type of malfunction in the network, performance measurements are collected in measurement tools implemented on the routers. The performance measurements are represented by input events. Each input event represents measurement results from one or more measurement tools. In the disclosed simplified scenario a group of input events e1-e4, b1, b2 and m1, m2 respectively representing the optical layer, the Ethernet layer and the MPLS layer are identified. The resolution in terms of elements associated with an event varies between the events. The input events e1-e4, are based on bit error rate measurements in the routers, the input events b1, b2 concern Ethernet bandwidth and include measurements relating to multiple routers b1:{R1, R2, R3}, b2:{R2, R3, R4}; the same applies for input events m1, m2 that concern MPLS trace-route, m1:{R1, R2, R3}, m2:{R2,R3, R4}.
A receiver 360 in an arrangement 300 in a network management system receives a first set of input events b1, b2, m1, m2, e1-e4. The performance measurements and the corresponding input events represent the Ethernet layer {b1, b2}, the MPLS layer {m1, m2}, and the Optical layer {e1, e2, e3, e4}. The input events occur in response to a malfunction in the disclosed three-layer network. Events are generated by an event detection module in response to a malfunction in the disclosed three-layer network. The event identification is usually performed as a threshold based method, wherein thresholds based on time T, packet loss L and bandwidth B are defined. An Ethernet bandwidth measurement is represented by an input event b1 defined as an Ethernet Bandwidth>B. A further event b2 defined also defined as an Ethernet bandwidth>B. An MPLS traceroute measurement is represented by an input event ml defined as an MPLS Loss>L; a further event in the MPLS layer is identified as input event m2 also defined as an MPLS Loss>L. In the optical layer, events e1-e4 represent bit error rate measurements in the routers R1-R4.
The localization of the fault is performed by identifying a probable set of network elements affected by identifying the set of routers most commonly represented in all of the events. A search for the routers associated with each event is performed. In the illustrated scenario, the result of this search is a set of most likely r for each event:

- b1:{R1, R2, R3}
- b2:{R2, R3, R4}
- m1:{R1, R2, R3}
- m2:{R2, R3, R4}
- e1:{R1}
- e2: . . .

Based on the likely locations for b1, b2, m1 and m2, the most common set among all locations is identified as locations R2, R3.
The type of fault is inferred through a fault analysis performed as a root cause analysis in an inference graph. Following the inference graph illustrated in FIG. 6, the starting point for type of fault inference would be based a Bit Error Rate BER event based on BER measurements, since this type of measurements will directly reveal problems caused by optical link errors. In the illustrated scenario, the measurements represented by events e1-e4 are assumed to be in the normal range. The search then continues with detection of a MPLS Packet Loss. In the exemplified scenario, m1 and m2 are found to exceed a predetermined threshold L. The next step for the root cause analysis in the inference graph, exemplified in FIG. 6, involves detection of Ethernet Loss. In the exemplified scenario, the events b1 and b2 exceed the predetermined threshold L. From the root cause analysis it may then be determined that the most likely root cause, i.e., type of fault, is congestion at the MPLS layer.
As previously disclosed for FIG. 4, a fault type inference graph is structured to narrow down the most likely fault type for a given set of symptoms. A root cause analysis is performed based on a set of observations (events) from existing measurements. If sufficient measurements are lacking to complete the searching of root cause, the determination of fault type is prevented. In such a situation, there is a need to trigger additional measurements of a different type to those previously available.
The root cause inference graph can be used for improving the accuracy of root cause analysis, especially when sufficient measurements for an actual fault location and type determination are lacking. In an embodiment of the invention, a second set of measurements are triggered based on the results of root cause analysis. Three types of triggering conditions are foreseen: when sufficient measurements to draw a conclusion on root cause is missing, when sufficient measurements to identify the location of the fault are missing, and where there is no strong symptom of service degradation but an indication of future congestion.
An additional second set of measurements may be triggered following a root cause analysis based on the first set of measurements. During the first root cause analysis, it is possible to identify a set of additional events required in order to form a conclusion on a fault type. A second set of performance measurements could represent measurements of a type not previously included in the performance measurements due to measurement overhead. However, in a situation where a fault exists in the multi-layer network and where fault-localization attempts have failed, such increased overhead will be acceptable. New types of measurement could include high-frequency loss measurement, more accurate/more aggressive bandwidth measurement, jitter measurement, etc. When encountering possible fault types without matching events, a triggering command initiates additional measurements to generate a second set of measurements. The second set of measurements are represented by input events and included in the step of inferring a fault type through root cause analysis in the fault inference graph.
A second set of measurements may also be triggered when a set of available measurements do not provide sufficient information to localize the fault. FIG. 7 illustrates a situation where additional measurements from a new path, Path 3, are required. It is an assumption for the illustrated scenario that the available events enables a narrowing down of the localization to two routers {R1, R2} following the measurement paths Path 1 and Path 2, that both are likely locations without any possibility to distinguish a most likely location based on the available data. In this case, a second set of measurements are needed from a different path. This additional path, Path 3, is shown in a dashed line in the figure. With the help of the measurements triggered on Path 3, it is possible to accurately isolate the fault to router R1. In order to identify where to initiate measurements for the second set of measurements, a search is performed in the finest grained location set. In the disclosed example, this corresponds to R1 and R2. The search identifies one or more possible paths in the topology that traverses only a subset of location. If there are such paths, additional measurements from elements along the path may help reduce the set of most likely locations to an identification of a most likely location. In this part of the process of triggering a second set of measurements, it is assumed that the measurement type is given.
In order to prevent overloading of the network and the devices, measuring is usually performed in a low-rate manner. When failure or performance degradation occurs, the frequency of measuring may be increased to obtain more accurate measurement results. In order to be able to identify and initiate preventive actions prior to network failure, it is possible to increase the measurement frequency. A second set of measurements representing higher frequency measurements will be triggered when one or more detected events are close to a defined threshold for triggering alarms. Similarly, when the fault type diagnosis suggests that the network is close to congestion, more frequent measurements are required in order to detect or predict future congestion.
In the disclosed embodiments for triggering additional measurements, the process of choosing which measurements to require for the second set of measurements are based on data flagged as missing by the root cause analysis algorithm. The process of initiating generation of the second set of measurements is automated and enables an improved method for localizing faults and identifying fault types in a multi-layer network. It is of course possible to generate a second set of measurements including measurements from different paths, as well as measurements of another type or an increased frequency. However, the analysis of the second set of measurements would then preferably be carried out for each individual subset.
A benefit of the disclosed method for localizing and inferring type of fault is that the method considers performance measurements at different protocol layers in the network. The performance measurements are collected by one or more measurement tools. The root cause of a fault is inferred from a set of symptoms extracted from measurements. The method can be deployed without access to the network devices.
Working with on-demand measurements when additional measurements are required to draw a conclusion on localization and type of faults, reduces measurement overhead in the fault analysis process while enabling improved inference accuracy.
The disclosed method may operate in a packet-optical integrated network with Ethernet services, e.g., a Metro-Ethernet Self Organizing Network MESON. The disclosed method can be performed without integration with software deployment and billing systems. The method may automatically determine the configuration of the measurement tools required for gathering the data required in the root cause analysis process.
FIG. 3 discloses a root cause analysis, RCA, module for carrying out the invention. The root cause analysis module is an arrangement included in a network management system in a network management node in a multi-layer network or in a network management system distributed amongst several nodes in a multi-layer network. The network management system receives measurements originated in distributed measurement units in the multi-layer network. In the management system, a first set of input events corresponding to measurements in a first set of performance measurements may be generated. A receiver in the RCA module receives the first set of input events or the first set of measurements. The first set of input events are processed in an event correlator 310 for processing input events corresponding to measurements in a first set of performance measurements received by a receiver 360. The events are correlated in time, space or protocol. The correlated events form the input to a root cause analyzer 320 for further processing of the events or for processing of the first set of performance measurements. A spatial localization unit 330 determines a set of network elements most likely to be affected in response to a failure in the multi-layer network. The type of fault is determined in a fault type inference unit 350. If further measurements are required in order to make a conclusion on the root cause of a failure, further measurements may be triggered. The RCA module includes a measurement generating unit 340 communicating with the spatial localization unit 330 and the fault type inference unit 350 and initiating generation of a second set of measurements based on input from the spatial localization unit 330 or the fault type inference unit 350. A transmitter 370 provides an output from the RCA module for further processing in a network management system.

Claims

1. A method in a network management system for diagnosing a root cause of a fault in a multi-layer communications network, comprising:

receiving a first set of performance measurements representing performance of network elements on at least a first and a second layer in the multi-layer communications network;

localizing the fault by identifying, based on the first set of performance measurements, a probable set of network elements affected by the fault;

inferring a type of fault from one or more symptoms incurred from one or more performance measurements in the first set of performance measurements; and

determining the root cause by combining information on the probable set of network elements with the inferred type of fault.

2. The method according to claim 1, wherein localizing the fault includes:

representing the first set of performance measurements by input events (e(i));

associating a set of network elements (e(i).elements) to each input event (e(i)); wherein a network element may be associated to a plurality of input events (e(i)); and

localizing the fault by identifying at least one network element associated to each of the input events (e(i)).

3. The method according to claim 1, wherein inferring the type of fault includes:

representing the first set of performance measurements by input events (e(i));

determining one or more symptoms following from a sequence of input events (e(i)); and

mapping at least one type of fault to the symptoms.

4. The method according to claim 3, wherein the sequence of input events (e(i)) is tracked in a graph of events.

5. The method according to claim 4, wherein tracking in the graph is initiated at a lower protocol level moving to a higher level in the multi-layer network.

6. The method according to claim 4, wherein the graph is pre-generated based on basic domain knowledge of protocol layers, upon setting up of the multi-layer network.

7. The method according to claim 1, further comprising: triggering a second set of performance measurements in addition to the measurements included in the first set of performance measurements.

8. The method according to claim 7, wherein the first set of performance measurements includes performance measurements from multiple first measurement paths through the network elements, each measurement path including a subset of network elements, the second set of performance measurements representing performance measurements generated on at least a second measurement path, and wherein localizing the fault is repeated for the second set of performance measurements.

9. The method according to claim 7, wherein the second set of performance measurements represents performance measurements of other types to those given in the first set of performance measurements and wherein the second set of performance measurements are provided as an input to the inferring the type of fault.

10. The method according to claim 7, wherein the second set of performance measurements represents more frequent performance measurements than those given in the first set of performance measurements and wherein the second set of performance measurements is used to anticipate future service degradation.

11. The method according to claim 8, wherein the second set of performance measurements is a combination of measurements on the at least second measurement path, measurements of other types and more frequent measurements to those given in the first set of performance measurements.

12. A network management node in a multi-layer network comprising:

a receiver configured to receive a first set of performance measurements representing performance of network elements on at least a first and a second layer in the multi-layer communications network;

a root cause analyzer arranged to detect affected network elements and fault type upon a network failure, wherein the root cause analyzer (320) further includes:

a spatial localization unit configured to identify a probable set of network elements affected by the fault based on the first set of performance measurements, and

a fault type inference unit configured to infer a type of fault from one or more symptoms incurred from one or more performance measurements in the first set of performance measurements; and

a transmitter configured to output information from the root cause analyzer.

13. The network management node according to claim 12, further comprising:

an event correlator arranged to correlate events in time, space or protocol and to output the result to the root cause analyzer.

14. The network management node according to claim 12, further comprising:

a measurement generating unit adapted to communicate with the spatial localization unit and the fault type inference unit and adapted to initiate generation of a second set of measurements based on input from the spatial localization unit or the fault type inference unit.

15. The network management node according to claim 12, wherein the receiver is further configured to receive a second set of performance measurements in addition to the measurements included in the first set of performance measurements.

16. The network management node according to claim 15, wherein the first set of performance measurements includes performance measurements from multiple first measurement paths through the network elements, each measurement path including a subset of network elements, the second set of performance measurements representing performance measurements generated on at least a second measurement path, and wherein the step of localizing the fault is repeated for the second set of performance measurements.

17. The network management node according to claim 15, wherein the second set of performance measurements represents performance measurements of other types to those given in the first set of performance measurements and wherein the second set of performance measurements are provided as an input to the fault type inference unit to infer the type of fault.

18. The network management node according to claim 15, wherein the second set of performance measurements represents more frequent performance measurements than those given in the first set of performance measurements, and wherein the fault type inference unit uses the second set of performance measurements to anticipate future service degradation.

19. The network management node according to claim 16, wherein the second set of performance measurements is a combination of measurements on the at least second measurement path, measurements of other types and more frequent measurements to those given in the first set of performance measurements.