US20140355453A1 - Method and arrangement for fault analysis in a multi-layer network - Google Patents
Method and arrangement for fault analysis in a multi-layer network Download PDFInfo
- Publication number
- US20140355453A1 US20140355453A1 US14/367,735 US201214367735A US2014355453A1 US 20140355453 A1 US20140355453 A1 US 20140355453A1 US 201214367735 A US201214367735 A US 201214367735A US 2014355453 A1 US2014355453 A1 US 2014355453A1
- Authority
- US
- United States
- Prior art keywords
- performance measurements
- fault
- measurements
- network
- performance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000004458 analytical method Methods 0.000 title abstract description 26
- 238000005259 measurement Methods 0.000 claims abstract description 186
- 230000004807 localization Effects 0.000 claims abstract description 21
- 208000024891 symptom Diseases 0.000 claims abstract description 16
- 238000004891 communication Methods 0.000 claims abstract description 11
- 230000015556 catabolic process Effects 0.000 claims description 9
- 238000006731 degradation reaction Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 2
- 230000004044 response Effects 0.000 abstract description 7
- 230000003287 optical effect Effects 0.000 description 11
- 238000003745 diagnosis Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 230000001960 triggered effect Effects 0.000 description 8
- 238000001514 detection method Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 4
- 230000007257 malfunction Effects 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000003339 best practice Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
- H04L41/065—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving logical or physical relationship, e.g. grouping and hierarchies
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01D—SEPARATION
- B01D19/00—Degasification of liquids
- B01D19/02—Foam dispersion or prevention
- B01D19/04—Foam dispersion or prevention by addition of chemical substances
- B01D19/0404—Foam dispersion or prevention by addition of chemical substances characterised by the nature of the chemical substance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/24—Testing correct operation
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01D—SEPARATION
- B01D19/00—Degasification of liquids
- B01D19/02—Foam dispersion or prevention
- B01D19/04—Foam dispersion or prevention by addition of chemical substances
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01D—SEPARATION
- B01D19/00—Degasification of liquids
- B01D19/02—Foam dispersion or prevention
- B01D19/04—Foam dispersion or prevention by addition of chemical substances
- B01D19/0404—Foam dispersion or prevention by addition of chemical substances characterised by the nature of the chemical substance
- B01D19/0409—Foam dispersion or prevention by addition of chemical substances characterised by the nature of the chemical substance compounds containing Si-atoms
Definitions
- the present invention relates to a method and a network node for diagnosing one or more faults in a multi-layer communications network.
- Fast failure detection and failure diagnosis is an important area in network management. After a failure is detected and data switched to alternative paths, there is a need to quickly localize the failure so that measures may be taken to replace or repair the faulty network element.
- a communications network involves a large set of distributed hardware and software components. Errors may occur in each device in the network despite best practices in design, implementation, and testing. Root causes can also be affected by external factors.
- the multi-layer network comprises network elements in several network protocol layers.
- An Ethernet E-line service on top of a packet-optical integrated network is an example of a network service provided in a multi-layer network, but other large-scale distribution networks may also be configured as multi-layer networks.
- Fault localization in a multi-layer networks is generally difficult.
- Path-trace capabilities, such as IP trace-route are not available in the optical layer.
- network elements with measurement capability in the network layers generate measurement records on network traffic. These records may be collected and used for statistical and/or reporting purposes. When faults are experienced in a multi-layer network, the collected information is used to detect failures and performance degradations that exist in the network. Typically, the information is gathered in a network management system NMS that handles fault management and performance management.
- Known fault localization methods aim at finding a correlation between a network fault and one or more fault carrying events, i.e., events occurring in response to the network fault. This is usually a difficult task due to the relatively large amount of fault carrying events caused by a network fault. Events may in this context be the issuance of messages informing about something happening in the network, such as the occurrence of an alarm, a performance parameter increment, or a service/action request made by one node to itself or to another node.
- a disadvantage of these methods is that they are limited in the scope of correlation of fault carrying events.
- processing of fault carrying events exclusively may not be sufficient to succeed in fault localization.
- the fault carrying events may be an effect of a fault but may also occur as a result of symptoms of the real fault. These symptoms may be localized far away from the actual location of the fault.
- these various networks and network devices often report operational information in different ways.
- the networks and network devices may employ particular network management approaches and technologies for monitoring operation of the network system, and network management personnel associated with particular networks and network devices may rely upon specific, and varied, network management systems and methods.
- modern networks increasingly rely upon third party vendors to provide hardware and/or software for offered services. These hardware and software devices frequently operate and report according to systems, methods, and even protocols that are not the same as the network providing the services.
- a first set of performance measurements occurring in response to the fault and representing at least a first and a second layer in a multi-layer communications network are received in the network management system.
- the fault is localized by identifying a probable set of network elements affected by the fault.
- a type of fault is inferred from one or more symptoms in the first set of performance measurements.
- the root cause of the failure is identified from a combination of the information on probable set of network elements and inferred type of fault.
- the root cause analysis is performed in a pre-generated fault type inference graph.
- additional measurements are triggered when the analysis of localization of a fault and/or the fault type is non-conclusive.
- the steps of localizing the fault by identifying a probable set of network elements affected by the fault, inferring the type of fault from one or more symptoms incurred from one or more performance measurements, and determining the root cause by combining the information on the probable set of network elements with the inferred type of fault is repeated for the additional measurements.
- the arrangement includes a receiver configured to receive a first set of performance measurements representing performance of network elements on at least a first and a second layer in the multi-layer communications network.
- the arrangement further includes a root cause analyzer arranged to detect affected network elements and fault type upon network failure.
- the root cause analyzer further comprises a spatial localization unit configured to identify a probable set of network elements affected by the fault based on the first set of performance measurements, and a fault type inference unit configured to infer the type of fault from one or more symptoms incurred from one or more performance measurements in the set of performance measurements.
- a transmitter is configured to output the information from the root cause analyzer.
- FIG. 1 schematically illustrates a multi-layer metro Ethernet network
- FIG. 2 is a flowchart illustrating embodiments of method steps
- FIG. 3 is a block diagram illustrating an embodiment of an arrangement in a network management node
- FIG. 4 is an illustration of a fault type inference graph for a three-layer network
- FIG. 5 illustrates a simplified scenario in a three-layer network
- FIG. 6 a fault type inference graph for the simplified scenario
- FIG. 7 illustrates the triggering of on-demand measurement on a new path
- FIG. 1 schematically illustrates a multi-layer packet-optical network 100 with Ethernet services running on top of a generic MPLS transport, which is carried by an optical network.
- the network could be a so called Metro Ethernet used as a metropolitan access network to connect subscribers and businesses to a large service network.
- Such a network encompasses end-customers with intermediate distribution devices that operate on the different protocol layers: the optical layer 130 , the MPLS layer 120 and the Ethernet layer 110 .
- FIG. 2 discloses a flowchart illustrating an embodiment of method steps for fault analysis in a multi-layer network. The method could be carried out in a root cause analysis, RCA, module disclosed in FIG. 3 . The following description will be based on execution in such module. However, it is apparent to the person skilled in the art that the execution of the inventive method may be performed in any type of arrangement in a multi-layer network and in a multi-layer network that consists of other types of protocols, than MPLS, Ethernet and Optical networks.
- a first set of performance measurements are received 210 representing elements within at least a first and a second layer in the multi-layer communications network.
- the performance measurements are preferably collected in measurement tools implemented on routers, switches or on hosts, capturing network performance, statistics of traffic performance metrics, e.g., optical bit-error-rate, MPLS delay, Ethernet bandwidth, or any other suitable metric from any protocol layer element, in the multi-layer network.
- the first set of performance measurements are represented by input events, wherein a set of network elements are associated with each event. These set of elements are referred to as ‘Elements of e’. The resolution in terms of elements associated with an event may vary between different events.
- the associated set of elements includes all network components that are part of the path.
- a router can be one of these components.
- the operator may also need to make the resolution finer and include measurements that have not previously been included. This could be accomplished by adding all involved routers' interfaces to the Element set.
- fault localization is performed identifying a probable set of network elements affected by the fault.
- the goal of the fault localization is to find all elements that could be the source of the failure/degradation.
- the fault localization is performed based on an assumption that at any given time, there is only a single fault leading to many input events and corresponding performance measurement anomalies from the elements associated with the input events.
- one or more elements that are part of all the input events are identified.
- the first set of performance measurements are represented by input events e(i).
- a set of network elements e(i).elements are associated to each input event, where a network element may be associated to a plurality of input events.
- the fault localization process further involves identifying the network elements, common-elements, associated to each of the input events.
- step 230 of the fault analysis method the type of failure is assessed.
- a root cause analysis inferring the type of the root cause is uncorrelated to determining the location of the fault.
- Different types of faults results in different groups of measurement events, e.g., an optical link problem result in a first group of measurement events, a congestion at MPLS layer in a second group of measurement events, non-overlapping with the first group.
- a congestion at the Ethernet layer on the other hand, will result in a third set of measurement events, that may in part overlap with a fourth group of measurement events following on a situation of near congestion at the MPLS layer.
- the possible combination of events is tracked through a graph of events.
- FIG. 4 illustrates an embodiment of a fault type inference graph in a three-layer network, based on which the root cause could be identified from a group or set of events. By means of tracking in the graph, it is possible to infer a most likely root cause type given the set of measurements.
- step 240 the root cause is determined by combining information on the location on the fault from step 220 and type of fault from step 230 .
- each oval represents a measurement event and the rims of the ovals represent a state transition.
- the boxes map to possible causes, fault types, associated with the measurement event linked to the fault type in the inference graph.
- the graph contains types of measurements and is not specific to any location in the network. For each group of events, the most likely locations are identified using the Fault-Localization Algorithm [1], prior to inferring the type of root cause. The output of the two consecutive steps will be an identification of the fault location and the type of fault.
- the identification of measurement events is preferably threshold based, with thresholds set to detect anomalies in the performance measurement results.
- the invention can be applied with different event detection mechanisms.
- the step of inferring a fault type involves a search process starting at a lower level moving to a higher level in the networking protocol stack. Events in higher layers may many times be a symptom of events in lower levels. Thus starting the search at a lower level may lead to identification of a fault type faster than performing the fault type inference with a starting point at a higher level.
- the invention is not limited to a lower level starting point, but is equally possible from an opposite protocol starting point.
- the search process checks measurement events with lower measurement overhead first. Execution of performance measurements in the network introduces a measurement load in the multilayer network, also known as the measurement overhead in the multilayer network. Starting with lower overhead measurements, the search process will be initiated for those measurements that may be performed with little impact on the overall performance of the multi-layer network; thus, unnecessary measurement load may be reduced in the system. Low overhead measurements are usually conducted more frequently and may thus provide more accurate information to help isolate the fault type faster.
- the inference type search uses a pre-generated inference graph as input.
- the graph is traversed from a root node to a root cause node.
- a search is performed in an event set E for all events with the same type as the node. The search stops when a root cause node is detected.
- the first set of performance measurements are represented by input events e(i).
- An event set E is formed by a set of input events e(i) forming a sequence of input events. Root cause analysis is performed for event set E, determining one or more symptoms.
- a pre-generated inference graph is used as an input, mapping a type of fault to the symptoms.
- the graph is traversed from the root node.
- a search is performed in the event set E for all events with the same type as the current node. If any event of the current type has an anomaly, traversing will continue to the nest node with the “Y” branch. If there is no anomaly, traversing will continue through the “X” branch. If the current node is a root cause node the search stops. If there are no unambiguous results relating to a root cause at this point of the search, e.g., when the search results in two or more possible root causes, an on-demand algorithm for generating further measurements should be triggered at this point.
- FIG. 5 illustrates a scenario in a three-layer network, involving four routers R 1 -R 4 .
- the three-layer network includes an Ethernet layer, a MPLS layer and an Optical layer.
- FIG. 6 exemplifies an inference graph for such a three-layer network.
- measurement capability is available involving the routers R 1 -R 4 in the three layer network.
- performance measurements are collected in measurement tools implemented on the routers.
- the performance measurements are represented by input events.
- Each input event represents measurement results from one or more measurement tools.
- a group of input events e 1 -e 4 , b 1 , b 2 and m 1 , m 2 respectively representing the optical layer, the Ethernet layer and the MPLS layer are identified.
- the resolution in terms of elements associated with an event varies between the events.
- the input events e 1 -e 4 are based on bit error rate measurements in the routers, the input events b 1 , b 2 concern Ethernet bandwidth and include measurements relating to multiple routers b 1 : ⁇ R 1 , R 2 , R 3 ⁇ , b 2 : ⁇ R 2 , R 3 , R 4 ⁇ ; the same applies for input events m 1 , m 2 that concern MPLS trace-route, m 1 : ⁇ R 1 , R 2 , R 3 ⁇ , m 2 : ⁇ R 2 ,R 3 , R 4 ⁇ .
- a receiver 360 in an arrangement 300 in a network management system receives a first set of input events b 1 , b 2 , m 1 , m 2 , e 1 -e 4 .
- the performance measurements and the corresponding input events represent the Ethernet layer ⁇ b 1 , b 2 ⁇ , the MPLS layer ⁇ m 1 , m 2 ⁇ , and the Optical layer ⁇ e 1 , e 2 , e 3 , e 4 ⁇ .
- the input events occur in response to a malfunction in the disclosed three-layer network. Events are generated by an event detection module in response to a malfunction in the disclosed three-layer network.
- the event identification is usually performed as a threshold based method, wherein thresholds based on time T, packet loss L and bandwidth B are defined.
- An Ethernet bandwidth measurement is represented by an input event b 1 defined as an Ethernet Bandwidth>B.
- a further event b 2 defined also defined as an Ethernet bandwidth>B.
- An MPLS traceroute measurement is represented by an input event ml defined as an MPLS Loss>L; a further event in the MPLS layer is identified as input event m 2 also defined as an MPLS Loss>L.
- events e 1 -e 4 represent bit error rate measurements in the routers R 1 -R 4 .
- the localization of the fault is performed by identifying a probable set of network elements affected by identifying the set of routers most commonly represented in all of the events.
- a search for the routers associated with each event is performed. In the illustrated scenario, the result of this search is a set of most likely r for each event:
- the most common set among all locations is identified as locations R 2 , R 3 .
- the type of fault is inferred through a fault analysis performed as a root cause analysis in an inference graph.
- the starting point for type of fault inference would be based a Bit Error Rate BER event based on BER measurements, since this type of measurements will directly reveal problems caused by optical link errors.
- the measurements represented by events e 1 -e 4 are assumed to be in the normal range.
- the search then continues with detection of a MPLS Packet Loss.
- m 1 and m 2 are found to exceed a predetermined threshold L.
- the next step for the root cause analysis in the inference graph, exemplified in FIG. 6 involves detection of Ethernet Loss.
- the events b 1 and b 2 exceed the predetermined threshold L. From the root cause analysis it may then be determined that the most likely root cause, i.e., type of fault, is congestion at the MPLS layer.
- a fault type inference graph is structured to narrow down the most likely fault type for a given set of symptoms.
- a root cause analysis is performed based on a set of observations (events) from existing measurements. If sufficient measurements are lacking to complete the searching of root cause, the determination of fault type is prevented. In such a situation, there is a need to trigger additional measurements of a different type to those previously available.
- the root cause inference graph can be used for improving the accuracy of root cause analysis, especially when sufficient measurements for an actual fault location and type determination are lacking.
- a second set of measurements are triggered based on the results of root cause analysis.
- Three types of triggering conditions are foreseen: when sufficient measurements to draw a conclusion on root cause is missing, when sufficient measurements to identify the location of the fault are missing, and where there is no strong symptom of service degradation but an indication of future congestion.
- An additional second set of measurements may be triggered following a root cause analysis based on the first set of measurements.
- a root cause analysis it is possible to identify a set of additional events required in order to form a conclusion on a fault type.
- a second set of performance measurements could represent measurements of a type not previously included in the performance measurements due to measurement overhead. However, in a situation where a fault exists in the multi-layer network and where fault-localization attempts have failed, such increased overhead will be acceptable. New types of measurement could include high-frequency loss measurement, more accurate/more aggressive bandwidth measurement, jitter measurement, etc.
- a triggering command initiates additional measurements to generate a second set of measurements.
- the second set of measurements are represented by input events and included in the step of inferring a fault type through root cause analysis in the fault inference graph.
- a second set of measurements may also be triggered when a set of available measurements do not provide sufficient information to localize the fault.
- FIG. 7 illustrates a situation where additional measurements from a new path, Path 3 , are required. It is an assumption for the illustrated scenario that the available events enables a narrowing down of the localization to two routers ⁇ R 1 , R 2 ⁇ following the measurement paths Path 1 and Path 2 , that both are likely locations without any possibility to distinguish a most likely location based on the available data. In this case, a second set of measurements are needed from a different path. This additional path, Path 3 , is shown in a dashed line in the figure. With the help of the measurements triggered on Path 3 , it is possible to accurately isolate the fault to router R 1 .
- a search is performed in the finest grained location set. In the disclosed example, this corresponds to R 1 and R 2 .
- the search identifies one or more possible paths in the topology that traverses only a subset of location. If there are such paths, additional measurements from elements along the path may help reduce the set of most likely locations to an identification of a most likely location. In this part of the process of triggering a second set of measurements, it is assumed that the measurement type is given.
- measuring is usually performed in a low-rate manner.
- the frequency of measuring may be increased to obtain more accurate measurement results.
- a second set of measurements representing higher frequency measurements will be triggered when one or more detected events are close to a defined threshold for triggering alarms.
- the fault type diagnosis suggests that the network is close to congestion, more frequent measurements are required in order to detect or predict future congestion.
- the process of choosing which measurements to require for the second set of measurements are based on data flagged as missing by the root cause analysis algorithm.
- the process of initiating generation of the second set of measurements is automated and enables an improved method for localizing faults and identifying fault types in a multi-layer network. It is of course possible to generate a second set of measurements including measurements from different paths, as well as measurements of another type or an increased frequency. However, the analysis of the second set of measurements would then preferably be carried out for each individual subset.
- a benefit of the disclosed method for localizing and inferring type of fault is that the method considers performance measurements at different protocol layers in the network.
- the performance measurements are collected by one or more measurement tools.
- the root cause of a fault is inferred from a set of symptoms extracted from measurements.
- the method can be deployed without access to the network devices.
- the disclosed method may operate in a packet-optical integrated network with Ethernet services, e.g., a Metro-Ethernet Self Organizing Network MESON.
- the disclosed method can be performed without integration with software deployment and billing systems.
- the method may automatically determine the configuration of the measurement tools required for gathering the data required in the root cause analysis process.
- FIG. 3 discloses a root cause analysis, RCA, module for carrying out the invention.
- the root cause analysis module is an arrangement included in a network management system in a network management node in a multi-layer network or in a network management system distributed amongst several nodes in a multi-layer network.
- the network management system receives measurements originated in distributed measurement units in the multi-layer network.
- a first set of input events corresponding to measurements in a first set of performance measurements may be generated.
- a receiver in the RCA module receives the first set of input events or the first set of measurements.
- the first set of input events are processed in an event correlator 310 for processing input events corresponding to measurements in a first set of performance measurements received by a receiver 360 .
- the events are correlated in time, space or protocol.
- the correlated events form the input to a root cause analyzer 320 for further processing of the events or for processing of the first set of performance measurements.
- a spatial localization unit 330 determines a set of network elements most likely to be affected in response to a failure in the multi-layer network. The type of fault is determined in a fault type inference unit 350 . If further measurements are required in order to make a conclusion on the root cause of a failure, further measurements may be triggered.
- the RCA module includes a measurement generating unit 340 communicating with the spatial localization unit 330 and the fault type inference unit 350 and initiating generation of a second set of measurements based on input from the spatial localization unit 330 or the fault type inference unit 350 .
- a transmitter 370 provides an output from the RCA module for further processing in a network management system.
Landscapes
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Toxicology (AREA)
- Dispersion Chemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Degasification And Air Bubble Elimination (AREA)
Abstract
The invention relates to a method for fault analysis in a multi-layer network. The method includes receiving a first set of performance measurements occurring in response to the fault and representing at least a first and a second layer in a multi-layer communications network. A probable set of network elements affected by the fault are localized. The type of fault is inferred from one or more symptoms in the performance measurements. The root cause of the failure is determined from a combination of information on probable set of network elements and inferred type of fault.
The invention also relates to a network management node for carrying out the method. The node includes a root cause analyzer to detect affected network elements and fault type upon network failure, wherein affected network elements are determined in a spatial localization unit and the fault type is determined in a fault type inference unit.
Description
- The present invention relates to a method and a network node for diagnosing one or more faults in a multi-layer communications network.
- Fast failure detection and failure diagnosis is an important area in network management. After a failure is detected and data switched to alternative paths, there is a need to quickly localize the failure so that measures may be taken to replace or repair the faulty network element. A communications network involves a large set of distributed hardware and software components. Errors may occur in each device in the network despite best practices in design, implementation, and testing. Root causes can also be affected by external factors.
- The multi-layer network comprises network elements in several network protocol layers. An Ethernet E-line service on top of a packet-optical integrated network is an example of a network service provided in a multi-layer network, but other large-scale distribution networks may also be configured as multi-layer networks. Fault localization in a multi-layer networks is generally difficult. Path-trace capabilities, such as IP trace-route are not available in the optical layer.
- In multi-layer networks, network elements with measurement capability in the network layers generate measurement records on network traffic. These records may be collected and used for statistical and/or reporting purposes. When faults are experienced in a multi-layer network, the collected information is used to detect failures and performance degradations that exist in the network. Typically, the information is gathered in a network management system NMS that handles fault management and performance management.
- Known fault localization methods aim at finding a correlation between a network fault and one or more fault carrying events, i.e., events occurring in response to the network fault. This is usually a difficult task due to the relatively large amount of fault carrying events caused by a network fault. Events may in this context be the issuance of messages informing about something happening in the network, such as the occurrence of an alarm, a performance parameter increment, or a service/action request made by one node to itself or to another node.
- A disadvantage of these methods is that they are limited in the scope of correlation of fault carrying events. In a complex communication system, processing of fault carrying events exclusively may not be sufficient to succeed in fault localization. The fault carrying events may be an effect of a fault but may also occur as a result of symptoms of the real fault. These symptoms may be localized far away from the actual location of the fault.
- There are existing techniques to search for dependencies between fault carrying event(s) and the problem causing the fault carrying events, within one or more different subsystems. Dependencies between for instance an alarm in a certain subsystem and the cause of the problem, if the cause resides in a different subsystem, are thus not considered. Network performance monitoring, automated failure localization and diagnosis are critical to service providers of large distribution networks, due to the increases in scale, diversity and complexity of the application services. There are significant network management challenges that arise from the combination of rapid growth and increased complexity of the network. Failure and performance degradation diagnosis is an important area in network management.
- The complexity and prevalence of communication networks, systems, and associated services have increased over the past several years. Many new applications are increasingly complex in terms of the resources required to operate and deliver the applications, the application functions, and storage architecture, for example. The resources necessary to conceive, develop, activate, and eventually to provide increasingly complex applications continue to increase. In addition to the increasing complexity of applications and services, there is increased demand for applications and services that traverse various network technologies and systems.
- From a network management standpoint, these various networks and network devices often report operational information in different ways. For example, the networks and network devices may employ particular network management approaches and technologies for monitoring operation of the network system, and network management personnel associated with particular networks and network devices may rely upon specific, and varied, network management systems and methods. Furthermore, modern networks increasingly rely upon third party vendors to provide hardware and/or software for offered services. These hardware and software devices frequently operate and report according to systems, methods, and even protocols that are not the same as the network providing the services.
- None of the techniques for localizing faults and failure diagnosis discussed above are applicable in a large scale multi-layer network. Thus, there is a need for a method to support diagnosis of failure and performance degradation instances in a multi-layer network.
- It is an object of the present invention to provide such a method for diagnosis of failure in a multi-layer network. This object is achieved through a method for root cause diagnosis of network failures and performance degradations in a multi-layer network method identifying the network elements that are the source of the failure as well the type of fault in these elements.
- In an embodiment of a method according to the invention, a first set of performance measurements occurring in response to the fault and representing at least a first and a second layer in a multi-layer communications network are received in the network management system. The fault is localized by identifying a probable set of network elements affected by the fault. A type of fault is inferred from one or more symptoms in the first set of performance measurements. The root cause of the failure is identified from a combination of the information on probable set of network elements and inferred type of fault.
- In another embodiment of a method according to the invention, the root cause analysis is performed in a pre-generated fault type inference graph.
- In a further embodiment of a method according to the invention, additional measurements, a second set of measurements, are triggered when the analysis of localization of a fault and/or the fault type is non-conclusive. The steps of localizing the fault by identifying a probable set of network elements affected by the fault, inferring the type of fault from one or more symptoms incurred from one or more performance measurements, and determining the root cause by combining the information on the probable set of network elements with the inferred type of fault is repeated for the additional measurements.
- It is another object of the invention to provide an arrangement in a network management node in a multi-layer network for carrying out the inventive method. The arrangement includes a receiver configured to receive a first set of performance measurements representing performance of network elements on at least a first and a second layer in the multi-layer communications network. The arrangement further includes a root cause analyzer arranged to detect affected network elements and fault type upon network failure. The root cause analyzer further comprises a spatial localization unit configured to identify a probable set of network elements affected by the fault based on the first set of performance measurements, and a fault type inference unit configured to infer the type of fault from one or more symptoms incurred from one or more performance measurements in the set of performance measurements. A transmitter is configured to output the information from the root cause analyzer.
-
FIG. 1 schematically illustrates a multi-layer metro Ethernet network -
FIG. 2 is a flowchart illustrating embodiments of method steps -
FIG. 3 is a block diagram illustrating an embodiment of an arrangement in a network management node -
FIG. 4 is an illustration of a fault type inference graph for a three-layer network -
FIG. 5 illustrates a simplified scenario in a three-layer network -
FIG. 6 a fault type inference graph for the simplified scenario -
FIG. 7 illustrates the triggering of on-demand measurement on a new path -
FIG. 1 schematically illustrates a multi-layer packet-optical network 100 with Ethernet services running on top of a generic MPLS transport, which is carried by an optical network. The network could be a so called Metro Ethernet used as a metropolitan access network to connect subscribers and businesses to a large service network. Such a network encompasses end-customers with intermediate distribution devices that operate on the different protocol layers: theoptical layer 130, theMPLS layer 120 and the Ethernetlayer 110. - Failure diagnosis in multi-layer networks requires the ability to localize faults in a large set of distributed components.
FIG. 2 discloses a flowchart illustrating an embodiment of method steps for fault analysis in a multi-layer network. The method could be carried out in a root cause analysis, RCA, module disclosed inFIG. 3 . The following description will be based on execution in such module. However, it is apparent to the person skilled in the art that the execution of the inventive method may be performed in any type of arrangement in a multi-layer network and in a multi-layer network that consists of other types of protocols, than MPLS, Ethernet and Optical networks. - In a first step of the illustrated embodiment of the inventive method a first set of performance measurements are received 210 representing elements within at least a first and a second layer in the multi-layer communications network. The performance measurements are preferably collected in measurement tools implemented on routers, switches or on hosts, capturing network performance, statistics of traffic performance metrics, e.g., optical bit-error-rate, MPLS delay, Ethernet bandwidth, or any other suitable metric from any protocol layer element, in the multi-layer network. In an embodiment of the invention the first set of performance measurements are represented by input events, wherein a set of network elements are associated with each event. These set of elements are referred to as ‘Elements of e’. The resolution in terms of elements associated with an event may vary between different events. For example, for an event e indicating a long delay on a path, the associated set of elements includes all network components that are part of the path. A router can be one of these components. However, the operator may also need to make the resolution finer and include measurements that have not previously been included. This could be accomplished by adding all involved routers' interfaces to the Element set.
- In a
next step 220, fault localization is performed identifying a probable set of network elements affected by the fault. The goal of the fault localization is to find all elements that could be the source of the failure/degradation. The fault localization is performed based on an assumption that at any given time, there is only a single fault leading to many input events and corresponding performance measurement anomalies from the elements associated with the input events. In the fault localization step, one or more elements that are part of all the input events are identified. - The first set of performance measurements are represented by input events e(i). A set of network elements e(i).elements are associated to each input event, where a network element may be associated to a plurality of input events. The fault localization process further involves identifying the network elements, common-elements, associated to each of the input events.
-
Fault- Localization Algorithm 1. let e(1), e(2),..., e(n) be the input events [1] 2. common-elements = e(1).elements 3. for each event e(i) do 4. common-elements = common-elements ∩ e(i).elements 5. end - In
step 230 of the fault analysis method, the type of failure is assessed. In a root cause analysis, inferring the type of the root cause is uncorrelated to determining the location of the fault. Different types of faults results in different groups of measurement events, e.g., an optical link problem result in a first group of measurement events, a congestion at MPLS layer in a second group of measurement events, non-overlapping with the first group. A congestion at the Ethernet layer on the other hand, will result in a third set of measurement events, that may in part overlap with a fourth group of measurement events following on a situation of near congestion at the MPLS layer. In the step of assessing the fault type, the possible combination of events is tracked through a graph of events. The step of inferring the type of failure is preferably performed in sequence of the step of evaluating the fault.FIG. 4 illustrates an embodiment of a fault type inference graph in a three-layer network, based on which the root cause could be identified from a group or set of events. By means of tracking in the graph, it is possible to infer a most likely root cause type given the set of measurements. - In
step 240, the root cause is determined by combining information on the location on the fault fromstep 220 and type of fault fromstep 230. - In the disclosed inference graph, each oval represents a measurement event and the rims of the ovals represent a state transition. The boxes map to possible causes, fault types, associated with the measurement event linked to the fault type in the inference graph. The graph contains types of measurements and is not specific to any location in the network. For each group of events, the most likely locations are identified using the Fault-Localization Algorithm [1], prior to inferring the type of root cause. The output of the two consecutive steps will be an identification of the fault location and the type of fault.
- The identification of measurement events is preferably threshold based, with thresholds set to detect anomalies in the performance measurement results. The invention can be applied with different event detection mechanisms.
- In an embodiment of the invention, the step of inferring a fault type involves a search process starting at a lower level moving to a higher level in the networking protocol stack. Events in higher layers may many times be a symptom of events in lower levels. Thus starting the search at a lower level may lead to identification of a fault type faster than performing the fault type inference with a starting point at a higher level. However, the invention is not limited to a lower level starting point, but is equally possible from an opposite protocol starting point.
- When deciding on starting points within the same protocol layer, the search process checks measurement events with lower measurement overhead first. Execution of performance measurements in the network introduces a measurement load in the multilayer network, also known as the measurement overhead in the multilayer network. Starting with lower overhead measurements, the search process will be initiated for those measurements that may be performed with little impact on the overall performance of the multi-layer network; thus, unnecessary measurement load may be reduced in the system. Low overhead measurements are usually conducted more frequently and may thus provide more accurate information to help isolate the fault type faster.
- The inference type search uses a pre-generated inference graph as input. The graph is traversed from a root node to a root cause node. For each node, a search is performed in an event set E for all events with the same type as the node. The search stops when a root cause node is detected.
- In the step of inferring the fault type, the first set of performance measurements are represented by input events e(i). An event set E is formed by a set of input events e(i) forming a sequence of input events. Root cause analysis is performed for event set E, determining one or more symptoms.
-
Root Cause Analysis Algorithm (G,E) 1. Let event set E={e(1), e(2),..., e(n)} be the set of input events [2] 2. Start from the root node in fault type inference graph G 3. for current node n in G do 4. find all events in E where e.type=n.type 5. start transition to the next node i.transition(e.value) 6. if n.type is a root cause type 7. break 8. end+TZ,1/32 - In an embodiment of the invention, a pre-generated inference graph is used as an input, mapping a type of fault to the symptoms. The graph is traversed from the root node. Starting in a root node, a search is performed in the event set E for all events with the same type as the current node. If any event of the current type has an anomaly, traversing will continue to the nest node with the “Y” branch. If there is no anomaly, traversing will continue through the “X” branch. If the current node is a root cause node the search stops. If there are no unambiguous results relating to a root cause at this point of the search, e.g., when the search results in two or more possible root causes, an on-demand algorithm for generating further measurements should be triggered at this point.
-
FIG. 5 illustrates a scenario in a three-layer network, involving four routers R1-R4. The three-layer network includes an Ethernet layer, a MPLS layer and an Optical layer.FIG. 6 exemplifies an inference graph for such a three-layer network. - In order to accurately identify the root cause, a complete set of information on performance at each layer and at each network element is required. However, the conciseness of the set of performance information must be balanced against the concerns for measurement overhead. In the disclosed scenario, measurement capability is available involving the routers R1-R4 in the three layer network. In response to detection of some type of malfunction in the network, performance measurements are collected in measurement tools implemented on the routers. The performance measurements are represented by input events. Each input event represents measurement results from one or more measurement tools. In the disclosed simplified scenario a group of input events e1-e4, b1, b2 and m1, m2 respectively representing the optical layer, the Ethernet layer and the MPLS layer are identified. The resolution in terms of elements associated with an event varies between the events. The input events e1-e4, are based on bit error rate measurements in the routers, the input events b1, b2 concern Ethernet bandwidth and include measurements relating to multiple routers b1:{R1, R2, R3}, b2:{R2, R3, R4}; the same applies for input events m1, m2 that concern MPLS trace-route, m1:{R1, R2, R3}, m2:{R2,R3, R4}.
- A
receiver 360 in anarrangement 300 in a network management system receives a first set of input events b1, b2, m1, m2, e1-e4. The performance measurements and the corresponding input events represent the Ethernet layer {b1, b2}, the MPLS layer {m1, m2}, and the Optical layer {e1, e2, e3, e4}. The input events occur in response to a malfunction in the disclosed three-layer network. Events are generated by an event detection module in response to a malfunction in the disclosed three-layer network. The event identification is usually performed as a threshold based method, wherein thresholds based on time T, packet loss L and bandwidth B are defined. An Ethernet bandwidth measurement is represented by an input event b1 defined as an Ethernet Bandwidth>B. A further event b2 defined also defined as an Ethernet bandwidth>B. An MPLS traceroute measurement is represented by an input event ml defined as an MPLS Loss>L; a further event in the MPLS layer is identified as input event m2 also defined as an MPLS Loss>L. In the optical layer, events e1-e4 represent bit error rate measurements in the routers R1-R4. - The localization of the fault is performed by identifying a probable set of network elements affected by identifying the set of routers most commonly represented in all of the events. A search for the routers associated with each event is performed. In the illustrated scenario, the result of this search is a set of most likely r for each event:
-
- b1:{R1, R2, R3}
- b2:{R2, R3, R4}
- m1:{R1, R2, R3}
- m2:{R2, R3, R4}
- e1:{R1}
- e2: . . .
- Based on the likely locations for b1, b2, m1 and m2, the most common set among all locations is identified as locations R2, R3.
- The type of fault is inferred through a fault analysis performed as a root cause analysis in an inference graph. Following the inference graph illustrated in
FIG. 6 , the starting point for type of fault inference would be based a Bit Error Rate BER event based on BER measurements, since this type of measurements will directly reveal problems caused by optical link errors. In the illustrated scenario, the measurements represented by events e1-e4 are assumed to be in the normal range. The search then continues with detection of a MPLS Packet Loss. In the exemplified scenario, m1 and m2 are found to exceed a predetermined threshold L. The next step for the root cause analysis in the inference graph, exemplified inFIG. 6 , involves detection of Ethernet Loss. In the exemplified scenario, the events b1 and b2 exceed the predetermined threshold L. From the root cause analysis it may then be determined that the most likely root cause, i.e., type of fault, is congestion at the MPLS layer. - As previously disclosed for
FIG. 4 , a fault type inference graph is structured to narrow down the most likely fault type for a given set of symptoms. A root cause analysis is performed based on a set of observations (events) from existing measurements. If sufficient measurements are lacking to complete the searching of root cause, the determination of fault type is prevented. In such a situation, there is a need to trigger additional measurements of a different type to those previously available. - The root cause inference graph can be used for improving the accuracy of root cause analysis, especially when sufficient measurements for an actual fault location and type determination are lacking. In an embodiment of the invention, a second set of measurements are triggered based on the results of root cause analysis. Three types of triggering conditions are foreseen: when sufficient measurements to draw a conclusion on root cause is missing, when sufficient measurements to identify the location of the fault are missing, and where there is no strong symptom of service degradation but an indication of future congestion.
- An additional second set of measurements may be triggered following a root cause analysis based on the first set of measurements. During the first root cause analysis, it is possible to identify a set of additional events required in order to form a conclusion on a fault type. A second set of performance measurements could represent measurements of a type not previously included in the performance measurements due to measurement overhead. However, in a situation where a fault exists in the multi-layer network and where fault-localization attempts have failed, such increased overhead will be acceptable. New types of measurement could include high-frequency loss measurement, more accurate/more aggressive bandwidth measurement, jitter measurement, etc. When encountering possible fault types without matching events, a triggering command initiates additional measurements to generate a second set of measurements. The second set of measurements are represented by input events and included in the step of inferring a fault type through root cause analysis in the fault inference graph.
- A second set of measurements may also be triggered when a set of available measurements do not provide sufficient information to localize the fault.
FIG. 7 illustrates a situation where additional measurements from a new path, Path 3, are required. It is an assumption for the illustrated scenario that the available events enables a narrowing down of the localization to two routers {R1, R2} following themeasurement paths Path 1 andPath 2, that both are likely locations without any possibility to distinguish a most likely location based on the available data. In this case, a second set of measurements are needed from a different path. This additional path, Path 3, is shown in a dashed line in the figure. With the help of the measurements triggered on Path 3, it is possible to accurately isolate the fault to router R1. In order to identify where to initiate measurements for the second set of measurements, a search is performed in the finest grained location set. In the disclosed example, this corresponds to R1 and R2. The search identifies one or more possible paths in the topology that traverses only a subset of location. If there are such paths, additional measurements from elements along the path may help reduce the set of most likely locations to an identification of a most likely location. In this part of the process of triggering a second set of measurements, it is assumed that the measurement type is given. - In order to prevent overloading of the network and the devices, measuring is usually performed in a low-rate manner. When failure or performance degradation occurs, the frequency of measuring may be increased to obtain more accurate measurement results. In order to be able to identify and initiate preventive actions prior to network failure, it is possible to increase the measurement frequency. A second set of measurements representing higher frequency measurements will be triggered when one or more detected events are close to a defined threshold for triggering alarms. Similarly, when the fault type diagnosis suggests that the network is close to congestion, more frequent measurements are required in order to detect or predict future congestion.
- In the disclosed embodiments for triggering additional measurements, the process of choosing which measurements to require for the second set of measurements are based on data flagged as missing by the root cause analysis algorithm. The process of initiating generation of the second set of measurements is automated and enables an improved method for localizing faults and identifying fault types in a multi-layer network. It is of course possible to generate a second set of measurements including measurements from different paths, as well as measurements of another type or an increased frequency. However, the analysis of the second set of measurements would then preferably be carried out for each individual subset.
- A benefit of the disclosed method for localizing and inferring type of fault is that the method considers performance measurements at different protocol layers in the network. The performance measurements are collected by one or more measurement tools. The root cause of a fault is inferred from a set of symptoms extracted from measurements. The method can be deployed without access to the network devices.
- Working with on-demand measurements when additional measurements are required to draw a conclusion on localization and type of faults, reduces measurement overhead in the fault analysis process while enabling improved inference accuracy.
- The disclosed method may operate in a packet-optical integrated network with Ethernet services, e.g., a Metro-Ethernet Self Organizing Network MESON. The disclosed method can be performed without integration with software deployment and billing systems. The method may automatically determine the configuration of the measurement tools required for gathering the data required in the root cause analysis process.
-
FIG. 3 discloses a root cause analysis, RCA, module for carrying out the invention. The root cause analysis module is an arrangement included in a network management system in a network management node in a multi-layer network or in a network management system distributed amongst several nodes in a multi-layer network. The network management system receives measurements originated in distributed measurement units in the multi-layer network. In the management system, a first set of input events corresponding to measurements in a first set of performance measurements may be generated. A receiver in the RCA module receives the first set of input events or the first set of measurements. The first set of input events are processed in anevent correlator 310 for processing input events corresponding to measurements in a first set of performance measurements received by areceiver 360. The events are correlated in time, space or protocol. The correlated events form the input to aroot cause analyzer 320 for further processing of the events or for processing of the first set of performance measurements. Aspatial localization unit 330 determines a set of network elements most likely to be affected in response to a failure in the multi-layer network. The type of fault is determined in a faulttype inference unit 350. If further measurements are required in order to make a conclusion on the root cause of a failure, further measurements may be triggered. The RCA module includes ameasurement generating unit 340 communicating with thespatial localization unit 330 and the faulttype inference unit 350 and initiating generation of a second set of measurements based on input from thespatial localization unit 330 or the faulttype inference unit 350. Atransmitter 370 provides an output from the RCA module for further processing in a network management system.
Claims (19)
1. A method in a network management system for diagnosing a root cause of a fault in a multi-layer communications network, comprising:
receiving a first set of performance measurements representing performance of network elements on at least a first and a second layer in the multi-layer communications network;
localizing the fault by identifying, based on the first set of performance measurements, a probable set of network elements affected by the fault;
inferring a type of fault from one or more symptoms incurred from one or more performance measurements in the first set of performance measurements; and
determining the root cause by combining information on the probable set of network elements with the inferred type of fault.
2. The method according to claim 1 , wherein localizing the fault includes:
representing the first set of performance measurements by input events (e(i));
associating a set of network elements (e(i).elements) to each input event (e(i)); wherein a network element may be associated to a plurality of input events (e(i)); and
localizing the fault by identifying at least one network element associated to each of the input events (e(i)).
3. The method according to claim 1 , wherein inferring the type of fault includes:
representing the first set of performance measurements by input events (e(i));
determining one or more symptoms following from a sequence of input events (e(i)); and
mapping at least one type of fault to the symptoms.
4. The method according to claim 3 , wherein the sequence of input events (e(i)) is tracked in a graph of events.
5. The method according to claim 4 , wherein tracking in the graph is initiated at a lower protocol level moving to a higher level in the multi-layer network.
6. The method according to claim 4 , wherein the graph is pre-generated based on basic domain knowledge of protocol layers, upon setting up of the multi-layer network.
7. The method according to claim 1 , further comprising: triggering a second set of performance measurements in addition to the measurements included in the first set of performance measurements.
8. The method according to claim 7 , wherein the first set of performance measurements includes performance measurements from multiple first measurement paths through the network elements, each measurement path including a subset of network elements, the second set of performance measurements representing performance measurements generated on at least a second measurement path, and wherein localizing the fault is repeated for the second set of performance measurements.
9. The method according to claim 7 , wherein the second set of performance measurements represents performance measurements of other types to those given in the first set of performance measurements and wherein the second set of performance measurements are provided as an input to the inferring the type of fault.
10. The method according to claim 7 , wherein the second set of performance measurements represents more frequent performance measurements than those given in the first set of performance measurements and wherein the second set of performance measurements is used to anticipate future service degradation.
11. The method according to claim 8 , wherein the second set of performance measurements is a combination of measurements on the at least second measurement path, measurements of other types and more frequent measurements to those given in the first set of performance measurements.
12. A network management node in a multi-layer network comprising:
a receiver configured to receive a first set of performance measurements representing performance of network elements on at least a first and a second layer in the multi-layer communications network;
a root cause analyzer arranged to detect affected network elements and fault type upon a network failure, wherein the root cause analyzer (320) further includes:
a spatial localization unit configured to identify a probable set of network elements affected by the fault based on the first set of performance measurements, and
a fault type inference unit configured to infer a type of fault from one or more symptoms incurred from one or more performance measurements in the first set of performance measurements; and
a transmitter configured to output information from the root cause analyzer.
13. The network management node according to claim 12 , further comprising:
an event correlator arranged to correlate events in time, space or protocol and to output the result to the root cause analyzer.
14. The network management node according to claim 12 , further comprising:
a measurement generating unit adapted to communicate with the spatial localization unit and the fault type inference unit and adapted to initiate generation of a second set of measurements based on input from the spatial localization unit or the fault type inference unit.
15. The network management node according to claim 12 , wherein the receiver is further configured to receive a second set of performance measurements in addition to the measurements included in the first set of performance measurements.
16. The network management node according to claim 15 , wherein the first set of performance measurements includes performance measurements from multiple first measurement paths through the network elements, each measurement path including a subset of network elements, the second set of performance measurements representing performance measurements generated on at least a second measurement path, and wherein the step of localizing the fault is repeated for the second set of performance measurements.
17. The network management node according to claim 15 , wherein the second set of performance measurements represents performance measurements of other types to those given in the first set of performance measurements and wherein the second set of performance measurements are provided as an input to the fault type inference unit to infer the type of fault.
18. The network management node according to claim 15 , wherein the second set of performance measurements represents more frequent performance measurements than those given in the first set of performance measurements, and wherein the fault type inference unit uses the second set of performance measurements to anticipate future service degradation.
19. The network management node according to claim 16 , wherein the second set of performance measurements is a combination of measurements on the at least second measurement path, measurements of other types and more frequent measurements to those given in the first set of performance measurements.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/367,735 US20140355453A1 (en) | 2011-12-21 | 2012-01-16 | Method and arrangement for fault analysis in a multi-layer network |
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201161578419P | 2011-12-21 | 2011-12-21 | |
| US14/367,735 US20140355453A1 (en) | 2011-12-21 | 2012-01-16 | Method and arrangement for fault analysis in a multi-layer network |
| PCT/SE2012/050028 WO2013095247A1 (en) | 2011-12-21 | 2012-01-16 | Method and arrangement for fault analysis in a multi-layer network |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20140355453A1 true US20140355453A1 (en) | 2014-12-04 |
Family
ID=48668956
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US14/367,735 Abandoned US20140355453A1 (en) | 2011-12-21 | 2012-01-16 | Method and arrangement for fault analysis in a multi-layer network |
| US13/722,924 Active 2034-03-22 US9259671B2 (en) | 2011-12-21 | 2012-12-20 | Apparatus, system, and method for defoaming a waste tank |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/722,924 Active 2034-03-22 US9259671B2 (en) | 2011-12-21 | 2012-12-20 | Apparatus, system, and method for defoaming a waste tank |
Country Status (3)
| Country | Link |
|---|---|
| US (2) | US20140355453A1 (en) |
| EP (1) | EP2795841B1 (en) |
| WO (1) | WO2013095247A1 (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160119089A1 (en) * | 2013-07-08 | 2016-04-28 | Huawei Technologies Co., Ltd. | Method for bit error rate detection, and network device |
| US11086768B1 (en) * | 2020-02-20 | 2021-08-10 | International Business Machines Corporation | Identifying false positives in test case failures using combinatorics |
| US11176026B2 (en) | 2020-02-20 | 2021-11-16 | International Business Machines Corporation | Assignment of test case priorities based on combinatorial test design model analysis |
| US11307975B2 (en) | 2020-02-20 | 2022-04-19 | International Business Machines Corporation | Machine code analysis for identifying software defects |
| US11316728B2 (en) * | 2018-05-27 | 2022-04-26 | Sedonasys Systems Ltd | Method and system for assessing network resource failures using passive shared risk resource groups |
| US20220224591A1 (en) * | 2018-10-18 | 2022-07-14 | Nippon Telegraph And Telephone Corporation | Network management apparatus, method, and program |
| US11436132B2 (en) * | 2020-03-16 | 2022-09-06 | International Business Machines Corporation | Stress test impact isolation and mapping |
| US20220286348A1 (en) * | 2019-09-04 | 2022-09-08 | Zte Corporation | Method and device for determining root cause of fault, server and computer-readable medium |
| US11663113B2 (en) | 2020-02-20 | 2023-05-30 | International Business Machines Corporation | Real time fault localization using combinatorial test design techniques and test case priority selection |
Families Citing this family (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3089409A4 (en) * | 2013-12-26 | 2017-11-01 | Telefonica, S.A. | Method and system for restoring qos deteriorations in mpls networks |
| US11695618B2 (en) * | 2021-01-07 | 2023-07-04 | Accenture Global Solutions Limited | Quantum computing in root cause analysis of 5G and subsequent generations of communication networks |
| CN113285840B (en) * | 2021-06-11 | 2021-09-17 | 云宏信息科技股份有限公司 | Storage network fault root cause analysis method and computer readable storage medium |
| CN117964031B (en) * | 2023-12-29 | 2024-11-19 | 宁夏宁东清大国华环境资源有限公司 | A defoaming treatment method for high-concentration waste liquid evaporation process |
| DE102024122217A1 (en) | 2024-08-02 | 2026-02-05 | Deutsche Telekom Ag | Method and system for monitoring a data transmission link |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020129295A1 (en) * | 2001-03-07 | 2002-09-12 | Itaru Nishioka | Network node apparatus, network system using the same and fault location detecting method |
| US20030191841A1 (en) * | 2000-05-15 | 2003-10-09 | Deferranti Marcus | Communication system and method |
| US20040078683A1 (en) * | 2000-05-05 | 2004-04-22 | Buia Christhoper A. | Systems and methods for managing and analyzing faults in computer networks |
| US20050204028A1 (en) * | 2004-01-30 | 2005-09-15 | Microsoft Corporation | Methods and systems for removing data inconsistencies for a network simulation |
| US20060277295A1 (en) * | 2005-06-03 | 2006-12-07 | Mineyoshi Masuda | Monitoring system and monitoring method |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS61136406A (en) * | 1984-12-05 | 1986-06-24 | Toray Silicone Co Ltd | Slid silicone defoaming agent and its preparation |
| US8484336B2 (en) * | 2006-11-15 | 2013-07-09 | Cisco Technology, Inc. | Root cause analysis in a communication network |
| US8889048B2 (en) * | 2007-10-18 | 2014-11-18 | Ecolab Inc. | Pressed, self-solidifying, solid cleaning compositions and methods of making them |
| US8411577B2 (en) * | 2010-03-19 | 2013-04-02 | At&T Intellectual Property I, L.P. | Methods, apparatus and articles of manufacture to perform root cause analysis for network events |
-
2012
- 2012-01-16 EP EP12861017.7A patent/EP2795841B1/en not_active Not-in-force
- 2012-01-16 US US14/367,735 patent/US20140355453A1/en not_active Abandoned
- 2012-01-16 WO PCT/SE2012/050028 patent/WO2013095247A1/en not_active Ceased
- 2012-12-20 US US13/722,924 patent/US9259671B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040078683A1 (en) * | 2000-05-05 | 2004-04-22 | Buia Christhoper A. | Systems and methods for managing and analyzing faults in computer networks |
| US20030191841A1 (en) * | 2000-05-15 | 2003-10-09 | Deferranti Marcus | Communication system and method |
| US20020129295A1 (en) * | 2001-03-07 | 2002-09-12 | Itaru Nishioka | Network node apparatus, network system using the same and fault location detecting method |
| US20050204028A1 (en) * | 2004-01-30 | 2005-09-15 | Microsoft Corporation | Methods and systems for removing data inconsistencies for a network simulation |
| US20060277295A1 (en) * | 2005-06-03 | 2006-12-07 | Mineyoshi Masuda | Monitoring system and monitoring method |
Cited By (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9948434B2 (en) * | 2013-07-08 | 2018-04-17 | Huawei Technologies Co., Ltd. | Method for bit error rate detection, and network device |
| US20160119089A1 (en) * | 2013-07-08 | 2016-04-28 | Huawei Technologies Co., Ltd. | Method for bit error rate detection, and network device |
| US11316728B2 (en) * | 2018-05-27 | 2022-04-26 | Sedonasys Systems Ltd | Method and system for assessing network resource failures using passive shared risk resource groups |
| US11489715B2 (en) * | 2018-05-27 | 2022-11-01 | Sedonasys Systems Ltd | Method and system for assessing network resource failures using passive shared risk resource groups |
| US20220224591A1 (en) * | 2018-10-18 | 2022-07-14 | Nippon Telegraph And Telephone Corporation | Network management apparatus, method, and program |
| US11736338B2 (en) * | 2018-10-18 | 2023-08-22 | Nippon Telegraph And Telephone Corporation | Network management apparatus, method, and program |
| US20220286348A1 (en) * | 2019-09-04 | 2022-09-08 | Zte Corporation | Method and device for determining root cause of fault, server and computer-readable medium |
| US11750439B2 (en) * | 2019-09-04 | 2023-09-05 | Zte Corporation | Method and device for determining root cause of fault, server and computer-readable medium |
| US11307975B2 (en) | 2020-02-20 | 2022-04-19 | International Business Machines Corporation | Machine code analysis for identifying software defects |
| US11176026B2 (en) | 2020-02-20 | 2021-11-16 | International Business Machines Corporation | Assignment of test case priorities based on combinatorial test design model analysis |
| US11086768B1 (en) * | 2020-02-20 | 2021-08-10 | International Business Machines Corporation | Identifying false positives in test case failures using combinatorics |
| US11663113B2 (en) | 2020-02-20 | 2023-05-30 | International Business Machines Corporation | Real time fault localization using combinatorial test design techniques and test case priority selection |
| US11436132B2 (en) * | 2020-03-16 | 2022-09-06 | International Business Machines Corporation | Stress test impact isolation and mapping |
| US11636028B2 (en) | 2020-03-16 | 2023-04-25 | International Business Machines Corporation | Stress test impact isolation and mapping |
Also Published As
| Publication number | Publication date |
|---|---|
| US20130192469A1 (en) | 2013-08-01 |
| WO2013095247A1 (en) | 2013-06-27 |
| EP2795841B1 (en) | 2021-11-24 |
| US9259671B2 (en) | 2016-02-16 |
| EP2795841A1 (en) | 2014-10-29 |
| EP2795841A4 (en) | 2015-09-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP2795841B1 (en) | Method and arrangement for fault analysis in a multi-layer network | |
| Kompella et al. | Detection and localization of network black holes | |
| JP5195953B2 (en) | Abnormal link estimation device, abnormal link estimation method, program, and abnormal link estimation system | |
| Markopoulou et al. | Characterization of failures in an IP backbone | |
| JP5120784B2 (en) | Method for estimating quality degradation points on a network in a communication network system | |
| US7907535B2 (en) | Anomaly detection and diagnosis using passive monitoring | |
| US9712290B2 (en) | Network link monitoring and testing | |
| EP2081321A2 (en) | Sampling apparatus distinguishing a failure in a network even by using a single sampling and a method therefor | |
| Herodotou et al. | Scalable near real-time failure localization of data center networks | |
| US20110270957A1 (en) | Method and system for logging trace events of a network device | |
| CN110224883B (en) | A Grey Fault Diagnosis Method Applied in Telecom Bearer Network | |
| JP2009049708A (en) | Network failure information collection device, system, method and program | |
| CN101483547A (en) | Evaluation method and system for network burst affair | |
| US20160308709A1 (en) | Method and system for restoring qos degradations in mpls networks | |
| JP4412031B2 (en) | Network monitoring system and method, and program | |
| CN109039763A (en) | A kind of network failure nodal test method and Network Management System based on backtracking method | |
| US8631115B2 (en) | Connectivity outage detection: network/IP SLA probes reporting business impact information | |
| US9203719B2 (en) | Communicating alarms between devices of a network | |
| Tang et al. | Efficient fault diagnosis using incremental alarm correlation and active investigation for internet and overlay networks | |
| Bouillard et al. | Hidden anomaly detection in telecommunication networks | |
| Zhang et al. | A framework for measuring and predicting the impact of routing changes | |
| JP6586067B2 (en) | Fault location device, fault location method, and fault location program | |
| Tayal et al. | Congestion-aware probe selection for fault detection in networks | |
| KR100921558B1 (en) | TCCP Network Management System and Method | |
| KR100921335B1 (en) | Apparatus and Method for Stability Diagnosis of Circuits Using Internet Traffic Characteristics |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL), SWEDEN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BEHESHTI-ZAVAREH, NEDA;DEVLIC, ALISA;MEIROSU, CATALIN;AND OTHERS;SIGNING DATES FROM 20111222 TO 20120116;REEL/FRAME:033151/0822 |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |