WO2016150524A1

WO2016150524A1 - Method and apparatus for managing a service chain in a communication network

Info

Publication number: WO2016150524A1
Application number: PCT/EP2015/069972
Authority: WO
Inventors: Sergio Lanzone; Böttcher JENS; Victor Manuel Avila Gonzalez
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2015-03-24
Filing date: 2015-09-01
Publication date: 2016-09-29
Anticipated expiration: 2017-09-24

Abstract

A method (100) for managing a service chain in a communication network is disclosed, the service chain comprising a traffic entry point, a traffic exit point and at least one service function between the traffic entry point and the traffic exit point. The method comprises determining that a service function in the service chain has failed or become overloaded (110), and causing traffic in the service chain to follow a secondary path to the traffic exit point (170), the secondary path bypassing the failed or overloaded service function. Also disclosed are a controller node (500, 600) and a computer program product configured to carry out a method for managing a service chain in a communication network.

Description

Method And Apparatus For Managing A Service Chain In A Communication Network

Technical Field

The present invention relates to a method and an apparatus for managing a service chain (SC) in a communication network. The present invention also relates to a computer program product configured, when run on a computer, to carry out a method for managing a service chain in a communication network.

Background

An attractive feature of Software Defined Networking (SDN) architecture for communication networks is the ability to create software controlled service chains, in which network traffic is steered through appropriate service functions for a given traffic flow based on customer profile, service type, traffic patterns or other characteristics. Network Address Translation (NAT), Deep Packet Inspection (DPI) and Content Filtering are examples of small Service Functions (SFs) which may be combined in a service chain to satisfy customer requirements.

Figure 1 illustrates an example service chain 2. Traffic flow from a subscriber 4, for example a residential user, is passed through a Service Edge node such as a Border Network Gateway (BNG) 6 before being delivered to the Internet 8 via the service chain 2. The service chain 2 comprises a plurality of service functions 12 through which the traffic flow is passed via flow switches 14 in a transport network 10. The particular service functions 12 forming the service chain 2 are determined by a user profile of the subscriber 4. In the illustrated example, a first service function SF1 is a Parental Control function and a second service function SF2 is a Deep Packet Inspection function. Forwarding of the traffic flow through the service chain 2 is controlled by a central network controller 16 which instructs the flow switches 14 to direct the traffic flow through the service chain 2.

The particular service functions that are to form part of a service chain, together with the order in which traffic will flow through the service functions, are determined by a network operator when the service chain is first established. Filters may be defined for each service chain established in the network to isolate traffic that is allowed to be forwarded through the service chain. These filters may be static, that is they may be defined by a network operator at the set up of the service chain, and/or they may be dynamic, defined through inspection of the traffic arriving at the service chain. Service functions which may be included in a service chain may be divided into three main types as illustrated in Figure 2. These types are Bump-in-the-wire, Mirror and Injector. Bump-in-the-wire service functions are inserted into the data path of a service chain to process traffic flowing through the service chain. Processing traffic may include analysing, altering and/or deleting packets, following which the processed packets are then forwarded along the data path. Examples of Bump-in-the-wire service functions include video optimization services, header enrichment and spam filters.

Mirror service functions act on data packets which have been duplicated and forwarded by a flow switch to the service function for processing, which involves for example charging assessment or statistics gathering. An example of Mirror service functions is offline-Deep Packet Inspection (DPI). Injector service functions may send messages in the uplink or downlink directions according to the content of data packets arriving at the service function. An example of Injector service functions is Content Filtering for parental control.

SDN technology offers various redundancy schemes to enhance the availability of service chaining. These redundancy schemes include Load Balancing, Revertive Failover and Permanent Failover. Load Balancing is illustrated in Figure 3 and may be achieved by deploying a service function 12 as a collection of service instances (Sis) 17 or service instance groups 19 (SIG). Each single instance of a service function is independent and individually connected to the relevant flow switches 14. Traffic arriving at the service function is distributed between the service instances 17 according to their capacity and availability. The distribution is organized by the network controller. In the event of a failure of one of the service instances 17, traffic is automatically redistributed at runtime through the other service instances 17 of the service function 12.

Revertive Failover is illustrated at Figure 4 and also involves deployment of a service function 12 as a collection of service instances (Sis) 17. During normal functioning (situation 1 ) traffic is processed by one or more of the available service instances, illustrated as SI1 in the example of Figure 4. In the event of a failure affecting the active service instance or instances S11 , traffic is switched to the next available passive instance of the service function, Slk (situation 2). As soon as the original service instance or instances SI1 return to service after a failure, traffic reverts back to it (situation 3). In the Permanent Failover redundancy scheme, the newly active instance Slk continues to be used even when the original service instance returns to service.

The service instances of a service function may be logically implemented inside a single server or in multiple servers which may also be located in different sites to enhance redundancy.

The above discussed redundancy schemes provide cover for the failure of a single service instance or multiple service instances, but are unable to protect against total failure of a service function, when all service instances fail. Total failure of a service function may occur for example if all service instances of a service function are deployed inside a single server, failure of the server causes failure of all the service instances and therefore failure of the entire service function. Such failure is also more likely in the case of service functions having a limited number of service instances The effect on traffic in a service chain of failure of a service function depends upon the type of the service function, as illustrated in Figure 5. For example, failure of a Mirror service function may have no impact upon delivery of the traffic, the traffic is delivered to its destination although the duplication and processing of duplicate packets is not performed. In contrast, failure of a Bump in the wire service function results in traffic disruption and failure to deliver the traffic to the destination. Both these situations may be problematic, as it may sometimes be desirable for traffic to be forwarded to its destination even in the event of failure of a service function. In contrast, failure of some service functions may jeopardize the integrity of the traffic and it may thus be undesirable for the traffic to be delivered. The only current solution to manage failure of a service function in a service chain is for the network operator to manually intervene to either bring the failed service function back into service or to modify the service chain. Such manual intervention is both time consuming and costly, resulting in increased expenditure for the network operator and significantly reduced service to end users during the period of manual intervention. A service function can have a limited capacity such that whilst not suffering from a complete failure, for example a hardware or software failure as described above, the total traffic required to be processed by the service function cannot be managed due to some or all of the service function's processing capacity being exhausted or overloaded.

A portion of the traffic may not be processed or, in the worst cases dropped, while the remaining traffic may still be processed by the service function. Therefore the result could be a loss of performance, delay or loss of user traffic.

The 'overload' may occur in one or several service functions in the chain. For example this could happen if one or more service instances of the service function do not have enough capacity to process the incoming traffic. A degraded service may then be provided to the final user.

The overload as described above may be handled in two ways depending on the network architecture. In the case of a physical service function, a notification is given to the operator that an overload situation is detected and it is up to the operator to react on this. In the case of virtualized service functions running in a cloud system, facilities outside a controller node such as an SDN controller can be used to monitor the service functions and to trigger corrective actions when overload is detected, e.g. via a cloud manager or SDN orchestrator to start a new instance of the overloaded service function. The result is not always the most ideal or effective as the corrective actions could fail, require time or not be desired (if too costly for instance), therefore the user could experience either a degraded service or a loss of the traffic.

In the case of a physical service function above the action of the operator is required to modify the sequence of the service function inside the impacted service chain. This maintenance action requires time and, therefore, during this period of time the traffic is disrupted and no service is provided to the end user. Summary

It is an aim of the present invention to provide a method and apparatus which obviate or reduce at least one or more of the disadvantages mentioned above.

According to a first aspect of the present invention, there is provided a method for managing a service chain in a communication network, the service chain comprising a traffic entry point, a traffic exit point and at least one service function between the traffic entry point and the traffic exit point. The method comprises determining that a service function in the service chain has failed or become overloaded, and causing traffic in the service chain to follow a secondary path to the traffic exit point, the secondary path bypassing the failed or overloaded service function.

According to some examples, the communication network may employ a Software Defined Networking (SDN) architecture, and the method may be carried out at an SDN controller. In some examples, the traffic entry and exit points of the service chain may themselves be service functions, or the traffic entry and exit points may alternatively be flow switches. In some examples, the method may further comprise marking the service chain as degraded.

According to some examples, a data path of the service chain may comprise a plurality of flow switches, and causing traffic in the service chain to follow a secondary path to the traffic exit point may comprise configuring at least one flow switch to direct traffic onto the secondary path.

According to some examples, the method may further comprise checking whether bypass of the failed or overloaded service function is permitted and, if bypass of the failed or overloaded service function is permitted, causing traffic in the service chain to follow the secondary path to the traffic exit point. The method may further comprise, if bypass of the failed or overloaded service function is not permitted, marking the service chain as failed. According to some examples the method may further comprise, if bypass of the failed or overloaded service function is not permitted, marking the service chain as degraded. According to such examples, permission to bypass a service function may be set by a communication network operator and may be set according to operator preferences as to which services may or may not be bypassed in a service chain.

According to some examples, checking whether bypass of the failed or overloaded service function is permitted may comprise checking whether the failed or overloaded service function has been marked as essential to operation of the service chain. A service function may be marked as essential to operation of the service chain by a communication network operator according to operator priorities or requirements. For example, service functions relating to communication network security or to protection of traffic passing through the service chain may be marked as essential to operation of the service chain. Optimisation service functions relating for example to service quality may not be marked as essential to operation of the service chain.

According to some examples, the method may further comprise checking whether traffic can flow through the failed or overloaded service function and, if traffic cannot flow through the failed or overloaded service function, causing traffic in the service chain to follow the secondary path to the traffic exit point, and if traffic can flow through the failed or overloaded service function, checking whether bypass of the failed or overloaded service function is instructed. According to such examples, service functions which, upon failure or upon becoming overloaded, no longer allow traffic to pass through the function (such as Bump in the Wire type service functions) may be bypassed, subject for example to the checks regarding permission or marking as essential discussed above. Service functions which, upon failure or upon becoming overloaded, still allow traffic to pass through the function (such as Mirror or Injector type service functions) may or may not be bypassed according to instruction, which may for example be provided by a network operator.

According to some examples, the method may further comprise, if bypass of the failed or overloaded service function is instructed, causing traffic in the service chain to follow the secondary path to the traffic exit point and, if bypass of the failed or overloaded service function is not instructed, marking the service chain as degraded.

According to some examples, the method may further comprise, if the service function is overloaded, checking whether offloading of the affected service chain is configured; and, if offloading of the service chain is configured, checking if the overloaded service function is offloadable and, if the service function is offloadable, causing at least some traffic in the service chain to follow the secondary path to the traffic exit point.

According to some examples, the method may further comprise, if the service function is overloaded, checking whether traffic can flow through the overloaded service function and, if traffic cannot flow through the overloaded service function, checking if the service chain is not offloading and, if the service function is not offloadable causing a portion of the traffic to be dropped. In such examples if traffic can flow through the overloaded service function the method may further comprise causing a portion of the traffic to flow through the service function as degraded.

According to some examples the method may comprise determining the portion of traffic dropped as a percentage of the traffic in the originally intended for the primary wherein the portion of traffic dropped or caused to follow a secondary path is X percent of the traffic initially flowing through the primary path before the overload condition, and 100-X percent is the portion of traffic which can still be processed by the overloaded service function.

According to some examples, the method may further comprise determining that functionality has been restored to the failed or overloaded service function, and causing traffic in the service chain to follow a primary path to the traffic exit point, the primary path including the restored service function. According to such examples, the method is of a restorative nature, according to which traffic may revert to an original service chain when functionality is restored to a failed or overloaded service function.

According to some examples, the at least one service function may comprise a plurality of service instances, and determining that a service function in the service chain has failed or become overloaded may comprise determining that some or all service instances of the service function have failed or become overloaded.

According to such examples, wherein the at least one service function comprises a plurality of service instances, the method may further comprise dynamically determining a load balancing scheme between service instances of the overloaded service function and or service instances of a service function in the secondary path. According to such examples, the load balancing scheme may comprise at least one iteration of load allocation within a period of time wherein the at least one iteration adapts the load balancing in response to an overload condition in one or more service instance. According to such examples, the load balancing scheme may further comprises a service instance reporting one or more thresholds wherein the thresholds comprise a high load threshold and or a critical threshold wherein the load balancing scheme may increase the load in a service instance which is below said threshold.

According to such examples, determining that functionality has been restored to the service function may comprise determining that functionality has been restored to at least one service instance of the service function.

According to some examples, causing traffic in the service chain to follow a primary path to the traffic exit point may comprise causing traffic to be directed to a restored service instance of the service function. According to some examples, the secondary path to the traffic exit point may bypass only the failed service function.

According to some examples, causing traffic in the service chain to follow a secondary path to the traffic exit point may comprise causing traffic to be directed to a dummy service instance of the failed service function, the dummy service instance causing traffic to flow transparently through the failed service function to a subsequent element in the service chain.

According to some examples, causing traffic in the service chain to follow a secondary path to the traffic exit point may comprise causing traffic to be directed directly to the traffic exit point. According to such examples, the entire service chain may be bypassed, passing traffic directly to the exit point of the service chain via any suitable transport system. According to some examples, causing traffic in the service chain to follow a secondary path to the traffic exit point may comprise causing traffic to be directed to an alternative service function instead of the failed or overloaded service function.

According to some examples, the alternative service function may be comprised within the service chain. In some examples, the alternative service function may comprise the traffic exit point of the service chain or the alternative service function may be a service function immediately preceding the traffic exit point.

According to some examples, causing traffic to be directed to an alternative service function may comprise retrieving an alternative service function for the failed or overloaded service function. According to such examples, a predetermined alternative service function may be stored in a memory and retrieved on failure or overload of the service function to enable traffic to be directed to the alternative service function. According to some examples, causing traffic to be directed to an alternative service function may comprise selecting an alternative service function for the failed or overloaded service function. According to some examples, selecting an alternative service function for the failed or overloaded service function may comprise identifying the next functional service function in the service chain after the failed or overloaded service function. According to such examples, the method may enable reuse of as much of the original service chain as possible, redirecting traffic to the next operational service function in the service chain.

According to some examples the secondary path is either pre-configured or dynamically calculated as a result of the failure or overload situation. According to such examples if a concurrent failure or overload of another service function within the primary service path occurs a subsequent secondary service path is dynamically derived from the earlier secondary service path and or the primary service path. According to another aspect of the present invention, there is provided a computer program product configured, when run on a computer, to execute a method according to the first aspect of the present invention.

According to another aspect of the present invention, there is provided a controller node for managing a service chain in a communication network, the service chain comprising a traffic entry point, a traffic exit point and at least one service function between the traffic entry point and the traffic exit point. The controller node comprises a monitoring unit for determining that a service function in the service chain has failed, or become overloaded and a management unit for causing traffic in the service chain to follow a secondary path to the traffic exit point, the secondary path bypassing the failed or overloaded service function. According to some examples, a data path of the service chain may comprise a plurality of flow switches, and the management unit may be for causing traffic in the service chain to follow a secondary path to the traffic exit point by configuring at least one flow switch to direct traffic onto the secondary path.

According to some examples, the controller node may further comprise a permission unit for checking whether bypass of the failed or overloaded service function is permitted and, if bypass of the failed or overloaded service function is permitted, for instructing the management unit to cause traffic in the service chain to follow the secondary path to the traffic exit point; and, if bypass of the failed or overloaded service function is not permitted, for instructing the management unit to mark the service chain as failed or degraded. According to some examples, the permission unit may be for checking whether the failed or overloaded service function has been marked as essential to operation of the service chain.

According to some examples, the controller node may further comprise a service function unit for checking whether traffic can flow through the failed or overloaded service function and, if traffic cannot flow through the failed or overloaded service function, for instructing the management unit to cause traffic in the service chain to follow the secondary path to the traffic exit point; and, if traffic can flow through the failed or overloaded service function, for checking whether bypass of the failed or overloaded service function is instructed.

According to some examples, if bypass of the failed or overloaded service function is instructed, the service function unit may be for instructing the management unit to cause traffic in the service chain to follow the secondary path to the traffic exit point; and, if bypass of the failed or overloaded service function is not instructed, the service function unit may be for instructing the management unit to mark the service chain as degraded.

According to some examples if the service unit has become overloaded the service function unit may be for checking whether traffic can flow through the overloaded service function and if traffic cannot flow through the overloaded service function and, if the service chain is not offloading, and the service function is not offloadable, instructing the management unit to cause a portion of the traffic to be dropped. According to such examples if the service unit determines traffic can flow through the overloaded service function the service unit may instruct the management unit to cause a portion of the traffic to flow through the service function classed as degraded.

According to some examples if the service unit has become overloaded and a portion of the traffic is to be dropped the service function unit may determine the portion of traffic as a percentage of the traffic originally to be flowing through the primary service chain such that the portion of traffic instructed to be dropped or instructed to be caused to follow a secondary path is 100-X percent of the traffic initially flowing through the primary path before the overload condition and X percent is the portion of traffic which can still be processed by the overloaded service function. According to some examples, the monitoring unit may also be for determining that functionality has been restored to the failed or overloaded service function; and the management unit may also be for causing traffic in the service chain to follow a primary path to the traffic exit point, the primary path including the restored service function. According to some examples, the at least one service function may comprise a plurality of service instances, and the monitoring unit may be for determining that a service function in the service chain has failed or become overloaded by determining that some or all service instances of the service function have failed or become overloaded. According to some examples, the controller node may further comprise a load balancing unit for dynamically determining a load balancing scheme between service instances of the overloaded service function and or service instances of a service function in the secondary path. According to such examples the load balancing scheme may comprise at least one iteration of load allocation within a period of time wherein the at least one iteration adapts the load balancing in response to an overload condition in one or more service instance. According to such examples the load balancing scheme determined by the load balancing unit may comprise a service instance reporting one or more thresholds wherein the thresholds comprise a high load threshold and or a critical threshold wherein the controller unit, according to the load balancing scheme, may increase the load in a service instance which is below said threshold. According to some examples, the management unit may be for causing traffic in the service chain to follow a primary path to the traffic exit point by causing traffic to be directed to a restored service instance of the service function.

According to some examples, the management unit may be for causing traffic in the service chain to follow a secondary path to the traffic exit point by causing traffic to be directed to a dummy service instance of the failed service function, the dummy service instance causing traffic to flow transparently through the failed service function to a subsequent element in the service chain. According to some examples, the management unit may be for causing traffic in the service chain to follow a secondary path to the traffic exit point by causing traffic to be directed directly to the traffic exit point.

According to some examples, the management unit may be for causing traffic in the service chain to follow a secondary path to the traffic exit point by causing traffic to be directed to an alternative service function instead of the failed or overloaded service function.

According to some examples, the management unit may be for retrieving an alternative service function for the failed or overloaded service function.

According to some examples, the management unit may be for selecting an alternative service function for the failed or overloaded service function. According to some examples, the management unit may be for identifying the next functional service function in the service chain after the failed or overloaded service function.

According to another aspect of the present invention, there is provided a controller node for managing a service chain in a communication network, the service chain comprising a traffic entry point, a traffic exit point and at least one service function between the traffic entry point and the traffic exit point. The controller node comprises a processor and a memory, the memory containing instructions executable by the processor such that the controller node is operable to determine that a service function in the service chain has failed or become overloaded, and cause traffic in the service chain to follow a secondary path to the traffic exit point, the secondary path bypassing the failed or overloaded service function.

Brief description of the drawings For a better understanding of the present invention, and to show more clearly how it may be carried into effect, reference will now be made, by way of example, to the following drawings in which:

Figure 1 illustrates a service chain comprising several service functions;

Figure 2 illustrates different types of service function;

Figure 3 is a representation of Load Balancing behaviour across service instances or service instance groups;

Figure 4 is a representation of Revertive Failover behaviour across service instances or service instance groups;

Figure 5 illustrates behaviour of different service function types under failure conditions;

Figure 6 illustrates failure of a service function in a service chain;

Figure 7 illustrates configuration of a service chain according to an example method for managing a service chain;

Figure 8 illustrates bypass of a failed service function according to a first example method for managing a service chain;

Figure 9 illustrates bypass of a failed service function according to a second example method for managing a service chain; Figure 10 illustrates bypass of a failed service function according to a third example method for managing a service chain;

Figures 1 1 a and 1 1 b are flow charts illustrating process steps in a method for managing a service chain;

Figure 12 is a schematic illustration of bypass of a failed service function according to the first example method for managing a service chain; Figure 13 is a flow chart illustrating process steps in the first example method for managing a service chain;

Figures 14a and 14b illustrate updating of service chain configuration according to the first example method for managing a service chain;

Figures 15 to 17 are schematic illustrations of bypass of a failed service function according to the second example method for managing a service chain;

Figure 18 is a flow chart illustrating process steps in the second example method for managing a service chain;

Figure 19 is a schematic illustration of bypass of a failed service function according to the third example method for managing a service chain; Figure 20 is a flow chart illustrating process steps in the third example method for managing a service chain;

Figures 21 to 23 are schematic illustrations of bypass of a failed service function according to the third example method for managing a service chain;

Figure 24 is illustrates an overloaded service chain;

Figure 25 is an example of an embodiment to manage a service chain for overload; Figure 26 is a flow chart depicting an exemplary method for managing a service chain for overload; Figure 27 is an example of an embodiment to recover a service chain when a service function is overloaded; Figure 28 is an example of a further embodiment to recover a service chain when a service function is overloaded;

Figure 29 is a block diagram illustrating functional units in a controller node; and Figure 30 is a block diagram illustrating functional units in another example of controller node.

Detailed description Aspects of the present invention provide a method and apparatus for managing a service chain according to which a failed or overloaded service function may be bypassed. The method may be carried out at an SDN controller and includes determining that a service function in a service chain has failed and causing traffic to follow an alternative path that bypasses at least the failed service function. The traffic which is caused to follow the alternative path may be all of the traffic in the service chain or only some of the traffic in the service chain. Examples of the method enable traffic in a service chain to reach the traffic exit point of the service chain, even when a service function in the service chain has failed or is overloaded. If multiple service functions in the chain fail or are overloaded simultaneously, the described methods may be applied multiple times or iteratively to determine an alternative path

Figure 6 illustrates a failure scenario in which a service function 12 in a service chain 2 has failed. In the illustrated example the failed service function 12 is a Bump in the wire service function, and traffic is thus interrupted, unable to flow past the failed service function which thus causes failure of the entire service chain 2. According to examples of the present invention, the failed service function may be bypassed, enabling traffic to reach the destination and thus providing continued service to the subscriber, even if that service may be reduced owing to the failed service function. Examples of the present invention provide three alternative methods for bypassing a failed service function, illustrated at Figures 8 to 10. According to a first example method, illustrated in Figure 8, only the failed service function may be bypassed, with traffic continuing to flow through the remaining service functions of the service chain 2 before being delivered to its destination. According to a second example method, illustrated in Figure 9, all service functions in the service chain may be bypassed, the traffic being transferred to a predetermined alternative which may be another service function or may be the exit point of the affected service chain 2. Any suitable transport mechanism, including for example a LAN function, may be used to transport the data flow from the service chain including the failed service function to the final destination. According to a third example method, illustrated in Figure 10, a predefined alternative service chain may be chosen or an alternative service chain may be calculated in real time, with the controller being enabled to select an alternative service function to which traffic may be directed. According to this example, one or more of the service functions in the existing or original service chain may be bypassed, according to the current operating conditions and network operator priorities.

Each of the three example methods described above can also be applied for a service function which has become overloaded and may be enhanced by enabling a network operator to set bypass permissions for each service function in a service chain. In one example, a network operator may designate service functions in a service chain as either essential or non essential to operation of the service chain. A non essential service function may for example be a service function which provides some form of optimisation, and so although included in the service chain, could safely be bypassed for a period of time in the event of failure or overload, allowing the traffic to reach its final destination. Examples of Bump in the wire service functions which may be designated as non essential include video optimization services, spam filters and some types of header enrichment. An essential service function may be a service function which should not be bypassed for security reasons. A Firewall is an example of an essential Bump in the wire service function. In the event of a firewall failure, it is preferable to allow failure of the entire service chain as opposed to bypassing the firewall and compromising security.

Figure 7 illustrates configuration of a service chain 2 in which a first service function 12a is designated by the network operator as essential to the operation of the service chain, and a second service function 12b is designated by the network operator as non essential to the operation of the service chain. In a further enhancement, a check may be made as to the type of the service function. The nature of Bump in the wire service functions means that it is desirable for all non essential failed or overloaded Bump in the wire service functions to be bypassed, in order to allow traffic to reach its destination. Failed or overloaded Mirror and Injector service functions do not prevent traffic from flowing through the logical instance of the failed or overloaded service function, but it may still be desirable to bypass them in the event of failure or overload. A network operator may therefore set preferences for bypass instruction in the event of failure or overload of Mirror or Injector type service functions, which preferences may be checked by an SDN controller when implementing methods according to examples of the present invention.

Figures 1 1 a and 1 1 b are flow charts illustrating a method for managing a service chain according to an example of the present invention. The example method 100 includes the step 1 10 of determining that a service function in the service chain has failed or become overloaded, and the step 170 of causing traffic in the service chain to follow a secondary path to the traffic exit point, the secondary path bypassing the failed or overloaded service function. The method 100 may also include a plurality of additional steps, as discussed in detail below. The method 100 illustrated in Figures 1 1 a and 1 1 b includes process steps which may be conducted according to any one of the first, second and third example methods discussed above. The first, second and third example methods discussed above represent alternative ways in which the functionality of the method 100 may be achieved, and are discussed in further detail with reference to Figures 12 to 23 below. The method 100 may for example be conducted as a run-time procedure by a controller node such as an SDN controller. Referring to Figure 1 1 a, in a first step 1 10, the entity carrying out the method, referred to herein as a controller node, determines that a service function in the service chain in question has failed or become overloaded. This may comprise determining, in step 1 15, that all service instances of the service function have failed or become overloaded. The controller node then checks, in step 120, whether bypass of the failed or overloaded service function is permitted. If bypass is not permitted, for example because the failed or overloaded service function has been marked as essential to operation of the service chain by the network operator, then the controller node proceeds to mark the service chain as failed if all instances have failed or degraded if some or all instances have become overloaded in step 130. If bypass of the failed or overloaded service function is permitted, the controller node proceeds to check at step 140 whether or not traffic can still flow through the failed or overloaded service function. While traffic may not be able to flow through the physical components of the failed or overloaded service function, traffic may be able to flow through the logical instance of the failed or overloaded service function. This is the case for Mirror and Injector type service functions, in which a preceding flow switch may be programmed by a controller to ensure the physical components of the service function receive a copy of each packet, with the original packet being forwarded to the next element in the service chain. If the physical service function fails or becomes overloaded, the duplicated packet may be lost but the original packet will still be forwarded to the next element in the service chain. Traffic may thus be considered as being able to flow through a failed or overloaded Mirror or Injector type service function. This is not the case for Bump in the wire type service functions, in which no duplication takes place, and the original packets are required to flow through the physical components of the service function. If traffic can still flow through the failed or overloaded service function (as in the case of Mirror or Injector type service functions), then the controller node checks, at step 150 whether or not bypass of the failed or overloaded service function has been instructed by a network operator. This instruction may be set by a network operator at setup of the service chain. If bypass has not been instructed then the controller node simply marks the service chain as degraded at step 160. By marking the service chain as degraded, an operator is informed that the service chain status is abnormal, with traffic being delivered but not all service functions operating correctly. If bypass of the service function has been instructed, or if traffic cannot flow through the failed or overloaded service function (as in the case of Bump in the wire type service functions), the controller proceeds to cause traffic in the service chain to follow a secondary path to the traffic exit point of the service chain in step 170. This may comprise configuring by the controller node of one or more flow switches in step 171 to ensure traffic in the service chain is directed to the secondary flow path. The controller node may at some later time determine in step 180 that functionality has been restored to the failed or overloaded service function, for example by determining in step 181 that functionality has been restored to at least one service instance of the service function. The controller node then proceeds, in step 190, to cause traffic in the service chain to follow a primary path to the traffic exit point, the primary path including the restored service function. This may comprise causing traffic to be directed to the restored service instance in step 191 . Steps 190 and 191 may comprise configuring flow switches as described above. The completion of steps 190 and 191 renders the method 100 revertive, in the sense that traffic is reverted back to a failed or overloaded service function once that failed or overloaded service function is restored to service. Figure 1 1 b illustrates alternative ways in which the step 170 of causing traffic in the service chain to follow a secondary path to the traffic exit point may be accomplished. In a first alternative, illustrated at step 172, the controller node may cause traffic to be directed to a dummy service instance of the failed or overloaded service function, the dummy service instance causing traffic to flow transparently through the failed or overloaded service function to a subsequent element in the service chain. In a second alternative, illustrated at step 173, the controller node may cause traffic to be directed directly to the traffic exit point. In a third alternative, illustrated in step 174, the controller node may cause traffic to be directed to an alternative service function instead of the failed or overloaded service function, which alternative service function may or may not be comprised within the service chain. The alternative service function may be retrieved from a memory in step 176 or may be selected or calculated in step 174. In some examples, selecting an alternative service function at step 174 may comprise identifying the next functional service function in the service chain after the failed or overloaded service function.

Figures 12 to 14b illustrate in greater detail the first example method discussed above. The first example method is one example of how the functionality of the method 100 may be realised, in particular, the first example method demonstrates one way in which traffic may be caused to follow a secondary path to the traffic exit point of an affected service chain.

Figure 12 illustrates an example service chain comprising a Traffic Source and Sink (TSS) entry point 18, a Traffic Source and Sink (TSS) exit point 20, service functions 12 and flow switches 14, which direct the traffic flow from the traffic entry point 18, through the service functions 12 and to the traffic exit point 20. During setup of the service chain, a network operator or user of the service chain identifies which service functions in the service chain are not essential to the operation of the service chain, and so may be bypassed in the event of service function failure or overload. In the event of a failure of a service function that has been marked as essential to operation of the service chain, the service chain will be marked as failed, with traffic dropped. In the event of an overload of a service function that has been marked as essential to operation of the service chain, the service chain will be marked as degraded, with a portion of the traffic dropped When failed or overloaded, Mirror or Injector service functions will continue to allow traffic to flow through them even in the event of failure or overload. Bump in the wire service functions will disrupt traffic, and so are modified to enable bypass by creating an additional service instance inside the service function. This newly created dummy service instance (DSI) allows traffic to pass transparently through the service function. An example Bump in the wire service function 12i is illustrated in Figure 12 and comprises a plurality of service instances 22 and a dummy service instance 24. During normal functioning, traffic is directed to one of the active service instances 22 of the service function 12i by the relevant flow switches 14. This is illustrated as situation 1 in Figure 12. In the event of failure or overload of all normal service instances 22 in the service function 12i, the flow switches 14 are instructed to forward traffic through the dummy service instance 24, effectively bypassing the failed or overloaded service instances 22 of the service function and enabling the traffic to be delivered to its destination. This is illustrated as situation 2 in Figure 12. When at least one of the service instances 22 of the service function 12i is restored, the flow switches 14 are instructed to forward the traffic through the restored service instance or instances 22, so that the functionality of the previously failed or overloaded service function 12i may again be provided. This is illustrated as situation 3 in Figure 12. In the case of Mirror or Injector type service functions, flow switches 14 may be instructed simply to stop duplicating traffic and to forward the original data flow. Figure 13 is a flow chart illustrating process steps in the first example method 200. The example method 200 may be carried out as a run-time process in a controller node such as an SDN controller. Referring to Figure 13, the step 1 10 in method 100 of determining that a service function has failed or overloaded is carried out through a series of sub-steps 21 1 to 216. In step 21 1 a controller node determines that a status of a service instance has changed. In step 212 the controller node checks the change and in step 213 the controller node determines whether or not the service instance in question has failed or gone down or become overloaded. If the service instance has failed or become overloaded, the controller node checks the status of the service function of the failed or overloaded service instance in step 214, and checks in step 215 whether or not the service function retains at least one active service instance. If no service instances remain active in the service function, the controller node determines that the service function has failed or become overloaded at step 216. If at least one service instance remains active, the controller node proceeds to check other service functions within the service chain at steps 293 and 294, described in greater detail below. Once the controller node has determined that a service function has failed or become overloaded in step 216, the controller node checks in step 221 whether the service function has been designated by a network operator as essential to the operation of the service chain, abbreviated in the illustrated flow chart to a check as to whether the service function is "vital". If the service function is vital, the controller node sets the service chain to failed or degraded in step 230 and the service chain status 231 is shown as "SC failed or degraded, no failover".

If the failed or overloaded service function is not vital, the controller node proceeds to check whether the service function is of type Bump in the wire, in step 241. If the service function is not Bump in the wire type, then traffic will still be able to flow, and the controller node simply marks the service chain as degraded, leading to a service chain status 261 of "SC degraded". If the failed or overloaded service function is of Bump in the wire type, the controller node instructs the relevant flow switches to use the dummy service instance of the service function in step 272, before marking the service chain as degraded. It is thus the step 272 of instructing flow switches to use the dummy service instance as a bypass that has the effect of step 170 in method 100 of causing the traffic in the service chain to follow a secondary path, bypassing the failed or overloaded service function.

Returning to step 213, the procedure for reverting to a primary path including a restored service function is illustrated in the right half of the flow chart. If at step 213 it is determined that the status change of a service instance indicates that the service instance has been restored to functionality, the controller node determines at step 281 that the service instance has recovered. The controller node then checks whether the service function of the recovered service instance was down in step 282. If this was not the case then traffic need not be redirected, and the controller node proceeds to check other service functions in steps 293 and 294. If the service function was down, then the controller node instructs the relevant flow switches to stop using the dummy service instance of the service function and to recommence using the restored service instance in step 291 .

In step 293, the controller node checks whether any other service function in the service chain is down. If no other service function in the service chain is down, then the controller node sets the service chain to active in step 295, resulting in a service chain status 297 of "SC active". If at least one other service function in the service chain is down, the controller node then checks at step 294 whether the at least one service function that is down has been marked as vital. If that is the case, then the service chain remains failed or degraded, and the service chain status remains at "SC failed or degraded, no failover" 231. If the at least one service function that is down has not been marked as vital then the service chain status is set to, or remains at "SC degraded" 261 .

As discussed above, the steps 272 and 291 of using a dummy service instance as bypass and reverting to a restored service instance may be conducted through the appropriate configuration of flow switches. Figures 14a and 14b illustrate flow switch configurations either side of a service function i which fails and is then restored. As can be seen in Figure 14a, the effect of using the dummy service instance is that a flow switch directly preceding the failed or overloaded service function directs traffic directly to the next service function in the chain, bypassing the failed or overloaded service function. The reverse procedure is illustrated in Figure 14b, in which the original configuration is restored.

Figures 15 to 18 illustrate in greater detail the second example method discussed above. The second example method is another example of how the functionality of the method 100 may be realised, in particular, the second example method demonstrates another way in which traffic may be caused to follow a secondary path to the traffic exit point of an affected service chain.

Figure 15 illustrates another example service chain comprising a traffic entry point 18, a traffic exit point 20, service functions 12 and flow switches 14, which direct the traffic flow from the traffic entry point 18, through the service functions 12 and to the traffic exit point 20. During setup of the service chain, a network operator or user of the service chain identifies which service functions in the service chain are not essential to the operation of the service chain, and so may be bypassed in the event of service function failure. In the event of a failure of a service function that has been marked as essential to operation of the service chain, the service chain will be marked as failed, with traffic dropped. In the event of an overload of a service function that has been marked as essential to operation of the service chain, the service chain will be marked as degraded, with a portion of the traffic dropped. On failure or overload, Mirror or Injector service functions will continue to allow traffic to flow through them even in the event of failure or overload, although an operator may choose to bypass them in the event of failure or overload for operational reasons, should they be deemed to be non essential to the operation of the service chain. Bump in the wire service functions will always disrupt traffic, and so it is desirable to bypass all Bump in the wire service functions that are not deemed essential to the service chain. The network operator or user also identifies to which other service chain or transport service traffic shall be diverted in the event of failure or overload of each service function that is to be bypassed. Figure 15 illustrates the primary path of the original or primary active service chain 26, which includes all service functions operating normally. Figure 15 also illustrates the alternative or secondary path 28 which has been identified in the case of failure or overload of an example non essential service function 12ii. This secondary path is also referred to as a fallback service chain or alternative service chain. Figure 15 illustrates a normal operational situation, with an active traffic path 30 following the original or primary service chain 26, flowing through all service functions, and in the case of service function 12ii, being directed through a first service instance 22 of the service function 12ii. In contrast to the first example method, no dummy service instance is created in any of the service functions.

In the event of failure or overload of all service instances 22 in the service function 12ii, the controller node, which may be an SDN controller, identifies the data traffic flows associated to the affected service chain and causes the data flows to be redirected to the appropriate fallback service chain for the failed or overloaded service function by reconfiguring the appropriate flow switches to update the relevant flow entries with the new actions. This situation is illustrated in Figure 16, in which the active traffic path 30 can be seen to follow the fallback service chain 28, with both service instances 22 of the service function 12ii having failed or become overloaded. The failed or overloaded service function 12ii is thus bypassed and traffic delivered to the service chain traffic exit point 20. When at least one of the service instances 22 of the service function 12ii is restored, the flow switches 14 are instructed to revert to the original active service chain, forwarding traffic through the restored service instance or instances 22, so that the functionality of the previously failed or overloaded service function 12ii may again be provided. This is situation is illustrated in Figure 17, with the active traffic path 30 once again following the original active service chain 26.

Figure 18 is a flow chart illustrating process steps in the second example method 300. The example method 300 may also be carried out as a run-time process in a controller node such as an SDN controller. Many of the process steps are equivalent to those of the method 200, and are thus not described again in detail. Only those process steps that differ from the method 200 are discussed in full below. Referring to Figure 18, on determining that a non essential service function has failed or become overloaded, through a combination of steps 31 1 to 316 and 321 , and on determining that the failed or overloaded service function is a Bump in the wire service function at step 341 , the controller node establishes that failover to the appropriate fallback service chain is required at step 373. The controller node thus identifies all filters that determine the traffic flows being allowed to traverse the original service chain. The controller node changes the associations in these filters to direct data traffic onto the fallback service chain and removes all flow entries related to these filters from the relevant flow switches. The controller node adds flow entries in the flow switches so that all earlier identified traffic flows are directed along the fallback service chain. It is thus the step 373 of instructing flow switches to use the fallback service chain that has the effect of step 170 in method 100 of causing the traffic in the service chain to follow a secondary path, bypassing the failed or overloaded service function.

On determining that a previously failed or overloaded service function has been restored, through a combination of process steps 31 1 , 312, 313, 381 and 382, the controller node sets the restored service function to active at step 392. The controller node then checks the remaining service functions in the service chain in steps 393 and

394, and if all service functions in the service chain are operational, the controller node causes the active traffic flow to revert back to the original active service chain in step

395, as the fallback service chain is no longer required. To do this the controller node identifies all filters that were associated with the original, active service chain and changes the associations in these filters back to the original active service chain. The controller node then removes all flow entries from the flow switches that are related to these filters and adds new flow entries in the flow switches in order to direct data traffic along the original active service chain. Figures 19 to 23 illustrate in greater detail the third example method discussed above. The third example method is another example of how the functionality of the method 100 may be realised, in particular, the third example method demonstrates another way in which traffic may be caused to follow a secondary path to the traffic exit point of an affected service chain. Figure 19 illustrates another example service chain comprising a traffic entry point 18, a traffic exit point 20, service functions 12 and flow switches 14, which direct the traffic flow from the traffic entry point 18, through the service functions 12 and to the traffic exit point 20. During setup of the service chain, a network operator or user of the service chain identifies which service functions in the service chain are not essential to the operation of the service chain, and so may be bypassed in the event of service function failure or overload. Mirror or Injector service functions will continue to allow traffic to flow through them even in the event of failure or overload, although an operator may choose to bypass them in the event of failure or overload for operational reasons, should they be deemed to be non essential to the operation of the service chain. Bump in the wire service functions will always disrupt traffic, and so it is desirable to bypass all Bump in the wire service functions that are not deemed essential to the service chain. As for the first and second example methods described above, in the event of a failure or overload of a service function that has been marked as essential to operation of the service chain, the service chain will be marked as failed or degraded, with some or all traffic dropped.

During normal functioning, traffic is passed through the service chain, including example service function 12iii, illustrated with its two service instances 22, and passed to the traffic exit point 20. This is illustrated as situation 1 in Figure 19. In the event of failure or overload of example service function 12iii, which has not been marked as essential by a network operator or user, a controller node calculates in real time a fallback service chain which bypasses the failed or overloaded service function 12iii. In some examples, the calculated fallback service chain may include all of the service functions in the original service chain that are still operational. The controller node then configures the relevant flow switches to direct data traffic onto the fallback service chain. This is illustrated as situation 2 in Figure 19. When functionality of at least one of the service instances 22 of the service function 12iii is restored, the controller node reconfigures the relevant flow switches to direct traffic back onto the original service chain. This is illustrated as situation 3 in Figure 19.

Figures 21 to 23 illustrate the three situations of normal operation, failure or overload of a service function 12iii, and restoration of a service function 12iii in greater detail. Figure 21 illustrates normal operation, with the original primary path, also named administrative service chain, 32 passing through all service functions 12, and the active traffic path 34 following the administrative service chain 32. In Figure 22, failure or overload of both service instances 22 of the service function 12iii causes the controller node to calculate a secondary or fallback service chain 36 and to configure the relevant flow switches 14 to direct data traffic onto the fallback service chain 36. The active traffic path 34 can therefore be seen to follow the fallback service chain 36. In Figure 23, at least one of the service instances 22 of the service function 12iii is restored, and the controller node therefore instructs flow switches 14 to revert to the original administrative service chain, forwarding traffic through the restored service instance or instances 22, so that the functionality of the previously failed or overloaded service function 12iii may again be provided. The active traffic path 34 can therefore be seen to follow the original administrative service chain 32.

Figure 20 is a flow chart illustrating process steps in the third example method 400. The example method 400 may also be carried out as a run-time process in a controller node such as an SDN controller. As in the case of the second example method, only those process steps that differ from the first example method 200 are discussed in full below. Referring to Figure 20, on determining that a non essential service function has failed or become overloaded, through a combination of steps 41 1 to 416 and 421 , the controller node calculates a fallback, or operative service chain in step 474 for each service chain including the failed or overloaded service function. In some examples, the operative service chain may include all operational service functions of the original administrative service chain. The process of calculating the operative service chain may be iterative, meaning that one or more additional failed or overloaded service functions may also be excluded from the operative service chain, with the controller node identifying the next operational service function in the administrative service chain. Having calculated the operative service chain for each affected administrative service chain at step 474, the controller node switches traffic to the operative service chain or chains in step 475. This involves identifying all data flows associated to an affected administrative service chain and re associating those data flows to the newly calculated operative service chain, instructing relevant flow switches to forward data according to the operative service chain. It is thus the steps 474 and 475 of calculating an operative service chain and instructing flow switches to use the operative service chain that have the effect of step 170 in method 100 of causing the traffic in the service chain to follow a secondary path, bypassing the failed or overloaded service function. On determining that a previously failed or overloaded service function has been restored, through a combination of process steps 41 1 , 412, 413, 481 and 482, the controller node sets the restored service function to active at step 492. The controller node then checks the remaining service functions in the service chain in steps 493 and 494, and if all service functions in the service chain are operational, the controller node causes the active traffic flow to revert back to the original administrative service chain in step 496, as the operative service chain is no longer required. This involves identifying all data flows associated to the operative service chain and re associating those data flows to the original administrative service chain, instructing relevant flow switches to forward data according to the administrative service chain so that the restored service function 12iii may again be provided. In the event of failure or overload of another service function, or repeated failure or overload of the same service function, a new operative service chain is calculated, taking account of the current circumstances and the current state of the remaining service functions in the service chain. According to the techniques further described herein in order to enable the network to adapt to an overloaded service function without manual intervention an automatic runtime procedure (i.e. without the action of the operator) is provided which is able to react dynamically to the overload condition of one of the service functions in the service chain.

An overloaded service function is a service function not able to process all of the incoming traffic; a portion of the traffic can either flow through the service function without any processing or may be dropped. In the case where the service function is virtualised, a coordination node such as an orchestrator could be programmed to request the SDN controller to update the service chain. However, this requires knowledge in the orchestrator about the service functions and the service chains. In Figure 24 an overload scenario is depicted in which a service function 42 in a service chain 40 has become overloaded. In the illustrated example the overloaded service function 42 is a Bump in the wire service function, and traffic is thus interrupted, not all the traffic is able to flow past the overloaded service function which thus causes the service chain 40 to be degraded from that service function to the exit point of the service chain. It is desirable that during the configuration of a service chain the operator can decide whether the service chain can be dynamically adapted due to the overload in one or more service functions and in order to convey this capability and decision to other nodes for example a controller node such as an SDN controller a parameter is used to indicate this configuration.

The parameter is referred to herein as the Offloading scheme' which can have two values: 'offloading' (OL) and 'not offloading' (NL). During the configuration of the service chain an operator may configure the service chain with the offloading scheme parameter.

In order that a service function which is under an overload condition can be controlled as part of the automatic run-time procedure in some embodiments a parameter is used to identify whether the service function is 'offloadable' or 'not offloadable. An offloadable service function is a service function that is able to detect an overload status and, if overloaded, can be protected using the mechanism as described in more detail below. This parameter may be specified by the operator during the configuration of the service function. In some embodiments the service function may be configured with a threshold T for the maximum allowed load for the service function. In other examples the maximum allowed load is determined by a default value.

During run-time a service function, which is configured as offloadable, may determine the load of the service function. In other examples an external entity such as a supervision function may perform the measurement of the load in the service function.

The load may comprise of one or more factors, examples of which are: Central Processor Unit (CPU) usage, traffic throughput, traffic loss, packet size, memory consumption. The supervision may be passive (by measurements) or active by injecting test traffic and measuring the service function's reaction

When the threshold T is crossed, an overload indication may be sent to a controller node such as an SDN controller. Depending on the type of service function this threshold can have different meanings. For example it could represent the quantity of dropped traffic (i.e. when the service function is dropping more than T traffic). For a virtual service function the threshold could represent the percentage of CPU usage (i.e. when the service function is reaching the limit of the CPU capability).

An example of an external supervision function may be a Deep Packet Inspection (DPI) function, which is inserted in the data path. For example it may measure Quality of Experience (QoE), Key Perfomance Indicator (KPI)s, and trigger offloading of one or multiple service functions if QoE thresholds are reached for specific traffic types

An alternative to a direct offload trigger from a service function supervision function to a controller node such as an SDN controller is that a central network analytics and policing function collects data from multiple sources in the network and decides based on policies to instruct a controller node such as an SDN controller to offload certain service functions. In some embodiments a status variable may be used to indicate to another network node, such as an SDN controller, the status of a service function or instance of a service function. The status variable indicates whether the service function is overloaded or not overloaded. Figure 25 is an example of an aspect of an embodiment wherein a service function 50 is configured by a controller node such as an SDN controller 52 as offloadable. The service chain 51 may be considered an administrative service chain which is the service chain as configured by the operator for the service prior to any automatic runtime procedure to provide an alternative route through the service chain. A second service function 53 is configured by the controller node (such as an SDN controller) as not offloadable.

In some examples a controller node such as an SDN controller is informed of the status of the service functions in a service chain which is configured as offloading (OL) and responds to the indication that a service function or at least one instance within a service function is overloaded and redirects some or all of the traffic from the overloaded service function.

In some examples the redirection is achieved via an alternative service chain, herein called an Offload service chain wherein the Offload service chain may contain all service functions of the administrative service chain that are not overloaded and by- pass the overloaded service function. An Offload service chain may be pre-configured or dynamically calculated as a result of the overload situation. In the case of a concurrent failure or overload of another service function within the same administrative service chain, the Offload service chain may be dynamically derived from the fallback or operative service chain described earlier.

In some examples a percentage of the traffic can still be processed by the overloaded service function which remains flowing through the administrative service chain and the remaining percentage of the traffic is diverted to the Offload service chain.

It should be noted that in the examples where some of the traffic bypasses the overloaded service chain to reach the final destination (e.g. TSS exit 20) via the Overload service chain may do so without having the functionality of the service function which is overloaded applied to that traffic flow whereas the remaining percentage of traffic which flows through the administrative service chain may have the functionality applied. This is however an advantage over the traffic flow being dropped.

It will be appreciated that for a service chain which is not offloading (NL) if any of the service functions are overloaded then not all of the traffic will be able to be processed by the overloaded service function. Depending on the service function types as described earlier the traffic may still be able to flow through the administrative service chain, declared as degraded, without the overloaded function being applied. Conversely, the traffic which the overloaded service function is unable to process may be dropped.

Figure 26 is a flow chart illustrating process steps in an example method 700. The example method 700 may also be carried out as a run-time process in a controller node such as an SDN controller. The method begins with the process of monitoring the load of service functions in a service chain at step 701. At step 702 it is determined whether a service function is overloaded or not. If no service functions in the administrative service chain are overloaded the method proceeds to step 703 where a hundred percent of the traffic flows through the administrative service chain and all required services are applied to the traffic flow. Such a service chain and its respective service functions may be declared 'active'. If at step 702 it is determined that at least one service function is overloaded the method proceeds to step 704 where the service chain and its respective service functions may be declared 'degraded'. The method proceeds to step 705 where it is determined whether or not the service chain is configured for offloading.

If the service chain is not configured for offloading (NL) the method proceeds to 706 where a determination is made if the traffic can flow through the overloaded service function or not.

If the portion of traffic which is not able to be processed by the service function is able to flow through the overloaded service function the method concludes with step 707 where X percent of the traffic is not able to be processed and thus being degraded and 100-X percent being correctly processed by the service function.

If the portion of traffic which is not able to be processed by the service function is not able to flow through the overloaded service function then the percentage of traffic which is not able to be processed by the service function is dropped at step 708. If the service chain is configured for offloading (OL) then the method proceeds with determining whether the overloaded service function is offloadable.

If the service function is offloadable then the method proceeds with re-routing the percentage of traffic which cannot be processed by the overloaded service function to an Offload service chain at step 710. As part of step 710 the Offload service chain may be configured dynamically (on the fly) or pre-arranged during configuration time, either by the operator preferences or by a controller node such as an SDN controller. In the case of a pre-arranged Offload service chain this may be considered operative when a portion of the traffic is re-routed through this service chain. If all of the traffic is able to be processed by the administrative service chain as shown in 703 then such a prearranged Offload service chain may be considered inoperative.

If the overloaded service function is not offloadable (NL) the method proceeds with the determination step 706 as described above. Figure 27 is an example of an administrative service chain and a corresponding Offloading service chain. The administrative service chain 51 is configured with a first service function 50 as offloadable (OL) and a second service function 53 as not offloadable (NL). The Offload service chain is only configured with service function 53. As a result of an overload condition in the service function 50 a percentage of the traffic is offloaded to the Offload service chain 54 and the remaining percentage of traffic continues via the administrative service chain 51 .

In some embodiments criteria are defined to determine which portion of traffic follows which operative service chain. The criteria to select the traffic can be various: random choice of packets, deterministic choice based on any combination of header fields and their values that a flow switch is able to process during flow classification like Destination IP Address (dIP), Source IP Address (sIP), packet size, etc. In some examples the criteria may be defined by a coordination node such as an SDN controller. In other examples the operator may decide which criteria to select.

Figure 28 is an example of a service chain offload wherein a controller 55 (for example an SDN controller) is informed by a service function 50 that it is experiencing overload. The controller 55 determines a percentage (X%) of the traffic to be re-routed through the Offload service chain 54. The controller 55 instructs the flow switches 56 to re-route X percent of the traffic received from the TSS entry 18, bypassing the overloaded service function 50 and resuming its processing at the next service function 53 before exiting the service chain at TSS 20. In some embodiments the service function in the administrative service chain which has been overloaded may recover from the overload condition and the traffic which was either being dropped, passed without being processed or re-routed may be switched back to being processed within the administrative service chain. In some preferred embodiments a period of time elapses before destroying the Offload service chain, as, if a service function was overloaded, it may be that it becomes overloaded again shortly after recovering. This has the advantage of providing spare Controller CPU capacity to calculate and activate again the Offload service chain if the overload condition happened again. In some embodiments the elapsed time is controlled by a 'timeout' parameter which may be associated to the administrative service chain and configurable by the operator In some examples of the embodiments described above a service function comprises a plurality of service instances wherein the service functions and their respective service chain may be considered active when no single service instance is overloaded.

In some embodiments a controller node such as an SDN controller may be informed of the service function status wherein the service function is composed of multiple service instances that are load-balanced by the controller node. The controller node (such as an SDN controller) may re-calculate the load-balancing weight among the service function service instances dynamically if an individual service function service instance reports overload.

In relation to the example method 700, a service function which has at least one of its service instances overloaded may result in the service chain being declared degraded. The method proceeds as described earlier depending on the offlloading scheme configured for the service chain. For a service chain whose offloading scheme is offloading (OL) and the overloaded service function is offloadable 705 the method includes calculating on-the-fly a new load-balancing scheme among the service instances to offload the overloaded service instance.

This can be an adaptive process where other service instances may in turn also indicate overload. If a new load balance can be achieved within a reasonable time, so that none of the service instances indicate overload anymore, the service function and the respective service chain are declared active. The traffic may continue flowing through the administrative service chain.

If no balance can be achieved within reasonable time, so that at least one of the service intances still indicates overload, the service function and the respective service chain are declared degraded. The traffic may continue flowing through the administrative service chain, but a percentage could miss the processing of the overloaded service function or the traffic may be dropped. As a subsequent recovery step, the controller may either create a dummy service instance to which the controller may send traffic which cannot be processed by the overloaded service function instance traffic or the controller may order the switch-over of excess traffic to the Offloading service chain. The calculation of the new load-balancing scheme may be supported by the service instances reporting multiple thresholds, such as one 'high load' and one 'critical' threshold. In this way, the traffic portion for the service instance not reporting high or critical load may be increased first.

The methods 100, 200, 300, 400, 700 may be implemented by a controller node such as an SDN controller. An example controller node 500 is illustrated in Figure 29. The example controller node 500 may implement the methods 100, 200, 300, 400, 700 for example on receipt of suitable instructions from a computer program. Referring to Figure 29, the controller node 500 comprises a processor 501 and a memory 502. The memory 502 contains instructions executable by the processor 501 such that the controller node 500 is operative to conduct the steps of any or all of the methods 100, 200, 300, 400, 700.

Figure 30 illustrates an alternative example of a controller node 600, which may implement the methods 100, 200, 300, 400, 700 for example on receipt of suitable instructions from a computer program. It will be appreciated that the units illustrated in Figure 30 may be realised in any appropriate combination of hardware and/or software. For example, the units may comprise one or more processors and one or more memories containing instructions executable by the one or more processors. The units may be integrated to any degree. Referring to Figure 30, the controller node 600 comprises a monitoring unit 601 for determining that a service function in the service chain has failed or become overloaded, and a management unit 602 for causing traffic in the service chain to follow a secondary path to the traffic exit point, the secondary path bypassing the failed or overloaded service function. The monitoring unit 601 may determine that a service function in the service chain has failed or become overloaded by determining that some or all service instances of the service function have failed or become overloaded. The management unit 602 may cause traffic in the service chain to follow a secondary path to the traffic exit point by configuring at least one flow switch to direct traffic onto the secondary path.

The controller node 600 may further comprise a permission unit 603 for checking whether bypass of the failed or overloaded service function is permitted and, if bypass of the failed or overloaded service function is permitted, for instructing the management unit 602 to cause traffic in the service chain to follow the secondary path to the traffic exit point, and, if bypass of the failed or overloaded service function is not permitted, for instructing the management unit 602 to mark the service chain as failed or degraded. The permission unit may also check whether the failed or overloaded service function has been marked as essential to operation of the service chain.

The controller node 600 may also comprise a service function unit 604 for checking whether traffic can flow through the failed or overloaded service function and, if traffic cannot flow through the failed or overloaded service function, for instructing the management unit 602 to cause traffic in the service chain to follow the secondary path to the traffic exit point' and, if traffic can flow through the failed or overloaded service function, for checking whether bypass of the failed or overloaded service function is instructed. If bypass of the failed or overloaded service function is instructed, the service function unit 604 may instruct the management unit to cause traffic in the service chain to follow the secondary path to the traffic exit point, and, if bypass of the failed or overloaded service function is not instructed, the service function unit 604 may instruct the management unit to mark the service chain as degraded.

The service function unit 604 may also be for instructing the management unit 602 to cause a portion of the traffic to be dropped when the overloaded service function is not offloading and/or belongs to a service chain which is not offloadable and traffic which is not able to be processed by the service function is not able to flow through the service function.

The service function unit 604 may also be for instructing the management unit 602 of the portion of traffic to be dropped or to be caused to follow a secondary path as X percent of the traffic initially flowing through the primary path before the overload condition, wherein 100-X percent is the portion of traffic which can still be processed by the overloaded service function. The monitoring unit 601 may also be for determining that functionality has been restored to the failed or overloaded (the traffic load over the overloaded service function for example has declined below an acceptable threshold) service function, and the management unit 602 may also be for causing traffic in the service chain to follow a primary path to the traffic exit point, the primary path including the restored service function. The management unit 602 may cause traffic in the service chain to follow a secondary path to the traffic exit point by causing traffic to be directed to a dummy service instance of the failed service function, the dummy service instance causing traffic to flow transparently through the failed service function to a subsequent element in the service chain. Alternatively, the management unit 602 may cause traffic in the service chain to follow a secondary path to the traffic exit point by causing traffic to be directed directly to the traffic exit point. Alternatively, the management unit 602 may cause traffic in the service chain to follow a secondary path to the traffic exit point by causing traffic to be directed to an alternative service function instead of the failed or overloaded service function.

The management unit 602 may retrieve an alternative service function for the failed or overloaded service function or may select an alternative service function for the failed or overloaded service function, for example by identifying the next functional service function in the service chain after the failed or overloaded service function.

The management unit 602 may cause traffic in the service chain to follow a primary path to the traffic exit point by causing traffic to be directed to a restored service instance of the service function.

The controller node 600 may also comprise a load balancing unit 605 for dynamically determining a load balancing scheme to balance the load between service instances either within the overloaded service function or between the service instances of the overloaded service function and the service instances of an alternative service function which may be associated to the secondary path.

The load balancing unit 605 may perform the load balancing scheme in a number of iterations wherein the load balancing unit 605 may instruct the management unit 602 to cause traffic to flow to one or more service instances which may result in one or more other service instances becoming overloaded in which case the load balancing unit 605 performs subsequent iterations to enable the traffic to be processed by service instances with capacity. In same examples the balancing unit 605 and or the management unit 602 restricts the number of iterations to ensure that the load balancing process does not take an unreasonable length of time. The load balancing scheme performed by the load balancing unit 605 may involve the load of a service instance having a certain status and may include thresholds to determine to which service instance the load balancing unit may direct the management unit 602 to cause traffic to flow. In some examples multiple thresholds may be used such as one high load threshold and one critical threshold such that the load balancing unit 605 directs the management unit 602 to first increase the traffic in service instances which are not reporting high or critical load.

Aspects of the present invention thus provide methods and apparatus according to which a failed or overloaded service function in a service chain may be bypassed, enabling traffic in the service chain to be directed to its destination. Examples of the methods enable a network operator to identify service functions that are essential to the operation of the service chain, and to ensure that such service functions are not bypassed. Non essential service functions may be bypassed via an automatic, run- time procedure that allows an end user to receive a service, even if that service is somewhat degraded or lacking optimisation, owing to the absence of the failed or overloaded service function. Examples of the methods may bypass only the failed or overloaded service function or may bypass some or all of the service chain, using a dummy service instance, a predetermined fallback service chain or an operative service chain calculated in real time. Operator input may be provided at the setup of a service chain to identify essential and non essential service functions. Operator input may also be provided to identify predetermined fallbackt service chains. Such input is not required for real time calculated operative service chains, which are calculated automatically on the basis of the failed or overloaded service function and the conditions in the service chain. In the case of large numbers of service chains, this reduction of manual intervention can reduce operational expenditure and the possibility of errors.

Advantages of the methods of the present invention include continuity of service for an end user, even if such service may be degraded with respect to an optimum level, and a reduction in operational expenditure. No action is required by the network operator to enable bypass of the failed or overloaded traffic function, as the methods described above may be carried out substantially autonomously by a controller node. In addition, urgency to repair a failed service function may be reduced, as some level of service is still being provided to the end user. It will be appreciated that any of the methods described in the present specification may be used in combination with any of the redundancy mechanisms described in the background section for managing failure of individual service instances. The methods of the present invention may be implemented in hardware, or as software modules running on one or more processors. The methods may also be carried out according to the instructions of a computer program, and the present invention also provides a computer readable medium having stored thereon a program for carrying out any of the methods described herein. A computer program embodying the invention may be stored on a computer readable medium, or it could, for example, be in the form of a signal such as a downloadable data signal provided from an Internet website, or it could be in any other form.

It should be noted that the above-mentioned examples illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim, "a" or "an" does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims. Any reference signs in the claims shall not be construed so as to limit their scope.

Claims

1 . A method for managing a service chain in a communication network, the service chain comprising a traffic entry point, a traffic exit point and at least one service function between the traffic entry point and the traffic exit point, the method comprising: determining that a service function in the service chain has failed or become overloaded; and

causing traffic in the service chain to follow a secondary path to the traffic exit point, the secondary path bypassing the failed or overloaded service function.

2. A method as claimed in claim 1 , wherein a data path of the service chain comprises a plurality of flow switches, and wherein causing traffic in the service chain to follow a secondary path to the traffic exit point comprises configuring at least one flow switch to direct traffic onto the secondary path.

3. A method as claimed in claim 1 or 2, further comprising:

checking whether bypass of the failed or overloaded service function is permitted, and

if bypass of the failed or overloaded service function is permitted, causing traffic in the service chain to follow the secondary path to the traffic exit point; and

if bypass of the failed or overloaded service function is not permitted, marking the service chain as failed or degraded.

4. A method as claimed in claim 3, wherein checking whether bypass of the failed or overloaded service function is permitted comprises checking whether the failed or overloaded service function has been marked as essential to operation of the service chain.

5. A method as claimed in any one of the preceding claims, further comprising: checking whether traffic can flow through the failed or overloaded service function; and

if traffic cannot flow through the failed or overloaded service function, causing traffic in the service chain to follow the secondary path to the traffic exit point; and

if traffic can flow through the failed or overloaded service function, checking whether bypass of the failed or overloaded service function is instructed.

6. A method as claimed in any one of the preceding claims, wherein if the service function is overloaded;

checking whether offloading of the affected service chain is configured; and if offloading of the service chain is configured;

checking if the overloaded service function is offloadable; and

if the service function is offloadable causing at least some traffic in the service chain to follow the secondary path to the traffic exit point.

7. A method as claimed in claim 6, further comprising:

checking whether traffic can flow through the overloaded service function; and if traffic cannot flow through the overloaded service function; and

if the service chain is not offloading and the service function is not offloadable causing a portion of the traffic to be dropped; and

if traffic can flow through the overloaded service function causing a portion of the traffic to flow through the service function as degraded.

8. A method as claimed in any of claims 6 or 7, wherein the portion of traffic dropped or caused to follow a secondary path is X percent of the traffic initially flowing through the primary path before the overload condition, wherein 100-X percent is the portion of traffic which can still be processed by the overloaded service function.

9. A method as claimed in claim 5, further comprising:

if bypass of the failed service function is instructed, causing traffic in the service chain to follow the secondary path to the traffic exit point; and

if bypass of the failed service function is not instructed, marking the service chain as degraded.

10. A method as claimed in any one of the preceding claims, further comprising: determining that functionality has been restored to the failed or overloaded service function; and

causing traffic in the service chain to follow a primary path to the traffic exit point, the primary path including the restored service function.

1 1 . A method as claimed in any one of the preceding claims, wherein the at least one service function comprises a plurality of service instances, and wherein determining that a service function in the service chain has failed or become overloaded comprises determining that some or all service instances of the service function have failed or become overloaded.

12. A method as claimed in claim 1 1 further comprising dynamically determining a load balancing scheme between service instances of the overloaded service function and or service instances of a service function in the secondary path.

13. A method as claimed in claim 12 wherein the load balancing scheme comprises at least one iteration of load allocation within a period of time wherein the at least one iteration adapts the load balancing in response to an overload condition in one or more service instance.

14. A method as claimed in one of claims 12 or 13 wherein the load balancing scheme comprises a service instance reporting one or more thresholds wherein the thresholds comprise a high load threshold and or a critical threshold wherein the load balancing scheme may increase the load in a service instance which is below said threshold.

15. A method as claimed in any of the preceding claims, when dependent upon claim 7, wherein causing traffic in the service chain to follow a primary path to the traffic exit point comprises causing traffic to be directed to a restored service instance of the service function.

16. A method as claimed in any one of the preceding claims, wherein the secondary path to the traffic exit point bypasses only the failed or overloaded service function.

17. A method as claimed in any one of the preceding claims, wherein causing traffic in the service chain to follow a secondary path to the traffic exit point comprises causing traffic to flow through a dummy service instance of the failed service function, the dummy service instance causing traffic to flow transparently through the failed service function to a subsequent element in the service chain.

18. A method as claimed in any one of claims 1 to 16, wherein causing traffic in the service chain to follow a secondary path to the traffic exit point comprises causing traffic to be directed directly to the traffic exit point.

19. A method as claimed in any one of claims 1 to 16, wherein causing traffic in the service chain to follow a secondary path to the traffic exit point comprises causing traffic to be directed to an alternative service function instead of the failed or overloaded service function.

20. A method as claimed in claim 19, wherein the alternative service function is comprised within the service chain.

21 . A method as claimed in claim 19 or 20, wherein causing traffic to be directed to an alternative service function comprises retrieving an alternative service function for the failed or overloaded service function.

22. A method as claimed in claim 19 or 20, wherein causing traffic to be directed to an alternative service function comprises selecting an alternative service function for the failed or overloaded service function.

23. A method as claimed in claim 22, wherein selecting an alternative service function for the failed or overloaded service function comprises identifying the next functional service function in the service chain after the failed or overloaded service function.

24. A method as claimed in any of the preceding claims wherein the secondary path is pre-configured.

25. A method as claimed in any of the claims 1 to 23 wherein the secondary path is dynamically calculated as a result of the failure or overload situation.

26. A method as claimed in any of the preceding claims wherein

if a concurrent failure of another service function within the service chain occurs; a subsequent secondary service path is dynamically derived from the secondary path; wherein the method causes some or all of the traffic to flow through the subsequent secondary path.

27. A computer program product configured, when run on a computer, to execute a method according to any one of the preceding claims.

28. A controller node for managing a service chain in a communication network, the service chain comprising a traffic entry point, a traffic exit point and at least one service function between the traffic entry point and the traffic exit point, the controller node comprising:

a monitoring unit for determining that a service function in the service chain has failed or become overloaded; and

a management unit for causing some or all of the traffic in the service chain to follow a secondary path to the traffic exit point, the secondary path bypassing the failed or overloaded service function.

29. A controller node as claimed in claim 28, wherein a data path of the service chain comprises a plurality of flow switches, and wherein the management unit is for causing traffic in the service chain to follow a secondary path to the traffic exit point by configuring at least one flow switch to direct traffic onto the secondary path.

30. A controller node as claimed in claim 28 or 29, further comprising:

a permission unit for checking whether bypass of the failed or overloaded service function is permitted, and

if bypass of the failed or overloaded service function is permitted, for instructing the management unit to cause traffic in the service chain to follow the secondary path to the traffic exit point; and

if bypass of the failed or overloaded service function is not permitted, for instructing the management unit to mark the service chain as failed or degraded.

31 . A controller node as claimed in claim 30, wherein the permission unit is for checking whether the failed or overloaded service function has been marked as essential to operation of the service chain.

32. A controller node as claimed in any one of claims 28 to 31 , further comprising: a service function unit for checking whether traffic can flow through the failed or overloaded service function; and

if traffic cannot flow through the failed or overloaded service function, for instructing the management unit to cause traffic in the service chain to follow the secondary path to the traffic exit point; and

if traffic can flow through the failed or overloaded service function, for checking whether bypass of the failed or overloaded service function is instructed.

33. A controller node as claimed in claim 32, further comprising:

A service function unit for checking whether traffic can flow through the overloaded service function; and

if traffic cannot flow through the overloaded service function; and

if the service chain is not offloading and the service function is not offloadable instructing the management unit to cause a portion of the traffic to be dropped; and

if traffic can flow through the overloaded service function instructing the management unit to cause a portion of the traffic to flow through the service function classed as degraded.

34. A controller node as claimed in any of claims 28 to 33, wherein the portion of traffic instructed to be dropped or instructed to be caused to follow a secondary path is X percent of the traffic initially flowing through the primary path before the overload condition, wherein 100-X percent is the portion of traffic which can still be processed by the overloaded service function.

35. A controller node as claimed in claim 28, wherein:

if bypass of the failed or overloaded service function is instructed, the service function unit is for instructing the management unit to cause traffic in the service chain to follow the secondary path to the traffic exit point; and

if bypass of the failed or overloaded service function is not instructed, the service function unit is for instructing the management unit to mark the service chain as degraded.

36. A controller node as claimed in any one of claims 28 to 35, wherein the monitoring unit is also for determining that functionality has been restored to the failed or overloaded service function; and the management unit is also for causing traffic in the service chain to follow a primary path to the traffic exit point, the primary path including the restored service function.

37. A controller node as claimed in claim 36 wherein the functionality has been restored due to the traffic load over the overloaded service function having declined below an acceptable threshold

38. A controller node as claimed in any one of claims 28 to 37, wherein the at least one service function comprises a plurality of service instances, and wherein the monitoring unit is for determining that a service function in the service chain has failed or become overloaded by determining that some or all of the service instances of the service function have failed or become overloaded.

39. A controller node as claimed in claim 38 further comprising a load balancing unit for dynamically determining a load balancing scheme between service instances of the overloaded service function and or service instances of a service function in the secondary path.

40. A controller node as claimed in claim 39 wherein the load balancing scheme comprises at least one iteration of load allocation within a period of time wherein the at least one iteration adapts the load balancing in response to an overload condition in one or more service instance.

41 . A controller node as claimed in one of claims 39 or 40 wherein the load balancing scheme comprises a service instance reporting one or more thresholds wherein the thresholds comprise a high load threshold and or a critical threshold wherein the controller node may increase the load in a service instance which is below said threshold.

42. A controller node as claimed in any of the claims 28 to 41 , when dependent upon claim 36, wherein the management unit is for causing traffic in the service chain to follow a primary path to the traffic exit point by causing traffic to be directed to a restored service instance of the service function.

43. A controller node as claimed in any one of claims 28 to 42, wherein the management unit is for causing traffic in the service chain to follow a secondary path to the traffic exit point by causing traffic to flow through a dummy service instance of the failed or overloaded service function, the dummy service instance causing traffic to flow transparently through the failed or overloaded service function to a subsequent element in the service chain.

44. A controller node as claimed in any one of claims 28 to 42, wherein the management unit is for causing traffic in the service chain to follow a secondary path to the traffic exit point by causing traffic to be directed directly to the traffic exit point.

45. A controller node as claimed in any one of claims 28 to 42, wherein the management unit is for causing traffic in the service chain to follow a secondary path to the traffic exit point by causing traffic to be directed to an alternative service function instead of the failed or overloaded service function.

46. A controller node as claimed in claim 45, wherein the management unit is for retrieving an alternative service function for the failed or overloaded service function.

47. A controller node as claimed in claim 45, wherein the management unit is for selecting an alternative service function for the failed or overloaded service function.

48. A controller node as claimed in claim 47, wherein the management unit is for identifying the next functional service function in the service chain after the failed or overloaded service function.

49. A controller node for managing a service chain in a communication network, the service chain comprising a traffic entry point, a traffic exit point and at least one service function between the traffic entry point and the traffic exit point, the controller node comprising a processor and a memory, the memory containing instructions executable by the processor such that the controller node is operable to:

determine that a service function in the service chain has failed or become overloaded; and

cause traffic in the service chain to follow a secondary path to the traffic exit point, the secondary path bypassing the failed or overloaded service function.