Delay-gradient congestion control

By Jonathan Corbet
May 20, 2015

Network congestion-control algorithms have a difficult task to perform. They must moderate each endpoint's outgoing traffic to keep the Internet from being overwhelmed by packet congestion (as happened in 1986 before these algorithms were introduced). But, at the same time, the algorithm is expected to allow a machine to make full use of the bandwidth available to it, sharing that bandwidth with other systems without any sort of central control mechanism. Err on one side, and the network as a whole suffers; err on the other, and performance will suffer. So it is not surprising that, long after workable solutions to the congestion-control problem exist, research continues in this area. One relatively new algorithm is called "CAIA delay gradient" (or CDG); it is named after the Centre for Advanced Internet Architectures where it was first developed for FreeBSD. A patch adding CDG to the kernel was recently posted for review.

Most congestion-control algorithms are based on the use of packet loss as a signal that indicates congestion somewhere along the path used by the connection. When all goes well, a connection should not experience packet loss. If a router's buffers fill, though, that router will start to drop packets; a congestion-control algorithm will respond to that packet loss by reducing the number of packets that can be outstanding (the "congestion window") at any given time. Typically, the congestion window will then be allowed to creep back up until the next packet loss occurs, indicating that the limit has once again been reached.

Loss-based congestion control has served the net well for the better part of thirty years, but it is not without its drawbacks. By its nature, it will cause packets to be dropped occasionally; that will necessarily introduce latency into the connection which, perhaps, can ill afford it. Loss-based algorithms can only find the limit for a given connection by pushing the slowest link to its limit, meaning that it forces a router buffer somewhere to overflow; this behavior can also worsen bufferbloat-related problems. There are also problems when packets are lost for reasons other than congestion, as can happen with wireless links, for example. The congestion-control code will interpret that loss as a congestion signal, slowing transmission unnecessarily.

The alternative, as implemented by CDG, is to try to infer the state of a connection by looking at the variation in the round-trip time (RTT) — the time it takes for a packet to transit the connection in both directions. Using RTT to estimate congestion is easy if one knows the actual characteristics of the connection in use; one need only look at how much "extra" time is currently needed to get a packet through the link. That is why, for example, smartphone driving-time estimates for a well-known commute can be a useful indication of road congestion. But that knowledge is not available on the Internet as a whole, so some other approach will be required.

The CDG approach is to look at the minimum and maximum RTTs observed for a given connection over a period of time. The minimum is called Τ_min, while the maximum is Τ_max. From subsequent observations of Τ_min and Τ_max, the algorithm calculates the rate of change of each. The rate at which the minimum RTT is changing is δ_min, while the rate at which the maximum RTT is changing is δ_max. These two parameters are then used in a couple of interesting ways.

The key use, of course, is to try to come up with an estimate of how congested the link is. To simplify a bit: if Τ_min is growing (δ_min is positive), chances are that the link is getting more congested. Every RTT interval, CDG calculates a "probability of backoff" based on δ_min; as δ_min grows, that probability approaches one. That probability is then compared against a random number to determine whether the congestion window should be decreased; should a decrease be decided upon, the number of packets in the congestion window will be reduced by a configurable factor (0.3 by default).

In cycles where the algorithm decides not to decrease the congestion window, it will, instead, increase it by one packet. That allows the window to creep upward and continually test the limits of the connection. In theory, the delay-gradient should detect that limit without pushing the connection to the point of packet loss.

There is an interesting question that comes up at about this point, though: imagine a situation where a system using CDG is sharing a slow link with another system using traditional loss-based congestion control. As RTT increases, the CDG system will back off, but the loss-based system will continue to pump out packets until the router starts dropping things. Indeed, it may increase its transmission rate to soak up the bandwidth that the CDG system is no longer using. If CDG allows itself to be muscled out of a contended link in that manner, one can predict with high confidence that it will not find many users.

To deal with this problem, the CDG authors developed a heuristic to detect situations where competition with a loss-based system is occurring. If CDG is working properly, a decision to slow down the transmission rate should result in the RTT values getting smaller — δ_min and δ_max should go negative. If that fails to happen after a few backoff operations, the algorithm will conclude that somebody else is playing by different rules and stop backing off for a while. CDG also remembers the previous maximum value of the congestion window (as the "shadow window"); this value can be used to quickly restore the congestion window in the event of a packet drop.

CDG's handling of packet drops is interesting. The motivations for using delay gradients are to avoid depending on packet loss as a signal and to avoid slowing transmission in response to packet losses that do not result from congestion. But congestion-related packet loss will still happen when CDG is in use, and the algorithm should respond accordingly. Backing off in response to packet loss is easy; the tricky part is determining whether that loss is a congestion signal or not.

As a general rule, congestion manifests itself as an overflow of the packet queue for the slowest link used by a connection. If that queue is known to be full, then a packet loss is likely to be a result of the congestion there; if, instead, it is known to be relatively empty, congestion is probably not the problem. The heuristic used to determine the state of the queue is this: when a queue fills, Τ_max will reach its largest possible value and stop increasing (because the queue cannot get any longer), but Τ_min will continue to increase. CDG will only treat a packet loss as a congestion signal when a full queue has been detected in this manner.

At least, that is how the algorithm was designed; the current Linux patch does not quite work that way. In the patch posting, Kenneth Klette Jonassen notes that: "We decided to disable the loss tolerance heuristic by default due to concerns about its safety outside closed environments." So that aspect of CDG will have to wait until the algorithm's behavior on the full Internet is better understood.

Even so, networking developer Eric Dumazet said that the new congestion-control module "looks awesome". Its true level of awesomeness can only be determined via years of real-world experience. But getting CDG into the Linux kernel is a good first step toward the acquisition of that experience. Should things go well, loss-based congestion control might end up backing off in its role on the net.

(See this paper [PDF] for the full details on how the CDG algorithm works.)

Index entries for this article
Kernel	Networking/Congestion control

to post comments

--cdg > --bwlimit

Posted May 21, 2015 11:32 UTC (Thu) by gmatht (subscriber, #58961) [Link] (2 responses)

If CDG allows itself to be muscled out of a contended link in that manner, one can predict with high confidence that it will not find many users.

Perhaps, but its worth noting that in some niches that would be a feature and not a bug. It would be nice to be able to just do `rsync --cdg` inplace of `rsync --bwlimit 5000` and go as fast as possible without slowing down other users.

--cdg > --bwlimit

Posted May 22, 2015 1:42 UTC (Fri) by flussence (guest, #85566) [Link]

It'd be *really* nice to have it in ffmpeg. Constantly tweaking -b parameters for an RTMP stream gets to be a headache....

--cdg > --bwlimit

Posted May 27, 2015 18:57 UTC (Wed) by mtaht (guest, #11087) [Link]

I long ago posted patches to the rsync folk that would allow it to use alternate congestion control algorithms and diffserv. The patches sat in their patchwork for a while and got improved a bit, but I don't think they ever made their mainline tree.

The initial patches were here:

https://lists.samba.org/archive/rsync/2011-November/02711...

As rsync is often tunneled over ssh, I have long thought we could add stuff to ssh to better chose it's underlying cc, also.

Delay-gradient congestion control

Posted May 22, 2015 12:18 UTC (Fri) by Celest (guest, #69372) [Link]

How well would this algorithm work in presence if path change? Or when multiple paths are used for a single connexion?

Delay-gradient congestion control

Posted May 23, 2015 14:57 UTC (Sat) by ncm (guest, #165) [Link]

Most of the dependence on packet-loss signaling is a consequence of not being able to trust the peer. Packet loss is a reliable signal, for a very specialized definition of "reliable". In cases where you control both ends of a connection, rate of change of transit time is a much better measure of congestion.

The problem with transit time as a measure is that the useful lifetime of a single measurement is less than an RTT, and the end that has the measurement is not the end that needs it. Control theory directly addresses that problem at its root: the sender needs a prediction of the transit time according to a model controlled by frequent updates from the receiver. Given a trusted remote endpoint, you can do much, much better by entirely ignoring packet loss.

Packet losses due to congestion vs. errors

Posted Jun 2, 2015 21:53 UTC (Tue) by vomlehn (guest, #45588) [Link]

This might help solve a long-term problem in applications with a high value of the product of the data rate and the transmission time between end nodes. These include high speed connections to satellites in high or geosynchronous orbits and ultra high band width over continental scale distances. The satellite issue is especially interesting because the error rate is also considerably higher than most Earth-bound applications. When errors occur, the current algorithim treats it as congestion detection and backs off its send rate. This would be correct for congestion but does nothing to change the error rate; it only reduces throughput. So, if this algorithm can distinguish between errors and congestion, it will be very welcome from the high-speed/long-haul community.

Delay-gradient congestion control

Posted Jun 5, 2015 9:51 UTC (Fri) by intgr (subscriber, #39733) [Link] (4 responses)

Very cool, these people invented a congestion control algorithm that the bufferbloat crowd claimed was impossible. Really looking forward to a packet loss-less Internet.

The bufferbloat community has been surprisingly successful, but it seems so much easier to change congestion control in a few major (endpoint) operating systems, rather than to fix buffer accounting and in tens/hundreds of middlebox vendors.

Delay-gradient congestion control

Posted Jun 5, 2015 9:52 UTC (Fri) by intgr (subscriber, #39733) [Link]

(Oops, preview fail)
> to fix buffer accounting and

and queue management

Delay-gradient congestion control

Posted Jun 5, 2015 14:51 UTC (Fri) by dlang (guest, #313) [Link]

The question is if this does actually solve the problem in the absence of any improvement to the queue management on routers.

Especially when combined with traffic from systems that don't use the same algorithm. If systems that use this back off before the congestion is bad, but other systems don't, these systems will not get their fair share of bandwidth (as noted in the article), and if it stops backing off because it detects this situation, is it really solving the problem?

There are a lot of things besides congestion that can alter the RTT of a connection (including server performance at the other end) so it's not clear how reliable this signal will be in the real world.

It's worth testing, and getting it into the kernel as an option makes it far easier for people to test it. But it's far too early to say that it's the solution and will lead to a packet loss-less Internet.

Delay-gradient congestion control

Posted Jun 5, 2015 21:14 UTC (Fri) by mtaht (guest, #11087) [Link] (1 responses)

"Very cool, these people invented a congestion control algorithm that the bufferbloat crowd claimed was impossible. Really looking forward to a packet loss-less Internet."

I don't think anyone has ever claimed that. Rather (I at least) have always pointed at every improvement in host tcps and driver infrastructure in every talk - BQL, TSQ, sch_fq's pacing, deprecating hystart, TSO/GSO/GRO sizing fixes, Quic, etc - as both needed and useful - and arguably sch_fq (which builds on all that) has had a far greater benefit thus far than all the work on fq and AQM combined.

Certainly there is a huge bias (based on the experimental evidence) that classic delay based tcps lost out to loss based in an undeployable fashion for most connections - but, for example, I would not mind a delay based tcp on 3g mobile handsets as a specialized use case. I know of some new CC algorithms that look *really* promising in other cases... and...

I welcome new attempts at making tcps better (or building, as I have thought, new protocols like quic that had a touch more awareness of the underlying link layer's properties, wireless or wired)...

and plan to test this new tcp as soon as I have time to do a recompile. (well, it looks like there have been quite a bit of feedback on it on the list)

As for packet-loss-free internet, well, I deployed ECN fully at multiple sites over the past few years, and am very close to that, already, and all people have to do to get there is turn ECN on in their tcp's and aqms, which is down to a couple sysctls now.

There are still some details regarding ECN to sort out, notably how to handle overload better, see cake and the latest fq_codel code for that.

http://www.bufferbloat.net/projects/cerowrt/wiki/Enable_ECN

And lastly, still, I remain dubious about any form of e2e congestion control as the be-all-end-all solution. TCP-ers tend to crow with excitement when they can get results with 100-150ms of induced delay. With AQMs on the routers, we got in the 20-100ms range, and with fq, down to near zero for most traffic types we cared about. This was exciting enough to make the big push needed to try and fix the routers - knowing full well how hard it would be - worldwide... for 4.5 years now.

The combination of a world with fq+aqm+ecn + better e2e congestion control remains to be more fully explored. I have pointed out that in particular fq gives coupled flows (e.g. bittorrent, some videoconferencing stuff) a much better signal (clock) to measure against, but there has been very little (public) work in this area.

My focus these days is on applying all that we have learned about e2e, aqm, fq, ecn, etc, as best as we can, to dramatically improve wifi behaviors, with "make-wifi-fast".

"The bufferbloat community has been surprisingly successful",

I am shocked, myself. When we started I had no hope. I thought we were doomed to watching tv over the internet, only. It was a windmill I wanted to tilt at, and I had some spare time to do it....

In fact, when the first ideas for fixing wifi didn't work out back in late 2011, I had *quit*, and was looking for a "real" job, then about 2 weeks later - I somewhat randomly picked up on BQL (Which was a set of patches that had died on the vine 5 or so months prior)... and it worked! It had fundamental insights into everything we were doing wrong... and things have been building ever since. I think over the next year we are going to see a huge change in internet latency under load across hundreds of millions of connections, especially now that speedtest sites like dslreports are adding bufferbloat measurements.

Sometimes I am grateful to tom herbert for BQL, other times I kind of resent the invention of it, because I could have just gone back to a beach and surfed a few more years and ignored the internet a while longer.

"but it seems so much easier to change congestion control in a few major (endpoint) operating systems, rather than to fix buffer accounting and in tens/hundreds of middlebox vendors."

Hah. Finding anyone with a pulse at microsoft, BSD, or apple has been very difficult.

Delay-gradient congestion control

Posted Aug 31, 2015 15:44 UTC (Mon) by Lennie (subscriber, #49641) [Link]

I'm so glad people keep working on these kinds of things. These kinds of issues need a lot of work.

It does seem to me Apple has a pulse:

https://lists.bufferbloat.net/pipermail/bloat/2015-June/0...

Delay-gradient congestion control

Posted Jun 13, 2015 23:14 UTC (Sat) by kennetkl (guest, #96588) [Link]

We are pleased to announce that CDG is now part of net-next, the network development branch:
https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/?id=2b0a8c9.

Thanks to Yuchung Cheng and others that provided feedback to help improve the implementation.

Delay-gradient congestion control

Posted Sep 14, 2015 15:47 UTC (Mon) by cov (guest, #84351) [Link]

How does this look on wireless links?