Improving syncookies
Back in 1997 TCP SYN flood attacks were all the rage among script kiddies. A SYN flood is a denial of service attack that uses up server resources by initiating, but not completing, a connection. Attacks via this method still remain a problem today though they are now more likely to be launched by sophisticated botnets rather than an individual. A first line defense against SYN floods is the syncookie. The syncookie was not designed for Linux specifically but found its way into kernel 2.1.44 via a patch from Andi Kleen.
This long-time feature generated some recent discussion when a patch was submitted adding syncookie support to IPv6. The patch has now been queued for acceptance but in discussion along the way the community also began to tackle some longstanding limitations of syncookies and reaffirmed how relevant the feature continues to be.
To fully describe syncookies some background on how TCP uses a three way handshake to establish a connection is in order. The first packet of any TCP session received by the server is known as the SYN packet because it carries the synchronize control flag. The SYN flag indicates that its sender wishes to open a new connection. That flag is only used during the opening sequence. The server responds with a packet also containing the SYN flag because the connection needs to be opened in both directions. This second packet also carries the ACK flag and is known as the SYN-ACK. It serves to both open the connection from the server to the client and to acknowledge receipt of the opening packet from the other host. Finally, the client sends a bare ACK packet to the server to acknowledge receipt of server-to-client SYN-ACK and the connection is then fully established.
During a SYN flood a server receives the first packet of the three-way TCP handshake and responds with a SYN-ACK but no further data is ever received from the initiating client. When the SYN-ACK is generated most servers will also create an entry in the SYN queue. This queue is the waiting area for half-open connections awaiting handshake completion. The attacker intentionally orphans those entries and instead generates more SYN packets which in turn take up more entries in the queue. The server needs to wait for a long timeout before giving up and recovering the connection resources. During this time the attacker can flood it with many more half-open connections. Eventually the server runs out of resources and cannot accept any new connections without dropping some, perhaps legitimate, connection from the queue. Simple solutions such as placing a quota on the number of partially open connections per peer or using dynamically adjusted packet filters do not work because the SYN packets are easy to forge with fake source addresses.
A syncookie allows the server to defer using up any resources until the third packet in the three-way handshake has been received. At that time the peer's address has been mildly authenticated because the final packet in the handshake contains a reference to the sequence number that was sent by the server in the second packet. With this assurance, packet filters and resource quotas keyed to the peer's address will again be useful defenses against resource attacks.
The basic mechanism of the syncookie works by carefully manipulating the initial sequence number value of the connection instead of choosing it at random. Upon receiving a SYN the server carefully encodes the vital information that would have been stored as state in the SYN queue. This encoded information is cryptographically hashed with a secret key to form the sequence number of the SYN-ACK and sent to the client. The third packet of a legitimate handshake, which is the ACK from the client back to the server, contains this sequence number (plus one) in its acknowledgment number field. In this way all the information necessary to fully open the connection is presented back to the server without having to maintain state while the handshake is being completed.
The major downside to syncookies is that they only have space to encode the most basic of TCP handshake options. At the time of initial syncookie deployment this was not a large problem because the only option prominently in use at the time was the Maximum Segment Size (MSS) option. This option is provided to help the peer avoid unnecessary fragmentation by sending packets that the other end of the connection knows a priori are too large to cross its network. This is exactly the kind of information that is normally stored as state in the SYN queue. The syncookie designers knew that this option was important to performance and found 3 bits for it in the encoded syncookie. These bits are used to approximate the real value of the option to one of 8 common values.
In the intervening years new options have come into prominence and these are not syncookie compatible. The most important of these are the window scaling and Selective Acknowledgment (SACK) options. These features respectively allow the TCP congestion control window to grow beyond 64KB and be more efficient in the case of minor packet losses from those large windows. Without using these features it is impossible to get good transfer rates on networks with large bandwidth or large latency. Many household broadband links require at least the window scaling option to fully utilize the network connection. Due to this limitation, and the modest computation overhead of the cryptographic hash, the Linux stack only resorts to syncookie based connections when the number of half-open connection exceeds a high watermark controlled by the net.ipv4.tcp_max_syn_backlog sysctl. These connections are less featureful than normal connections but they are only resorted to when the queue would otherwise require active pruning.
It turns out that the cookie mechanism is only implemented for IPv4. Recently, Glenn Griffin posted patches that add IPv6 support for syncookies. Andi Kleen, author of the original syncookie patch, wondered if the mechanism should be continued at all much less added to IPv6:
Andi's argument was three pronged. His first point was about the reduced abilities of cookie initiated connections as already described in this article. Over time the value of these options has increased and therefore the cost of using syncookies has increased too. His second point was that Linux no longer uses all of the memory necessary for a full connection until the new connection is fully open. Instead it uses a "minisock" for that period. The minisock is a 96 byte struct tcp_request_sock structure holding the minimum state necessary to get the connection fully opened. The fully established struct tcp_sock is 1616 bytes. Both structure size measurements refer to a 64-bit kernel. Finally, Andi points out that the queue management routines for an overloaded SYN queue are more sophisticated now than the dumb head drop algorithm that was in place when syncookies were first deployed. The suggestion was that in aggregate these advances might make Linux robust enough without syncookies so that they could therefore be removed all together.
Instead of engaging in a theoretical discussion some readers set up and ran their own experiments. One of the best parts of the Linux community is the tendency to put real data behind their arguments. While there is often disagreement over the realism of the measured scenarios, the data points always help us better understand the dynamics of kernel code.
This data compellingly supports the continued value of the syncookie and that position seems to have won the day. The IPv6 syncookie patches are now queued within the network 2.6.26 development tree.
However, the biggest news is probably that this discussion brought renewed energy to the problem of lost handshake options. Florian Westphal and Glenn Griffin have recently presented a solution to the most damaging aspect of that problem too.
Their solution is to leverage the echoed TCP timestamp option in a way similar to the way classic syncookies leverage the echoing of the SYN-ACK sequence number in the subsequent ACK. The timestamp option was introduced with RFC 1323 and is widely deployed on modern Linux, Windows, and FreeBSD (including OS X) systems. Its main purpose is to be able to increase the frequency of round trip time measurements in the presence of large congestion control windows.
Using the timestamp to preserve the window scale and SACK option values requires modifying the timestamp of the SYN-ACK packet to include the state necessary to support them. During a normal handshake the client will echo the modified timestamp value of the SYN-ACK packet back to the server as part of the timestamp option on the third part of the handshake and thus propagate the SACK and window scale information without keeping any state on the server.
In order to make room in the timestamp for this new information the least significant 9 bits of the timestamp are shaved off. The encoded representation of the window scale and SACK options are then transferred back and forth at the minor cost of reduced granularity of TCP timestamps during the handshake exchange. Timestamps lose their least significant 512 jiffies with this approach.
Below are two different TCP handshakes completed with syncookies and the timestamp patch. Note that the lowest bits of the SYN-ACK timestamp are the same in each handshake even at different points in time because each handshake uses the same SACK and window scaling options. As a result the timestamp values in each SYN-ACK are different but the lower nine bits share the same 0x166 value.
13:51:04.582464 IP 127.0.0.1.57985 > 127.0.0.1.4050: S 1061746051:1061746051(0)
win 32792 <mss 16396,sackOK,timestamp 0xfffea013 0,nop,wscale 6>
13:51:04.582478 IP 127.0.0.1.4050 > 127.0.0.1.57985: S 2800702917:2800702917(0)
ack 1061746052 win 32768 <mss 16396,sackOK,timestamp 0xfffe9f66 0xfffea013,nop,wscale 6>
13:51:04.582480 IP 127.0.0.1.57985 > 127.0.0.1.4050: .
ack 1 win 513 <nop,nop,timestamp 0xfffea013 0xfffe9466>
13:59:19.047306 IP 127.0.0.1.45979 > 127.0.0.1.4050: S 218483035:218483035(0)
win 32792 <mss 16396,sackOK,timestamp 0x0001bed4 0,nop,wscale 6>
13:59:19.047320 IP 127.0.0.1.4050 > 127.0.0.1.45979: S 1141094138:1141094138(0)
ack 218483036 win 32768 <mss 16396,sackOK,timestamp 0x0001bd66 0x0001bed4,nop,wscale 6>
13:59:19.047322 IP 127.0.0.1.45979 > 127.0.0.1.4050: .
ack 1 win 513 <nop,nop,timestamp 0x0001bed4 0x0001bd66>
While there is no guarantee that the timestamp option will be supported by every TCP peer, timestamps are widely deployed on the most common operating systems. Additionally, because timestamps, window scaling, and selective acknowledgments are all features related to high latency and bandwidth networks it would be unlikely to find an implementation that supported only a subset of these options.
One shortcoming of the scheme is that it is not general enough to be future-proof as new handshake based options may continue to be deployed. At this time the MSS, SACK, window scaling, and timestamp options are the only handshake options seen with any regularity other than the NOP option which is just used for packet alignment. However, the whole point of an extensible option scheme is to leave room for future improvements. The IANA registry that records option values was last updated in February 2007 to reserve option code 27 for use with Experimental RFC 4782 "Quick Start for TCP and IP". Only time will tell if that particular option will be the next challenge to the syncookie scheme or if something else will rise first.
The timestamp patch has only been posted very recently, and there has been little discussion of it beyond the developers who worked directly on it. It is not clear whether or not it will be accepted right away into the mainline, but it certainly seems to address a well known core problem with the syncookie at a minor cost.
With the updates for IPv6 and modern TCP option schemes syncookies
appear primed to keep providing sweet relief in their somewhat
esoteric networking security niche. Perhaps they will keep chugging
away for another 10 years without having to be re-baked.
| Index entries for this article | |
|---|---|
| GuestArticles | McManus, Patrick |