TCP high performance and maximum usable bandwidth

The second half of Februari saw two main topics on the NANOG list: DS3 performance and satellite latency. The long round trip times for satellite connections wreak havoc on TCP performance. In order to be able to utilize the available bandwidth, TCP needs to keep sending data without waiting for an acknowledgment for at least a full round trip time. Or in other words: TCP performance is limited to the window size multiplied by the round trip time. The TCP window (amount of data TCP will send before stopping and waiting for an acknowledgment) is limited by two factors: the send buffer on the sending system and the 16 bit window size field in the TCP header. So on a 600 ms RTT satellite link the maximum TCP performance is limited to 107 kilobytes per second (850 kbps) by the size of the header field, and if a sender uses a 16 kilobyte buffer (a fairly common size) this drops to as little as 27 kilobytes per second (215 kbps). Because of the TCP slow start mechanism, it takes several seconds to reach this speed as well. Fortunately, RFC 1323, TCP Extensions for High Performance introduces a "window scale" option to increase the TCP window to a maximum of 1 GB, if both ends of the connection allocate enough buffer space.

The other subject that received a lot of attention, the maximum usable bandwidth of a DS3/T3 line, is also related to TCP performance. When the line gets close to being fully utilized, short data bursts (which are very common in IP) will fill up the send queue. When the queue is full, additional incoming packets are discarded. This is called a "tail drop". If the TCP session which loses a packet doesn't support "fast retransmit", or if several packets from the same session are dropped, this TCP session will go into "slow start" and slow down a lot. This often happens to several TCP sessions at the same time, so those now all perform slow start at the same time. So they all reach the point where the line can't handle the traffic load at the same time, and another small burst will trigger another round of tail drops.

A possible solution is to use Random Early Detect (RED) queuing rather than First In, First Out (FIFO). RED will start dropping more and more packets as the queue fills up, to trigger TCP congestion avoidance and slow down the TCP sessions more gently. But this only works if there aren't (m)any tail drops, which is unlikely if there is only limited buffer space. Unfortunately, Cisco uses a default queue size of 40 packets. Queuing theory tells us this queue will be filled entirely (on average) at 97% line utilization. So at 97%, even a one packet burst will result in a tail drop. The solution is to increase the queue size, in addition to enabling RED. On a Cisco:

interface ATM0
random-detect
hold-queue 500 out

This gives RED the opportunity to start dropping individual packets long before the queue fills up entirely and tail drops occur. The price is a somewhat longer queuing delay. At 99% utilization, there will be an average of 98 packets in the queue, but at 45 Mbps this will only introduce a delay of 9 ms.

Permalink - posted 2002-03-31

Analysis of 9/11 impact on the net

Jaap Akkerhuis from the .nl TLD registry made an analysis of the impact of the events of September 11th on the net which he presented at the ICANN general meeting mid-November.

Slides of the presentation (PDF)
Extensive archives of the ICANN meeting (but hard to find specific information)

Permalink - posted 2001-12-27

1500+ byte MTU for exchange points

Between August 20 and 26 an interesting subject came up on the NANOG list: when using Gigabit Ethernet for exchange points, there can be a nice performance gain if a larger MTU than the standard 1500 byte one is used. However, this will only work if all attached layer 3 devices agree on the MTU. This can be accomplished by having several VLANs and setting the MTU for each subinterface if a single MTU can't be agreed upon.

Permalink - posted 2001-09-30

9/11 network impact

The terrorist attack on the World Trade Center in New York City resulted in outages for a number of ISPs. Of the destroyed buildings, WTC 1 and 7 housed colocation facilities. The Telehouse America facility on 25 Broadway in Manhattan, not far from the WTC, lost power. The facility was not damaged, but commercial power was lost and after running on generator power for two days, the generators overheated and had to be turned off for several hours. Affected ISPs received many offers for temporary connectivity and assistence rebuilding their networks.

The phone network experienced congestion in many places on the day of the attack. Although individual (news) sites were slow or hard to reach, general Internet connectivity held up very well. While phone traffic was much higher than usual, traffic over the Internet rose shortly after the attack, but then it declined and stayed somewhat lower than normal the rest of the day, with some unusual traffic patterns.

It seems obvious that packet switched network have better graceful degradation than circuit switched networks. A phone call always uses the same amount of bandwidth, so either you are lucky and it works, or you are unlucky and you get nothing. Packet networks on the other hand, slow down but generally don't cut off users completely until things get really, really bad. And while the current Internet holds its own in many-to-many communication, it can't really cope with massive one-to-many traffic.

Photos of an affected telephone Central Office in New York

Permalink - posted 2001-09-29

- nieuwere posts