Content

   

RFC 2581 - TCP Congestion Control  

RFC 2581TCP Congestion Control was written by M. Allman, V. Paxson and W. Stevens in April 1999. Here I'm trying to make a very brief summary of this document:

Definitions
 
SEGMENT : Any TCP/IP data or ACK packet.
 
SENDER MAXIMUM SEGMENT SIZE (SMSS) : size of the largest segment that the sender can transmit.
 
RECEIVER MAXIMUM SEGMENT SIZE (RMSS) : size of the largest segment that the receiver is willing to accept.
 
FULL-SIZED SEGMENT : a segment that contains the maximum number of data bytes permitted.
 
RECEIVER WINDOW (rwnd) : the most recently advertised receiver window.
 
CONGESTION WINDOW (cwnd) : state variable that limits the amount of data a TCP sender can send. At any given time, a TCP sender must not send data with a sequence number higher than the sum of the highest acknowledged sequence number and the minimum of cwnd and rwnd.
 
INITIAL WINDOW (IW) : size of the cwnd after the three-way handshake is completed.
 
LOSS WINDOW (LW) : size of the cwnd after a TCP sender detects loss using its retransmission timer.
 
RESTART WINDOW (RW) : size of the cwnd after a TCP sender restarts transmission after an idle period (if the slow-start algorithm is used).
 
FLIGHT SIZE : amount of data that has been sent but not yet acknowledged (acked).
 
Congestion Control Algorithm
 
TCP must not be more aggresive than the algorithm of this specification. Then, it must not send data when cwnd computed by the algorithm would not allow the data to be sent.
 
   

 

Slow-start and Congestion Avoidance
The cwnd variable is a sender-side limit on the amount of data the sender can transmit into the network before receiving an ACK. rwnd state variable is a receiver-side limit on the amount of outstanding data. The minimun of cwnd and rwnd governs data transmission.
Slow-start threshold (ssthresh) state variable is used to determine whether the slow-start or congestion avoidance algorithm is used to control data transmission.
The slow-start algorithm is used at the beginning of a transfer, or after repairing a loss detected by the retransmission timer, to slowly probe the network to determine its available capacity.
IW, the initial cwnd value, must be less than or equal to 2 * SMSS bytes and must not be more than 2 segments.
The initial value of ssthresh may be arbitrarily high (i.e., the size of the advertised window), but it may be reduced in response to congestion. When cwnd < ssthresh, the slow-start algorithm is used and when cwnd > ssthresh, the congestion avoidance algorithm is used. When cwnd and ssthresh are equal, the sender may use either of them.
During slow-start, a TCP increments cwnd by at most SMSS bytes for each ACK received that acknowledges new data. During congestion avoidance, cwnd is incremented by 1 full-sized segment per round-trip time (RTT). Slow-start ends when cwnd exceeds ssthresh or when congestion is detected. Congestion avoidance ends when congestion is detected. One formula commonly used to update cwnd during congestion avoidance is:
  cwnd += SMSS * SMSS / cwnd (2)
This adjustment is executed on every incoming non-duplicated ACK. It provides an acceptable aproximation to the underlying principle of increasing cwnd by 1 full-sized segment per RTT. If the formula yields 0, the result should be rounded up to 1 byte. Another acceptable way to increase cwnd during congestion avoidance is to count the number of bytes that have been acked for new data. When the number of bytes acked reaches cwnd, the cwnd is incremented by up to SMSS bytes.
When a TCP sender detects segment loss using the retransmission timer, the value of ssthresh must be set to:
  ssthresh = max ( FlightSize/2, 2*SMSS ) (3)
Where FlightSize is the amount of outstanding data in the network. Furthermore, upon a timeout cwnd must be set to no more that the loss window LW, which equals 1 full-sized segment. Therefore, after retransmitting the dropped segment the TCP sender uses the slow-start algorithm to increase the window from 1 full-sized segment to the new value of ssthresh, at which point congestion avoidance again takes over.
Fast Retransmit/Fast Recovery
 
A TCP receiver should send an immediate duplicate ACK when an out-of-order segment arrives; this is to inform that a segment was received out-of-order and which sequence number is expected (caused by dropping, reordering or duplication in the network). In addition, a TCP receiver should send an immediate ACK when the incoming segment fills in all or part of a gap in the sequence space. This will generate more timely information for the sender recovery.
 
The TCP sender should use the fast retransmit algorithm to detect and repair loss based on incoming duplicate ACKs. After the arrival of 3 duplicate ACKs (4 identical ACKs without the arrival of any other intervening packet), TCP performs a retransmission of what appears to be the missing segment, without waiting for the retransmission timer to expire.
 
After the fast retransmit algorithm sends what apperars to be the missing segment, the fast recovery algorithm govers the transmission of new data until a non-duplicate ACK arrives. Duplicate ACKs indicate that data is flowing, then the slow-start algorithm is not applied. Since the receiver can only generate a duplicate ACK when a segment has arrived, that segment has left the network and is in the receiver's buffer, so we know is no longer consuming network resources. Furthermore, since the ACK clock is preserved, the TCP sender can continue to transmit new segments, using a reduced cwnd.
 
The fast retransmit and fast recovery algorithms are implemented together as follows:
 
  1. When the third duplicate ACK is received, set ssthresh to no more than the value given in equation (3) above.
     
  2. Retransmit the lost segment and set cwnd to ssthresh + 3 * SMSS. This artificially "inflates" cwnd by three segments that have left the network and are in the receiver buffer (the 3 duplicate ACK indicates 3 segments have left the network).
     
  3. For each additional duplicate ACK received, increment cwnd by SMSS. This artificially increments cwnd to reflect additional segments that have left the network.
     
  4. Transmit a segment, if allowed by the new value of cwnd and the receiver's advertised window.
     
  5. When the next ACK arrives that acknowledges new data, set cwnd to ssthresh (value in step 1). This is termed "deflating" the window. This ACK should be the acknowledgment elicited by the retransmission from step 1, one RTT after the retransmission (this conclusion assumes that only one packet has been lost in the network; this is not always the case, see the note below). Additionally, this ACK should acknowledge all the intermediate segments sent between the lost segment and the receipt of the third duplicate ACK, if none of these were lost.
 
   

 

Note: this algorithm is known to generally not recover very efficiently from multiple losses in a single flight of packets.
Re-starting Idle Connections
A know problem with the algorithms above is that they allow a potentially inappropriate burst of traffic to be transmitted after TCP has been idle for a relative long period of time. After an idle period, TCP cannot use the ACK clock to strobe new segments into the network, as all ACKs have been drained from the network. Therefore, TCP can potentially send a cwnd-size line-rate burst into the network after an idle period.
Van Jacobson recommends to go back to slow-start to restart the ACK clock, just as it does at the beginning of a transfer. Then, when TCP has not received a segment for more than one RTO (retransmission timeout), cwnd is reduced to the value of the restart window (RW) before transmission begins. For this standard, RW must be equal to IW.
Generating ACKs
The delayed ACK algorithm should be used by a TCP receiver, but it must not excessively delay ACKs. Specifically, an ACK should be generated for at least every second full-sized segment (2*RMSS), and must be generated within 500ms of the arrival of the first unacked packet.
In some cases, the sender and receiver may not agree on what constitute a full-sized segment. If the sender is forced to use a segment size less than RMSS (due to the MTU, path MTU discovery or other factor), clearly it will take more than two SMSS to the receiver to acknowledge receiver packets if it uses the RMSS factor. Therefore, while a specific algorithm is not defined, it is desirable for receivers to acknowledge at least every second segment, regardless of size.
 
Out-of-order data segments should be acknowledged immediately, in order to accelerate loss recovery. To trigger the fast retransmit algorithm, the receiver should send an immediate duplicate ACK when it receives a data segment above a gap in the sequence space. To provide feedback to senders recovering from losses, the receiver should send an immediate ACK when it receives a data segment that fills in all or part of a gap in the sequence space.
 
A TCP receiver must not generate more that one ACK for every incoming segment, other than to update the offered window as the receiving application consumes new data.
 
Loss recovery mechanism
 
A number of loss recovery algorithms that augment fast retransmit and fast recovery have been suggested by TCP researchers. These enhanced algorithms are implicity allowed, as long as they follow the general principle of the basic four algorithms outlined in this document.
 
Therefore:
 
  1. When the first loss in a window of data is detected, ssthreh must be set to no more than the value given by equation 3 above.
  2. Until all lost segments in a window of data are repaired, the number of segments transmitted in each RTT must be no more than half the number of outstanding segments when the loss was detected.
  3. After all loss is a given window of segments has been sucessfully retransmitted, cwnd must be set to no more that ssthreh and congestion avoidance must be used to further increase cwnd.
  4. Loss in two succesive windows of data, or the loss of a retransmission, should be taken as two indication of congestion and, therefore, cwnd (and ssthresh) must be lowered twice in this case.
 
   

 


   


Content