|
Content
|
|
|
|
|
RFC 2581 - TCP Congestion Control |
|

|
|
|
RFC 2581: TCP Congestion Control was written by M.
Allman, V. Paxson and W. Stevens in April 1999. Here I'm
trying to make a very brief summary of this document: |
|
|
Definitions |
| |
| SEGMENT : Any TCP/IP data or ACK packet. |
| |
| SENDER MAXIMUM SEGMENT SIZE (SMSS) : size of the largest segment that the
sender can transmit. |
| |
| RECEIVER MAXIMUM SEGMENT SIZE (RMSS) : size of the largest segment that the
receiver is willing to accept. |
| |
| FULL-SIZED SEGMENT : a segment that contains the maximum number of data
bytes permitted. |
| |
| RECEIVER WINDOW (rwnd) : the most recently advertised receiver
window. |
| |
| CONGESTION WINDOW (cwnd) : state variable that limits the amount of
data a TCP sender can send. At any given time, a TCP sender
must not send data with a sequence number higher than the sum of the highest
acknowledged sequence number and the minimum of cwnd and rwnd. |
| |
| INITIAL WINDOW (IW) : size of the cwnd after the three-way
handshake is completed. |
| |
| LOSS WINDOW (LW) : size of the cwnd after a TCP sender
detects loss using its retransmission timer. |
| |
| RESTART WINDOW (RW) : size of the cwnd after a TCP
sender restarts transmission after an idle period (if the slow-start
algorithm is used). |
| |
| FLIGHT SIZE : amount of data that has been sent but not yet acknowledged
(acked). |
| |
|
Congestion Control Algorithm |
| |
|
TCP must not be more aggresive than the algorithm of this
specification. Then, it must not send data when cwnd computed
by the algorithm would not allow the data to be sent. |
|
|
|
|
|
Slow-start and Congestion Avoidance |
|
The cwnd variable is a sender-side limit on the amount of data
the sender can transmit into the network before receiving an ACK.
rwnd state variable is a receiver-side limit on the amount of
outstanding data. The minimun of cwnd and rwnd governs
data transmission. |
|
Slow-start threshold (ssthresh) state variable is used to
determine whether the slow-start or congestion avoidance
algorithm is used to control data transmission. |
|
The slow-start algorithm is used at the beginning of a transfer,
or after repairing a loss detected by the retransmission timer,
to slowly probe the network to determine its available capacity. |
|
IW, the initial cwnd value, must be less than or equal to 2
* SMSS bytes and must not be more than 2 segments. |
|
The initial value of ssthresh may be arbitrarily high (i.e., the size
of the advertised window), but it may be reduced in response to congestion.
When cwnd < ssthresh, the slow-start algorithm is used and
when cwnd > ssthresh, the congestion avoidance algorithm is
used. When cwnd and ssthresh are equal, the sender may use
either of them. |
|
During slow-start, a TCP increments cwnd by at most
SMSS bytes for each ACK received that acknowledges new
data. During congestion avoidance, cwnd is incremented by
1 full-sized segment per round-trip time (RTT).
Slow-start ends when cwnd exceeds ssthresh or when
congestion is detected. Congestion avoidance ends when congestion is
detected. One formula commonly used to update cwnd during congestion
avoidance is: |
| |
cwnd += SMSS * SMSS / cwnd |
(2) |
|
|
This adjustment is executed on every incoming non-duplicated ACK. It
provides an acceptable aproximation to the underlying principle of
increasing cwnd by 1 full-sized segment per RTT. If the
formula yields 0, the result should be rounded up to 1 byte.
Another acceptable way to increase cwnd during congestion
avoidance is to count the number of bytes that have been acked for new
data. When the number of bytes acked reaches cwnd, the cwnd is
incremented by up to SMSS bytes. |
|
When a TCP sender detects segment loss using the retransmission
timer, the value of ssthresh must be set to: |
| |
ssthresh = max ( FlightSize/2,
2*SMSS ) |
(3) |
|
|
Where FlightSize is the amount of outstanding data in the network.
Furthermore, upon a timeout cwnd must be set to no more that the loss
window LW, which equals 1 full-sized segment. Therefore, after
retransmitting the dropped segment the TCP sender uses the
slow-start algorithm to increase the window from 1 full-sized segment
to the new value of ssthresh, at which point congestion avoidance
again takes over. |
|
Fast Retransmit/Fast Recovery |
| |
| A TCP receiver should send an immediate duplicate ACK when an
out-of-order segment arrives; this is to inform that a segment was
received out-of-order and which sequence number is expected (caused
by dropping, reordering or duplication in the network). In addition, a
TCP receiver should send an immediate ACK when the incoming
segment fills in all or part of a gap in the sequence space. This
will generate more timely information for the sender recovery. |
| |
| The TCP sender should use the fast retransmit algorithm to
detect and repair loss based on incoming duplicate ACKs. After the
arrival of 3 duplicate ACKs (4 identical ACKs without the
arrival of any other intervening packet), TCP performs a
retransmission of what appears to be the missing segment, without waiting
for the retransmission timer to expire. |
| |
| After the fast retransmit algorithm sends what apperars to be the
missing segment, the fast recovery algorithm govers the transmission
of new data until a non-duplicate ACK arrives. Duplicate ACKs
indicate that data is flowing, then the slow-start algorithm is not
applied. Since the receiver can only generate a duplicate ACK when a
segment has arrived, that segment has left the network and is in the
receiver's buffer, so we know is no longer consuming network resources.
Furthermore, since the ACK clock is preserved, the TCP sender
can continue to transmit new segments, using a reduced cwnd. |
| |
| The fast retransmit and fast recovery algorithms are
implemented together as follows: |
| |
- When the third duplicate ACK is received, set ssthresh
to no more than the value given in equation (3) above.
- Retransmit the lost segment and set cwnd to ssthresh + 3 *
SMSS. This artificially "inflates" cwnd by three
segments that have left the network and are in the receiver buffer (the
3 duplicate ACK indicates 3 segments have left the network).
- For each additional duplicate ACK received, increment cwnd
by SMSS. This artificially increments cwnd to reflect
additional segments that have left the network.
- Transmit a segment, if allowed by the new value of cwnd and the
receiver's advertised window.
- When the next ACK arrives that acknowledges new data,
set cwnd to ssthresh (value in step 1). This is termed "deflating"
the window. This ACK should be the acknowledgment elicited by
the retransmission from step 1, one RTT after the
retransmission (this conclusion assumes that only one packet has been
lost in the network; this is not always the case, see the note below).
Additionally, this ACK should acknowledge all the intermediate
segments sent between the lost segment and the receipt of the third
duplicate ACK, if none of these were lost.
|
|
|
|
|
|
Note: this algorithm is known to generally not recover very efficiently
from multiple losses in a single flight of packets. |
|
Re-starting Idle Connections |
|
A know problem with the algorithms above is that they allow a potentially
inappropriate burst of traffic to be transmitted after TCP has been
idle for a relative long period of time. After an idle period,
TCP cannot use the ACK clock to strobe new segments into the
network, as all ACKs have been drained from the network. Therefore,
TCP can potentially send a cwnd-size line-rate burst into the
network after an idle period. |
|
Van Jacobson recommends to go back to slow-start to restart
the ACK clock, just as it does at the beginning of a transfer. Then,
when TCP has not received a segment for more than one RTO (retransmission
timeout), cwnd is reduced to the value of the restart window
(RW) before transmission begins. For this standard, RW must be
equal to IW. |
|
Generating ACKs |
|
The delayed ACK algorithm should be used by a TCP receiver,
but it must not excessively delay ACKs. Specifically, an
ACK should be generated for at least every second full-sized segment
(2*RMSS), and must be generated within 500ms of the arrival of
the first unacked packet. |
| In some cases, the sender and receiver may not agree on what constitute a
full-sized segment. If the sender is forced to use a segment size less
than RMSS (due to the MTU, path MTU discovery or other
factor), clearly it will take more than two SMSS to the receiver to
acknowledge receiver packets if it uses the RMSS factor. Therefore,
while a specific algorithm is not defined, it is desirable for receivers to
acknowledge at least every second segment, regardless of size. |
| |
|
Out-of-order data segments should be acknowledged immediately,
in order to accelerate loss recovery. To trigger the fast retransmit
algorithm, the receiver should send an immediate duplicate ACK
when it receives a data segment above a gap in the sequence space. To
provide feedback to senders recovering from losses, the receiver should send
an immediate ACK when it receives a data segment that fills in all
or part of a gap in the sequence space. |
| |
| A TCP receiver must not generate more that one ACK for every
incoming segment, other than to update the offered window as the receiving
application consumes new data. |
| |
|
Loss recovery mechanism |
| |
| A number of loss recovery algorithms that augment fast retransmit and
fast recovery have been suggested by TCP researchers. These
enhanced algorithms are implicity allowed, as long as they follow the
general principle of the basic four algorithms outlined in this document. |
| |
| Therefore: |
| |
- When the first loss in a window of data is detected, ssthreh
must be set to no more than the value given by equation 3 above.
- Until all lost segments in a window of data are repaired, the number
of segments transmitted in each RTT must be no more than half
the number of outstanding segments when the loss was detected.
- After all loss is a given window of segments has been sucessfully
retransmitted, cwnd must be set to no more that ssthreh and
congestion avoidance must be used to further increase cwnd.
- Loss in two succesive windows of data, or the loss of a
retransmission, should be taken as two indication of congestion
and, therefore, cwnd (and ssthresh) must be lowered twice in
this case.
|
|
|
|
|
|
|
|
|
|
Content
|
|
|