|
Previous
|
Content
|
Next
|
|
|
9.0.- TCP Selective
Acknowledgment Options (SACK) |
|
 |
|
|
|
|
M.Mathis, J.Mahdavi, S.Floyd and
A.Romanow wrote RFC 2018 on Octuber 1996 about
TCP Selective Acknowledgment Options (SACK). Next is a brief summary
of the original document. |
|
The problem |
|
Multiple packet losses from a window of data can have a catastrophic effect
on TCP throughput. TCP uses a cumulative acknowledgment scheme
in which received segments that are not at the left edge of the receive
window are not acknowledged. This forces the sender to either wait a
roundtrip time to find out about each lost packet, or to unnecessarily
retransmit segments which have been correctly received. With the cumulative
acknowledgment scheme, multiple dropped segments generally cause TCP
to lose its ACK-based clock, reducing overall throughput. |
|
The solution |
|
Selective Acknowledgment (SACK) is a strategy which corrects
this behavior in the face of multiple dropped segments. With selective
acknowledgments, the data receiver can inform the sender about all segments
that have arrived successfully, so the sender need retransmit only the
segments that have actually been lost. |
|
The selective acknowledgment extension uses two TCP options. The
first is an enabling option, "SACK-permitted", which may be sent in a
SYN segment to indicate that the SACK option can be used once
the connection is established. The other is the SACK option itself,
which may be sent over an established connection once permission has been
given by SACK-permitted. |
|
The SACK option is to be included in a segment sent from a TCP
that is receiving data to the TCP that is sending that data; we will
refer to these TCPs as the data receiver and the data
sender, respectively. We will consider a particular simplex data flow;
any data flowing in the reverse direction over the same connection can be
treated independently. |
|
The specification |
|
The 2-byte TCP Sack-Permitted option may be sent in a SYN by a
TCP that has been extended to receive (and presumably process) the
SACK option once the connection has opened. It MUST NOT be sent
on non-SYN segments. The SACK option is to be used to convey
extended acknowledgment information from the receiver to the sender
over an established TCP connection. |
|

|
|
The SACK option is to be sent by a data receiver to inform the data
sender of non-contiguous blocks of data that have been received and
queued. The data receiver awaits the receipt of data to fill the gaps in
sequence space between received blocks. When missing segments are received,
the data receiver acknowledges the data normally by advancing the left
window edge in the Acknowledgement Number Field of the TCP header.
The SACK option does not change the meaning of the Acknowledgement
Number field. |
|

|
|
This option contains a list of some of the blocks of contiguous sequence
space occupied by data that has been received and queued within the window.
Each contiguous block of data queued at the data receiver is defined in the
SACK option by two 32-bit unsigned integers in network byte
order: |
| Left Edge of Block |
This is the first sequence number of this block. |
|
|
| Right Edge of Block |
This is the sequence number immediately following the
last sequence number of this block. |
|
|
|
Each block represents received bytes of data that are contiguous and
isolated; that is, the bytes just below the block, (Left Edge of Block -
1), and just above the block, (Right Edge of Block), have not
been received. |
|
A SACK option that specifies n blocks will have a length of
8*n+2 bytes, so the 40 bytes available for TCP options can
specify a maximum of 4 blocks. It is expected that SACK will
often be used in conjunction with the Timestamp option used for
RTTM, which takes an additional 10 bytes (plus two bytes of
padding); thus a maximum of 3 SACK blocks will be allowed in this
case. |
|
The SACK option is advisory, in that, while it notifies the data
sender that the data receiver has received the indicated segments, the data
receiver is permitted to later discard data which have been reported in a
SACK option. |
|
|
|
Generating Sack Options: Data Receiver Behavior |
|
If the data receiver has received a SACK-Permitted option on the
SYN for this connection, the data receiver MAY elect to generate SACK
options as described below. If the data receiver generates SACK
options under any circumstance, it SHOULD generate them under all permitted
circumstances. If the data receiver has not received a SACK-Permitted
option for a given connection, it MUST NOT send SACK options on that
connection. |
|
If sent at all, SACK options SHOULD be included in all ACKs
which do not ACK the highest sequence number in the data receiver's
queue. In this situation the network has lost or mis-ordered data,
such that the receiver holds non-contiguous data in its queue. The
receiver SHOULD send an ACK for every valid segment that arrives
containing new data, and each of these "duplicate" ACKs SHOULD
bear a SACK option. |
|
If the data receiver chooses to send a SACK option, the following
rules apply: |
- The first SACK block (i.e., the one immediately following the
kind and length fields in the option) MUST specify the contiguous block of
data containing the segment which triggered this ACK, unless that
segment advanced the Acknowledgment Number field in the header.
This assures that the ACK with the SACK option reflects the
most recent change in the data receiver's buffer queue.
- The data receiver SHOULD include as many distinct SACK blocks
as possible in the SACK option. Note that the maximum available
option space may not be sufficient to report all blocks present in the
receiver's queue.
- The SACK option SHOULD be filled out by repeating the most
recently reported SACK blocks (based on first SACK blocks in
previous SACK options) that are not subsets of a SACK block
already included in the SACK option being constructed. This assures
that in normal operation, any segment remaining part of a non-contiguous
block of data held by the data receiver is reported in at least three
successive SACK options, even for large-window TCP
implementations. After the first SACK block, the following SACK
blocks in the SACK option may be listed in arbitrary order.
|
|
It is very important that the SACK option always reports the block
containing the most recently received segment, because this provides the
sender with the most up-to-date information about the state of the network
and the data receiver's queue. |
|
Interpreting the Sack Option and Retransmission Strategy: Data Sender
Behavior |
| When receiving an ACK containing a SACK option, the data
sender SHOULD record the selective acknowledgment for future reference. The
data sender is assumed to have a retransmission queue that contains the
segments that have been transmitted but not yet acknowledged, in
sequence-number order. If the data sender performs re-packetization before
retransmission, the block boundaries in a SACK option that it
receives may not fall on boundaries of segments in the retransmission queue;
however, this does not pose a serious difficulty for the sender. |
|
|
One possible implementation of the sender's behavior is as follows. Let us
suppose that for each segment in the retransmission queue there is a (new)
flag bit "SACKed", to be used to indicate that this particular
segment has been reported in a SACK option. |
|
When an acknowledgment segment arrives containing a SACK option, the
data sender will turn on the SACKed bits for segments that have been
selectively acknowledged. More specifically, for each block in the SACK
option, the data sender will turn on the SACKed flags for all
segments in the retransmission queue that are wholly contained within that
block. This requires straightforward sequence number comparisons. |
|
After the SACKed bit is turned on (as the result of processing a
received SACK option), the data sender will skip that segment during
any later retransmission. Any segment that has the SACKed bit turned
off and is less than the highest SACKed segment is available for
retransmission. |
|
After a retransmit timeout the data sender SHOULD turn off all of the
SACKed bits, since the timeout might indicate that the data receiver
has reneged. The data sender MUST retransmit the segment at the left edge of
the window after a retransmit timeout, whether or not the SACKed bit
is on for that segment. A segment will not be dequeued and its buffer freed
until the left window edge is advanced over it. |
|
Congestion Control Issues |
|
The congestion control algorithms present in the de facto standard
TCP implementations MUST be preserved. In particular, to preserve
robustness in the presence of packets reordered by the network, recovery is
not triggered by a single ACK reporting out-of-order packets at the
receiver. Further, during recovery the data sender limits the number of
segments sent in response to each ACK. Existing implementations limit
the data sender to sending one segment during Reno-style fast recovery,
or to two segments during slow-start. Other aspects of congestion
control, such as reducing the congestion window in response to congestion,
must similarly be preserved. |
|
The use of time-outs as a fall-back mechanism for detecting
dropped packets is unchanged by the SACK option. Because the
data receiver is allowed to discard SACKed data, when a retransmit
timeout occurs the data sender MUST ignore prior SACK
information in determining which data to retransmit. |
|
Efficiency and Worst Case Behavior |
|
If the return path carrying ACKs and SACK options were
lossless, one block per SACK option packet would always be
sufficient. Every segment arriving while the data receiver holds
discontinuous data would cause the data receiver to send an ACK with
a SACK option containing the one altered block in the receiver's
queue. The data sender is thus able to construct a precise replica of the
receiver's queue by taking the union of all the first SACK blocks. |
|
Since the return path is not lossless, the SACK option is
defined to include more than one SACK block in a single packet. The
redundant blocks in the SACK option packet increase the robustness of
SACK delivery in the presence of lost ACKs. For a receiver
that is also using the time stamp option, the SACK option has room to
include three SACK blocks. Thus each SACK block will generally
be repeated at least three times, if necessary, once in each of three
successive ACK packets. However, if all of the ACK packets
reporting a particular SACK block are dropped, then the sender might
assume that the data in that SACK block has not been received, and
unnecessarily retransmit those segments. |
|
The deployment of other TCP options may reduce the number of
available SACK blocks to 2 or even to 1. This will reduce the
redundancy of SACK delivery in the presence of lost ACKs. Even
so, the exposure of TCP SACK in regard to the unnecessary
retransmission of packets is strictly less than the exposure of current
implementations of TCP. |
|
Sack Option Examples |
|
The following examples attempt to demonstrate the proper behavior of SACK
generation by the data receiver. Assume the left window edge is 5000
and that the data transmitter sends a burst of 8 segments, each
containing 500 data bytes. |
|
Case 1: The first 4 segments are received but the last 4 are dropped. |
|
The data receiver will return a normal TCP ACK segment acknowledging
sequence number 7000, with no SACK option. |
|
Case 2: The first segment is dropped but the remaining 7 are
received. |
|
Upon receiving each of the last seven packets, the data receiver will return
a TCP ACK segment that acknowledges sequence number 5000 and
contains a SACK option specifying one block of queued data: |
|

|
|
Case 3: The 2nd, 4th, 6th, and 8th (last) segments are dropped. |
|
The data receiver ACKs the first packet normally. The third, fifth,
and seventh packets trigger SACK options as follows: |
|

|
|
Suppose at this point, the 4th packet is received out of order. (This
could either be because the data was badly misordered in the network, or
because the 2nd packet was retransmitted and lost, and then the
4th packet was retransmitted). At this point the data receiver has only
two SACK blocks to report. The data receiver replies with the
following Selective Acknowledgment: |
|

|
|
Suppose at this point, the 2nd segment is received. The data receiver
then replies with the following Selective Acknowledgment: |
|

|
|
Data Receiver Reneging |
|
Note that the data receiver is permitted to discard data in its queue that
has not been acknowledged to the data sender, even if the data has already
been reported in a SACK option. Such discarding of SACKed packets
is discouraged, but may be used if the receiver runs out of buffer space. |
|
The data receiver MAY elect not to keep data which it has reported in a
SACK option. In this case, the receiver SACK generation is
additionally qualified: The first SACK block MUST reflect the newest
segment. Even if the newest segment is going to be discarded and the
receiver has already discarded adjacent segments, the first SACK
block MUST report, at a minimum, the left and right edges of the newest
segment. Except for the newest segment, all SACK blocks MUST NOT
report any old data which is no longer actually held by the receiver. |
|
Since the data receiver may later discard data reported in a SACK
option, the sender MUST NOT discard data before it is acknowledged by the
Acknowledgment Number field in the TCP header. |
|
|
|
|
Previous
|
Content
|
Next
|