WO2001065394A1 - Commande de trafic de bord-a-bord pour internet - Google Patents
Commande de trafic de bord-a-bord pour internet Download PDFInfo
- Publication number
- WO2001065394A1 WO2001065394A1 PCT/US2001/006263 US0106263W WO0165394A1 WO 2001065394 A1 WO2001065394 A1 WO 2001065394A1 US 0106263 W US0106263 W US 0106263W WO 0165394 A1 WO0165394 A1 WO 0165394A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- congestion
- nodes
- edge
- epoch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/30—Flow control; Congestion control in combination with information about buffer occupancy at either end or at transit nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/12—Avoiding congestion; Recovering from congestion
- H04L47/122—Avoiding congestion; Recovering from congestion by diverting traffic away from congested entities
Definitions
- the present invention generally relates to computer network traffic
- the present invention relates to
- QoS Quality of Service
- tail circuits (to remote locations), international circuits and peering points, may
- QoS Quality-of-Service
- the present invention provides a congestion control
- edge-to-edge traffic control For the Internet, edge-to-edge traffic control.
- edge-to-edge traffic trucking building blocks herein referred to as
- IP network
- TCP Transmission Control Protocol
- t e overlay involves the exchange of control
- the edge-to-edge virtual links thus created can be used as the
- bandwidth- based denial of service attacks can be isolated, simple differentiated services can be offered, and new
- FIG. 1 illustrates a network system for the implementation of the
- FIG. 2 illustrates another view of a network system for the
- FIG. 3 illustrates a detailed network system to for use in describing the
- FIG. 4 shows a chart illustrating dynamic edge-to-edge regulation
- FIG. 5(a) illustrates a queue at a bottleneck
- FIG. 5(b) illustrates a queue distributed across edges.
- FIG. 1 a network node bottleneck 100.
- the present invention is based upon the observation that at all times, the sum of the output rates of flows passing through a particular single network node bottleneck 100 is less than or equal to the capacity of ( ⁇ ) 102 at the bottleneck 100, as illustrated in FIG. 1. Most importantly, this condition holds during periods of congestion called “congestion epochs" .
- a “congestion epoch” is defined as any period when the instantaneous queue length exceeds a queue bound which is larger than the maximum steady state queue fluctuations. Chosen this way, the congestion epoch is the period of full utilization incurred when the mean aggregate load
- ( ⁇ ) at a single bottleneck 100 exceeds mean capacity ( ⁇ ).
- Congestion epoch does not involve packet loss in its definition and is a basis for "early" detection.
- a single bottleneck 100 is used in the following description, although the present invention is applicable to a network of bottlenecks 100 as well.
- the output rates of flows (v,) can be measured at the receiver and fed back to the sender.
- each sender imposes a rate limit r, such that r, ⁇ — min( ⁇ v, , r,) where ⁇ ⁇ 1.
- the increase-decrease policy of the present invention is not the same as the well known additive -increase multiplicative-decrease (AIMD) policy, because the decrease policy of the present invention is based upon tl e output rate (v,) and not the input rate ( ⁇ ,).
- the policy of the present invention is hereto referred to as AIMD-ER (Additive Increase and Multiplicative Decrease using Egress Rate).
- the remaining part of the basic approach is a method to detect the congestion epochs in the system.
- the present invention utilizes two method for this purpose. The first method assumes that the interior routers assist in the determination of the start and duration of a congestion epoch. In the second method, edges detect congestion epochs without the involvement of interior routers.
- the interior router promiscuously marks a bit in packets whenever the instantaneous queue length exceeds a carefully designed threshold.
- the second method does not involve support from interior routers.
- the edges To detect the beginning of a congestion epoch, the edges rely on the observation that each flow's contribution to the queue length (or accumulation), q is equal to the integral J( ⁇ j.(T) - Vj (T))dT. If this accumulation is larger than a predefined threshold, the flow assumes the beginning of a congestion epoch.
- the end of the congestion epoch is detected when a one-way delay sample comes close to the minimum one-way delay.
- the present invention assumes that the network architecture is partitioned into traffic control classes.
- a traffic control class is a set of networks with consistent policies applied by a single administrative entity or cooperating administrative entities or peers. Specifically, it is assumed that edge-to-edge controlled traffic is isolated from other traffic which is not edge-to-edge controlled.
- the architecture has three primary components: the ingress edge 202, the interior router 204, and the egress edge 206. Nodes within a traffic control class that are connected to nodes outside the class and implement edge-to-edge control are known as edge routers 202, 206. Any remaining routers within a class are called interior routers 204.
- the methods of the present invention can be implemented on conventional hardware such as that of FIG. 2, where the ingress edge 202, the interior router 204, and egress edge 206 employ the means for performing the methods herein described.
- the ingress node 202 regulates each edge-to-edge virtual link to a rate limit of r ; .
- the actual input rate i.e. departure rate from the ingress 202, and denoted XT
- the present invention also assumes that the ingress node 202 uses an observation interval T for each edge-to-edge virtual link originating at this ingress 202.
- a congestion epoch begins when an interior router promiscuously marks a congestion bit on all packets once the instantaneous queue exceeds a carefully designed queue threshold parameter. Since interior routers 204 participate explicitly, the present invention refers to this as the Explicit Edge Control (EEC) method.
- EEC Explicit Edge Control
- the egress node 206 declares the beginning of a new congestion epoch upon seeing the first packet with tl e congestion bit set. A new control packet is created and the declaration of congestion along with the measured output rate at the egress 206 is fed back to the ingress 202. The interval used by the egress node 206 to measure and average the output rate is resynchronized with the beginning of this new congestion epoch.
- the congestion epoch continues in every successive observation interval where at least one packet from the edge-to-edge virtual link is seen with the congestion bit set. At the end of such intervals, tl e egress 206 sends a control packet with the latest value of the exponentially averaged output rate.
- the default response of the ingress edge 202 upon receipt of control packets is to reduce the virtual link's rate limit (r ; ) to the smootiied output rate scaled down by a multiplicative factor (v ; ⁇ 3 : 0 ( ⁇ ( 1).
- the congestion epoch ends in the first interval when no packets from the link are marked with the congestion bit.
- the egress 206 merely stops sending control packets and the ingress 202 assumes the end of a congestion epoch when two intervals pass without seeing a control packet.
- the ingress node 202 uses a leaky bucket rate shaper whose rate limit (r) can be varied dynamically based upon feedback.
- the amount of traffic "I" entering the network over any time interval [t, t + T ] after shaping is:
- the queue threshold parameter can be set as N ⁇ where N is the number of virtual links, not end-to-end flows, passing through the bottleneck.
- a rough estimate of N which suffices for this method, can be based upon the number of routing entries, and/or the knowledge of the number of edges whose traffic passes through the node. The objective is to allow at most ⁇ burstiness per active virtual link before signaling congestion.
- edge-to-edge virtual links occurs in a manner similar to TCP slow start, and is defined by the present invention as "rate -based slow start. " As long as there are sufficient packets to send the rate limit doubles each interval,when a rate-decrease occurs (in a congestion epoch), a rate threshold 'rthreshi", is set to the new value of tl e rate limit after the decrease. The function of this variable is similar to tl e "SSTHRESH" variable used in TCP. While the rate-limit r, tracks the actual departure rate ⁇ instruct rthresh, serves as an upper bound for slow start.
- the rate limit r is allowed to increase multiplicatively until it reaches rthresh , or receives a congestion notification.
- tl e is allowed to increase linearly by ⁇ /T once per measurement interval T. The dynamics of these variables are illustrated in FIG. 4.
- the rate-decrease during a congestion epoch is based upon the measured and smoothed egress rate v,.
- the response to congestion is to limit the departure rates ( ⁇ ,) to values smaller than v, consistently during the congestion epoch.
- the measurement interval T used by all edge systems (both ingress 202 and egress 206) is set to the class-wide maximum edge-to-edge round-trip propagation and transmission delays, max_ertt, plus the time to mark all virtual links passing through the bottleneck when congestion occurs.
- the time to mark all virtual links can be roughly estimated as N max ⁇ / ⁇ mm where ⁇ min is the smallest capacity within the class and N max is a reasonable bound on the number of virtual links passing through the bottleneck.
- the method optionally delays backoff through a method known as “delayed feedback response. " Specifically, the feedback received by the ingress node 202 is enforced after a delay of max_ertt
- the ingress 202 backs off by ⁇ 2 when packet loss occurs.
- tl e present invention also provides for edge-to-edge congestion control without interior router 204 involvement, herein referred to as Implicit Edge Control (IEC).
- IEC Implicit Edge Control
- IEC infers the congestion state by estimating the contribution of each virtual link to the queue length q ; by integrating the difference in ingress and egress rates.
- tlie estimate exceeds a threshold, IEC declares congestion.
- IEC ends the congestion epoch when the delay on a control packet drops to within e of the minimum measured delay.
- IEC and Explicit Edge Control (EEC) are identical, as described above.
- each virtual link signals congestion when its (contribution ( "accumulation" ), q i5 to the queue length exceeds ⁇ .
- the total accumulation is N ⁇ which is the congestion epoch detection criterion used in the EEC method.
- the accumulation q i can be calculated using the following observation: Assume a sufficiendy large interval ⁇ . If tl e average input rate during this period ⁇ is ⁇ ; and the average output rate is V; , the accumulation caused by this flow during the period ⁇ is ( ⁇ t - - v : ) x ⁇ . The accumulation measured during this period can be added to a running estimate of accumulation q ; which can then be compared against a maximum accumulation reference parameter. More accurately stated:
- the average interval for v is delayed by the propagation delay so that any fluid entering the virtual link by the time t can leave by time u unless it is backlogged.
- the computation of q excludes packets in the bandwidth-delay product, if the bandwidth-delay product is constant.
- the ingress node 202 sends two control packets in each interval T (but no faster than the real data rate), ⁇ is tl e inter-departure time of control packets at the sender.
- ⁇ is tl e inter-departure time of control packets at the sender.
- the ingress inserts a timestamp and the measured average input rate ( ⁇ ; ).
- the average output rate v is measured over the time interval between arrivals of consecutive control packets at the egress 206.
- the egress node 206 now has all the three quantities required to do the computation: ( ⁇ ; - v ; ) x ⁇ and add it to a running estimate of accumulation.
- the running estimate of accumulation is also reset at the end of a congestion epoch to avoid propagation of measurement errors.
- One way of implementing the control packet flow required for this mechanism without adding extra traffic is for the ingress 202 to piggy-back rate and timestamp information in a shim header on two data packets in each interval T. Interior IP routers ignore the shim headers, while the egress 206 strips them out.
- the detection of the end of a congestion epoch, or in general an un- congested network is based upon samples of one-way delay.
- the egress 202 updates the minimum one-way delay seen so far. Every time a one-way delay sample is within e of the minimum one-way delay, the egress 206 declares that the network is un- congested and stops sending negative feedback.
- the minimum oneway delay captures the fixed components of delay such as transmission, propagation and processing (not queuing delays). The delay over and above this minimum one-way delay is a rough measure of queuing delays.
- edge-to-edge control can be used to distribute backlog across the edges, as illustrated in FIGS. 5(a) and 5(b). This increases the effective number of buffers allowing more TCP connections to obtain large enough windows to survive loss without timing out. This reduces TCP's bias against tiny flows and thus improves fairness.
- goodput is defined herein as the number of transmitted payload bits excluding retransmissions per unit time) when many TCP connections compete for the bottleneck. As expected, this improvement increases as congestion is distributed across more edges.
- Edge-to-edge control can also be combined with Packeteer TCP rate control (TCPR) to provide a low-loss-end-to-end service for TCP connections.
- TCPR Packeteer TCP rate control
- Low-loss it is meant that the method typically does not incur loss in the steady-state.
- IEC Packeteer TCP rate control
- the combined IED + TCPR method does not require upgrading either end-systems or the interior network.
- both the Explicit Edge Control ( EEC ) and the Implicit dge Control (IEC) methods can be deployed one class at a time improving performance as the number of edge controlled class increases. For example, deployment can be piggybacked with the roll-out of services or MPLS, SII C fl these techniques can work with either architecture. Both methods are transparent to end-systems, but require software components to be installed at the edges of the network. Such edge components can be installed as upgrades to routers or stand-alone units.
- the above described apparat us and method proude loi an improved data network by elevating congestion at network bottlenecks.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US10/204,222 US20040037223A1 (en) | 2001-02-28 | 2001-02-28 | Edge-to-edge traffic control for the internet |
| AU2001245355A AU2001245355A1 (en) | 2000-02-29 | 2001-02-28 | Edge-to-edge traffic control for the internet |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18579500P | 2000-02-29 | 2000-02-29 | |
| US60/185,795 | 2000-02-29 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2001065394A1 true WO2001065394A1 (fr) | 2001-09-07 |
| WO2001065394A9 WO2001065394A9 (fr) | 2002-12-27 |
Family
ID=22682474
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2001/006263 Ceased WO2001065394A1 (fr) | 2000-02-29 | 2001-02-28 | Commande de trafic de bord-a-bord pour internet |
Country Status (2)
| Country | Link |
|---|---|
| AU (1) | AU2001245355A1 (fr) |
| WO (1) | WO2001065394A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN100450081C (zh) * | 2005-06-10 | 2009-01-07 | 华为技术有限公司 | 进行流量控制的方法和系统 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5818815A (en) * | 1994-09-19 | 1998-10-06 | International Business Machines Corporation | Method and an apparatus for shaping the output traffic in a fixed length cell switching network node |
| US5878228A (en) * | 1996-11-15 | 1999-03-02 | Northern Telecom Limited | Data transfer server with time slots scheduling base on transfer rate and predetermined data |
| US6151300A (en) * | 1996-05-10 | 2000-11-21 | Fujitsu Network Communications, Inc. | Method and apparatus for enabling flow control over multiple networks having disparate flow control capability |
| US6185187B1 (en) * | 1997-12-10 | 2001-02-06 | International Business Machines Corporation | Method and apparatus for relative rate marking switches |
-
2001
- 2001-02-28 AU AU2001245355A patent/AU2001245355A1/en not_active Abandoned
- 2001-02-28 WO PCT/US2001/006263 patent/WO2001065394A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5818815A (en) * | 1994-09-19 | 1998-10-06 | International Business Machines Corporation | Method and an apparatus for shaping the output traffic in a fixed length cell switching network node |
| US6151300A (en) * | 1996-05-10 | 2000-11-21 | Fujitsu Network Communications, Inc. | Method and apparatus for enabling flow control over multiple networks having disparate flow control capability |
| US5878228A (en) * | 1996-11-15 | 1999-03-02 | Northern Telecom Limited | Data transfer server with time slots scheduling base on transfer rate and predetermined data |
| US6185187B1 (en) * | 1997-12-10 | 2001-02-06 | International Business Machines Corporation | Method and apparatus for relative rate marking switches |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN100450081C (zh) * | 2005-06-10 | 2009-01-07 | 华为技术有限公司 | 进行流量控制的方法和系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| AU2001245355A1 (en) | 2001-09-12 |
| WO2001065394A9 (fr) | 2002-12-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20040037223A1 (en) | Edge-to-edge traffic control for the internet | |
| Al-Saadi et al. | A survey of delay-based and hybrid TCP congestion control algorithms | |
| Parsa et al. | Improving TCP congestion control over internets with heterogeneous transmission media | |
| Kuzmanovic et al. | TCP-LP: A distributed algorithm for low priority data transfer | |
| US8125910B2 (en) | Communication system | |
| Wang et al. | Adaptive bandwidth share estimation in TCP Westwood | |
| EP1417808B1 (fr) | Procédé et système pour le contrôle de congestion dans un réseau de communications | |
| US6839321B1 (en) | Domain based congestion management | |
| Kalampoukas et al. | Explicit window adaptation: A method to enhance TCP performance | |
| Grieco et al. | TCP westwood and easy RED to improve fairness in high-speed networks | |
| JP2008530841A (ja) | ネットワーク監視 | |
| Wang et al. | A simple refinement of slow-start of TCP congestion control | |
| Jain et al. | The TCP bandwidth-delay product revisited: network buffering, cross traffic, and socket buffer auto-sizing | |
| Man et al. | ImTCP: TCP with an inline measurement mechanism for available bandwidth | |
| Aweya et al. | Enhancing TCP performance with a load‐adaptive RED mechanism | |
| Capone et al. | Bandwidth estimates in the TCP congestion control scheme | |
| Biswal et al. | Comparative analysis of compound tcp with various end-to-end congestion control mechanisms | |
| Zhang et al. | Optimizing TCP start-up performance | |
| Aweya et al. | Enhancing network performance with TCP rate control | |
| WO2001065394A1 (fr) | Commande de trafic de bord-a-bord pour internet | |
| Chan et al. | Performance improvement of congestion avoidance mechanism for TCP Vegas | |
| Shihada et al. | Threshold-based TCP Vegas over optical burst switched networks | |
| Kim et al. | The FB-RED algorithm for TCP over ATM | |
| Lu et al. | EQF: An explicit queue-length feedback for TCP congestion control in datacenter networks | |
| Ho et al. | An enhanced slow-start mechanism for TCP Vegas |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| AK | Designated states |
Kind code of ref document: C2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: C2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
| COP | Corrected version of pamphlet |
Free format text: PAGES 1/5-5/5, DRAWINGS, REPLACED BY NEW PAGES 1/5-5/5; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 10204222 Country of ref document: US |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |