CN1694428A

CN1694428A - A Parallel Packet Order Preserving Switching Scheduling Strategy to Guarantee the Due Diligence of Links

Info

Publication number: CN1694428A
Application number: CNA2004100142634A
Authority: CN
Inventors: 田畅; 许波; 彭来献
Original assignee: PLA University of Science and Technology
Current assignee: PLA University of Science and Technology
Priority date: 2004-03-10
Filing date: 2004-03-10
Publication date: 2005-11-09

Abstract

This invention discloses a parallel sequence exchange dispatch strategy for guaranteeing the work of links including an input shunt dispatch algorithm, the storage management of an internal exchange unit and its output control. The method includes: before IP arrives at the shunt, it is divided into cells with definite lengths, the shunt selects the exchange units with the shortest internal queue and sends the cell to the selected exchange unit under the condition of securing number of cells output at the same time does not exceed the limited exchange unit number in the exchange unit with the idle internal input link with definite systematic accelerate factors. In order to limit the number of the output exchange units, the queue date block of the ports in the exchange units are labeled with 'cells' and 'idle', at the time slots, only the exchange units with 'cells' output cells.

Description

A kind of parallel grouping order-preserving exchange scheduling strategy that guarantees the responsible work of link

Technical field the present invention relates to a kind of Parallel packet switch scheduling strategy of a plurality of crosspoints, it not only guarantees the responsible work (work-conserving) of output link, can guarantee simultaneously the exchange according to the order of sequence of dividing into groups mainly to be applicable to high speed, multiport, jumbo ip router/switch.

Background technology is along with transmission technology with based on the development of the multi-service integrated technology of IP, packet transaction ability to the main line high speed router has proposed more and more higher requirement, and the restriction of packet transaction ability obviously becomes the principal element of restriction high speed router performance.High speed router is owing to the requirement of its high performance and system's expansion capacity, and traditional centralized message is handled the requirement that can't satisfy the message high speed processing with the architecture of transmitting.Existing in the world line-speed router mostly adopts the architecture of message distributed treatment at present, the processing of independently finishing message by each interface circuit with table look-up, be sent to corresponding ports by switching network, and system processor only is responsible for system maintenance, the processing of Routing Protocol and the maintenance of routing table.This distributed treatment architecture has alleviated the processing load of system processor, and each interface circuit also only needs to handle the grouping that arrives the port, thereby system can reach the high processing ability.But, when interface rate reaches 10Gbps (STM-192) when above, by present technology and technological level, interface circuit is wanted to finish the packet transaction that adapts with interface rate reliably in real time and is still had certain difficulty [1] [2], and the high speed switching network will realize that the above exchange transmission rate of 10Gbps is to realizing that technology also is a kind of challenge [3].Will further improve the capacity and the disposal ability of router, the disposal ability that employ new technology on the one hand, new technology improves processing module on the other hand by the improved system structure, adopts the disposal ability of the parallel processing raising system of a plurality of processing modules.

Distributed frame has improved the throughput of system and the extensibility of high speed router, and the disposal ability of system depends primarily on the transmission rate of port number, port disposal ability and switching network.Yet when link rate reaches 10Gbps when above, switching network and interface circuit are difficult to satisfy the requirement of the transmission rate and the packet transaction ability of high-speed link.A kind of solution intuitively is exactly that interface circuit adopts the architecture of a plurality of low-speed processing module parallel processing to realize the high speed processing and the forwarding of message.Simultaneously, switching network also adopts the structure of the parallel exchange of a plurality of low speed switching networks to improve the packet switched capabilities of system.

The IP message that the router line interface receives is sent to each processing module respectively according to specific dispatching algorithm, and each processing module is finished the processing and the forwarding of message independently.Like this, each processing module only need be handled the grouping that a part receives from line interface, the requirement of the disposal ability of module is not needed very high, is convenient to realize reliably the processing capacity of various complexity; And can realize the high speed processing of interface IP message by the combination of module, and system has expandability preferably.Shown in Figure 1 is the system architecture schematic diagram of a N port k module combinations.

This multimode parallel processing structure makes the message processing load of each module reduce greatly, can develop big capacity, high performance line-speed router product according to existing technology and technological level.Because each processing module and switching network independently are finished the processing and the forwarding of message, have reduced the packet processing load of individual module and the transmission rate of switching network, help system data and transmit reliably and handle the reliability of raising system.

In order to improve the treatment effeciency of parallel processing system (PPS), designed scheduling controlling strategy should the assurance system be responsible work (work-conserving) [4] [5].So-called responsible work exactly as long as in the system abundant grouping is arranged, just should guarantee that output link exports at full capacity.And dispatching algorithm design is improper, " work holdup " of system will occur.On the one hand, packet awaits is arranged in the system, and on the other hand, the free time of exchange plane but occurs.The dispatching algorithm of the responsible work of therefore, assurance system is to improve the key of parallel processing system (PPS) efficient.But the introducing of parallel processing system (PPS) has brought out-of-sequence [6] of grouping again inevitably, and out-of-sequence can bringing the performance of network has a strong impact on.According to document [6], the responsible work of parallel switching systems is a pair of implacable contradiction with grouping order-preserving exchange.Therefore, be exactly the problem that the high-speed parallel packet-switch technology must solve how in the efficient that guarantees to improve as far as possible under the not out-of-sequence condition of grouping parallel switching systems.

In order to exchange the needs in the control, present high speed router all adopts the fixed length cell exchanging mechanism.The IP message that arrives is split into the grouping of a plurality of regular lengths, and what claim this regular length is grouped into a cell (the not necessarily cell of ATM).Storage management in high speed router, queue scheduling and exchange all are to be to handle unit with the cell.Therefore, the present invention also is the cell scheduling problem of handling in the parallel switching systems.

According to the characteristics of the parallel exchange of high speed router, we think the responsible work that there is no need really to accomplish each crosspoint of parallel switching systems just can reach the identical performance of output work queue's switching fabric fully as long as guarantee the responsible work of output link.Simultaneously, can also reduce requirement to the output buffer capacity.Different with packet switching, the fixed length cell exchange can realize order-preserving and responsible work under certain system's accelerated factor.

Summary of the invention the invention provides the parallel grouping order-preserving exchange scheduling strategy of the responsible work of a kind of new assurance output link---the shortest limited dispatching algorithm RPED (Restricted Practical Earliest Departure) that may leave away the time.

The speed that we establish the inner exchanging unit is that the speed of r, peripheral link is R, and the quantity of inner exchanging unit is k, and the definition accelerated factor is s=kr/R.The time that we can define a cell of peripheral link transmission is a time slot.For the cell that the peripheral link that from speed is R receives, its internal transmission need take Individual time slot.Therefore, any time, the link from splitter to the inner exchanging unit takies at most

Individual, have at least

Individual link idle.And any time, for some output ports, as long as have

Individual crosspoint output cell just can guarantee the output link oepration at full load.If be less than

Individual crosspoint output cell, then Kong Xian inside discharging chain way more than or equal to Therefore, need only Just can guarantee that the cell one that arrives selects a crosspoint surely, the inside input link of its due in and the output link when needing output all be idle, guarantee that cell exports at specific time slot.When That is accelerated factor, Can guarantee the responsible work of output link.

This dispatching algorithm is according to above principle, and the number of links of restriction inner output of any time mostly is most

(cell credit that only arrives certain output port in all crosspoints is less than

The time, the number of links of inner output less than Guarantee the order-preserving exchange of cell.Because the number of links that each grouping that arrives all is selected at inner output is no more than Condition under the shortest crosspoint (accelerated factor of formation Guarantee necessarily can choose this crosspoint), therefore, both can guarantee the responsible work of output link, can guarantee the service earlier first of cell again.

The present invention is further described below in conjunction with drawings and Examples for description of drawings.

Fig. 1 parallel switching architecture model that divides into groups;

The same output port buffer management of Fig. 2 schematic diagram;

Fig. 3 is the implementation (three inner exchanging unit) of the inventive method one embodiment.

Embodiment is convenient for memory management, and we adopt the formation of fixed length number pick piece managed storage, and cell of each data block store is provided with a cell storage queue for each output port in the inner exchanging unit.The qualification maximum queue length of supposing each output port in each crosspoint is no more than n cell, and therefore (numbering 0～n-1) is stored data to an available n data block.

As shown in Figure 2, each output port j is provided with reading and writing pointer p respectively _JrAnd p _JwWherein, read pointer p _JrBe used for mailing to output multiplexer from the buffering area sense data of crosspoint, write pointer p _JwBe used to refer to the internal storage data piece that writes.Any one output port j, all crosspoints are all used same read pointer and write pointer.Obviously, k crosspoint, mountain p _JrAnd p _JwTotal k (one of each unit) of data block pointed.

Can establish p when initial _Jr=0, p _Jw=1.If in k the crosspoint by the read pointer indication cell is arranged, then these cells are exported simultaneously and (are had at most

Individual output simultaneously).Export (the needs that finish

Individual time slot), revise read pointer

p_{jr} &DoubleLeftArrow; p_{jr} + 1 \mod n .

If p _Jr=p _Jw, then revise write pointer

p_{jw} &DoubleLeftArrow; p_{jw} + 1 \mod n .

Arrive input port i for each, destination interface is the cell of j, write pointer p _JwHave at least in the k pointed data block Individual is idle.In the crosspoint at these freed data blocks places, look for the inner input link free time that links to each other with input port i crosspoint (when

The time certainly exist), data are sent to this idle crosspoint.If p _JwThe data block number that has taken in the k pointed data block is

The time, revise write pointer

p_{jw} &DoubleLeftArrow; p_{jw} + 1 \mod n .

If p _Jr=p _Jw, the expression buffering area is full, then abandons grouping according to corresponding congestion control policy.

Because defining any time slot can only have at most

Individual inner exchanging unit output cell, therefore, needing respectively to the internal storage location of each storage cell, mark takies and idle two states.We take with " ● " mark internal storage location.During output, only the crosspoint that is " ● " when current internal storage location output token is just exported cell by inner output link.

Fig. 3 is the specific output port buffer management embodiment of of three inner exchanging unit.For three crosspoints (k=3), its accelerated factor is

Can get the speed r=0.5R of inner exchanging unit thus.No matter be demodulation multiplexer from input to the inner exchanging unit, still crosspoint is to the inner link of output multiplexer internally, time of a cell of its transmission is two time slots.As long as two inner exchanging unit output cells are arranged at any time, can guarantee the output link oepration at full load.Therefore, write pointer p _JwHave two in the buffer location pointed at least for empty, that is to say that the input demodulation multiplexer has at least two inner exchanging unit to select.And simultaneously, three internal transmission links of input to three an inner exchanging unit have two at least for empty arbitrarily.Therefore, must select a crosspoint, its write pointer p _JwBuffer location pointed is empty, and input internal transmission link also is empty simultaneously.Like this, as long as enough cells are arranged, the cell number that any time exports is 2, guarantees that output link is responsible work.Simultaneously, the transmission of cell is service earlier definitely first, guarantees the not out-of-sequence of cell switching.

Those skilled in the art are on the present invention program basis; to choose different parameters (exchange plane is counted k, accelerated factor s etc.) or to be used for other parallel processing system (PPS) (tabling look-up etc.) and other scheme of making, also within the scope of protection of the invention as cell parallel buffer, grouping are parallel.

[list of references]

[1]Kumar?V.，Lakshman?T.，Stiliadis?D.，“Beyond?Best?Effort：Router?Architectures?for?the?Differentiated?Servicesof?Tomorrow’s?Internet”，IEEE?Communications?Magazine，May1998，pp.152-163。

[2]The?Evolution?of?Networks-The?Need?fbr?Routing?Switches，

http：//packetengines.com/education/techpapers/architecture/PR5200/Architecture.pdf

[3]McKeown?Nick，etc.，The?Tiny?Tera：A?Packet?Switch?Core，IEEE?Micro?Jan/Feb?1997，pp?26-33.

[4]Iyer?S.，Awadallah?A.，and?McKeown?N.，Analysis?of?a?packet?switch?with?memories?runnxng?slower?than?theline-rate，IEEE?Infocom?March?2000，Tel-Aviv，Israel.

[5]Krishna?P.，Patel?N.S.，etc.，On?the?Speedup?Required?for?Work-Consering?Crossbar?Switches，IEEE?Journal?ofSelected?Areas?in?Communications，Vol.17，No.6，June?1999.

[6]Bennett?J.C.R.，Partridge?C.，Packet?Reordering?is?not?Pathological?Network?Behavior，IEEE/ACM?Tran.OnNetworking，Vol.7，No.6，Dec.1999，pp789-798.

Claims

1. A parallel packet order-preserving switching scheduling strategy to ensure link due diligence, mainly including input splitter scheduling and multiple switching unit output scheduling strategies, characterized in that:

(1) Parallel switching based on fixed-length cells. IP packets of different lengths are divided into fixed-length "cells" before arriving at the input end, and then sent to the link after being reassembled at the output end; for the cells arriving at the input end, the splitter scheduling algorithm selects a switching unit for it , the switching unit sets an output queue for each output port, and puts the arriving cell into the corresponding output queue of the switching unit according to the output port number of the arriving cell.

(2) The scheduling algorithm of the splitter is called "Restricted Shortest Possible Time Departure Scheduling Algorithm" RPED (Restricted Practical Earliest Departure). The basic idea is to satisfy the number of internal exchange units k≥2

-1 The number of output cells per time slot does not exceed Under the condition of , the splitter selects the switching unit with the shortest output port queue among the switching units for each arriving cell. Here R is the external link rate, and r is the link rate of the internal switching unit.

(3), in order to ensure that the number of output cells per time slot does not exceed , and each storage data block shall be marked as "cell" or "free". When outputting, only those switching units whose current data block "has cells" output data.

(4) The output of all internal exchange units and the adjustment of read and write pointers are performed synchronously.

2. The parallel packet switching scheduling strategy according to claim 1, characterized in that: when the number of internal switching units k≥2 Under the condition of -1, the scheduling algorithm ensures that the number of output cells in each time slot does not exceed , so as to achieve order-preserving exchange.

3. The parallel packet switching scheduling strategy as claimed in claim 1, characterized in that: when ensuring that the number of output cells in each time slot does not exceed Under the condition of , input the cell into the queue as the shortest switching unit, so as to ensure the due diligence of the output link.

4. The parallel packet switching scheduling strategy as claimed in claim 1, characterized in that: in order to ensure that the number of output cells per time slot does not exceed , and each storage data block shall be marked as "cell" or "free". When outputting, only those switching units whose current data block "has cells" output data.