[go: up one dir, main page]

CN118784593A - Multi-level scheduler - Google Patents

Multi-level scheduler Download PDF

Info

Publication number
CN118784593A
CN118784593A CN202410410327.XA CN202410410327A CN118784593A CN 118784593 A CN118784593 A CN 118784593A CN 202410410327 A CN202410410327 A CN 202410410327A CN 118784593 A CN118784593 A CN 118784593A
Authority
CN
China
Prior art keywords
port
packet
queue
selection
sets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410410327.XA
Other languages
Chinese (zh)
Inventor
W·B·马修斯
A·阿拉帕蒂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Marvell Asia Pte Ltd
Original Assignee
Marvell Asia Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/227,117 external-priority patent/US20240340250A1/en
Application filed by Marvell Asia Pte Ltd filed Critical Marvell Asia Pte Ltd
Publication of CN118784593A publication Critical patent/CN118784593A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/21Flow control; Congestion control using leaky-bucket
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/215Flow control; Congestion control using token-bucket
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/22Traffic shaping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/52Queue scheduling by attributing bandwidth to queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Embodiments of the present disclosure relate to a multi-level scheduler. Packet metadata for incoming packets is buffered in a queue selection buffer associated with a port of the network node. Packet data for outgoing packets is buffered in a port selection buffer associated with the port. At the selection clock period, when the port scheduler of the network node selects a subset of the subset packet data for the outgoing packet from the port selection buffer, the queue scheduler of the port simultaneously selects a subset of the packet metadata for the subset of the incoming packet from the queue selection buffer and adds new packet data of the new outgoing packet to the port selection buffer of the port. New packet data is derived based at least in part on the subset of packet metadata for the subset of incoming packets.

Description

多级调度器Multi-level scheduler

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本申请要求于2023年4月4日提交的美国临时专利申请号63/457,122的优先权,该美国临时专利申请通过引用并入本文。This application claims priority to U.S. Provisional Patent Application No. 63/457,122 filed on April 4, 2023, which is incorporated herein by reference.

技术领域Technical Field

实施例总体上涉及分组递送,并且更具体地涉及多级调度器。Embodiments relate generally to packet delivery, and more particularly to multi-level schedulers.

背景技术Background Art

本部分中描述的方法是可以采用的方法,但不一定是之前已经设想或采用的方法。因此,除非另有说明,否则不应假设本部分中描述的任何方法仅由于它们包含在本部分中而被认为是现有技术。The approaches described in this section are approaches that could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.

对于各种计算机应用中使用的数据通信网络或交换设备来说,最大限度地减少下一跳和端到端分组递送的延迟通常是非常重要的目标。随着越来越多的高容量交换设备被部署在现场,分组递送路径中存在的不同处理组件可用于处理越来越多的分组的时间预算变得越来越短。结果是,包括但不限于网络接口或端口的一些或全部处理组件可能花费大量时间空闲等待分组到达。虽然这些网络接口或端口可以在名义上或理论上具有高的分组递送能力或带宽,但是如此高的能力或带宽中的许多仍然可能被浪费。例如,当上游处理组件无法及时处理和传送足够数目的分组到这些网络接口或端口以进行分组递送时,网络接口或端口可能会空闲很长时间。For data communication networks or switching devices used in various computer applications, minimizing the delay of next hop and end-to-end packet delivery is usually a very important goal. As more and more high-capacity switching devices are deployed in the field, the time budget that different processing components present in the packet delivery path can be used to process more and more packets becomes shorter and shorter. As a result, some or all processing components including but not limited to network interfaces or ports may spend a lot of time idle waiting for packets to arrive. Although these network interfaces or ports may have high packet delivery capabilities or bandwidths in nominal or theoretical terms, many of such high capabilities or bandwidths may still be wasted. For example, when the upstream processing components cannot process and transmit a sufficient number of packets to these network interfaces or ports in a timely manner for packet delivery, the network interfaces or ports may be idle for a long time.

发明内容Summary of the invention

根据本公开的一个或多个实施例,公开了一种方法,包括:将用于多个传入分组的多个分组元数据集合缓冲在与网络节点的端口相关联的多个队列选择缓冲器中;将用于一个或多个传出分组的一个或多个分组元数据集合缓冲在与该端口相关联的端口选择缓冲器中;在选择时钟周期,在由该网络节点的端口调度器从该端口选择缓冲器选择用于该一个或多个传出分组的子集的该一个或多个分组元数据集合的子集时,由该端口的队列调度器对针对该端口设置的多个分组队列同时执行:从存储在该多个队列选择缓冲器中的该多个分组元数据集合之中选择用于多个该传入分组中的一个或多个传入分组的一个或多个第二分组元数据集合;将用于一个或多个第二传出分组的该一个或多个第二分组元数据集合添加到该端口的该端口选择缓冲器。According to one or more embodiments of the present disclosure, a method is disclosed, comprising: buffering multiple packet metadata sets for multiple incoming packets in multiple queue selection buffers associated with a port of a network node; buffering one or more packet metadata sets for one or more outgoing packets in a port selection buffer associated with the port; in a selection clock cycle, when a subset of the one or more packet metadata sets for a subset of the one or more outgoing packets is selected from the port selection buffer by a port scheduler of the network node, the queue scheduler of the port simultaneously executes for multiple packet queues set for the port: selecting one or more second packet metadata sets for one or more incoming packets among the multiple incoming packets from the multiple packet metadata sets stored in the multiple queue selection buffers; and adding the one or more second packet metadata sets for one or more second outgoing packets to the port selection buffer of the port.

根据本公开的一个或多个实施例,公开了一种系统,包括:一个或多个计算设备;一个或多个非暂态计算机可读介质,存储有指令,该指令在由该一个或多个计算设备执行时引起以下的执行:将用于多个传入分组的多个分组元数据集合缓冲在与网络节点的端口相关联的多个队列选择缓冲器中;将用于一个或多个传出分组的一个或多个分组元数据集合缓冲在与该端口相关联的端口选择缓冲器中;在选择时钟周期,在由该网络节点的端口调度器从该端口选择缓冲器选择用于该一个或多个传出分组的子集的该一个或多个分组元数据集合的子集时,由该端口的队列调度器对针对该端口设置的多个分组队列同时执行:从存储在该多个队列选择缓冲器中的该多个分组元数据集合之中选择用于该多个传入分组中的一个或多个传入分组的一个或多个第二分组元数据集合;将用于一个或多个第二传出分组的该一个或多个第二分组元数据集合添加到该端口的该端口选择缓冲器。According to one or more embodiments of the present disclosure, a system is disclosed, comprising: one or more computing devices; one or more non-transitory computer-readable media storing instructions, which, when executed by the one or more computing devices, cause the following execution: buffering multiple packet metadata sets for multiple incoming packets in multiple queue selection buffers associated with a port of a network node; buffering one or more packet metadata sets for one or more outgoing packets in a port selection buffer associated with the port; in a selection clock cycle, when a subset of the one or more packet metadata sets for a subset of the one or more outgoing packets is selected from the port selection buffer by a port scheduler of the network node, the queue scheduler of the port simultaneously executes for multiple packet queues set for the port: selecting one or more second packet metadata sets for one or more incoming packets among the multiple incoming packets from the multiple packet metadata sets stored in the multiple queue selection buffers; and adding the one or more second packet metadata sets for one or more second outgoing packets to the port selection buffer of the port.

根据本公开的一个或多个实施例,公开了一个或多个非暂态计算机可读介质,存储有指令,该指令在由一个或多个计算设备执行时引起以下的执行:将用于多个传入分组的多个分组元数据集合缓冲在与网络节点的端口相关联的多个队列选择缓冲器中;将用于一个或多个传出分组的一个或多个分组元数据集合缓冲在与该端口相关联的端口选择缓冲器中;在选择时钟周期,在由该网络节点的端口调度器从该端口选择缓冲器选择用于该一个或多个传出分组的子集的该一个或多个分组元数据集合的子集时,由该端口的队列调度器对针对该端口设置的多个分组队列同时执行:从存储在该多个队列选择缓冲器中的该多个分组元数据集合之中选择用于该多个传入分组中的一个或多个传入分组的一个或多个第二分组元数据集合;将用于一个或多个第二传出分组的该一个或多个第二分组元数据集合添加到该端口的该端口选择缓冲器。According to one or more embodiments of the present disclosure, one or more non-transitory computer-readable media are disclosed, storing instructions, which, when executed by one or more computing devices, cause the following execution: buffering multiple packet metadata sets for multiple incoming packets in multiple queue selection buffers associated with a port of a network node; buffering one or more packet metadata sets for one or more outgoing packets in a port selection buffer associated with the port; in a selection clock cycle, when a subset of the one or more packet metadata sets for a subset of the one or more outgoing packets is selected from the port selection buffer by a port scheduler of the network node, the queue scheduler of the port simultaneously executes for multiple packet queues set for the port: selecting one or more second packet metadata sets for one or more incoming packets among the multiple incoming packets from the multiple packet metadata sets stored in the multiple queue selection buffers; and adding the one or more second packet metadata sets for one or more second outgoing packets to the port selection buffer of the port.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

在附图的图中以示例而非限制的方式图示了本发明的主题,附图中相似的附图标记指代相似的元件,并且其中:The subject matter of the invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references refer to similar elements, and in which:

图1图示了示例联网系统;FIG1 illustrates an example networked system;

图2图示了示例网络设备;FIG2 illustrates an example network device;

图3A图示了示例多级调度器;图3B图示了示例数据流;图3C和图3D图示了队列和端口调度器的示例操作;FIG. 3A illustrates an example multi-stage scheduler; FIG. 3B illustrates an example data flow; FIGS. 3C and 3D illustrate example operations of queue and port schedulers;

图4图示了示例端口束调度操作;FIG4 illustrates an example port bundle scheduling operation;

图5图示了示例处理流程;FIG5 illustrates an example process flow;

图6是示例计算机系统的框图。6 is a block diagram of an example computer system.

具体实施方式DETAILED DESCRIPTION

在下面的描述中,为了解释的目的,阐述了许多具体细节,以便提供对本发明主题的透彻理解。然而,将显而易见的是,可以在没有这些具体细节的情况下实践本发明主题。在其他实例中,以框图形式示出公知的结构和设备,以避免不必要地模糊本发明主题。In the following description, for the purpose of explanation, many specific details are set forth in order to provide a thorough understanding of the subject matter of the present invention. However, it will be apparent that the subject matter of the present invention can be practiced without these specific details. In other examples, well-known structures and devices are shown in block diagram form to avoid unnecessarily obscuring the subject matter of the present invention.

本文中根据以下要点描述实施例:Embodiments are described herein according to the following points:

1.0.总体概述1.0. General Overview

2.0.结构概述2.0. Structural Overview

2.1.数据单元2.1. Data Unit

2.2.网络路径2.2. Network Path

2.3.网络设备2.3. Network equipment

2.4.端口2.4.Port

2.5.分组处理器2.5. Packet Processor

2.6.缓冲器2.6. Buffer

2.7.队列2.7. Queues

2.8.流量管理Traffic Management

2.9.转发逻辑2.9. Forwarding Logic

2.10.多级调度器2.10. Multi-level scheduler

2.11.其他2.11. Others

3.0.功能概述3.0. Functional Overview

3.1.端口和队列调度3.1. Port and Queue Scheduling

3.2.端口或队列合格3.2. Port or queue qualified

3.3.工作节省3.3. Work Savings

3.4.队列选择数据流3.4. Queue selection data flow

3.5.队列状态3.5. Queue Status

3.6.队列选择策略3.6. Queue Selection Strategy

3.7.选择操作3.7. Select Operation

3.8.端口束调度3.8. Port Bundle Scheduling

4.0.示例实施例4.0. Example Embodiments

5.0.实现机制——硬件概述5.0. Implementation Mechanism - Hardware Overview

6.0.扩展和备选6.0. Extensions and alternatives

1.0.总体概述1.0. General Overview

随着不同网络节点或交换机之间以及网络节点或交换机内不同分组处理组件之间的分组的传输或传送速率变得越来越高,分组的到达与分组的传输/传送之间的时间变得越来越少。As the transmission or transfer rates of packets between different network nodes or switches and between different packet processing components within a network node or switch become higher and higher, the time between the arrival of a packet and the transmission/transfer of the packet becomes less and less.

可以实现或应用本文中描述的技术来将端口保持在活动分组传输或传送状态而不会出现饥饿。作为多级调度器的一部分,可以为一个(例如,每个等)端口设置端口选择缓冲器,同时可以为该端口的一个或多个队列设置一个或多个队列选择缓冲器。The techniques described herein may be implemented or applied to keep ports in active packet transmission or delivery states without starvation. As part of a multi-stage scheduler, a port selection buffer may be set for one (e.g., each, etc.) port, and one or more queue selection buffers may be set for one or more queues of the port.

多级调度器的端口调度器可以执行端口调度操作,该端口调度操作仲裁对跨(例如,合格的)端口的(例如,出口等)带宽或端口速率的访问,而用于多级调度器中的端口的(例如,每个等)每端口队列调度器可以执行队列调度操作,该队列调度操作仲裁跨(例如,合格的)端口队列的(例如,出口等)带宽或端口速率的访问。相同或不同的合格因素或其一种或多种组合可以用于确定端口或队列当中的给定端口或队列由端口或队列调度器从端口或队列选择缓冲器进行选择或出队是否是合格的。The port scheduler of the multi-level scheduler may perform a port scheduling operation that arbitrates access to bandwidth or port rates (e.g., egress, etc.) across (e.g., eligible) ports, and the (e.g., each, etc.) per-port queue scheduler for a port in the multi-level scheduler may perform a queue scheduling operation that arbitrates access to bandwidth or port rates (e.g., egress, etc.) across (e.g., eligible) port queues. The same or different eligibility factors, or one or more combinations thereof, may be used to determine whether a given port or queue among the ports or queues is eligible for selection or dequeuing by the port or queue scheduler from the port or queue selection buffer.

分别解析端口的端口状态以及与端口相关联的队列的队列状态的端口调度操作和队列调度操作可以在单独的定时循环中彼此解耦或解串行。这些端口调度操作和队列调度操作还可以与传输或传送从与端口相关联的队列选择的分组的单独操作解耦或解串行。The port scheduling operation and queue scheduling operation of respectively parsing the port state of the port and the queue state of the queue associated with the port can be decoupled or deserialized from each other in a separate timing loop. These port scheduling operations and queue scheduling operations can also be decoupled or deserialized from the separate operation of transmitting or transmitting the grouping selected from the queue associated with the port.

由于端口调度器或端口调度操作的操作和定时循环与队列调度器或队列调度操作的操作和定时循环解耦,而不是直接从与端口相关联的合格队列中拾取或选择用于传输或传送的分组,因此端口调度器或端口传送逻辑可以从端口选择缓冲器中为合格的端口拾取或选择分组。端口选择缓冲器可以存储或缓冲分组的分组数据集合,以使得这些分组能够在没有时间延迟或几乎没有时间延迟的情况下进行传输或传送操作。换句话说,可以具体定义、生成、确定大小(sized)或组成从端口选择缓冲器缓冲和/或选择的分组数据集合,以促进以相对高的速度和最小化的延迟进行传输或传送操作。Because the operation and timing loop of the port scheduler or port scheduling operation are decoupled from the operation and timing loop of the queue scheduler or queue scheduling operation, instead of picking up or selecting packets for transmission or transmission directly from the qualified queue associated with the port, the port scheduler or port transmission logic can pick up or select packets for qualified ports from the port selection buffer. The port selection buffer can store or buffer a packet data set of packets so that these packets can be transmitted or transmitted without or with almost no time delay. In other words, the packet data set buffered and/or selected from the port selection buffer can be specifically defined, generated, sized or composed to facilitate transmission or transmission operations at relatively high speeds and minimized delays.

虽然端口具有执行端口调度操作的端口调度器,但是端口的队列具有在单独的定时循环中执行队列调度操作的单独的队列调度器。端口的一个或多个队列中的每个队列具有或被指派有单独的(每队列)队列选择缓冲器,在队列选择缓冲器中可以保持、维持或更新队列的队列状态。Although the port has a port scheduler that performs port scheduling operations, the queues of the port have separate queue schedulers that perform queue scheduling operations in separate timing loops. Each of the one or more queues of the port has or is assigned a separate (per queue) queue selection buffer in which the queue state of the queue can be maintained, maintained or updated.

队列的队列选择缓冲器可以在分组的队列选择缓冲器条目中存储表示在队列选择缓冲器中的多个(例如,相对小的数据大小、50位等)分组元数据集合——或从分组中划分的信元/传送。分组元数据可以包括但不仅限于:由队列调度器可以使用以促进或控制分组/信元选择操作、分组/信元复制操作、到相应端口选择缓冲器中的分组/信元入队操作等的分组/信元控制信息。端口选择缓冲器和队列选择缓冲器可以仅包含分组元数据。例如,分组元数据可以存储在队列选择缓冲器(或每队列缓冲器)中,并且然后在选择时传送到端口选择缓冲器。来自一个或多个队列缓冲器条目或元素的分组元数据可以在选择队列或相应的每队列缓冲器时传送。同样,来自一个或多个端口缓冲器条目或元素的分组元数据可以在选择端口或端口选择缓冲器时选择。在一些操作场景中,与端口选择缓冲器中的分组的分组数据或分组有效负载相比,相关联的端口或队列选择缓冲器中的分组的分组元数据集合可以具有相对小的数据大小。The queue selection buffer of the queue may store in the queue selection buffer entry of the packet a plurality of (e.g., relatively small data size, 50 bits, etc.) sets of packet metadata representing the queue selection buffer - or cells/transmissions divided from the packet. The packet metadata may include, but is not limited to: packet/cell control information that may be used by the queue scheduler to facilitate or control packet/cell selection operations, packet/cell copy operations, packet/cell enqueue operations to the corresponding port selection buffer, etc. The port selection buffer and the queue selection buffer may contain only packet metadata. For example, the packet metadata may be stored in the queue selection buffer (or per-queue buffer) and then transmitted to the port selection buffer upon selection. Packet metadata from one or more queue buffer entries or elements may be transmitted upon selection of a queue or a corresponding per-queue buffer. Similarly, packet metadata from one or more port buffer entries or elements may be selected upon selection of a port or port selection buffer. In some operational scenarios, the packet metadata set of the packet in the associated port or queue selection buffer may have a relatively small data size compared to the packet data or packet payload of the packet in the port selection buffer.

队列调度器可以使用队列状态和数据大小相对小的分组元数据集合,以轻松地在没有或几乎没有时间延迟以及没有或几乎没有附加操作的情况下在端口的队列中进行合格的队列选择;为所选的(一个或多个)分组生成(一个或多个)分组元数据集合,并且将包括但不限于(例如,附加的等)端口传输控制信息的(一个或多个)分组数据集合放置到端口的端口选择缓冲器中。The queue scheduler can use queue status and relatively small data size packet metadata sets to easily make eligible queue selections in the port's queue with no or almost no time delay and no or almost no additional operations; generate (one or more) packet metadata sets for the selected (one or more) packets, and place (one or more) packet data sets including but not limited to (e.g., additional, etc.) port transmission control information into the port selection buffer of the port.

同时,在单独的定时循环中,端口调度器可以独立并且同时拾取或选择合格的(准备发送分组的)端口,并且使用从合格端口的端口选择缓冲器中检索的一个或多个特定的分组数据集合或特定的端口传输控制信息,以访问或确定(例如,所有的等)待传输或传送的分组数据以供分组发送(例如,传输或传送等)。At the same time, in a separate timing loop, the port scheduler can independently and simultaneously pick up or select eligible (ready to send packets) ports, and use one or more specific packet data sets or specific port transmission control information retrieved from the port selection buffer of the eligible ports to access or determine (e.g., all, etc.) packet data to be transmitted or transmitted for packet sending (e.g., transmission or transmission, etc.).

诸如存储在队列选择缓冲器中的队列状态的控制信息可以用于从队列选择缓冲器选择分组/信元或相应的队列选择缓冲器条目。队列状态可以包括但不一定仅限于以下的一些或全部:空状态、拾取时空状态的一些或全部以及分组结束或拾取时EOP状态等。队列选择缓冲器中的队列状态可以被队列调度器用于下一个队列选择周期。有多少个选择使分组出列可以基于分组的信元或传送计数。附加地、可选地或备选地,可以基于分组的分组副本计数使分组出队多次。如本文中所述的队列选择可以使分组的单个信元、一个或多个分组的多个信元、单个分组副本、多个分组副本、在多个队列中表示的多个分组/信元等出列。Control information such as queue states stored in a queue selection buffer can be used to select packets/cells or corresponding queue selection buffer entries from the queue selection buffer. The queue state may include but is not necessarily limited to some or all of the following: an empty state, some or all of the empty state at pickup time, and an EOP state at the end or pickup of a packet, etc. The queue state in the queue selection buffer can be used by the queue scheduler for the next queue selection cycle. How many selections are made to dequeue a packet can be based on the cell or transmission count of the packet. Additionally, optionally or alternatively, a packet can be dequeued multiple times based on a packet copy count of the packet. The queue selection as described herein can dequeue a single cell of a packet, multiple cells of one or more packets, a single packet copy, multiple packet copies, multiple packets/cells represented in multiple queues, etc.

在一些操作场景中,端口或队列选择缓冲器可以被实现为FIFO。例如,可以通过为队列设置的队列选择缓冲器来缓冲端口的队列中所有分组当中的前N个分组。如果队列当前为空,则到达的第一个分组可以立即插入到队列选择缓冲器中。在一些操作场景中,可以实现端口或队列选择缓冲器而不是FIFO。例如,基于优先级的方案可以用于确定端口的队列中的所有分组当中的哪些分组将利用为该队列设置的队列选择缓冲器来缓冲。In some operation scenarios, a port or queue selection buffer may be implemented as a FIFO. For example, the first N packets among all packets in the queue of a port may be buffered by the queue selection buffer provided for the queue. If the queue is currently empty, the first packet that arrives may be inserted into the queue selection buffer immediately. In some operation scenarios, a port or queue selection buffer may be implemented instead of a FIFO. For example, a priority-based scheme may be used to determine which packets among all packets in the queue of a port will utilize the queue selection buffer provided for the queue to buffer.

在一些操作场景中,分组不是将分组作为一个整体传送到下一个分组处理组件(例如,在同一网络节点/交换机、流量管理器、分组处理器等内)或下一跳,而是通过传输或传送信元或从分组中划分的传送的方式来传输或传送。在这些操作场景中,如本文中所描述的从端口/信元选择缓冲器的选择可以被做出作为信元/传送(级别)选择,而不是(整个)分组级的选择。In some operational scenarios, rather than delivering the packet as a whole to the next packet processing component (e.g., within the same network node/switch, traffic manager, packet processor, etc.) or the next hop, the packet is delivered or transmitted by transmitting or transmitting cells or transmissions divided from the packet. In these operational scenarios, the selection from the port/cell selection buffer as described herein can be made as a cell/transmission (level) selection, rather than a (whole) packet level selection.

队列调度器可以利用一个或多个队列选择策略(或队列服务规则)来实现或执行其队列调度操作,该队列选择策略(或队列服务规则)诸如严格优先级(SP)、加权赤字轮询(weighted deficit round-robin,WDRR)、加权(WFQ)、基于字节的队列选择策略、基于分组的队列选择策略、两个或更多个不同队列选择策略的组合、其他或附加到本文中详细说明的那些之外的不同队列选择策略等。例如,端口的多个队列中的第一队列可以被分配有十(10)个分组,而多个队列中的第二队列可以被分配有五(5)个分组。可以为队列或其中的业务流指定最大带宽,以用作流量整形或速率限制操作的整形阈值,例如利用令牌桶整形器、漏桶整形器(leaky bucket shaper)等。附加地、可选地或备选地,可以在具有不同队列或不同队列组的多级层次结构中设置或做出带宽分配和统计,而不是单个或统一的端口级或队列级的带宽级别。The queue scheduler may implement or perform its queue scheduling operations using one or more queue selection policies (or queue service rules), such as strict priority (SP), weighted deficit round-robin (WDRR), weighted (WFQ), byte-based queue selection policy, packet-based queue selection policy, a combination of two or more different queue selection policies, different queue selection policies other than or in addition to those described in detail herein, etc. For example, a first queue of a plurality of queues of a port may be assigned ten (10) packets, while a second queue of a plurality of queues may be assigned five (5) packets. A maximum bandwidth may be specified for a queue or a traffic flow therein to be used as a shaping threshold for traffic shaping or rate limiting operations, such as using a token bucket shaper, a leaky bucket shaper, etc. Additionally, optionally, or alternatively, bandwidth allocations and statistics may be set or made in a multi-level hierarchy with different queues or different queue groups, rather than a single or uniform port-level or queue-level bandwidth level.

如本文中描述的队列选择策略或队列服务列表中确定或指定的带宽(分配或消耗)可以在给定的时间段内相关联、测量或应用。在一些操作场景中,可以为相同的分组处理组件、端口、队列、业务流等的不同时间段设置或配置不同的带宽或速率。带宽分配可以用于确定或设置速率,以添加或填充的相应的令牌桶中的令牌,而带宽使用或消耗可以用于从令牌桶中移除或丢弃令牌。从多个不同的队列选择策略之中选择的特定队列选择策略可以用作其他策略的后备。可以响应于确定其他队列选择策略已经被应用或执行,或者在(一个或多个)令牌桶中的令牌已经在这些其他队列选择策略中基于特定和/或消耗速率或带宽被丢弃和耗尽之后,应用后备选择策略。The bandwidth (allocation or consumption) determined or specified in the queue selection policy or queue service list as described herein can be associated, measured or applied within a given time period. In some operating scenarios, different bandwidths or rates can be set or configured for different time periods of the same packet processing component, port, queue, business flow, etc. Bandwidth allocation can be used to determine or set the rate to add or fill the tokens in the corresponding token bucket, while bandwidth usage or consumption can be used to remove or discard tokens from the token bucket. A specific queue selection policy selected from a plurality of different queue selection policies can be used as a backup for other policies. A backup selection policy can be applied in response to determining that other queue selection policies have been applied or executed, or after the tokens in (one or more) token buckets have been discarded and exhausted based on a specific and/or consumption rate or bandwidth in these other queue selection policies.

如本文所描述的多级调度器中的端口调度操作和队列调度操作可以是工作节省的。例如,本文中描述的队列调度器可以被配置为将选择填充或推送到端口的端口选择缓冲器,直到端口选择缓冲器变满,使得端口将尽可能不空闲。The port scheduling operation and queue scheduling operation in the multi-level scheduler as described herein can be work-saving. For example, the queue scheduler described herein can be configured to select a port selection buffer that fills or pushes to a port until the port selection buffer becomes full so that the port will be as idle as possible.

在一些操作场景中,本文中描述的端口调度器可以实现时分复用(TDM)调度策略或服务规则。端口可以被细分或分解为子端口并且分配时间间隔的时隙。分配给端口的时隙可以与同一时间间隔内分配给其他端口的其他时隙混杂或混合,以允许同一端口的子端口均匀地分布在该时间间隔内的整个分配的时隙上,并且将在接收设备或分组处理组件处的抖动最小化。其中没有分组要发送(或传输/传送)的空闲端口可以从时隙分配中跳过。In some operational scenarios, the port scheduler described herein may implement a time division multiplexing (TDM) scheduling strategy or service rule. A port may be subdivided or decomposed into subports and allocated time slots of time intervals. The time slots allocated to a port may be intermixed or mixed with other time slots allocated to other ports in the same time interval to allow subports of the same port to be evenly distributed over the entire allocated time slots in the time interval and to minimize jitter at a receiving device or packet processing component. Idle ports where no packets are to be sent (or transmitted/transmitted) may be skipped from time slot allocation.

公开了用于调度分组以由网络内的分组处理组件或网络节点/交换机进行传输或传送操作的方法、技术和机制。用于多个传入分组的多个分组元数据集合在与网络节点的端口相关联的多个队列选择缓冲器中被缓冲。用于一个或多个传出分组的一个或多个分组数据集合在与端口相关联的端口选择缓冲器中被缓冲。在选择时钟周期,当网络节点的端口调度器从端口选择缓冲器中为一个或多个传出分组的子集选择一个或多个分组数据集合的子集时,端口的队列调度器同时执行对于为端口设置的多个报文队列的后续操作。从存储在多个队列选择缓冲器之中的多个分组元数据集合选择用于多个传入分组中的一个或多个传入分组的一个或多个分组元数据集合。一个或多个第二传出分组的一个或多个第二分组数据集合添加到端口的端口选择缓冲器。至少部分地基于一个或多个传入分组的一个或多个分组元数据集合来导出一个或多个第二分组数据集合。Disclosed are methods, techniques and mechanisms for scheduling packets for transmission or transfer operations by a packet processing component or a network node/switch within a network. Multiple packet metadata sets for multiple incoming packets are buffered in multiple queue selection buffers associated with a port of a network node. One or more packet data sets for one or more outgoing packets are buffered in a port selection buffer associated with a port. During a selection clock cycle, when a port scheduler of a network node selects a subset of one or more packet data sets for a subset of one or more outgoing packets from a port selection buffer, a queue scheduler of the port simultaneously performs subsequent operations for multiple message queues set for the port. One or more packet metadata sets for one or more incoming packets in a plurality of incoming packets are selected from a plurality of packet metadata sets stored in a plurality of queue selection buffers. One or more second packet data sets for one or more second outgoing packets are added to the port selection buffer of the port. One or more second packet data sets are derived based at least in part on the one or more packet metadata sets for one or more incoming packets.

在其他方面,本发明主题涵盖被配置为执行前述技术的计算机装置和/或计算机可读介质。In other aspects, the present subject matter encompasses computer devices and/or computer-readable media configured to perform the foregoing techniques.

2.0.结构概述2.0. Structural Overview

图1图示了根据实施例的示例联网系统100(也称为网络)的示例方面,在该示例联网系统中可以实践本文中描述的技术。联网系统100包括多个互连的节点110a-110n(统称为节点110),每个节点由不同的计算设备实现。例如,节点110可以是诸如路由器或交换机的单个联网计算设备,在节点中,本文中描述的处理组件中的一些或全部在专用集成电路(ASIC)、现场可编程门阵列(FPGA)、或(一个或多个)其他集成电路中实现。作为另一示例,节点110可以包括存储用于实现本文中描述的各种组件的指令的一个或多个存储器、被配置为执行存储在一个或多个存储器中的指令的一个或多个硬件处理器、以及在用于存储由各个组件使用和操作的数据结构的一个或多个存储器中的各种数据储存库。FIG. 1 illustrates example aspects of an example networking system 100 (also referred to as a network) according to an embodiment, in which the techniques described herein may be practiced. The networking system 100 includes a plurality of interconnected nodes 110a-110n (collectively referred to as nodes 110), each node being implemented by a different computing device. For example, the node 110 may be a single networking computing device such as a router or a switch, in which some or all of the processing components described herein are implemented in an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or (one or more) other integrated circuits. As another example, the node 110 may include one or more memories storing instructions for implementing the various components described herein, one or more hardware processors configured to execute instructions stored in the one or more memories, and various data repositories in one or more memories for storing data structures used and operated by the various components.

每个节点110通过一个或多个通信链路(被描绘为节点110之间的线路)连接到网络100中的一个或多个其他节点110。通信链路可以是任何合适的有线电缆或无线链路。注意的是,系统100仅图示了网络内节点的许多可能布置中的一种。其他网络可以包括更少或更多的节点110,节点之间具有任意数目的链路。Each node 110 is connected to one or more other nodes 110 in the network 100 via one or more communication links (depicted as lines between nodes 110). The communication links may be any suitable wired cables or wireless links. Note that the system 100 illustrates only one of many possible arrangements of nodes within a network. Other networks may include fewer or more nodes 110 with any number of links between nodes.

2.1.数据单元2.1. Data Unit

虽然每个节点110可以具有或不具有多种其他功能,但是在实施例中,每个节点110被配置为经由这些链路向一个或多个其他节点110发送、接收和/或中继数据。一般而言,数据作为由通过通信链路上传送的信号表示的一系列离散单元或数据结构来通信。While each node 110 may or may not have a variety of other functionalities, in an embodiment, each node 110 is configured to send, receive, and/or relay data via these links to one or more other nodes 110. In general, data is communicated as a series of discrete units or data structures represented by signals transmitted over the communication links.

网络100内的不同节点110可以在不同的通信级别或层处发送、接收和/或中继数据单元。例如,第一节点110可以通过包括中间节点110的路径向第二节点110发送网络层的数据单元(例如,TCP段等)。该数据单元可以被分成在从第一节点110传输数据之前、在各个子级别的较小数据单元。这些较小数据单元可以被称为较大数据单元的“子单元”或“部分”。Different nodes 110 within the network 100 may send, receive, and/or relay data units at different communication levels or layers. For example, a first node 110 may send a data unit (e.g., a TCP segment, etc.) at the network layer to a second node 110 via a path that includes the intermediate nodes 110. The data unit may be divided into smaller data units at various sub-levels before the data is transmitted from the first node 110. These smaller data units may be referred to as "sub-units" or "portions" of the larger data unit.

例如,TCP段可以被分成分组、然后是信元,最后作为信号编码位的集合被发送到中间设备。取决于中间节点110的网络类型和/或设备类型,中间节点110可以在将信息路由到第二节点110之前重建整个原始数据单元,或者中间节点110可以简单地重建数据的某些子单元(例如,帧和/或信元等)并且将这些子单元路由到第二节点110,而无需组成整个原始数据单元。For example, a TCP segment may be divided into packets, then cells, and finally sent to the intermediate device as a collection of signal encoding bits. Depending on the network type and/or device type of the intermediate node 110, the intermediate node 110 may reconstruct the entire original data unit before routing the information to the second node 110, or the intermediate node 110 may simply reconstruct certain sub-units of the data (e.g., frames and/or cells, etc.) and route these sub-units to the second node 110 without composing the entire original data unit.

当节点110接收数据单元时,其通常检查数据单元内的寻址信息(和/或数据单元内的其他信息),以确定如何处理该单元。寻址信息可以是例如网际协议(IP)地址、MPLS标签或任何其他合适的信息。如果寻址信息指示接收节点110不是数据单元的目的地,则接收节点110可以在接收节点的路由信息内查找目的地节点110,并且基于与目的地节点110(或目的地节点所属的地址组)相关联的转发指令,将数据单元路由到连接到接收节点110的另一节点110。转发指令可以指示例如用于通过其发送数据单元的传出端口、用于附加数据单元的标签等。在多个路径(例如,通过相同端口、通过不同端口等)到目的地节点110是可能的情况下,转发指令可以包括指示用于选择这些路径中的一个路径的合适方法、或者可以已经定义了被视为最佳路径的路径的信息。When a node 110 receives a data unit, it typically examines the addressing information within the data unit (and/or other information within the data unit) to determine how to process the unit. The addressing information may be, for example, an Internet Protocol (IP) address, an MPLS label, or any other suitable information. If the addressing information indicates that the receiving node 110 is not the destination of the data unit, the receiving node 110 may look up the destination node 110 within the receiving node's routing information and, based on forwarding instructions associated with the destination node 110 (or an address group to which the destination node belongs), route the data unit to another node 110 connected to the receiving node 110. The forwarding instructions may indicate, for example, an egress port through which the data unit is to be sent, a label to which the data unit is to be attached, etc. In the event that multiple paths (e.g., through the same port, through different ports, etc.) to the destination node 110 are possible, the forwarding instructions may include information indicating a suitable method for selecting one of the paths, or may have defined a path that is considered the best path.

用于确定如何处理数据单元的寻址信息、标志、标签和其他元数据通常被嵌入在数据单元的被称为报头的部分内。报头通常位于数据单元的开头,后面是数据单元的有效负载,有效负载是数据单元中被实际发送的信息。报头通常由诸如目的地地址字段、源地址字段、目的地端口字段、源端口字段等的不同类型的字段组成。在一些协议中,字段的数目和排列可以是固定的。其他协议允许任意数目的字段,其中一些或全部字段由向节点解释字段的含义的类型信息在前。Addressing information, flags, labels, and other metadata used to determine how to process a data unit are typically embedded in a portion of the data unit called a header. The header is typically located at the beginning of a data unit and is followed by the data unit's payload, which is the information actually sent in the data unit. The header typically consists of different types of fields such as a destination address field, a source address field, a destination port field, a source port field, and so on. In some protocols, the number and arrangement of fields may be fixed. Other protocols allow an arbitrary number of fields, with some or all of the fields preceded by type information that explains to the node what the field means.

业务流是从源计算机到目的地的一系列数据单元,诸如分组。在实施例中,业务流的源可以使用数据单元内的标签、标记或其他合适的标识符将序列中的每个数据单元标记为流的成员。在另一实施例中,通过从数据单元中的其他字段(例如,源地址、源端口、目的地地址、目的地端口和协议等的“五元组”组合)导出标识符来标识流。流通常意指按顺序发送,因此,在许多操作场景中,网络设备通常被配置为沿着同一路径发送给定流内的所有数据单元,以确保按顺序接收流。A traffic flow is a series of data units, such as packets, from a source computer to a destination. In an embodiment, the source of a traffic flow may mark each data unit in the sequence as a member of a flow using a label, tag, or other suitable identifier within the data unit. In another embodiment, a flow is identified by deriving an identifier from other fields in the data unit (e.g., a "five-tuple" combination of source address, source port, destination address, destination port, and protocol, etc.). A flow is generally meant to be sent in order, and therefore, in many operational scenarios, network devices are typically configured to send all data units within a given flow along the same path to ensure that the flow is received in order.

节点110可以对若干不同层处的网络数据进行操作,并且因此将相同的数据视为属于若干不同类型的数据单元。Node 110 may operate on network data at several different layers, and therefore view the same data as belonging to several different types of data units.

2.2.网络路径2.2. Network Path

所描绘的网络100中的任何节点可以通过经由一系列节点110和链路(称为路径)发送数据单元来与网络100中的任何其他节点进行通信。例如,节点B(110b)可以经由从节点B到节点D到节点E到节点H的路径向节点H(110h)发送数据单元。两个节点之间可能存在大量有效路径。例如,从节点B到节点H的另一路径是从节点B到节点D到节点G到节点H。Any node in the depicted network 100 can communicate with any other node in the network 100 by sending data units via a series of nodes 110 and links (referred to as paths). For example, node B (110b) can send a data unit to node H (110h) via a path from node B to node D to node E to node H. There may be a large number of valid paths between two nodes. For example, another path from node B to node H is from node B to node D to node G to node H.

在实施例中,节点110实际上不需要指定其发送的数据单元的完整路径。相反,节点110可以简单地被配置为计算数据单元离开设备的最佳路径(例如,其应该在哪个出口端口上发送数据单元等)。当节点110接收到不直接寻址到节点110的数据单元时,基于与数据单元相关联的报头信息,诸如路径和/或目的地信息,节点110将数据单元中继到任一目的地节点110,或者节点110计算的“下一跳”节点110处于将数据单元中继到目的地节点110的更好位置。以这种方式,数据单元的实际路径是沿着该路径的每个节点110做出关于如何最好地将数据单元移动到由数据单元标识的目的地节点110的路由决策的乘积。In an embodiment, the node 110 does not actually need to specify the complete path of the data unit it sends. Instead, the node 110 can simply be configured to calculate the best path for the data unit to leave the device (e.g., which egress port it should send the data unit on, etc.). When the node 110 receives a data unit that is not directly addressed to the node 110, based on the header information associated with the data unit, such as the path and/or destination information, the node 110 relays the data unit to either the destination node 110, or the "next hop" node 110 calculated by the node 110 is in a better position to relay the data unit to the destination node 110. In this way, the actual path of the data unit is the product of the routing decisions made by each node 110 along the path about how to best move the data unit to the destination node 110 identified by the data unit.

2.3.网络设备2.3. Network equipment

图2图示了根据实施例的其中可以实践本文中描述的技术的示例网络设备200的示例方面。网络设备200是包括被配置为实现本文中描述的各种逻辑组件(包括组件210-290)的硬件和软件的任意组合的计算设备。例如,该装置可以是诸如路由器或交换机的单个网络计算设备,其中本文中描述的组件210-290中的一些或全部是使用专用集成电路(ASIC)或现场可编程门阵列(FPGA)来实现的。作为另一示例,实现装置可以包括存储用于实现本文中描述的各种组件的指令的一个或多个存储器、被配置为执行存储在一个或多个存储器中的指令的一个或多个硬件处理器、以及在用于存储由各种组件210-290使用和操作的数据结构的一个或多个存储器中的各种数据储存库。Fig. 2 illustrates example aspects of an example network device 200 in which the techniques described herein may be practiced according to an embodiment. Network device 200 is a computing device comprising any combination of hardware and software configured to implement the various logical components (including components 210-290) described herein. For example, the device may be a single network computing device such as a router or a switch, wherein some or all of the components 210-290 described herein are implemented using an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). As another example, the implementation device may include one or more memories storing instructions for implementing the various components described herein, one or more hardware processors configured to execute instructions stored in one or more memories, and various data repositories in one or more memories for storing data structures used and operated by the various components 210-290.

设备200通常被配置为借助于在设备200内的各个组件处执行的一系列操作来接收数据单元205并将其转发到网络(例如网络100)中的其他设备。注意,在实施例中,诸如网络100的系统中的一些或全部节点110可以各自是或包括单独的网络设备200。在实施例中,节点110可以包括多于一个的设备200。在实施例中,设备200本身可以是节点110内的多个组件中的一个组件。例如,网络设备200可以是专用于执行网络交换机或路由器内的交换和/或路由功能的集成电路或“芯片”。网络交换机或路由器还可以包括一个或多个中央处理器单元、存储单元、存储器、物理接口、LED显示器或芯片外部的其他组件,这些组件中的一些或全部可以与芯片通信。Device 200 is generally configured to receive data unit 205 and forward it to other devices in a network (e.g., network 100) by means of a series of operations performed at various components within device 200. Note that in embodiments, some or all of nodes 110 in a system such as network 100 may each be or include a separate network device 200. In embodiments, node 110 may include more than one device 200. In embodiments, device 200 itself may be one of a plurality of components within node 110. For example, network device 200 may be an integrated circuit or "chip" dedicated to performing switching and/or routing functions within a network switch or router. A network switch or router may also include one or more central processor units, storage units, memory, physical interfaces, LED displays, or other components external to the chip, some or all of which may communicate with the chip.

数据单元205通过设备200的转发逻辑的各个子组件的非限制性示例流程如下。在经由端口210接收之后,数据单元205可以被缓冲在入口缓冲器224中并且由入口仲裁器220在入口队列225中排队直到数据单元205可以被入口分组处理器230处理,然后被递送到诸如交换构造(switching fabric)的互连(或交叉连接)。数据单元205可以从互连转发到流量管理器240。流量管理器240可以将数据单元205存储在出口缓冲器244中并且将数据单元205分配给出口队列245。流量管理器240管理数据单元205流过出口队列245,直到数据单元205被释放到出口分组处理器250。根据处理,流量管理器240然后可以将数据单元205分配给另一队列,使得其可以由另一出口处理器250处理,或者出口分组处理器250可以将数据单元205发送到出口仲裁器260,出口仲裁器260将数据单元205临时存储或缓冲在传输缓冲器(transmit buffer)中并且最终经由另一端口290转发出数据单元。当然,根据实施例,转发逻辑可以省略这些子组件中的一些子组件和/或以不同的布置包括其他子组件。A non-limiting example flow of a data unit 205 through the various subcomponents of the forwarding logic of the device 200 is as follows. After being received via the port 210, the data unit 205 may be buffered in the ingress buffer 224 and queued in the ingress queue 225 by the ingress arbiter 220 until the data unit 205 can be processed by the ingress packet processor 230 and then delivered to an interconnect (or cross-connect) such as a switching fabric. The data unit 205 may be forwarded from the interconnect to the traffic manager 240. The traffic manager 240 may store the data unit 205 in the egress buffer 244 and assign the data unit 205 to the egress queue 245. The traffic manager 240 manages the flow of the data unit 205 through the egress queue 245 until the data unit 205 is released to the egress packet processor 250. Depending on the processing, the traffic manager 240 may then assign the data unit 205 to another queue so that it can be processed by another egress processor 250, or the egress packet processor 250 may send the data unit 205 to the egress arbiter 260, which temporarily stores or buffers the data unit 205 in a transmit buffer and ultimately forwards the data unit via another port 290. Of course, depending on the embodiment, the forwarding logic may omit some of these subcomponents and/or include other subcomponents in a different arrangement.

现在更详细地描述设备200的示例组件。Example components of device 200 are now described in more detail.

2.4.端口2.4.Port

网络设备200包括端口210/290。包括端口210-1至210-N在内的端口210是入站(“入口”)端口,通过该端口通过诸如网络110的网络接收本文中称为数据单元205的数据单元。包括端口290-1至290-N的端口290是出站(“出口”)端口,数据单元205中的至少一些在被网络设备200处理之后通过该端口被发送到网络内的其他目的地。Network device 200 includes ports 210/290. Ports 210, including ports 210-1 through 210-N, are inbound ("ingress") ports through which data units, referred to herein as data units 205, are received through a network, such as network 110. Ports 290, including ports 290-1 through 290-N, are outbound ("egress") ports through which at least some of data units 205 are sent to other destinations within the network after being processed by network device 200.

出口端口290可以与相应的传输缓冲器一起操作,以存储从其划分的要通过端口290传输的数据单元或子单元(例如,分组、信元、帧、传输单元等)。传输缓冲器可以具有与290端口的一对一对应关系、与290端口的多对一对应关系等。出口处理器250或与出口处理器250一起操作的出口仲裁器260可以在这些数据单元/子单元从端口290传输出去之前将这些数据单元或子单元输出到传输缓冲器。The egress port 290 may operate with a corresponding transmission buffer to store data units or sub-units (e.g., packets, cells, frames, transmission units, etc.) divided therefrom to be transmitted through the port 290. The transmission buffer may have a one-to-one correspondence with the port 290, a many-to-one correspondence with the port 290, etc. The egress processor 250 or the egress arbiter 260 operating with the egress processor 250 may output these data units or sub-units to the transmission buffer before these data units or sub-units are transmitted out of the port 290.

数据单元205可以是任何合适的PDU类型,诸如分组、信元、帧、传输单元等。在实施例中,数据单元205是分组。然而,所描绘的组件可以对其进行操作的各个原子数据单元实际上可以是数据单元205的子单元。例如,数据单元205可以在信元或帧级别被接收、作用和传输。这些信元或帧——也可以被称为传送——可以逻辑地链接在一起作为它们分别所属的数据单元205(例如,分组等),出于确定如何处理信元或帧的目的。然而,子单元实际上可能不会被组装成设备200内的数据单元205,特别是如果子单元正通过设备200转发到另一目的地的话。The data unit 205 can be any suitable PDU type, such as a packet, a cell, a frame, a transmission unit, etc. In an embodiment, the data unit 205 is a packet. However, the individual atomic data units on which the depicted components can operate can actually be sub-units of the data unit 205. For example, the data unit 205 can be received, acted upon, and transmitted at the cell or frame level. These cells or frames - which can also be referred to as transmissions - can be logically linked together as the data units 205 (e.g., packets, etc.) to which they belong respectively, for the purpose of determining how to process the cells or frames. However, the sub-units may not actually be assembled into the data units 205 within the device 200, especially if the sub-units are being forwarded to another destination by the device 200.

出于说明的目的,端口210/290被描绘为单独的端口,但实际上可以与网络设备210上的相同物理硬件端口(例如,网络插孔或接口等)相对应。也就是说,网络设备200可以通过单个物理端口接收数据单元205并发送数据单元205,并且该单个物理端口因此可以用作入口端口210和出口端口290。尽管如此,为了各种功能的目的,网络设备200的某些逻辑可以将单个物理端口视为单独的入口端口210和单独的出口端口290。此外,为了各种功能的目的,网络设备200的某些逻辑可以将单个物理入口端口或出口端口细分为多个入口端口210或出口端口290,或将多个物理入口端口或出口端口聚合为单个入口端口210或出口端口290。因此,在某些操作场景中,端口210和290应理解为映射到物理端口的不同逻辑构造,而不是简单地理解为不同的物理结构。For purposes of illustration, ports 210/290 are depicted as separate ports, but may actually correspond to the same physical hardware port (e.g., a network jack or interface, etc.) on network device 210. That is, network device 200 may receive data units 205 and transmit data units 205 through a single physical port, and the single physical port may therefore function as both ingress port 210 and egress port 290. Nevertheless, for purposes of various functions, certain logic of network device 200 may treat a single physical port as a separate ingress port 210 and a separate egress port 290. Furthermore, for purposes of various functions, certain logic of network device 200 may subdivide a single physical ingress port or egress port into multiple ingress ports 210 or egress ports 290, or aggregate multiple physical ingress ports or egress ports into a single ingress port 210 or egress port 290. Therefore, in certain operational scenarios, ports 210 and 290 should be understood as different logical constructs that map to physical ports, rather than simply as different physical structures.

在一些实施例中,设备200的端口210/290可以耦合到一个或多个收发器,诸如串行器/解串器(“SerDes”)块。例如,端口210可以将接收到的数据单元的并行输入提供到SerDes块中,SerDes块然后将数据单元串行输出到入口分组处理器230中。在另一端,出口分组处理器250可以将数据单元串行输入到另一SerDes块中,该SerDes块将数据单元并行输出到端口290。In some embodiments, the ports 210/290 of the device 200 may be coupled to one or more transceivers, such as a serializer/deserializer (“SerDes”) block. For example, the port 210 may provide a parallel input of received data units into a SerDes block, which then serially outputs the data units into the ingress packet processor 230. At the other end, the egress packet processor 250 may serially input the data units into another SerDes block, which outputs the data units in parallel to the port 290.

2.5.分组处理器2.5. Packet Processor

设备200包括共同实现转发逻辑的一个或多个分组处理组件,通过该转发逻辑,设备200被配置为确定如何处理设备200接收的每个数据单元205。这些分组处理器组件可以是固定电路装置和/或基于软件的逻辑的任何合适的组合,诸如通过一个或多个现场可编程门阵列(FPGA)或专用集成电路(ASIC)、或执行软件指令的通用处理器实现的特定逻辑组件。The device 200 includes one or more packet processing components that collectively implement forwarding logic by which the device 200 is configured to determine how to process each data unit 205 received by the device 200. These packet processor components may be any suitable combination of fixed circuitry and/or software-based logic, such as specific logic components implemented by one or more field programmable gate arrays (FPGAs) or application specific integrated circuits (ASICs), or general purpose processors executing software instructions.

不同的分组处理器230和250可以被配置为执行不同的分组处理任务。这些任务可以包括例如标识转发数据单元205所沿的路径、将数据单元205转发到出口端口290、实现流控制和/或其他策略、操纵分组、执行统计或调试操作等。设备200可以包括被配置为执行任意数目的处理任务的任意数目的分组处理器230和250。Different packet processors 230 and 250 may be configured to perform different packet processing tasks. These tasks may include, for example, identifying a path along which to forward data unit 205, forwarding data unit 205 to egress port 290, implementing flow control and/or other policies, manipulating packets, performing statistical or debugging operations, etc. Device 200 may include any number of packet processors 230 and 250 configured to perform any number of processing tasks.

在实施例中,设备200内的分组处理器230和250可以被布置为使得一个分组处理器230或250的输出可以以这样的方式最终被输入到另一分组处理器230或250中:以一系列分级将数据单元205从某些(一个或多个)分组处理器230和/或250传递到其他分组处理器230和/或250,直到最终处置数据单元205(例如,通过将数据单元205发送出出口端口290、“丢弃”数据单元205等)。在一些实施例中,处理给定数据单元205的分组处理器230和/或250的确切集合和/或序列可以根据数据单元205的属性和/或设备200的状态而变化。对可以以这样的方式链接在一起的分组处理器230和/或250的数目没有限制。In an embodiment, the packet processors 230 and 250 within the device 200 may be arranged so that the output of one packet processor 230 or 250 may ultimately be input into another packet processor 230 or 250 in such a manner that the data unit 205 is passed from certain (one or more) packet processors 230 and/or 250 to other packet processors 230 and/or 250 in a series of stages until the data unit 205 is ultimately disposed of (e.g., by sending the data unit 205 out of the egress port 290, "discarding" the data unit 205, etc.). In some embodiments, the exact set and/or sequence of packet processors 230 and/or 250 that process a given data unit 205 may vary depending on the attributes of the data unit 205 and/or the state of the device 200. There is no limit to the number of packet processors 230 and/or 250 that may be chained together in such a manner.

在一些实施例中,和/或对于某些处理任务,基于在处理数据单元205时做出的决策,分组处理器230或250可以直接操纵数据单元205。例如,分组处理器230或250可以添加、删除或修改数据单元报头或有效负载中的信息。在其他实施例中,和/或对于其他处理任务,当数据单元205继续通过设备200时,分组处理器230或250可以生成伴随数据单元205或者与数据单元205合并的控制信息。然后,设备200的其他组件可以利用控制信息来实现分组处理器230或250做出的决策。In some embodiments, and/or for certain processing tasks, based on decisions made when processing data unit 205, packet processor 230 or 250 can directly manipulate data unit 205. For example, packet processor 230 or 250 can add, delete, or modify information in a data unit header or payload. In other embodiments, and/or for other processing tasks, packet processor 230 or 250 can generate control information that accompanies or is merged with data unit 205 as data unit 205 continues through device 200. Other components of device 200 can then use the control information to implement the decisions made by packet processor 230 or 250.

在实施例中,分组处理器230或250不一定需要处理整个数据单元205,而是可以仅接收和处理数据单元205的包括数据单元的报头信息的子单元。例如,如果数据单元205是包括多个信元的分组,则第一信元或第一信元子集可以被转发到分组处理器230或250,而分组的剩余信元(以及可能的(一个或多个)第一信元)可以被并行转发到合并组件,它们在该合并组件处等待处理结果。In an embodiment, the packet processor 230 or 250 does not necessarily need to process the entire data unit 205, but may only receive and process a sub-unit of the data unit 205 including header information of the data unit. For example, if the data unit 205 is a packet including multiple cells, the first cell or a first subset of cells may be forwarded to the packet processor 230 or 250, while the remaining cells of the packet (and possibly the first cell(s)) may be forwarded in parallel to a merging component, where they await processing results.

入口处理器和出口处理器Ingress and egress processors

在实施例中,分组处理器通常可以被分类为入口分组处理器230或出口分组处理器250。通常,入口处理器230为流量管理器240解析目的地,以确定数据单元205应该偏离哪些端口290和/或队列。可以存在任意数目的入口处理器230,包括仅单个入口处理器230。In an embodiment, the packet processors may be generally categorized as either an ingress packet processor 230 or an egress packet processor 250. Generally, the ingress processor 230 resolves the destination for the traffic manager 240 to determine which ports 290 and/or queues the data unit 205 should be deflected to. There may be any number of ingress processors 230, including only a single ingress processor 230.

在实施例中,入口处理器230在数据单元205到达时对数据单元205执行某些引入任务。这些引入任务可以包括例如但不限于解析数据单元205、执行路由相关的查找操作、绝对地阻止具有某些属性的数据单元205和/或当设备200处于某种状态时复制某些类型的数据单元205、对数据单元205进行初始分类等。一旦执行了适当的(一个或多个)引入任务,数据单元205被转发到适当的流量管理器240,入口处理器230可以直接耦合到该流量管理器240,或者经由诸如互连组件的各种其他组件耦合到该流量管理器240。In an embodiment, the ingress processor 230 performs certain ingestion tasks on the data unit 205 as it arrives. These ingestion tasks may include, for example, but are not limited to, parsing the data unit 205, performing routing-related lookup operations, absolutely blocking data units 205 with certain attributes and/or replicating certain types of data units 205 when the device 200 is in a certain state, performing initial classification of the data unit 205, etc. Once the appropriate ingestion task(s) are performed, the data unit 205 is forwarded to the appropriate traffic manager 240, to which the ingress processor 230 may be coupled directly, or via various other components such as interconnect components.

相比之下,设备200的(一个或多个)出口分组处理器250可以被配置为执行实现设备200的转发逻辑所必需的非引入任务。这些任务可以包括例如诸如标识沿其转发数据单元205的路径、实现流控制和/或其他策略、操纵数据单元、执行统计或调试操作等。在实施例中,可以存在被分配给不同流或其他类别的流量的不同的(一个或多个)出口分组处理器250,使得并非所有数据单元205将由相同出口分组处理器250处理。In contrast, the egress packet processor(s) 250 of the device 200 may be configured to perform non-introducing tasks necessary to implement the forwarding logic of the device 200. These tasks may include, for example, tasks such as identifying paths along which data units 205 are forwarded, implementing flow control and/or other policies, manipulating data units, performing statistical or debugging operations, etc. In an embodiment, there may be different egress packet processor(s) 250 assigned to different flows or other classes of traffic, such that not all data units 205 will be processed by the same egress packet processor 250.

在实施例中,每个出口处理器250耦合到不同组的出口端口290,它们可以向出口端口290发送通过出口处理器250处理的数据单元205。在实施例中,对一组端口290的访问或端口290的对应的传输缓冲器280可以经由耦合到出口分组处理器250的出口仲裁器来调节。在一些实施例中,出口处理器250还可以或代替地耦合到其他潜在的目的地,诸如内部中央处理单元、存储器子系统、或流量管理器240。In an embodiment, each egress processor 250 is coupled to a different set of egress ports 290 to which they may send data units 205 processed by the egress processor 250. In an embodiment, access to a set of ports 290 or the corresponding transmit buffers 280 of the ports 290 may be regulated via an egress arbiter coupled to the egress packet processor 250. In some embodiments, the egress processor 250 may also or instead be coupled to other potential destinations, such as an internal central processing unit, a memory subsystem, or a traffic manager 240.

2.6.缓冲器2.6. Buffer

由于并非由设备200接收的所有数据单元205都可以通过诸如(一个或多个)分组处理器230和/或250和/或端口290的组件同时处理,因此当数据单元205等待被处理时,设备200的各个组件可以临时存储在被称为(例如,入口、出口等)缓冲器的存储器结构中。例如,特定分组处理器230或250或端口290可以仅能够在给定时钟周期内处理特定量的数据,诸如特定数目的数据单元205或数据单元205的部分,这意味着去往分组处理器230或250或端口290的其他数据单元205或数据单元205的部分必须被忽略(例如,丢弃等)或被存储。在任何给定时间,取决于网络流量状况,大量数据单元205可以存储在设备200的缓冲器中。Since not all data units 205 received by the device 200 can be processed simultaneously by components such as (one or more) packet processors 230 and/or 250 and/or port 290, various components of the device 200 may temporarily store data units 205 in memory structures referred to as (e.g., ingress, egress, etc.) buffers while they are waiting to be processed. For example, a particular packet processor 230 or 250 or port 290 may only be able to process a certain amount of data, such as a certain number of data units 205 or portions of data units 205, within a given clock cycle, which means that other data units 205 or portions of data units 205 destined for the packet processor 230 or 250 or port 290 must be ignored (e.g., discarded, etc.) or stored. At any given time, a large number of data units 205 may be stored in the buffers of the device 200, depending on network traffic conditions.

设备200可以包括各种缓冲器,每个缓冲器用于不同的目的和/或组件。一般而言,等待由组件处理的数据单元205被保存在与该组件相关联的缓冲器中,直到数据单元205被“释放”给该组件进行处理。Device 200 may include various buffers, each used for a different purpose and/or component. Generally speaking, data units 205 waiting to be processed by a component are held in a buffer associated with the component until the data unit 205 is "released" to the component for processing.

缓冲器可以使用任意数目的不同存储器的存储体来实现。每个存储体可以是包括易失性存储器和/或非易失性存储器在内的任何类型的存储器的一部分。在实施例中,每个存储体包括许多可寻址“条目”(例如,行、列等),其中可以存储数据单元205、子单元、链接数据或其他类型的数据。给定存储体中每个条目的大小被称为存储体的“宽度”,而存储体中的条目的数目被称为存储体的“深度”。存储体的数目可以根据实施例而变化。The buffer can be implemented using any number of banks of different memories. Each bank can be a part of any type of memory including volatile memory and/or non-volatile memory. In an embodiment, each bank includes many addressable "entries" (e.g., rows, columns, etc.) in which data units 205, subunits, link data, or other types of data can be stored. The size of each entry in a given bank is referred to as the "width" of the bank, and the number of entries in the bank is referred to as the "depth" of the bank. The number of banks can vary according to an embodiment.

每个存储体可以具有相关联的访问限制。例如,存储体可以使用单端口存储器来实现,单端口存储器在给定时隙(例如,时钟周期等)中仅能被访问一次。因此,设备200可以被配置为确保在给定时隙中不需要从存储体读取或向存储体写入不超过一个条目。相反,存储体可以在多端口存储器中实现,以支持给定时隙中的两个或更多个访问。然而,在许多情况下,为了更高的工作频率和/或降低成本,单端口存储器可能是期望的。Each memory bank can have associated access restrictions.For example, a memory bank can be implemented using a single-port memory, which can only be accessed once in a given time slot (e.g., clock cycle, etc.). Therefore, equipment 200 can be configured to ensure that no more than one entry is required to be read from the memory bank or written to the memory bank in a given time slot. On the contrary, a memory bank can be implemented in a multi-port memory to support two or more accesses in a given time slot. However, in many cases, for higher operating frequency and/or to reduce costs, a single-port memory may be desirable.

在实施例中,除了缓冲器存储体之外,设备还可以被配置为将特定存储体聚合在一起成为支持时隙中的附加读取或写入和/或更高写入带宽的逻辑存储体。在实施例中,无论是逻辑的、物理的还是另一个(例如,可寻址的、分层的、多级的、子存储体等)组织结构,每个存储体都能够在同一时钟周期中与每个其他存储体同时被访问,尽管没有必要完全实现这种能力。In an embodiment, in addition to the buffer banks, the device may be configured to aggregate certain banks together into logical banks that support additional reads or writes in a time slot and/or higher write bandwidth. In an embodiment, whether logical, physical, or another (e.g., addressable, hierarchical, multi-level, sub-bank, etc.) organizational structure, each bank is capable of being accessed simultaneously with every other bank in the same clock cycle, although it is not necessary to fully implement this capability.

设备200中的利用一个或多个缓冲器的一些或所有组件可以包括被配置为管理那些缓冲器的使用的缓冲器管理器。除了其他处理任务之外,缓冲器管理器可以例如维护数据单元205到其中存储那些数据单元205的数据的缓冲器条目的映射、确定何时因为数据单元205不能存储在缓冲器中而必须丢弃数据单元205、对不再需要的数据单元205(或其部分)的缓冲器条目执行垃圾回收等。Some or all components of device 200 that utilize one or more buffers may include a buffer manager configured to manage the use of those buffers. The buffer manager may, for example, maintain a mapping of data units 205 to buffer entries in which data for those data units 205 is stored, determine when a data unit 205 must be discarded because the data unit 205 cannot be stored in a buffer, perform garbage collection on buffer entries for data units 205 (or portions thereof) that are no longer needed, and the like, among other processing tasks.

缓冲器管理器可以包括缓冲器分配逻辑。缓冲器分配逻辑被配置为标识应利用哪个或哪些缓冲器条目来存储给定数据单元205或其部分。在一些实施例中,每个数据单元205被存储在单个条目中。在又一些实施例中,出于存储的目的,数据单元205被接收为或者被划分为组成数据单元部分。缓冲器可以单独存储这些组成部分(例如,不在同一地址位置处或者甚至在同一存储体内等)。其中存储数据单元205的一个或多个缓冲器条目被标记为已使用(例如,在“空闲”列表中,如果未标记为已使用则为空闲或可用等),以防止新接收的数据单元205覆盖已经被缓冲的数据单元205。在从缓冲器释放数据单元205之后,可以将其中缓冲数据单元205的一个或多个条目标记为可用于存储新的数据单元205。The buffer manager may include buffer allocation logic. The buffer allocation logic is configured to identify which buffer entry or entries should be utilized to store a given data unit 205 or a portion thereof. In some embodiments, each data unit 205 is stored in a single entry. In yet other embodiments, for storage purposes, the data unit 205 is received as or divided into constituent data unit portions. The buffer may store these constituent portions separately (e.g., not at the same address location or even in the same storage body, etc.). One or more buffer entries in which the data unit 205 is stored are marked as used (e.g., in a "free" list, if not marked as used, then free or available, etc.) to prevent the newly received data unit 205 from overwriting the already buffered data unit 205. After releasing the data unit 205 from the buffer, one or more entries in which the data unit 205 is buffered may be marked as available for storing a new data unit 205.

在一些实施例中,缓冲器分配逻辑相对简单,因为数据单元205或数据单元部分被随机地或使用轮询方法分配给存储体和/或这些存储体内的特定条目。在一些实施例中,数据单元205至少部分地基于那些数据单元205的诸如对应的业务流、目的地地址、源地址、入口端口和/或其他元数据的特性被分配给缓冲器。例如,可以利用不同的存储体来存储从不同端口210或端口210的集合接收的数据单元205。在实施例中,缓冲器分配逻辑还或者代替利用诸如利用率度量的缓冲器状态信息来确定哪个存储体和/或缓冲器条目分配给数据单元205或其部分。其他分配考虑因素可以包括缓冲器分配规则(例如,不将来自同一分组的两个连续信元写入同一存储体等)和I/O调度冲突,例如,以避免在由于其他组件正在读取存储体中已有的内容而存在到存储体的不可写入操作时将数据单元分配给该存储体。In some embodiments, the buffer allocation logic is relatively simple because data units 205 or data unit portions are randomly or using a polling method to allocate to storage bodies and/or specific entries in these storage bodies. In some embodiments, data units 205 are allocated to buffers based at least in part on characteristics of those data units 205 such as corresponding traffic flows, destination addresses, source addresses, ingress ports, and/or other metadata. For example, different storage bodies can be used to store data units 205 received from different ports 210 or sets of ports 210. In an embodiment, the buffer allocation logic also or instead uses buffer status information such as utilization metrics to determine which storage body and/or buffer entry is allocated to the data unit 205 or its portion. Other allocation considerations may include buffer allocation rules (e.g., not writing two consecutive cells from the same packet to the same storage body, etc.) and I/O scheduling conflicts, for example, to avoid allocating data units to a storage body when there is a non-write operation to the storage body because other components are reading the existing contents in the storage body.

2.7.队列2.7. Queues

在实施例中,为了管理从缓冲器处理数据单元205的顺序,设备200的各种组件可以实现排队逻辑。例如,可以使用入口队列来管理通过入口缓冲器的数据单元流,而可以使用出口队列来管理通过出口缓冲器的数据单元流。In an embodiment, various components of the device 200 may implement queuing logic to manage the order in which data units 205 are processed from the buffers. For example, an ingress queue may be used to manage the flow of data units through an ingress buffer, while an egress queue may be used to manage the flow of data units through an egress buffer.

每个数据单元205或存储数据单元205的(一个或多个)缓冲器位置被认为是属于被称为队列的一个或多个构造。通常,队列是通过描述队列的元数据按某种顺序排列的存储器位置(例如,在缓冲器中等)的集合。存储器位置相对于它们的寻址方案和/或物理或逻辑布置可以(并且通常)是不连续的。例如,一个队列的元数据可以指示该队列按顺序由特定缓冲器中的条目地址2、50、3和82组成。Each data unit 205 or (one or more) buffer locations storing data units 205 are considered to belong to one or more constructs called queues. In general, a queue is a collection of memory locations (e.g., in a buffer, etc.) arranged in some order by metadata describing the queue. The memory locations can (and typically are) non-contiguous with respect to their addressing scheme and/or physical or logical arrangement. For example, the metadata for a queue may indicate that the queue consists of entry addresses 2, 50, 3, and 82 in a particular buffer, in order.

在许多实施例中,其中队列布置组成数据单元205的顺序通常与队列中的数据单元205或数据单元部分将被释放和处理的顺序相对应。这样的队列被称为先进先出(“FIFO”)队列,但是在其他实施例中可以利用其他类型的队列。在一些实施例中,在给定时间分配给给定队列的数据单元205或数据单元部分的数目可以全局地或基于每队列地受到限制,并且该限制可以随时间改变。In many embodiments, the order in which the queues arrange the constituent data units 205 generally corresponds to the order in which the data units 205 or data unit portions in the queues will be released and processed. Such queues are referred to as first-in, first-out ("FIFO") queues, but other types of queues may be utilized in other embodiments. In some embodiments, the number of data units 205 or data unit portions allocated to a given queue at a given time may be limited globally or on a per-queue basis, and the limit may change over time.

2.8.流量管理Traffic Management

根据实施例,设备200还包括被配置为控制到一个或多个分组处理器230和/或250的数据单元流的一个或多个流量管理器240。例如,当数据单元205等待由(一个或多个)出口处理器250处理时,设备200内的缓冲器管理器可以将数据单元205临时存储在缓冲器中。流量管理器240可以直接从端口210、从入口处理器230和/或其他合适的组件接收数据单元205。在实施例中,流量管理器240在每个时钟周期或其他时隙从每个可能的源(例如,每个端口210等)接收一个TDU。According to an embodiment, the device 200 also includes one or more traffic managers 240 configured to control the flow of data units to the one or more packet processors 230 and/or 250. For example, a buffer manager within the device 200 may temporarily store the data units 205 in a buffer while the data units 205 wait to be processed by the egress processor(s) 250. The traffic manager 240 may receive the data units 205 directly from the ports 210, from the ingress processors 230, and/or other suitable components. In an embodiment, the traffic manager 240 receives one TDU from each possible source (e.g., each port 210, etc.) per clock cycle or other time slot.

流量管理器240可以包括出口缓冲器或耦合到出口缓冲器,用于在将数据单元205发送到它们相应的(一个或多个)出口处理器250之前缓冲这些数据单元205。当数据单元205等待由(一个或多个)出口处理器250处理时,流量管理器240内的缓冲器管理器可以在出口缓冲器中临时存储数据单元205。出口缓冲器的数目可以根据实施例而变化。出口缓冲器中的数据单元205或数据单元部分最终可以通过从(例如,出口等)缓冲器读取数据单元205并且发送到一个或多个出口处理器250来“释放”到一个或多个出口处理器250以进行处理。在实施例中,流量管理器240可以在每个时钟周期或其他定义的时隙将达到特定数目的数据单元205从缓冲器释放到出口处理器250。The traffic manager 240 may include or be coupled to an egress buffer for buffering the data units 205 before sending them to their corresponding egress processor(s) 250. A buffer manager within the traffic manager 240 may temporarily store the data units 205 in the egress buffer while the data units 205 are waiting to be processed by the egress processor(s) 250. The number of egress buffers may vary depending on the embodiment. The data units 205 or portions of data units in the egress buffer may eventually be "released" to one or more egress processors 250 for processing by reading the data units 205 from the (e.g., egress, etc.) buffers and sending them to the one or more egress processors 250. In an embodiment, the traffic manager 240 may release up to a certain number of data units 205 from the buffer to the egress processor 250 at each clock cycle or other defined time slot.

除了管理用以存储数据单元205(或其副本)的缓冲器的使用之外,流量管理器240还可以包括被配置为将缓冲器条目分配给队列并管理通过队列的数据单元205的流的队列管理逻辑。例如,在接收到数据单元205时,流量管理器240可以标识将数据单元205分配到的特定队列。流量管理器240还可以确定何时从队列中释放——也称为“出队”——数据单元205(或其部分)并且将这些数据单元205提供给特定的(一个或多个)分组处理器250。流量管理器240中的缓冲器管理逻辑还可以将缓冲器中存储不再链接到流量管理器的队列的数据单元205条目“解除分配”。然后,通过垃圾回收过程回收这些条目,以用于存储新数据。In addition to managing the use of buffers to store data units 205 (or copies thereof), the traffic manager 240 may also include queue management logic configured to assign buffer entries to queues and manage the flow of data units 205 through the queues. For example, upon receiving a data unit 205, the traffic manager 240 may identify a particular queue to which the data unit 205 is assigned. The traffic manager 240 may also determine when to release—also referred to as "dequeue"—data units 205 (or portions thereof) from a queue and provide these data units 205 to a particular packet processor(s) 250. The buffer management logic in the traffic manager 240 may also "de-allocate" buffer entries that store data units 205 for queues that are no longer linked to the traffic manager. These entries are then reclaimed through a garbage collection process for use in storing new data.

在实施例中,不同的目的地可以存在不同的队列。例如,每个端口210和/或端口290可以具有其自己的队列集合。例如,可以基于指示数据单元205应当偏离哪个端口290的转发信息来选择输入数据单元205被分配和链接到的队列。在实施例中,不同的出口处理器250可以与一个或多个队列的每个不同集合相关联。在实施例中,数据单元205的当前处理上下文可以用于选择数据单元205应当被分配到哪个队列。In an embodiment, different destinations may have different queues. For example, each port 210 and/or port 290 may have its own set of queues. For example, the queue to which the input data unit 205 is assigned and linked may be selected based on forwarding information indicating which port 290 the data unit 205 should deviate from. In an embodiment, different egress processors 250 may be associated with each different set of one or more queues. In an embodiment, the current processing context of the data unit 205 may be used to select which queue the data unit 205 should be assigned to.

在实施例中,对于不同的流或流集合还可以存在或代替地存在不同的队列。也就是说,每个可标识的业务流或业务流组被分配其自己的队列集合,其数据单元205分别被分配到该组队列。在实施例中,不同的队列可以与不同的业务类别或服务质量(QoS)水平相对应。对于数据单元205的任何其他合适的区分属性,诸如源地址、目的地地址、分组类型等,还可以或代替地存在不同的队列。In an embodiment, different queues may also or instead exist for different flows or flow sets. That is, each identifiable service flow or service flow group is assigned its own set of queues, to which data units 205 are respectively assigned. In an embodiment, different queues may correspond to different service classes or quality of service (QoS) levels. Different queues may also or instead exist for any other suitable distinguishing attributes of the data unit 205, such as source address, destination address, packet type, etc.

设备200可以包括任何数目(例如,一个或多个等)的分组处理器230和/或250以及流量管理器240。例如,不同端口210和/或端口290的集合可以具有它们自己的流量管理器240和分组处理器230和/或250。作为另一示例,在实施例中,可以针对处理数据单元的一些或所有阶段复制流量管理器240。例如,系统200可以包括用于在数据单元205退出系统200时执行的出口阶段的流量管理器240和出口分组处理器250,和/或用于任何数目的中间阶段的流量管理器240和分组处理器230或250。因此,数据单元205在退出系统200之前可以经过任意数目的流量管理器240和/或分组处理器230和/或250。在其他实施例中,仅需要单个流量管理器240。如果需要中间处理,则数据单元205的流可以“循环回”至流量管理器240,以用于在中间处理的每个阶段之后进行缓冲和/或排队。The device 200 may include any number (e.g., one or more, etc.) of packet processors 230 and/or 250 and traffic managers 240. For example, different ports 210 and/or sets of ports 290 may have their own traffic managers 240 and packet processors 230 and/or 250. As another example, in an embodiment, the traffic manager 240 may be replicated for some or all stages of processing a data unit. For example, the system 200 may include a traffic manager 240 and an egress packet processor 250 for an egress stage performed when a data unit 205 exits the system 200, and/or a traffic manager 240 and a packet processor 230 or 250 for any number of intermediate stages. Thus, a data unit 205 may pass through any number of traffic managers 240 and/or packet processors 230 and/or 250 before exiting the system 200. In other embodiments, only a single traffic manager 240 is required. If intermediate processing is required, the flow of data units 205 may be "looped back" to traffic manager 240 for buffering and/or queuing after each stage of intermediate processing.

在实施例中,流量管理器240耦合到入口分组处理器230,使得数据单元205(或其部分)仅在最初由入口分组处理器230处理时才被分配给缓冲器。一旦在出口缓冲器中,数据单元205(或其部分)可以由发送用于相应缓冲器到出口分组处理器250的链路或其他合适的寻址信息的流量管理器240、或者由直接发送数据单元205来“释放”到一个或多个出口分组处理器250以进行处理。In an embodiment, the traffic manager 240 is coupled to the ingress packet processor 230 such that a data unit 205 (or portion thereof) is assigned to a buffer only when initially processed by the ingress packet processor 230. Once in an egress buffer, the data unit 205 (or portion thereof) may be "released" to one or more egress packet processors 250 for processing by the traffic manager 240 sending link or other suitable addressing information for the corresponding buffer to the egress packet processor 250, or by sending the data unit 205 directly.

在处理数据单元205的过程中,设备200可以将数据单元205复制一次或多次——例如,基于数据单元的控制信息中指定的复制计数——用于多目的地的目的,诸如但不限于多播、镜像、再循环、调试等。例如,单个数据单元205可以被复制到多个出口队列。例如,数据单元205可以链接到端口1、3和5中的每个的单独队列。作为另一示例,数据单元205在其到达队列(例如,对于不同的出口处理器250等)的头部之后可以被复制多次。因此,虽然本文中描述的某些技术可以指的是由设备200接收的原始数据单元205,但是应当理解的是,那些技术将出于各种目的同样适用于已经生成的数据单元205的副本。数据单元205的副本可以是部分的或完整的。此外,缓冲器中可以存在数据单元205的实际副本,或者数据单元205的单个副本可以同时从单个缓冲器位置链接到多个队列。In the process of processing the data unit 205, the device 200 may copy the data unit 205 one or more times - for example, based on the copy count specified in the control information of the data unit - for multi-destination purposes, such as but not limited to multicast, mirroring, recycling, debugging, etc. For example, a single data unit 205 can be copied to multiple egress queues. For example, the data unit 205 can be linked to a separate queue for each of ports 1, 3, and 5. As another example, the data unit 205 can be copied multiple times after it reaches the head of the queue (e.g., for different egress processors 250, etc.). Therefore, although certain techniques described herein may refer to the original data unit 205 received by the device 200, it should be understood that those techniques will be equally applicable to copies of the data unit 205 that have been generated for various purposes. The copy of the data unit 205 can be partial or complete. In addition, there may be actual copies of the data unit 205 in the buffer, or a single copy of the data unit 205 can be linked to multiple queues from a single buffer location at the same time.

2.9.转发逻辑2.9. Forwarding Logic

设备200确定如何处理数据单元205的逻辑——诸如向哪里以及是否发送数据单元205、是否对数据单元205执行附加处理等——被称为设备200的转发逻辑。如上所述,该转发逻辑由设备200的各种组件共同实现。例如,入口分组处理器230可以负责解析数据单元205的目的地并且确定要对数据单元205执行的动作/编辑的集合,并且出口分组处理器250可以执行这些编辑。或者,在一些情况下,出口分组处理器250还可以确定动作并且解析目的地。此外,可以存在入口分组处理器230也执行编辑的实施例。The logic of the device 200 that determines how to process the data unit 205, such as where and whether to send the data unit 205, whether to perform additional processing on the data unit 205, etc., is referred to as the forwarding logic of the device 200. As described above, the forwarding logic is implemented by various components of the device 200. For example, the ingress packet processor 230 can be responsible for parsing the destination of the data unit 205 and determining a set of actions/edits to be performed on the data unit 205, and the egress packet processor 250 can perform these edits. Alternatively, in some cases, the egress packet processor 250 can also determine the action and parse the destination. In addition, there can be embodiments in which the ingress packet processor 230 also performs the editing.

根据实施例,转发逻辑可以是硬编码的和/或可配置的。例如,在一些情况下,设备200或其部分的转发逻辑可以至少部分地硬编码到一个或多个入口处理器230和/或出口处理器250中。作为另一示例,转发逻辑或其元件也可以是可配置的,因为逻辑响应于从设备200的各个组件和/或设备200所在的网络中的其他节点收集的状态信息或接收的指令的分析而随时间变化。Depending on the embodiment, the forwarding logic may be hard-coded and/or configurable. For example, in some cases, the forwarding logic of the device 200 or a portion thereof may be at least partially hard-coded into one or more ingress processors 230 and/or egress processors 250. As another example, the forwarding logic or elements thereof may also be configurable in that the logic changes over time in response to analysis of state information collected from various components of the device 200 and/or other nodes in the network in which the device 200 is located or instructions received.

在实施例中,设备200通常将在其存储器中存储一个或多个转发表(或等效结构),其将特定数据单元属性或特性映射到针对具有这些属性或特性的数据单元205要采取的动作,诸如将数据单元205发送到选择的路径、或者使用指定的内部组件来处理数据单元205。例如,这样的属性或特性可以包括由数据单元205指定的或与数据单元205的另一特性相关联的服务质量水平、流控制组、通过其接收数据单元205的入口端口210、分组的报头中的标记或标签、源地址、目的地地址、分组类型或任何其他合适的区分属性。流量管理器240可以例如实现读取这样的表、基于该表确定将数据单元205发送到的一个或多个端口290、以及将数据单元205发送到耦合到一个或多个端口290的出口处理器250的逻辑。In an embodiment, the device 200 will typically store one or more forwarding tables (or equivalent structures) in its memory that map specific data unit attributes or characteristics to actions to be taken for data units 205 having those attributes or characteristics, such as sending the data unit 205 to a selected path, or using a specified internal component to process the data unit 205. For example, such attributes or characteristics may include a quality of service level specified by the data unit 205 or associated with another characteristic of the data unit 205, a flow control group, an ingress port 210 through which the data unit 205 is received, a tag or label in a header of a packet, a source address, a destination address, a packet type, or any other suitable distinguishing attribute. The traffic manager 240 may, for example, implement logic to read such a table, determine one or more ports 290 to which the data unit 205 is to be sent based on the table, and send the data unit 205 to an egress processor 250 coupled to the one or more ports 290.

根据实施例,转发表描述一个或多个地址的组,诸如IPv4或IPv6地址的子网。每个地址是网络上的网络设备的地址,但是网络设备可以具有多个地址。每个组与一组可能不同的一个或多个动作相关联,以针对解析到(例如,定向到等)组内的地址的数据单元来执行。任何合适的一个或多个动作的集合可以与一组地址相关联,包括但不限于将消息转发到指定的“下一跳”、复制消息、改变消息的目的地、丢弃消息、执行调试或统计操作、应用服务质量策略或流量控制策略等。According to an embodiment, the forwarding table describes a group of one or more addresses, such as a subnet of an IPv4 or IPv6 address. Each address is the address of a network device on the network, but the network device can have multiple addresses. Each group is associated with a set of one or more actions that may be different, to be performed for data units that resolve to (e.g., directed to, etc.) the address within the group. Any suitable set of one or more actions can be associated with a group of addresses, including but not limited to forwarding a message to a specified "next hop", copying a message, changing the destination of a message, discarding a message, performing debugging or statistical operations, applying a quality of service strategy or a flow control strategy, etc.

为了说明的目的,这些表被描述为“转发表”,但是将认识到的是,这些表所描述的(一个或多个)动作的范围可以比简单地将消息转发到哪里大得多。例如,在实施例中,表可以是简单地指定针对每个组的下一跳的基本转发表。在其他实施例中,表格可以描述针对每个组的一个或多个复杂策略。此外,可以存在用于不同目的的不同类型的表。例如,一个表可以是与每个分组的目的地地址进行比较的基本转发表,而另一表可以基于分组的目的地(或源)组等在进入时指定应用于分组的策略。For the purpose of illustration, these tables are described as "forwarding tables", but it will be appreciated that the scope of the (one or more) actions described by these tables can be much larger than simply forwarding the message to where. For example, in an embodiment, the table can be a basic forwarding table that simply specifies the next hop for each group. In other embodiments, the table can describe one or more complex strategies for each group. In addition, different types of tables for different purposes can exist. For example, a table can be a basic forwarding table compared with the destination address of each group, and another table can specify the strategy applied to the group when entering based on the destination (or source) group of the group, etc.

在实施例中,转发逻辑可以读取端口210/290的端口状态数据。端口状态数据可以包括例如描述各种业务流和相关联的业务流控制规则或策略的流控制状态信息、指示链路接通或断开的链路状态信息、指示端口如何被利用的端口利用信息(例如,利用率、百分比、利用状态等)。转发逻辑可以被配置为实现与给定分组所属的流相关联的关联规则或策略。In an embodiment, the forwarding logic may read the port status data of the port 210/290. The port status data may include, for example, flow control status information describing various business flows and associated business flow control rules or policies, link status information indicating whether a link is connected or disconnected, and port utilization information indicating how the port is utilized (e.g., utilization, percentage, utilization status, etc.). The forwarding logic may be configured to implement an associated rule or policy associated with the flow to which a given packet belongs.

当数据单元205通过网络中的不同节点路由时,节点有时可能会丢弃、无法发送或无法接收特定数据单元205,从而导致数据单元205无法到达其预期的目的地。丢弃数据单元205或未能递送数据单元205的行为通常被称为“丢弃”数据单元。丢弃数据单元205的实例——本文中被称为“丢弃”或“分组丢失”,可能由于各种原因(诸如资源限制、错误或故意的策略)而发生。设备200的不同组件可以出于各种原因做出丢弃数据单元205的决定。例如,流量管理器240可以因为除了其他原因之外的缓冲器被过度利用、队列超过特定大小、和/或数据单元205具有特定特性而确定丢弃数据单元205。As data units 205 are routed through different nodes in a network, a node may sometimes drop, fail to send, or fail to receive a particular data unit 205, causing the data unit 205 to fail to reach its intended destination. The act of dropping a data unit 205 or failing to deliver a data unit 205 is generally referred to as "dropping" the data unit. Instances of dropping a data unit 205, referred to herein as "drops" or "packet losses," may occur for a variety of reasons, such as resource limitations, errors, or intentional policies. Different components of the device 200 may make decisions to drop data units 205 for a variety of reasons. For example, the traffic manager 240 may determine to drop a data unit 205 because, among other reasons, a buffer is overutilized, a queue exceeds a particular size, and/or the data unit 205 has particular characteristics.

2.10.多级调度器2.10. Multi-level scheduler

图3A图示了网络节点或交换机中的示例多级调度器300,该网络节点或交换机用于调度和选择网络/分组,以通过网络节点/交换机中的多个端口进行传输或传送操作。多级调度器300包括端口调度器308、用于多个端口的多个端口选择缓冲器306(表示为“端口0缓冲器”、“端口1缓冲器”、……“端口n缓冲器”)、用于多个端口或端口选择缓冲器的多个队列调度器304(表示为“QS0”、“QS1”、……“QSn”)、多组队列选择缓冲器302(表示为“P0、Q0缓冲器”、……“P0、Qm缓冲器”、“P1、Q0缓冲器”……“P1、Qm缓冲器”、……“Pn、Q0缓冲器”、……“Pn、Qm缓冲器”),队列调度器304从多组队列选择缓冲器302选择网络/分组或使网络/分组出队,以进一步入队或缓冲到多个端口选择缓冲器306中等。在一些操作场景中,图3中的这些模块、设备、系统等中的一些或全部可以部分或全部由本文中所述的网络或分组交换机来实现。在一些操作场景中,图3A中的这些模块、设备、系统等中的一些或全部可以部分或全部由本文中所述的网络或分组交换机来实现。多级调度器300中所图示的每个块、模块、设备、系统等可以共同地或单独地用一个或多个组件、子系统或设备来实现,这些组件、子系统或设备包括被配置为实现本文中描述的各种不同的逻辑组件的硬件和软件的任意组合。例如,一个或多个计算设备可以包括存储用于实现本文中描述的各种组件的指令的一个或多个存储器、被配置为执行存储在一个或多个存储器中的指令的一个或多个硬件处理器、以及用于存储由各种组件使用和操作的数据结构的一个或多个存储器中的各种数据存储库。3A illustrates an example multi-stage scheduler 300 in a network node or switch, which is used to schedule and select networks/packets for transmission or transmission operations through multiple ports in the network node/switch. The multi-stage scheduler 300 includes a port scheduler 308, multiple port selection buffers 306 for multiple ports (expressed as "port 0 buffer", "port 1 buffer", ... "port n buffer"), multiple queue schedulers 304 for multiple ports or port selection buffers (expressed as "QS0", "QS1", ... "QSn"), multiple groups of queue selection buffers 302 (expressed as "P0, Q0 buffer", ... "P0, Qm buffer", "P1, Q0 buffer", ... "P1, Qm buffer", ... "Pn, Q0 buffer", ... "Pn, Qm buffer"), and the queue scheduler 304 selects a network/packet from the multiple groups of queue selection buffers 302 or dequeues the network/packet to further queue or buffer to multiple port selection buffers 306, etc. In some operating scenarios, some or all of these modules, devices, systems, etc. in FIG. 3 may be implemented in part or in whole by a network or packet switch described herein. In some operating scenarios, some or all of these modules, devices, systems, etc. in FIG. 3A may be implemented in part or in whole by a network or packet switch described herein. Each block, module, device, system, etc. illustrated in the multi-stage scheduler 300 may be implemented jointly or individually with one or more components, subsystems, or devices, which may include any combination of hardware and software configured to implement the various different logical components described herein. For example, one or more computing devices may include one or more memories storing instructions for implementing the various components described herein, one or more hardware processors configured to execute instructions stored in one or more memories, and various data repositories in one or more memories for storing data structures used and operated by various components.

多级调度器300包括端口调度器,该端口调度器例如实时地或接近实时地在每个CPU周期或每个参考时钟周期从多个端口的多个端口选择缓冲器选择或出队特定网络/分组。多个端口选择缓冲器中的每个端口选择缓冲器可以针对多个端口中的相应端口来设置或维护,以用于选择、缓冲、存储、入队或出队网络/分组的集合的分组数据集合,以通过相应的端口传输或传送。The multi-stage scheduler 300 includes a port scheduler that selects or dequeues a specific network/packet from a plurality of port selection buffers of a plurality of ports, for example, in real time or near real time at each CPU cycle or each reference clock cycle. Each of the plurality of port selection buffers may be set or maintained for a corresponding port in the plurality of ports for selecting, buffering, storing, enqueuing or dequeuing a set of packet data of a set of networks/packets to be transmitted or transferred through the corresponding port.

多级调度器300包括多个队列调度器,该队列调度器例如实时地或接近实时地在每个CPU周期或每个参考时钟周期从多个端口的多组队列选择缓冲器选择或出队特定网络/分组。多组队列选择缓冲器中的每组选择缓冲器可以由多个队列调度器中的相应队列调度器针对多个端口中的相应端口来设置或维护,以用于选择、缓冲、存储、入队或出队网络/分组的集合的分组元数据集合,以进一步入队、存储或缓冲在相应端口的端口选择缓冲器中。The multi-stage scheduler 300 includes a plurality of queue schedulers, which select or dequeue a specific network/packet from a plurality of sets of queue selection buffers of a plurality of ports, for example, in real time or near real time at each CPU cycle or each reference clock cycle. Each set of selection buffers in the plurality of sets of queue selection buffers can be set or maintained by a corresponding queue scheduler in the plurality of queue schedulers for a corresponding port in the plurality of ports to select, buffer, store, enqueue or dequeue a set of packet metadata of a set of networks/packets to further enqueue, store or buffer in a port selection buffer of the corresponding port.

例如,第一组选择缓冲器(“P0、Q0缓冲器”、……“P0、Qm缓冲器”)可以由第一队列调度器(“QS0”)为第一端口(“端口0”)设置或维护,以用于选择、缓冲、存储、入队或出队网络/分组的集合的分组元数据集合,以进一步入队、存储或缓冲在第一端口(“端口0”)的第一端口选择缓冲器(“端口0缓冲器”)中。第二组选择缓冲器(“P1、Q0缓冲器”、……“P1、Qm缓冲器”)可以由第二队列调度器(“QS1”)为第二端口(“端口1”)设置或维护,以用于选择、缓冲、存储、入队或出队网络/分组的集合的分组元数据集合,以进一步入列、存储或缓冲在第二端口(“端口1”)的第二端口选择缓冲器(“端口1缓冲器”)中。第(n+1)组选择缓冲器(“Pn、Q0缓冲器”、……“Pn、Qm缓冲器”)可以由第(n+1)队列调度器(“QSn”)为第(n+1)端口(“端口n”)设置或维护,以用于选择、缓冲、存储、入队或出队网络/分组的集合的分组元数据集合,以进一步入队、存储或缓冲在第(m+1)端口(“端口n”)的第(n+1)端口选择缓冲器(“端口n缓冲器”)中。For example, a first set of selection buffers ("P0, Q0 buffers", ... "P0, Qm buffers") may be set or maintained by a first queue scheduler ("QS0") for a first port ("Port 0") for selecting, buffering, storing, enqueuing or dequeuing a set of packet metadata of a set of networks/packets for further enqueuing, storing or buffering in a first port selection buffer ("Port 0 buffer") of the first port ("Port 0"). A second set of selection buffers ("P1, Q0 buffers", ... "P1, Qm buffers") may be set or maintained by a second queue scheduler ("QS1") for a second port ("Port 1") for selecting, buffering, storing, enqueuing or dequeuing a set of packet metadata of a set of networks/packets for further enqueuing, storing or buffering in a second port selection buffer ("Port 1 buffer") of the second port ("Port 1"). The (n+1)th group selection buffer ("Pn, Q0 buffer", ... "Pn, Qm buffer") can be set or maintained by the (n+1)th queue scheduler ("QSn") for the (n+1)th port ("port n") to select, buffer, store, enqueue or dequeue a set of packet metadata for a set of network/packets for further enqueuing, storing or buffering in the (n+1)th port selection buffer ("port n buffer") of the (m+1)th port ("port n").

2.11.其他2.11. Others

其中的设备200或多级调度器300仅图示了被配置为提供本文中描述的功能的设备或调度器的许多可能布置中的一种。其他布置可以包括更少的、附加的或不同的组件,并且组件之间的工作划分可以根据布置而变化。此外,在实施例中,本文中描述的技术可以用于除了网络100内之外的各种计算环境中。The device 200 or multi-stage scheduler 300 therein only illustrates one of many possible arrangements of devices or schedulers configured to provide the functions described herein. Other arrangements may include fewer, additional or different components, and the division of work between components may vary depending on the arrangement. In addition, in an embodiment, the technology described herein may be used in various computing environments other than within the network 100.

此外,本文中的附图仅图示了可用于实现所描述的缓冲技术的存储器的各种布置中的一些布置。其他布置可以包括不同布置中的更少或附加元件。Furthermore, the figures herein illustrate only some of the various arrangements of memory that can be used to implement the described buffering techniques. Other arrangements may include fewer or additional elements in different arrangements.

3.0.功能概述3.0. Functional Overview

本部分中描述的是用于实现本文中描述的系统和系统组件的各种特征的各种示例方法流程或操作。示例方法流程是非穷举性的。备选方法流程和用于实现其他特征的流程将从本公开中变得显而易见。Described in this section are various example method flows or operations for implementing various features of the systems and system components described herein. The example method flows are non-exhaustive. Alternative method flows and flows for implementing other features will become apparent from this disclosure.

下面描述的处理流程或操作的各个元素可以在各种系统中执行。在实施例中,结合下面描述的功能块描述的每个过程可以在执行涉及与计算机的存储器的物理状态交互和转换的数据检索、转换和存储操作时,使用通用计算机或专用计算机中的任何一个中的一个或多个集成电路、逻辑组件、计算机程序、其他软件元件和/或数字逻辑来实现,。The various elements of the process flows or operations described below can be performed in various systems. In an embodiment, each process described in conjunction with the functional blocks described below can be implemented using one or more integrated circuits, logic components, computer programs, other software elements, and/or digital logic in any of a general purpose computer or a special purpose computer when performing data retrieval, conversion, and storage operations involving interaction and conversion of the physical state of the computer's memory.

3.1.端口和队列调度3.1.Port and Queue Scheduling

根据本文中描述的技术,分组元数据集合可以为端口的队列中的(例如,接收的、输入的)传入分组生成,并且缓冲/存储在与网络节点/交换机的端口(例如,入口、出口等)相关联的队列选择缓冲器中。可以为与端口相关联的端口选择缓冲器中的传出分组(例如,选择的、输出的、要转发的等)生成分组数据集合。According to the techniques described herein, a set of packet metadata may be generated for incoming packets (e.g., received, input) in a queue of a port and buffered/stored in a queue selection buffer associated with a port (e.g., ingress, egress, etc.) of a network node/switch. A set of packet data may be generated for outgoing packets (e.g., selected, output, to be forwarded, etc.) in a port selection buffer associated with a port.

在选择时钟周期,当网络节点/交换机的端口调度器从端口选择缓冲器选择与传出分组的子集相对应的分组数据集合的子集时,为网络节点/交换机的端口设置的分组队列的队列调度器可以同时从队列选择缓冲器中缓冲/存储的分组元数据集合选择与传入分组的子集相对应的分组元数据集合的子集。对于与传入分组的子集相对应的第二传出分组,可以由队列调度器基于分组元数据集合的子集导出、生成或确定第二分组数据集合。这些第二分组数据集合可以由队列调度器在同一时钟周期同时缓冲/存储在端口的端口选择缓冲器中。In a selected clock cycle, when the port scheduler of the network node/switch selects a subset of the packet data set corresponding to the subset of the outgoing packets from the port selection buffer, the queue scheduler of the packet queue set for the port of the network node/switch can simultaneously select a subset of the packet metadata set corresponding to the subset of the incoming packets from the packet metadata set buffered/stored in the queue selection buffer. For a second outgoing packet corresponding to the subset of the incoming packets, a second packet data set can be derived, generated or determined by the queue scheduler based on the subset of the packet metadata set. These second packet data sets can be buffered/stored in the port selection buffer of the port by the queue scheduler at the same clock cycle.

3.2.端口或队列合格3.2. Port or queue qualified

本文中所述的端口调度器在网络节点/交换机中执行端口调度操作,以仲裁对在网络节点/交换机的一些或所有合格(例如,出口、入口、物理、逻辑等)端口上的整体(例如,出口、入口、传送、传输、1Tbps或大于1Tbps、400Mbps、100Mbps等)带宽的访问或分配。端口调度器可以使用端口的调度合格标准(也称为“端口合格标准”)的集合来确定端口用于从端口选择缓冲器选择分组并且进一步对其执行传输或传送操作是否是合格的。这些端口合格标准可以包括但不一定限于以下中的任意一者、一些或全部(或任何组合):根据适用的端口配置文件或配置设置或操作状态,不被整形低于最大传送速率(例如,以比特、字节或分组的数目来测量等)或速率限制(rated limited)或阻止传输或传送;不被流量控制或暂停;具有传输信用,例如基于下游设备上的可用下游资源从这些下游设备发出信号;端口的所有队列中至少有一个合格的队列;等。The port scheduler described herein performs port scheduling operations in a network node/switch to arbitrate access or allocation to the overall (e.g., egress, ingress, transmission, transmission, 1Tbps or greater than 1Tbps, 400Mbps, 100Mbps, etc.) bandwidth on some or all eligible (e.g., egress, ingress, physical, logical, etc.) ports of the network node/switch. The port scheduler can use a set of scheduling eligibility criteria (also referred to as "port eligibility criteria") of the port to determine whether the port is qualified for selecting a packet from a port selection buffer and further performing a transmission or transmission operation on it. These port eligibility criteria may include, but are not necessarily limited to, any one, some or all (or any combination) of the following: according to the applicable port profile or configuration settings or operating state, not shaped below the maximum transmission rate (e.g., measured in bits, bytes or the number of packets, etc.) or rate limited (rated limited) or blocked from transmission or transmission; not flow controlled or suspended; having transmission credits, such as signals from downstream devices based on available downstream resources on downstream devices; at least one of all queues of the port is qualified; etc.

如本文中所述的队列调度器可以被配置用于网络节点/交换机的一些或所有端口中的每个特定端口,并且在网络节点/交换机中执行队列调度操作,以仲裁对在网络节点/交换机的特定端口的多个(例如,2、4、8、12等)队列中的一些或所有合格队列上的整个端口特定带宽的访问或分配(例如,出口、入口、传送、传输等)。网络节点/交换机的不同端口可以配置有相同或不同的(一个或多个)数目的队列。队列调度器可以使用端口的队列的第二调度合格标准(也称为“队列合格标准”)集合来确定队列用于从队列选择缓冲器选择分组并且进一步使用端口的端口选择缓冲器执行缓冲操作是否是合格的。这些队列合格标准可以包括但不一定限于以下中的任意一者、一些或全部(或任何组合):根据适用的队列配置文件或配置设置或操作状态,不被整形或速率限制低于适用或指定的最大传送速率或不会阻止传输或传送;未受到流量控制(例如,当端口中当前正在服务更高优先级的流量时,基于优先级的流量控制或PFC等)或阻止传输或传送;具有非负信用赤字,例如在加权赤字轮询(WDRR)或加权轮询(WRR)调度政策或实现方案下;当前不为空;先前或最后未选择的队列;等。在一些操作场景中,队列调度器可以被实现为选择分组并且不切换上下文,直到完成与分组相关的所有选择。在一些其他操作场景中,队列调度器可以被实现为选择多个分组并且切换上下文,而无需等待与特定分组相关的所有选择完成。A queue scheduler as described herein may be configured for each specific port in some or all ports of a network node/switch, and performs queue scheduling operations in the network node/switch to arbitrate access or allocation (e.g., egress, ingress, transmission, transport, etc.) to the entire port-specific bandwidth on some or all eligible queues in a plurality of (e.g., 2, 4, 8, 12, etc.) queues of a specific port of the network node/switch. Different ports of the network node/switch may be configured with the same or different (one or more) numbers of queues. The queue scheduler may use a second scheduling eligibility criteria (also referred to as a "queue eligibility criteria") set for the queues of the port to determine whether the queue is qualified for selecting packets from the queue selection buffer and further performing buffering operations using the port selection buffer of the port. These queue eligibility criteria may include, but are not necessarily limited to, any, some, or all (or any combination) of the following: not shaped or rate limited below an applicable or specified maximum transmission rate or blocked from transmission or delivery, according to an applicable queue profile or configuration setting or operating state; not subject to flow control (e.g., priority-based flow control or PFC, etc., when a higher priority traffic is currently being served in the port) or blocked from transmission or delivery; having a non-negative credit deficit, such as under a weighted deficit round-robin (WDRR) or weighted round-robin (WRR) scheduling policy or implementation; not currently empty; a previously or last unselected queue; etc. In some operating scenarios, the queue scheduler may be implemented to select a packet and not switch context until all selections associated with the packet are completed. In some other operating scenarios, the queue scheduler may be implemented to select multiple packets and switch contexts without waiting for all selections associated with a particular packet to be completed.

诸如视频、数据中心流量等的不同的(例如,最终用户、非用户等)流量类型可能受到速率限制或不同地整形。在一些操作场景中,到数据中心中的特定处理器或设备或服务器的数据或传送速率可以被具体地速率限制或整形,这与相同数据中心中到不同处理器或不同设备或不同服务器的其他数据/传送速率不同。Different (e.g., end-user, non-user, etc.) traffic types such as video, data center traffic, etc. may be rate limited or shaped differently. In some operational scenarios, data or transfer rates to a particular processor or device or server in a data center may be specifically rate limited or shaped differently than other data/transfer rates to different processors or different devices or different servers in the same data center.

如本文中描述的流控制可以但不一定仅限于基于优先级,例如严格优先级。来自接收业务流的下游应用的流量控制(PFC)可能取决于特定的调度负载。信用赤字可以用于例如在WDRR队列调度策略或规则下使用令牌和/或阈值来将加权带宽分布分配或指定给不同的队列或相应的业务流。队列或相应业务流的负信用赤字导致队列或业务流被确定为不适合队列选择。Flow control as described herein may be, but is not necessarily limited to, priority-based, such as strict priority. Flow control (PFC) from downstream applications receiving a traffic flow may depend on a particular scheduling load. Credit deficits may be used, for example, to allocate or assign weighted bandwidth distribution to different queues or corresponding traffic flows using tokens and/or thresholds under a WDRR queue scheduling policy or rule. A negative credit deficit for a queue or corresponding traffic flow results in the queue or traffic flow being determined to be ineligible for queue selection.

3.3.工作节省3.3. Work Savings

本文中描述的一些或所有调度器(例如端口或队列调度器)可以被实现为工作节省调度器,其寻找保持调度的(一个或多个)资源繁忙或在每个时钟周期使用资源——调度的(一个或多个)资源的访问由调度器仲裁。因此,响应于确定存在用于分组选择(或出队)的合格分组,这样的工作节省调度器不会使(一个或多个)资源保持未使用或空闲(例如,不使用(一个或多个)资源等),而是选择或出队部分或全部合格的分组,以进行使用(一个或多个)资源的进一步操作。Some or all of the schedulers described herein (e.g., port or queue schedulers) may be implemented as work-saving schedulers that seek to keep the scheduled resource(s) busy or use the resource(s) on each clock cycle—access to the scheduled resource(s) being arbitrated by the scheduler. Thus, in response to determining that there are eligible packets for packet selection (or dequeuing), such a work-saving scheduler does not keep the resource(s) unused or idle (e.g., does not use the resource(s), etc.), but rather selects or dequeues some or all of the eligible packets for further operation using the resource(s).

本文中描述的技术可以被实现为确保当端口的队列包含分组或其信元时,即使当其端口速度相对较高时,诸如一个或多个数百兆比特每秒(Mbps)、一个或多个兆比特每秒(Tbps)等,端口也不空闲并且准备以最小化的延迟发送分组。对于这样的端口,分组到达和分组传输之间的时间需要相对较低,诸如对于小分组大小为几纳秒或者甚至更短。The techniques described herein may be implemented to ensure that when a port's queue contains packets or its cells, the port is not idle and is ready to transmit packets with minimized latency, even when its port speed is relatively high, such as one or more hundreds of megabits per second (Mbps), one or more terabits per second (Tbps), etc. For such ports, the time between packet arrival and packet transmission needs to be relatively low, such as a few nanoseconds or even shorter for small packet sizes.

为了最小化相对高速度/带宽操作的时间或延迟,调度器可以在运行时的所有时间或时钟周期维护或保持与分组排队/出队操作相关的最新状态可用。由调度器维护以促进高速度/带宽操作的状态可以包括(在任何给定时间或时钟周期):从队列和/或端口选择缓冲器的分组的当前调度信元或传送的出队是否将导致队列和/或端口选择缓冲器变空;当前调度信元或传送是否表示分组的最终传送(例如,分组结束或EOP等);等。In order to minimize the time or delay of relatively high speed/bandwidth operation, the scheduler can maintain or keep the latest state related to the packet queuing/dequeuing operation available at all times or clock cycles during runtime. The state maintained by the scheduler to facilitate high speed/bandwidth operation may include (at any given time or clock cycle): whether the dequeue of the current scheduled cell or transmission of the packet from the queue and/or port selection buffer will cause the queue and/or port selection buffer to become empty; whether the current scheduled cell or transmission represents the final transmission of the packet (e.g., end of packet or EOP, etc.); etc.

包括与多个队列调度器一起操作的端口调度器的多级调度器可以用于支持相对高速和低延迟的分组传输或传送操作。这些调度器可以独立地在它们各自的选择缓冲器中获取或出队与网络/分组相对应的缓冲器条目。例如,当端口调度器的定时循环从对应的端口选择缓冲器(例如,其头部等)选择、获取或出队端口或特定端口选择缓冲器条目以进行传输或传送时,队列调度器中在相应的单独且独立的定时循环中的一些或全部队列调度器可以同时从队列选择缓冲器(的头部)选择、获取或出队特定的队列选择缓冲器条目,以进一步入队到端口选择缓冲器(例如,尾部等)。在本文中描述的技术下,与解析端口和/或队列状态相关联的定时循环彼此解耦。端口选择缓冲器可以随时用于存储由队列调度器做出的选择(例如,在端口选择缓冲器的尾部等),而无需任何等待,而相同的端口选择缓冲器——或合格的非空端口选择缓冲器——可以随时由端口调度器用于选择分组以进行传输或传送操作,而无需任何等待。用于端口调度操作和队列调度操作的定时循环可以以单个相对高的时钟速率操作,或者备选地,可以以两个或更多个不同的时钟速率操作。例如,端口调度操作可以被实现为支持与队列调度操作的时钟速率相比相对高的时钟速率。The multi-stage scheduler including the port scheduler operated with multiple queue schedulers can be used to support relatively high-speed and low-delay packet transmission or transmission operation.These schedulers can independently obtain or dequeue the buffer entry corresponding to the network/packet in their respective selection buffers.For example, when the timing loop of the port scheduler selects, obtains or dequeues the port or specific port to select the buffer entry for transmission or transmission from the corresponding port selection buffer (for example, its head, etc.), the queue scheduler in the corresponding separate and independent timing loops Some or all of the queue schedulers can simultaneously select, obtain or dequeue specific queue selection buffer entries from the queue selection buffer (head), to further enter the queue to the port selection buffer (for example, tail, etc.).Under the technology described in this article, the timing loops associated with the parsing port and/or queue state are decoupled from each other.The port selection buffer can be used to store the selection made by the queue scheduler at any time (for example, at the tail of the port selection buffer, etc.), without any waiting, and the same port selection buffer-or qualified non-empty port selection buffer-can be used by the port scheduler to select packets for transmission or transmission operation at any time, without any waiting. The timing loops for port scheduling operations and queue scheduling operations can operate at a single relatively high clock rate, or alternatively, can operate at two or more different clock rates. For example, the port scheduling operation can be implemented to support a relatively high clock rate compared to the clock rate of the queue scheduling operation.

3.4.队列选择数据流3.4. Queue selection data flow

图3B图示了与网络节点/交换机中的多个端口中的端口(例如,“端口0”等)的多组队列选择缓冲器302中的一组队列选择缓冲器(例如,“端口0、Q0缓冲器”、“端口0、Qm缓冲器”等)以及多个队列调度器304中的队列调度器(例如,“QS0”等)相关的示例数据流FIG. 3B illustrates an example data flow associated with a set of queue selection buffers (e.g., “Port 0, Q0 buffers,” “Port 0, Qm buffers,” etc.) of a plurality of sets of queue selection buffers 302 of a port (e.g., “Port 0,” etc.) of a plurality of ports in a network node/switch and a queue scheduler (e.g., “QS0,” etc.) of a plurality of queue schedulers 304.

可以在网络节点/交换机中的多个端口的网络节点/交换机中设置或维护多组队列(例如,出口、入口、图2的245、图2的225等)。可以针对多个端口中的相应端口来设置或维护多组队列中的每组队列。例如,可以针对多个端口中的端口(“端口0”)设置多组队列中的一组队列,以用于将等待通过该端口(“端口0”)传输或传送的网络/分组入队和出队。Multiple groups of queues (e.g., egress, ingress, 245 of FIG. 2 , 225 of FIG. 2 , etc.) may be set or maintained in a network node/switch of multiple ports in the network node/switch. Each group of queues in the multiple groups of queues may be set or maintained for a corresponding port in the multiple ports. For example, a group of queues in the multiple groups of queues may be set for a port (“port 0”) in the multiple ports to queue and dequeue networks/packets waiting to be transmitted or transmitted through the port (“port 0”).

为端口(“端口0”)设置或维护的队列组中的每个队列可以(例如,1-1等)与用于端口(“端口0”)的队列选择缓冲器组中的相应队列选择缓冲器相对应。Each queue in the queue group set or maintained for the port ("port 0") may correspond (eg, 1-1, etc.) to a corresponding queue selection buffer in the queue selection buffer group for the port ("port 0").

例如,队列组中的队列可以与端口(“端口0”)的队列选择缓冲器组中的队列选择缓冲器302-0-0(“端口0、Q0缓冲器”)相对应。队列可以包括队列条目的集合,队列条目的集合中的每个队列条目可以对应于将要通过端口(“端口0”)传输或传送的网络/分组的集合——其可以被称为入队在队列中的网络/分组。可以从队列选择网络/分组的子集(例如,队列的头部的前几个、最高优先级等)并且将其入队到与队列相对应的队列选择缓冲器302-0-0(“端口0、Q0缓冲器”)。For example, a queue in a queue group may correspond to a queue selection buffer 302-0-0 ("Port 0, Q0 buffer") in a queue selection buffer group for a port ("Port 0"). A queue may include a collection of queue entries, each of which may correspond to a collection of networks/packets to be transmitted or transferred through the port ("Port 0") - which may be referred to as networks/packets enqueued in the queue. A subset of networks/packets may be selected from the queue (e.g., the first few at the head of the queue, the highest priority, etc.) and enqueued to a queue selection buffer 302-0-0 ("Port 0, Q0 buffer") corresponding to the queue.

队列调度器(“QS0”)在每个CPU周期或每个参考时钟周期从(“端口0”)的队列选择缓冲器组中(例如,实时或接近实时地)选择或出队特定网络/分组。由队列调度器(“QS0”)从队列选择缓冲器的组选择的特定网络/分组还被相应的端口选择缓冲器(“端口0缓冲器”)中的队列调度器(“QS0”)缓冲、存储或入队。与用于端口(“端口0”)的队列组中的队列相比,队列中的所有分组中的分组的子集可以具有存储或缓冲在相应的队列选择缓冲器中的分组元数据/信息,以供队列快速访问——例如,可以针对分组的相对小的子集而不是队列中的所有分组进行选择。The queue scheduler ("QS0") selects or dequeues a specific network/packet from the group of queue selection buffers ("Port 0") at each CPU cycle or each reference clock cycle (e.g., in real time or near real time). The specific network/packet selected by the queue scheduler ("QS0") from the group of queue selection buffers is also buffered, stored, or enqueued by the queue scheduler ("QS0") in the corresponding port selection buffer ("Port 0 buffer"). Compared to the queues in the queue group for the port ("Port 0"), a subset of all packets in the queue can have packet metadata/information stored or buffered in the corresponding queue selection buffer for fast access by the queue - for example, selection can be made for a relatively small subset of packets rather than all packets in the queue.

因此,网络节点/交换机可以使用端口的队列选择缓冲器组来克服存储器读取延迟,以获取下一个(一个或多个)分组用于经由端口进行传输或传送操作。Thus, a network node/switch may use a port's queue selection buffer bank to overcome memory read latency to obtain the next packet(s) for transmission or transfer operations via the port.

例如,队列调度器(“QS0”)可以例如实时或接近实时地在每个CPU周期或每个参考时钟周期从表示或入队在队列选择缓冲器302-0-0(“端口0、Q0缓冲器”)中的网络/分组选择或出队一个或多个网络/分组,并且进一步将这些选择的/出队的网络/分组从端口(“端口0”)的端口选择缓冲器(“端口0缓冲器”)中的队列选择缓冲器(“端口0、Q0缓冲器”)缓冲/存储/入队。For example, the queue scheduler ("QS0") can select or dequeue one or more networks/packets from the networks/packets represented or enqueued in the queue selection buffer 302-0-0 ("Port 0, Q0 buffer"), for example, in real time or near real time, every CPU cycle or every reference clock cycle, and further buffer/store/enqueue these selected/dequeued networks/packets from the queue selection buffer ("Port 0, Q0 buffer") in the port selection buffer ("Port 0 buffer") of the port ("Port 0").

如图3B所示,对应于队列调度器(“QS0”)的队列选择缓冲器组(“P0、Q0缓冲器”……“P0、Qm缓冲器”)中的队列选择缓冲器(“P0、Q0缓冲器”)可以包括呈由图3B中的竖直虚线箭头表示的特定顺序的缓冲器条目310的集合(被表示为“端口0、Q0缓冲器条目”)。缓冲器条目310的集合可以对应于或指定当前缓冲/存储/入队在队列选择缓冲器(“P0、Q0缓冲器”)中的网络/分组的集合。As shown in FIG3B , a queue selection buffer (“P0, Q0 buffer”) in a queue selection buffer group (“P0, Q0 buffer” … “P0, Qm buffer”) corresponding to a queue scheduler (“QS0”) may include a set of buffer entries 310 (denoted as “port 0, Q0 buffer entries”) in a specific order represented by the vertical dashed arrows in FIG3B . The set of buffer entries 310 may correspond to or specify a set of networks/packets currently buffered/stored/enqueued in the queue selection buffer (“P0, Q0 buffer”).

分组/信元特定队列控制信息可以被包括在存储在如本文中描述的队列选择缓冲器条目中的分组或其信元的分组元数据中,以驱动或促进队列调度器的(例如,内部的等)操作。分组/信元特定队列控制信息可以包括但不一定仅限于:一个或多个数据地址以及与该分组相关的任意附加元数据。例如,可以通过大于一(1)的分组副本计数的值在分组/信元特定队列控制信息中指示或指定多播分组。诸如信元/传送计数、(一个或多个)分组/信元/传送大小等的附加信息或分组元数据也可以在分组/信元特定队列控制信息中指示或指定。Packet/cell specific queue control information may be included in the packet metadata of a packet or its cells stored in a queue selection buffer entry as described herein to drive or facilitate the (e.g., internal, etc.) operation of a queue scheduler. The packet/cell specific queue control information may include, but is not necessarily limited to: one or more data addresses and any additional metadata associated with the packet. For example, a multicast packet may be indicated or specified in the packet/cell specific queue control information by a value greater than one (1) for a packet copy count. Additional information or packet metadata such as a cell/transmission count, (one or more) packet/cell/transmission sizes, etc. may also be indicated or specified in the packet/cell specific queue control information.

因此,不需要将网络/分组的所有分组数据或信息存储在与分组对应的如本文中描述的在队列选择缓冲器(“P0,Q0缓冲器”)中的缓冲器条目中。相反,与网络/分组相关的与分组的数据大小相比具有相对小的数据大小的分组元数据集合可以存储或维护在缓冲器条目中。存储在对应分组的缓冲器条目中的分组元数据集合可以包括与分组相关的一些或全部分组控制数据或分组数据,诸如分组(pkt)传送计数、分组副本计数等。网络/分组的分组元数据集合中的分组副本计数指的是为网络/分组生成和/或传送的网络/分组的分组副本的数目(例如,总数、总剩余等)。网络/分组的分组元数据集合中的分组传送计数指的是针对网络/分组的每个分组副本执行的调度器选择的数目(例如,总数、总剩余等)。Therefore, it is not necessary to store all packet data or information of a network/packet in a buffer entry in a queue selection buffer ("P0, Q0 buffer") as described herein corresponding to the packet. Instead, a set of packet metadata related to the network/packet having a relatively small data size compared to the data size of the packet may be stored or maintained in the buffer entry. The set of packet metadata stored in the buffer entry of the corresponding packet may include some or all packet control data or packet data related to the packet, such as a packet (pkt) transmission count, a packet copy count, and the like. The packet copy count in the packet metadata set of a network/packet refers to the number of packet copies of the network/packet generated and/or transmitted for the network/packet (e.g., total number, total remaining, etc.). The packet transmission count in the packet metadata set of a network/packet refers to the number of scheduler selections performed for each packet copy of the network/packet (e.g., total number, total remaining, etc.).

网络/分组可以在从网络/分组导出的一个或多个信元或传送中或与其一起传输或传送;网络/分组的每个信元或传送承载网络/分组中的分组数据的对应部分。存储在缓冲器条目中的调度器选择的数目对应于或表示网络/分组的信元或传送的数目。调度器选择中的每个调度器选择对应于或表示网络/数据分组的信元或传送中的相应信元。The network/packet may be transmitted or transferred in or with one or more cells or transfers derived from the network/packet; each cell or transfer of the network/packet carries a corresponding portion of the packet data in the network/packet. The number of scheduler selections stored in the buffer entry corresponds to or represents the number of cells or transfers of the network/packet. Each scheduler selection in the scheduler selections corresponds to or represents a corresponding cell in a cell or transfer of the network/data packet.

分组元数据集合可以包括附加分组信息或数据,以促进选择或出队操作。当网络/分组的每个分组副本的所有调度器选择已经由队列调度器(“QS0”)做出时,网络/分组从队列选择缓冲器(“P0、Q0缓冲器”)或从相应的队列出队。The packet metadata set may include additional packet information or data to facilitate the selection or dequeue operation. When all scheduler selections for each packet copy of a network/packet have been made by the queue scheduler ("QS0"), the network/packet is dequeued from the queue selection buffer ("P0, Q0 buffer") or from the corresponding queue.

如图3B所示,队列选择缓冲器302-0-0(“P0、Q0缓冲器”)中的缓冲器条目310当前可以分别缓冲或存储四个网络/分组的四个分组元数据集合。第一缓冲器条目——或者队列选择缓冲器302-0-0(“P0、Q0缓冲器”)中的缓冲器条目的头部——可以为四个网络/分组中的第一个网络/分组存储第一分组元数据集合(“Pkt传送计数=3”且“Pkt副本计数=1”)。第二缓冲器条目——或者队列选择缓冲器302-0-0(“P0、Q0缓冲器”)中的下一个缓冲器条目——可以为四个网络/分组中的第二个网络/分组存储第二分组元数据集合(“Pkt传送计数=9”且“Pkt副本计数=1”)。队列选择缓冲器302-0-0(“P0、Q0缓冲器”)中的缓冲器条目中的第三缓冲器条目可以为四个网络/分组中的第三个网络/分组存储第三分组元数据集合(“Pkt传送计数=4”且“Pkt副本计数=3”)。第四缓冲器条目——或者队列选择缓冲器302-0-0(“P0、Q0缓冲器”)中的缓冲器条目的尾部——可以为四个网络/分组中的第四个网络/分组存储第四分组元数据集合(“Pkt传送计数=1”且“Pkt副本计数=1”)。As shown in Figure 3B, the buffer entry 310 in the queue selection buffer 302-0-0 ("P0, Q0 buffer") can currently buffer or store four sets of packet metadata for four networks/packets, respectively. The first buffer entry - or the head of the buffer entry in the queue selection buffer 302-0-0 ("P0, Q0 buffer") - can store a first set of packet metadata for the first of the four networks/packets ("Pkt transmission count = 3" and "Pkt copy count = 1"). The second buffer entry - or the next buffer entry in the queue selection buffer 302-0-0 ("P0, Q0 buffer") - can store a second set of packet metadata for the second of the four networks/packets ("Pkt transmission count = 9" and "Pkt copy count = 1"). The third buffer entry in the buffer entries in the queue selection buffer 302-0-0 ("P0, Q0 buffer") can store a third set of packet metadata for the third of the four networks/packets ("Pkt transmission count = 4" and "Pkt copy count = 3"). The fourth buffer entry - or the tail of the buffer entry in the queue selection buffer 302-0-0 ("P0, Q0 buffer") - can store a fourth set of packet metadata ("Pkt transmit count = 1" and "Pkt copy count = 1") for the fourth of the four networks/packets.

3.5.队列状态3.5. Queue Status

如图3B所示,除了缓冲器条目310(“端口0、Q0缓冲器条目”)之外,队列选择缓冲器302-0-0(“端口0、Q0缓冲器”)可以用于存储队列选择信息/数据312。队列选择缓冲器302-0-0(“端口0、Q0缓冲器”)中的队列选择信息/数据312可以包括但不一定仅限于以下中的任意、一些或全部:空状态、拾取时空状态、拾取时EOP状态等。As shown in FIG3B , in addition to buffer entry 310 (“Port 0, Q0 buffer entry”), queue selection buffer 302-0-0 (“Port 0, Q0 buffer”) may be used to store queue selection information/data 312. The queue selection information/data 312 in queue selection buffer 302-0-0 (“Port 0, Q0 buffer”) may include, but is not necessarily limited to, any, some, or all of the following: empty state, empty state at pick-up, EOP state at pick-up, etc.

空状态可以由队列调度器(“QS0”)设置和/或使用,以(例如,在给定的CPU或时钟周期等)确定从其选择网络/分组的子集以具有存储在队列选择缓冲器302-0-0(“端口0、Q0缓冲器”)的缓冲器条目310(“端口0、Q0缓冲器条目”)中的分组元数据的队列当前是否为空。The empty state can be set and/or used by the queue scheduler (“QS0”) to determine (e.g., at a given CPU or clock cycle, etc.) whether the queue from which a subset of networks/packets is selected to have packet metadata stored in buffer entry 310 (“Port 0, Q0 buffer entry”) of queue selection buffer 302-0-0 (“Port 0, Q0 buffer”) is currently empty.

队列调度器(“QS0”)可以设置和/或使用拾取时空状态,以(例如,在给定的CPU或时钟周期等)确定由队列调度器(“QS0”)从队列选择缓冲器302-0-0(“端口0、Q0缓冲器”)做出的网络/分组的当前选择或分组的信元/传送的当前选择是否将使队列或其空状态转换为空或真。The queue scheduler ("QS0") can set and/or use the pick-time empty state to determine (e.g., at a given CPU or clock cycle, etc.) whether the current selection of a network/packet or the current selection of a cell/transmission of a packet made by the queue scheduler ("QS0") from queue selection buffer 302-0-0 ("Port 0, Q0 buffer") will cause the queue or its empty state to transition to empty or true.

拾取时EOP状态可以由队列调度器(“QS0”)设置和/或使用,以(例如,在给定的CPU或时钟周期等)确定由队列调度器(“QS0”)从队列选择缓冲器302-0-0(“端口0、Q0缓冲器”)做出的网络/分组的当前选择或分组的信元/传送的当前选择是否将使分组的最终或最后的传送发生或到达分组的分组尾部(EOP)。The EOP state at pick time can be set and/or used by the queue scheduler ("QS0") to determine (e.g., at a given CPU or clock cycle, etc.) whether the current selection of a network/packet or the current selection of a cell/transmission of a packet made by the queue scheduler ("QS0") from queue selection buffer 302-0-0 ("Port 0, Q0 buffer") will cause the final or last transmission of the packet to occur or reach the end of packet (EOP) of the packet.

例如,在其有效负载中携带9千字节的分组可以被划分或分为分别在9个信元或传送中携带的9个单独的有效负载部分。在由如本文中描述的队列调度器对端口的多个队列或队列选择缓冲器中的第一队列或对应的第一队列选择缓冲器执行的调度操作中,队列调度器可以确定分组的当前选择的信元或传送是否是将要从第一队列或队列选择缓冲器选择的并且将要由端口传输或传送的分组的倒数第二个信元或传送。如果确定是这样,则队列调度器可以将拾取时EOP状态设置为真,并且通过在下一个时钟周期中选择分组的最后的信元或传送而不是另一分组的另一信元或传送来继续向分组提供选择偏好。另一方面,如果没有确定是这样,则队列调度器可以将拾取时EOP状态设置或保持为假,并且继续在下一个时钟循环中选择队列或队列选择缓冲器中的分组中的一个的信元或传送或其他分组,而不是基于拾取时EOP状态向分组给定选择偏好。结果是,与通过端口的其他方式相比,可以相对快地传输或传送分组或其整个有效负载。For example, a packet carrying 9 kilobytes in its payload can be divided or divided into 9 separate payload parts carried in 9 cells or transmissions, respectively. In the scheduling operation performed by the first queue or the corresponding first queue selection buffer in the multiple queues or queue selection buffers of the port by the queue scheduler as described herein, the queue scheduler can determine whether the cell or transmission of the current selection of the packet is the penultimate cell or transmission of the packet to be selected from the first queue or queue selection buffer and to be transmitted or transmitted by the port. If it is determined to be so, the queue scheduler can set the EOP state to true when picking up, and continue to provide a selection preference to the packet by selecting the last cell or transmission of the packet in the next clock cycle instead of another cell or transmission of another packet. On the other hand, if it is not determined to be so, the queue scheduler can set or maintain the EOP state to false when picking up, and continue to select a cell or transmission or other packets in the packet in the queue or queue selection buffer in the next clock cycle, rather than giving a selection preference to the packet based on the EOP state when picking up. As a result, compared with other ways through the port, a packet or its entire payload can be transmitted or transmitted relatively quickly.

附加地、可选地或备选地,在由本文中描述的队列调度器对端口的多个队列或队列选择缓冲器中的第一队列或对应的第一队列选择缓冲器执行的调度操作中,队列调度器可以确定当前选择的分组是否是将要从第一队列或队列选择缓冲器选择并且将要由端口传输或传送的最后的分组。如果确定是这样,则队列调度器可以将拾取时空状态设置为真。队列调度器可以容易地(或无需等待地)在下一个时钟周期中进入下一个合格队列,以避免无意中服务于空队列或队列选择缓冲器(饥饿)并且在下一个时钟周期无法为端口选择或产生分组或其信元。另一方面,如果没有确定是这样,则队列调度器可以将拾取时空状态设置或保持为假。队列调度器可以在下一个时钟周期进入相同的合格队列中的一个或任何其他合格队列(所有非空且拾取时空状态为假)。Additionally, optionally or alternatively, in a scheduling operation performed by a queue scheduler described herein on a first queue or a corresponding first queue selection buffer of a plurality of queues or queue selection buffers of a port, the queue scheduler may determine whether the currently selected packet is the last packet to be selected from the first queue or queue selection buffer and to be transmitted or transmitted by the port. If it is determined to be so, the queue scheduler may set the pickup time-space state to true. The queue scheduler may easily (or without waiting) enter the next qualified queue in the next clock cycle to avoid inadvertently serving an empty queue or queue selection buffer (starvation) and being unable to select or generate a packet or its cell for the port in the next clock cycle. On the other hand, if it is not determined to be so, the queue scheduler may set or maintain the pickup time-space state to false. The queue scheduler may enter one or any other qualified queue (all non-empty and the pickup time-space state is false) in the same qualified queue in the next clock cycle.

作为在队列调度操作中使用这些状态的结果,可以使用或实现端口的队列调度器来保持端口的端口传输(活动的或没有饥饿的)。这可以帮助防止或显著减少分组传输/传送操作中的时间延迟,包括但不限于与批量传送、多个分组分组(packet grouping)、多个选择分组操作等相关的操作。As a result of using these states in queue scheduling operations, a queue scheduler for a port can be used or implemented to keep the port transmissions (active or not starved) of the port. This can help prevent or significantly reduce time delays in packet transmission/transmission operations, including but not limited to operations related to bulk transmission, multiple packet grouping, multiple select packet operations, etc.

3.6.队列选择策略3.6. Queue Selection Strategy

每个队列调度器可以实现一个或多个队列选择策略,例如在用于队列调度器的队列调度器配置中阐述的。作为示例而非限制,用于端口(“端口0”)的队列组中的一些或所有队列中的每个队列可以被分配定义在队列中表示的网络/分组之间的调度顺序的(队列)服务规则。调度顺序的示例可以包括但不一定仅限于以下中的任意一项:严格优先级、加权赤字轮询或WDRR、加权公平排队或WFQ等。Each queue scheduler may implement one or more queue selection policies, such as those set forth in the queue scheduler configuration for the queue scheduler. By way of example and not limitation, each of some or all of the queues in the queue group for a port ("port 0") may be assigned a (queue) service rule that defines a scheduling order between the networks/packets represented in the queue. Examples of scheduling order may include, but are not necessarily limited to, any of the following: strict priority, weighted deficit round robin or WDRR, weighted fair queuing or WFQ, etc.

附加地、可选地或备选地,端口(“端口0”)的队列组中的一些或所有队列中的每个队列可以被分配相应的最小带宽(MinBW)保证,以确保队列至少接收来自多级调度器的分配的最小带宽量(避免饥饿)(如果队列提供或包含高于相应的最小带宽量可以支持的足够量的分组数据用于传输或传送)。Additionally, optionally or alternatively, each queue in some or all of the queues in the queue group for the port ("Port 0") may be assigned a corresponding minimum bandwidth (MinBW) guarantee to ensure that the queue receives at least the allocated minimum amount of bandwidth from the multi-level scheduler (avoiding starvation) if the queue provides or contains sufficient amount of packet data for transmission or delivery that can be supported by more than the corresponding minimum amount of bandwidth.

附加地、可选地或备选地,端口(“端口0”)的队列组中的一些或所有队列中的每个队列可以被分配相应的最大带宽(MaxBW)限制,以确保由多级调度器提供给队列消耗的(总的或累积的)带宽量带宽被限制为或不超过特定的最大带宽。Additionally, optionally, or alternatively, each queue in some or all of the queues in the queue group of the port ("Port 0") may be assigned a corresponding maximum bandwidth (MaxBW) limit to ensure that the amount of bandwidth (total or cumulative) provided to the queue for consumption by the multi-level scheduler is limited to or does not exceed a particular maximum bandwidth.

在一些操作场景中,本文中描述的队列调度器可以维护或保持跟踪一个或多个(队列调度器)服务列表。单独的或不同的服务列表可以用于对端口的队列组中的队列的不同组合或不同子集进行分仓(bin)或分组。由队列调度器维护的一些或所有服务列表可以基于端口的队列组中的任意、一些或所有队列的一个或多个配置的属性来设置、配置、指定或定义。队列调度器可以使用服务列表中的一些或全部来建立或确定队列调度器服务队列组中的队列或服务与队列组对应的队列选择缓冲器组中的队列选择缓冲器的特定服务顺序。In some operating scenarios, the queue scheduler described herein may maintain or keep track of one or more (queue scheduler) service lists. Separate or different service lists may be used to bin or group different combinations or different subsets of queues in a queue group of a port. Some or all of the service lists maintained by the queue scheduler may be set, configured, specified, or defined based on one or more configured attributes of any, some, or all queues in a queue group of a port. The queue scheduler may use some or all of the service lists to establish or determine a specific service order for queues in a queue scheduler service queue group or a queue selection buffer in a queue selection buffer group corresponding to a queue group.

在一些操作场景中,队列调度器服务列表可以包括指定队列的相应最小带宽的minBW服务列表。minBW服务列表可以首先由队列调度器来服务,以确定尚未满足minBW保证的合格队列的特定列表,并且寻找做出选择以满足这些minBW保证的目的。特定列表中的合格队列可以按(例如,分组级而不是信元级等)轮询顺序来服务。In some operational scenarios, the queue scheduler service list may include a minBW service list that specifies the corresponding minimum bandwidth of the queue. The minBW service list may be first serviced by the queue scheduler to determine a particular list of eligible queues that have not yet met the minBW guarantees, and seek to make selections to meet these minBW guarantees. The eligible queues in a particular list may be serviced in a round-robin order (e.g., packet level rather than cell level, etc.).

在一些操作场景中,队列调度器服务列表可以包括严格优先级(SP)服务列表,该严格优先级(SP)服务列表指定或提出用于服务由队列调度器管理的队列组中的合格队列的严格优先级(SP)服务/调度规则。SP服务列表可以由队列调度器来服务,以确定合格队列中的特定优先级顺序,并且以特定优先级顺序(例如,优先级(M-1)降到优先级0等)从合格队列当中进行选择。可以响应于确定minBW服务列表为空(或者所有队列已经满足它们相应的minBW保证)来服务SP服务列表。In some operational scenarios, the queue scheduler service list may include a strict priority (SP) service list that specifies or proposes strict priority (SP) service/scheduling rules for servicing eligible queues in a queue group managed by the queue scheduler. The SP service list may be serviced by the queue scheduler to determine a specific priority order among eligible queues and to select from among the eligible queues in a specific priority order (e.g., priority (M-1) down to priority 0, etc.). The SP service list may be serviced in response to determining that the minBW service list is empty (or that all queues have satisfied their corresponding minBW guarantees).

在一些操作场景中,队列调度器服务列表可以包括WDRR服务列表,该WDRR服务列表指定或提出用于服务由队列调度器管理的队列组中的合格队列的WDRR服务/调度规则。WDRR服务列表可以由队列调度器提供服务以按照(例如,分组级等)轮询顺序从合格队列中进行选择。响应于确定minBW服务列表和SP服务列表都为空(或者所有队列已经满足它们相应的minBW保证并且所有严格优先级队列已经被服务),可以最后服务WDRR服务列表。In some operational scenarios, the queue scheduler service list may include a WDRR service list that specifies or proposes WDRR service/scheduling rules for servicing eligible queues in a queue group managed by the queue scheduler. The WDRR service list may be serviced by the queue scheduler to select from eligible queues in a (e.g., packet level, etc.) polling order. In response to determining that both the minBW service list and the SP service list are empty (or all queues have satisfied their corresponding minBW guarantees and all strict priority queues have been served), the WDRR service list may be served last.

在一些操作场景中,同一队列可以出现在多个服务列表上。例如,队列可以出现在minBW服务列表和WDRR服务列表二者上。In some operation scenarios, the same queue may appear on multiple service lists. For example, a queue may appear on both the minBW service list and the WDRR service list.

3.7.选择操作3.7. Select Operation

图3C和图3D图示了与端口选择缓冲器相关的队列和端口调度器的示例操作。3C and 3D illustrate example operations of a queue and a port scheduler associated with a port selection buffer.

如图3C所示,多个队列调度器304(“QS0”至“QSn”)可以通过从多组队列选择缓冲器302(“P0、Q0缓冲器”、……“P0、Qm缓冲器”、“P1、Q0缓冲器”、……“P1、Qm缓冲器”、……“Pn、Q0缓冲器”、……“Pn、Qm缓冲器”)获取或出队与分组或其信元/传送对应的队列选择缓冲器条目的选择(或拾取),来从端口的多个队列获取或出队分组或其信元/传送的选择(或拾取)。来自队列的每个选择——在来自多个队列组的选择(或拾取)中——导致队列中的选择或拾取被出队和/或推入对应的端口选择缓冲器。端口选择缓冲器可以包含由同一端口的队列选择缓冲器组中的队列调度器做出的端口的多达N个选择或端口选择缓冲器条目。As shown in FIG. 3C , multiple queue schedulers 304 (“QS0” through “QSn”) may acquire or dequeue selections (or picks) of packets or their cells/transmissions from multiple queues of a port by acquiring or dequeuing selections (or picks) of queue selection buffer entries corresponding to packets or their cells/transmissions from multiple groups of queue selection buffers 302 (“P0, Q0 buffers,” … “P0, Qm buffers,” “P1, Q0 buffers,” … “P1, Qm buffers,” … “Pn, Q0 buffers,” … “Pn, Qm buffers”). Each selection from a queue—among selections (or picks) from multiple queue groups—results in a selection or pick in a queue being dequeued and/or pushed into a corresponding port selection buffer. A port selection buffer may contain up to N selections or port selection buffer entries for a port made by a queue scheduler in a queue selection buffer group for the same port.

如图3D所示,端口的队列调度器可以进行选择,直到端口选择缓冲器变满——例如,所有N个端口选择缓冲器条目已被用于N个分组的N个分组数据集合。可以在队列调度器上断言背压——或者可以利用队列调度器执行流量控制操作,以暂停或延迟将来自队列选择缓冲器组的与队列选择缓冲器的附加选择对应的分组数据集合进入或推送到端口选择缓冲器中。As shown in Figure 3D, the queue scheduler of the port can make selections until the port selection buffer becomes full - for example, all N port selection buffer entries have been used for N packet data sets of N packets. Back pressure can be asserted on the queue scheduler - or flow control operations can be performed with the queue scheduler to pause or delay the entry or pushing of packet data sets corresponding to additional selections of the queue selection buffer from the queue selection buffer group into the port selection buffer.

端口调度器308和队列调度器可以独立地执行它们相应的调度(选择和排队操作)。端口调度器308可以(例如,同时等)对队列调度器所对应的端口执行调度操作、以及对具有非空端口选择缓冲器和/或具有非空队列的其他(例如,仅仅等)合格端口执行调度操作。The port scheduler 308 and the queue scheduler can independently perform their corresponding scheduling (selection and queuing operations). The port scheduler 308 can (e.g., simultaneously, etc.) perform scheduling operations on the port corresponding to the queue scheduler, and perform scheduling operations on other (e.g., only, etc.) qualified ports with non-empty port selection buffers and/or non-empty queues.

在一些操作场景中,本文中描述的端口调度器和队列调度器可以被实现或配置为执行多分组调度操作。In some operating scenarios, the port schedulers and queue schedulers described herein may be implemented or configured to perform multi-packet scheduling operations.

在第一示例中,队列选择或端口选择级别的缓冲器条目可以包含每缓冲器条目的多个分组。附加地、可选地或备选地,每(端口或队列)选择缓冲器条目的分组的总数可以变化,或者可以随分组的大小变化。多分组调度操作可以用于启用或支持调度操作,以使每CPU或时钟周期的多个分组出队以获得相对高的吞吐量。In a first example, a buffer entry at a queue selection or port selection level may contain multiple packets per buffer entry. Additionally, optionally or alternatively, the total number of packets per (port or queue) selection buffer entry may vary, or may vary with the size of the packet. A multi-packet scheduling operation may be used to enable or support scheduling operations to dequeue multiple packets per CPU or clock cycle to achieve relatively high throughput.

在第二示例中,队列调度器和端口调度器中的一些或全部中的每一者可以被实现或配置为每调度事件或每CPU或时钟周期选择多个(队列或端口)选择缓冲器条目。例如,端口调度器可以被实现或配置为在单个CPU或时钟周期中从M个不同的队列或M个不同的队列选择缓冲器条目(在与端口相关联的队列选择缓冲器中)选择多达M个分组。In a second example, each of some or all of the queue scheduler and the port scheduler may be implemented or configured to select multiple (queue or port) selection buffer entries per scheduling event or per CPU or clock cycle. For example, the port scheduler may be implemented or configured to select up to M packets from M different queues or M different queue selection buffer entries (in the queue selection buffer associated with the port) in a single CPU or clock cycle.

表示在队列端口选择缓冲器或端口选择缓冲器的单个缓冲器条目或多个缓冲器条目中的多个分组可以至少部分地基于分组的优先级指示、分组的序列指示或索引/编号等在单个CPU或时钟周期中从队列选择缓冲器或端口选择缓冲器被选择或出队。Representing that multiple packets in a single buffer entry or multiple buffer entries in a queue port selection buffer or a port selection buffer can be selected or dequeued from the queue selection buffer or the port selection buffer in a single CPU or clock cycle based at least in part on a priority indication of the packets, a sequence indication or an index/number of the packets, etc.

3.8.端口束调度3.8. Port Bundle Scheduling

在一些操作场景中,为了支持相对高速的调度操作,本文中描述的多级调度器可以被实现或配置为支持端口束调度操作。如图4所示,网络节点/交换机中的多个端口的多个端口选择缓冲器306可以被(例如,互斥地等)划分为与从多个端口划分的多个不同的端口束314(例如,互斥等)对应的多个不同的端口选择缓冲器束306-1。多个不同的端口选择缓冲器束306-1中的每个端口选择缓冲器束可以包括用于构成多个不同的端口束314中的相应端口束的一个或多个端口的一个或多个端口选择缓冲器。例如,网络节点/交换机可以具有128个端口,这些端口可以被分离或划分为每个端口束有64个端口的两个端口束。In some operating scenarios, in order to support relatively high-speed scheduling operations, the multi-stage scheduler described herein may be implemented or configured to support port bundle scheduling operations. As shown in FIG. 4 , a plurality of port selection buffers 306 of a plurality of ports in a network node/switch may be divided (e.g., mutually exclusively, etc.) into a plurality of different port selection buffer bundles 306-1 corresponding to a plurality of different port bundles 314 (e.g., mutually exclusively, etc.) divided from the plurality of ports. Each of the plurality of different port selection buffer bundles 306-1 may include one or more port selection buffers for one or more ports constituting a corresponding port bundle in the plurality of different port bundles 314. For example, a network node/switch may have 128 ports, which may be separated or divided into two port bundles with 64 ports per port bundle.

多个端口调度器308-1可以被实现或配置为同时和/或独立地执行分组级或信元级调度操作,以通过多个不同的端口束314中的端口进行传输或传送操作。更具体地,多个端口调度器308-1中的每个端口调度器可以被实现或配置为执行分组级或信单元级调度操作,以与由多个端口调度器308-1中的任意其他(一个或多个)端口调度器执行的调度操作同时和/或独立地,通过多个不同的端口束314中的相应端口束中的端口进行传输或传送操作。The plurality of port schedulers 308-1 may be implemented or configured to simultaneously and/or independently perform packet-level or cell-level scheduling operations for transmission or transfer operations through ports in the plurality of different port bundles 314. More specifically, each of the plurality of port schedulers 308-1 may be implemented or configured to perform packet-level or cell-level scheduling operations for transmission or transfer operations through ports in a corresponding port bundle in the plurality of different port bundles 314 simultaneously and/or independently of scheduling operations performed by any other (one or more) of the plurality of port schedulers 308-1.

多个不同的端口调度器308-1可以被实现或配置为将相应的选择或拾取从多个不同的端口选择缓冲器束(例如,“端口0缓冲器”到“端口M缓冲器”、……“端口N缓冲器”到“端口Z缓冲器”等)做出或推送到多个端口束选择缓冲器(例如,“端口束0”到“端口束X”等)中。更具体地,多个不同的端口调度器中的每个端口调度器可以被实现或配置为将选择或拾取从多个不同的端口选择缓冲器束中的相应的端口选择缓冲器束(例如,“端口0缓冲器”到“端口M缓冲器”等)做出或推送到多个端口束选择缓冲器中的相应的端口束选择缓冲器(例如,“端口束0”等)中。如本文中描述的端口束选择缓冲器可以包含缓冲器条目,每个缓冲器条目可以具有与端口选择缓冲器中的端口选择缓冲器条目类似的数据结构。The plurality of different port schedulers 308-1 may be implemented or configured to make or push corresponding selections or picks from the plurality of different port selection buffer bundles (e.g., "Port 0 Buffer" to "Port M Buffer", ... "Port N Buffer" to "Port Z Buffer", etc.) into the plurality of port bundle selection buffers (e.g., "Port Bundle 0" to "Port Bundle X", etc.). More specifically, each of the plurality of different port schedulers may be implemented or configured to make or push selections or picks from the corresponding port selection buffer bundles (e.g., "Port 0 Buffer" to "Port M Buffer", etc.) of the plurality of different port selection buffer bundles into the corresponding port bundle selection buffer (e.g., "Port Bundle 0", etc.) of the plurality of port bundle selection buffers. The port bundle selection buffers as described herein may contain buffer entries, each of which may have a data structure similar to the port selection buffer entries in the port selection buffer.

本文中描述技术——包括但不限于端口束调度技术,可以用于执行用于以相对高的带宽或传输/传送速率传输或传送分组的调度操作。在这些技术下,可以防止或显著降低包括但不限于饥饿的定时风险。Techniques are described herein, including but not limited to port bundle scheduling techniques, which can be used to perform scheduling operations for transmitting or delivering packets at relatively high bandwidths or transmission/delivery rates. Under these techniques, timing risks including but not limited to starvation can be prevented or significantly reduced.

由于可以通过使用端口选择缓冲器和队列选择缓冲器来准备用于传输或传送的分组来以最小化的延迟来激活或调度队列,因此可以实现排队、缓冲、调度、传输和/传送中的相对低的延迟。Since queues may be activated or scheduled with minimized latency by using port selection buffers and queue selection buffers to prepare packets for transmission or delivery, relatively low latency in queuing, buffering, scheduling, transmission, and/or delivery may be achieved.

附加地、可选地或备选地,包括但不限于端口束调度316(或对应的端口束调度器)的这些技术可以缩放至网络节点/交换机的端口的相对高的单独和/或总体吞吐量或端口速率。如本文中描述的多级调度器的各种系统配置中的端口调度器和队列调度器的定时循环可以是分开的或独立的。多级调度器中的一些或所有端口调度器和队列调度器中的每一者可以支持每个时钟周期的多个出队、以相对高的频率操作、和/或以相对高的端口密度有效地工作(例如,使用端口束调度316等)。Additionally, optionally or alternatively, these techniques including but not limited to port bundle scheduling 316 (or corresponding port bundle scheduler) can be scaled to relatively high individual and/or overall throughput or port rate of the port of network node/switch. The timing loop of the port scheduler and the queue scheduler in various system configurations of the multi-stage scheduler as described herein can be separate or independent. Each of some or all of the port schedulers and the queue scheduler in the multi-stage scheduler can support multiple dequeues per clock cycle, operate at a relatively high frequency, and/or work effectively with a relatively high port density (e.g., using port bundle scheduling 316, etc.).

4.0.示例性实施例4.0. Exemplary Embodiments

图5图示了根据实施例的示例处理流程。下面描述的流程的各个元素可以由利用一个或多个计算设备实现的一个或多个网络设备来执行。在块502中,多级调度器的队列调度器将多个传入分组的多个分组元数据集合缓冲在与网络节点的端口相关联的多个队列选择缓冲器中。FIG5 illustrates an example processing flow according to an embodiment. The various elements of the flow described below may be performed by one or more network devices implemented using one or more computing devices. In block 502, a queue scheduler of a multi-stage scheduler buffers a plurality of packet metadata sets of a plurality of incoming packets in a plurality of queue selection buffers associated with a port of a network node.

在块504中,多级调度器的端口调度器将一个或多个传出分组的一个或多个分组元数据集合缓冲在与端口相关联的端口选择缓冲器中。In block 504, a port scheduler of the multi-level scheduler buffers one or more sets of packet metadata for one or more outgoing packets in a port selection buffer associated with a port.

在块506中,在选择时钟周期,在网络节点的多级调度器的端口调度器从端口选择缓冲器选择用于一个或多个传出分组的子集的一个或多个分组元数据集合的子集时,端口的队列调度器对针对端口设置的多个分组队列,从存储在多个队列选择缓冲器中的多个分组元数据集合同时选择用于多个传入分组中的一个或多个传入分组的一个或多个第二分组元数据集合。In block 506, at a selection clock cycle, when a port scheduler of a multi-level scheduler of a network node selects a subset of one or more packet metadata sets for a subset of one or more outgoing packets from a port selection buffer, a queue scheduler of the port simultaneously selects one or more second packet metadata sets for one or more incoming packets among a plurality of incoming packets from a plurality of packet metadata sets stored in a plurality of queue selection buffers for a plurality of packet queues set for the port.

在块508中,队列调度器将一个或多个第二传出分组的一个或多个第二传出分组元数据集合添加到端口的端口选择缓冲器。In block 508, the queue scheduler adds one or more sets of second outgoing packet metadata for the one or more second outgoing packets to a port selection buffer of the port.

在实施例中,多个队列选择缓冲器中的每个队列选择缓冲器与针对端口设置的多个分组队列中的相应的分组队列相关联。In an embodiment, each queue selection buffer of the plurality of queue selection buffers is associated with a corresponding packet queue of the plurality of packet queues set for the port.

在实施例中,一个或多个分组元数据集合中的每个分组元数据集合用于一个或多个传出分组中的相应的传出分组。In an embodiment, each of the one or more sets of packet metadata is for a corresponding outgoing packet of the one or more outgoing packets.

在实施例中,多个队列选择缓冲器中的每个队列选择缓冲器包括相应的每队列指示符集合。相应的每队列指示符集合包括以下中的一项或多项:队列空状态指示符、拾取时队列空指示符、拾取时分组结束指示符等。In an embodiment, each queue selection buffer in the plurality of queue selection buffers comprises a corresponding set of per-queue indicators, wherein the corresponding set of per-queue indicators comprises one or more of the following: a queue empty state indicator, a queue empty indicator at pickup, a packet end indicator at pickup, and the like.

在实施例中,多个分组元数据集合中的每个分组元数据集合用于多个传入分组中的相应的传入分组;相应的传入分组的分组元数据集合包括以下中的一项或多项:分组副本计数、分组传送计数等。In an embodiment, each of the plurality of packet metadata sets is for a corresponding incoming packet of the plurality of incoming packets; the packet metadata set of the corresponding incoming packet includes one or more of the following: a packet copy count, a packet transfer count, etc.

在实施例中,端口属于网络节点的多个端口;多级调度器还执行:将用于第二多个传入分组的第二多个分组元数据集合缓冲在与网络节点的多个端口中的第二端口相关联的第二多个队列选择缓冲器中;将用于一个或多个第二传出分组的一个或多个第三分组元数据集合缓冲在与第二端口相关联的第二端口选择缓冲器中;在选择时钟周期,在由网络节点的端口调度器从第二端口选择缓冲器选择用于一个或多个第二传出分组的第二子集的一个或多个第三分组元数据集合的第二子集时,同时通过第二端口的第二队列调度器针对第二多个分组队列执行:从存储在第二多个队列选择缓冲器中的第二多个分组元数据集合之中选择用于第二多个传入分组中的一个或多个第二传入分组的一个或多个第四分组元数据集合;将一个或多个第四传出分组的一个或多个第四分组元数据集合添加到第二端口的第二端口选择缓冲器。In an embodiment, the port belongs to multiple ports of a network node; the multi-level scheduler also executes: buffering a second plurality of packet metadata sets for a second plurality of incoming packets in a second plurality of queue selection buffers associated with a second port among the multiple ports of the network node; buffering one or more third packet metadata sets for one or more second outgoing packets in a second port selection buffer associated with the second port; in a selection clock cycle, when a second subset of one or more third packet metadata sets for a second subset of one or more second outgoing packets is selected from the second port selection buffer by a port scheduler of the network node, simultaneously executing for the second plurality of packet queues by a second queue scheduler of the second port: selecting one or more fourth packet metadata sets for one or more second incoming packets in the second plurality of incoming packets from the second plurality of packet metadata sets stored in the second plurality of queue selection buffers; and adding one or more fourth packet metadata sets for one or more fourth outgoing packets to the second port selection buffer of the second port.

在实施例中,端口调度器执行工作节省的时分复用选择方法。In an embodiment, the port scheduler implements a work-saving time-division multiplexing selection method.

在实施例中,队列调度器执行以下中的一项:基于严格优先级的选择方法、加权动态轮询选择方法、加权轮询选择方法、基于阈值的选择方法、两个或更多个不同的选择方法的组合等。In an embodiment, the queue scheduler performs one of the following: a strict priority-based selection method, a weighted dynamic polling selection method, a weighted polling selection method, a threshold-based selection method, a combination of two or more different selection methods, and the like.

在实施例中,诸如交换机、路由器、机箱中的线卡、网络设备等的计算设备被配置为执行任意前述方法。在实施例中,一种装置包括处理器,并且被配置为执行任意前述方法。在实施例中,一种非暂态计算机可读存储介质,存储软件指令,当通过一个或多个处理器执行时,该软件指令实现任意前述方法的执行。In an embodiment, a computing device such as a switch, a router, a line card in a chassis, a network device, etc. is configured to perform any of the foregoing methods. In an embodiment, an apparatus includes a processor and is configured to perform any of the foregoing methods. In an embodiment, a non-transitory computer-readable storage medium stores software instructions that, when executed by one or more processors, implement the execution of any of the foregoing methods.

在实施例中,一种计算设备包括一个或多个处理器以及存储一组指令的一个或多个存储介质,当由一个或多个处理器执行时,该指令实现任意前述方法的执行。In an embodiment, a computing device includes one or more processors and one or more storage media storing a set of instructions that, when executed by the one or more processors, enable the performance of any of the foregoing methods.

注意的是,虽然本文中讨论了单独的实施例,但是本文中讨论的实施例和/或部分实施例的任意组合可以被组合为形成另外的实施例。5.0.实现机制——硬件概述Note that although separate embodiments are discussed herein, any combination of the embodiments and/or portions of the embodiments discussed herein may be combined to form additional embodiments. 5.0. Implementation Mechanism - Hardware Overview

根据实施例,本文中描述的技术由一个或多个专用计算设备实现。专用计算设备可以是台式计算机系统、便携式计算机系统、手持设备、网络设备或并入硬连线和/或程序逻辑以实现这些技术的任何其他设备。专用计算设备可以是硬连线的以执行这些技术,或者可以包括被持久地编程以执行这些技术的数字电子设备,诸如一个或多个专用集成电路(ASIC)或现场可编程门阵列(FPGA)。这样的专用计算设备还可以将定制硬连线逻辑、ASIC、FPGA或其他电路装置与定制编程相结合,以完成这些技术。According to an embodiment, the technology described herein is implemented by one or more special-purpose computing devices. The special-purpose computing device can be a desktop computer system, a portable computer system, a handheld device, a network device, or any other device incorporating hard wiring and/or program logic to implement these technologies. The special-purpose computing device can be hard-wired to perform these technologies, or can include a digital electronic device that is permanently programmed to perform these technologies, such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). Such a special-purpose computing device can also combine custom hard-wired logic, ASICs, FPGAs, or other circuit devices with custom programming to complete these technologies.

尽管相对于硬件实现描述了某些前述技术,其在某些实施例中提供了许多优点,但是还应当认识到的是,在其他实施例中,当部分或全部在软件中执行时,前述技术仍然可以提供某些优点。因此,在这样的实施例中,合适的实现装置包括通用硬件处理器,并且被配置为通过执行固件、存储器、其他存储设备或其组合中的程序指令来执行任意前述方法。Although some of the foregoing techniques are described with respect to hardware implementations, which provide many advantages in certain embodiments, it should also be appreciated that in other embodiments, the foregoing techniques may still provide certain advantages when implemented partially or entirely in software. Therefore, in such embodiments, suitable implementation means include a general-purpose hardware processor and are configured to perform any of the foregoing methods by executing program instructions in firmware, memory, other storage devices, or a combination thereof.

图6是图示根据实施例的可用于实现上述技术的示例计算机系统600的框图。计算机系统600可以是例如台式计算设备、膝上型计算设备、平板电脑、智能电话、服务器设备、计算主机、多媒体设备、手持设备、网络装置或任意其他合适的设备。在实施例中,图6构成了前面部分中描述的设备和系统的不同视图。FIG6 is a block diagram illustrating an example computer system 600 that can be used to implement the above-described techniques according to an embodiment. The computer system 600 can be, for example, a desktop computing device, a laptop computing device, a tablet computer, a smart phone, a server device, a computing host, a multimedia device, a handheld device, a network device, or any other suitable device. In an embodiment, FIG6 constitutes a different view of the devices and systems described in the previous section.

计算机系统600可以包括一个或多个ASIC、FPGA、或用于实现本文中描述的程序逻辑的其他专用电路装置603。例如,电路装置603可以包括用于实现所描述的技术中的一些或全部的固定和/或可配置的硬件逻辑块、输入/输出(I/O)块、硬件寄存器或用于存储各种数据的诸如随机存取存储器(RAM)的其他嵌入式存储器资源等。逻辑块可以包括例如逻辑门、触发器、多路复用器等的布置,其被配置为基于对输入信号执行的逻辑操作来生成输出信号。The computer system 600 may include one or more ASICs, FPGAs, or other specialized circuit devices 603 for implementing the program logic described herein. For example, the circuit device 603 may include fixed and/or configurable hardware logic blocks, input/output (I/O) blocks, hardware registers or other embedded memory resources such as random access memory (RAM) for storing various data, etc., for implementing some or all of the described techniques. The logic blocks may include, for example, an arrangement of logic gates, flip-flops, multiplexers, etc., which are configured to generate output signals based on logic operations performed on input signals.

附加地和/或代替地,计算机系统600可以包括被配置为执行基于软件的指令的一个或多个硬件处理器604。计算机系统600还可以包括一个或多个总线602或用于传递信息的其他通信机制。总线602可以包括各种内部和/或外部组件,包括但不限于内部处理器或存储器总线、串行ATA总线、PCI快速(PCI Express)总线、通用串行总线、超传输(HyperTransport)总线、无限带宽(Infiniband)总线和/或任何其他合适的有线或无线通信信道。Additionally and/or alternatively, the computer system 600 may include one or more hardware processors 604 configured to execute software-based instructions. The computer system 600 may also include one or more buses 602 or other communication mechanisms for transferring information. The bus 602 may include various internal and/or external components, including but not limited to an internal processor or memory bus, a serial ATA bus, a PCI Express bus, a universal serial bus, a HyperTransport bus, an Infiniband bus, and/or any other suitable wired or wireless communication channel.

计算机系统600还包括诸如RAM、硬件寄存器或其他动态或易失性存储设备的一个或多个存储器606,以用于存储将要由一个或多个ASIC、FPGA或其他专用电路装置603处理的数据单元。存储器606还可以或备选地用于存储将要由处理器604执行的信息和指令。存储器606可以直接连接或嵌入在电路装置603或处理器604内。或者,存储器606可以耦合到总线602并经由总线602访问。存储器606还可以用于存储临时变量、描述规则或策略的数据单元、或在程序逻辑或指令的执行期间的其他中间信息。The computer system 600 also includes one or more memories 606, such as RAM, hardware registers, or other dynamic or volatile storage devices, for storing data units to be processed by one or more ASICs, FPGAs, or other dedicated circuit devices 603. The memory 606 may also or alternatively be used to store information and instructions to be executed by the processor 604. The memory 606 may be directly connected to or embedded in the circuit device 603 or the processor 604. Alternatively, the memory 606 may be coupled to the bus 602 and accessed via the bus 602. The memory 606 may also be used to store temporary variables, data units describing rules or policies, or other intermediate information during the execution of program logic or instructions.

计算机系统600还包括耦合到总线602的一个或多个只读存储器(ROM)608或其他静态存储设备,以用于存储处理器604的静态信息和指令。诸如固态驱动器(SSD)、磁盘、光盘或其他合适的非易失性存储设备的一个或多个存储设备610可以可选地被提供并耦合到总线602,以用于存储信息和指令。The computer system 600 also includes one or more read only memories (ROM) 608 or other static storage devices coupled to the bus 602 for storing static information and instructions for the processor 604. One or more storage devices 610, such as solid state drives (SSDs), magnetic disks, optical disks, or other suitable non-volatile storage devices, may optionally be provided and coupled to the bus 602 for storing information and instructions.

在实施例中,计算机系统600还可以包括耦合到总线602的一个或多个通信接口618。通信接口618向连接到本地网络622的网络链路620提供通常是双向的数据通信耦合。例如,通信接口618可以是综合业务数字网络(ISDN)卡、电缆调制解调器、卫星调制解调器、或向相应类型的电话线提供数据通信连接的调制解调器。作为另一示例,一个或多个通信接口618可以包括局域网(LAN)卡,以提供到兼容LAN的数据通信连接。作为又一示例,一个或多个通信接口618可以包括无线网络接口控制器,诸如基于602.11的控制器、蓝牙控制器、长期演进(LTE)调制解调器和/或其他类型的无线接口。在任何这样的实现方式中,通信接口618发送和接收携带表示各种类型信息的数字数据流的电、电磁或光信号。In an embodiment, the computer system 600 may also include one or more communication interfaces 618 coupled to the bus 602. The communication interface 618 provides a data communication coupling that is usually bidirectional to a network link 620 connected to a local network 622. For example, the communication interface 618 may be an integrated services digital network (ISDN) card, a cable modem, a satellite modem, or a modem that provides a data communication connection to a telephone line of a corresponding type. As another example, the one or more communication interfaces 618 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. As yet another example, the one or more communication interfaces 618 may include a wireless network interface controller, such as a 602.11-based controller, a Bluetooth controller, a long-term evolution (LTE) modem, and/or other types of wireless interfaces. In any such implementation, the communication interface 618 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information.

网络链路620通常通过一个或多个网络向其他数据设备提供数据通信。例如,网络链路620可以通过本地网络622向主计算机624或向由服务提供商626操作的数据装置提供连接。可以例如是互联网服务提供商(ISP)的服务提供商626依次通过诸如现在通常称为“互联网”628的全球分组数据通信网络的广域网提供数据通信服务。本地网络622和互联网628都使用携带数字数据流的电、电磁或光信号。通过各种网络的信号以及网络链路620上和通过通信接口618的信号是传送介质的示例形式,其携带数字数据到计算机系统600和从计算机系统600携带数字数据。The network link 620 typically provides data communication to other data devices through one or more networks. For example, the network link 620 may provide a connection to a host computer 624 or to a data device operated by a service provider 626 through a local network 622. The service provider 626, which may be, for example, an Internet Service Provider (ISP), in turn provides data communication services through a wide area network such as the global packet data communication network now commonly referred to as the "Internet" 628. Both the local network 622 and the Internet 628 use electrical, electromagnetic, or optical signals that carry digital data streams. The signals through the various networks and the signals on the network link 620 and through the communication interface 618 are example forms of transmission media that carry digital data to and from the computer system 600.

在实施例中,计算机系统600可以通过(一个或多个)网络、网络链路620和通信接口618发送和接收数据单元。在一些实施例中,该数据可以是计算机系统600已被询问的数据单元,并且如果需要的话,经由合适的网络链路620重新定向到其他计算机系统。在其他实施例中,该数据可以是用于实现与所描述的技术相关的各种过程的指令。例如,在互联网示例中,服务器630可以通过互联网628、ISP 626、本地网络622和通信接口618传输应用程序所请求的代码。接收的代码可以在被接收时由处理器604执行和/或存储在存储设备610或其他非易失性存储器中以供以后执行。作为另一示例,经由网络链路620接收到的信息可以由计算机系统600的诸如网络浏览器、应用程序或服务器的软件组件解释和/或处理,其进而基于该信息向处理器604发出指令,可能经由操作系统和/或软件组件的其他中间层。In an embodiment, the computer system 600 may send and receive data units via (one or more) networks, network links 620, and communication interfaces 618. In some embodiments, the data may be data units that the computer system 600 has interrogated and, if necessary, redirected to other computer systems via appropriate network links 620. In other embodiments, the data may be instructions for implementing various processes associated with the described technology. For example, in the Internet example, the server 630 may transmit code requested by an application via the Internet 628, ISP 626, local network 622, and communication interface 618. The received code may be executed by the processor 604 when received and/or stored in the storage device 610 or other non-volatile memory for later execution. As another example, information received via the network link 620 may be interpreted and/or processed by a software component of the computer system 600, such as a web browser, application, or server, which in turn issues instructions to the processor 604 based on the information, possibly via an operating system and/or other intermediate layers of software components.

计算机系统600可以可选地经由总线602耦合到一个或多个显示器612,以用于向计算机用户呈现信息。例如,计算机系统600可以经由高清晰度多媒体接口(HDMI)电缆或其他合适的电缆连接到液晶显示器(LCD)监视器,和/或经由无线连接(诸如点对点Wi-Fi直连,peer-to-peer Wi-Fi Direct connection)连接到发光二极管(LED)电视。合适类型的显示器612的其他示例可以包括但不限于等离子体显示设备、投影仪、阴极射线管(CRT)监视器、电子纸、虚拟现实耳机、盲文终端和/或用于将信息输出到计算机用户的任何其他合适的设备。在实施例中,可以使用诸如以音频扬声器或打印机为例的任何合适类型的输出设备来代替显示器612。The computer system 600 may optionally be coupled to one or more displays 612 via bus 602 for presenting information to a computer user. For example, the computer system 600 may be connected to a liquid crystal display (LCD) monitor via a high-definition multimedia interface (HDMI) cable or other suitable cable, and/or to a light-emitting diode (LED) television via a wireless connection (such as a peer-to-peer Wi-Fi Direct connection). Other examples of suitable types of displays 612 may include, but are not limited to, plasma display devices, projectors, cathode ray tube (CRT) monitors, electronic paper, virtual reality headsets, Braille terminals, and/or any other suitable device for outputting information to a computer user. In an embodiment, any suitable type of output device, such as an audio speaker or a printer, may be used in place of the display 612.

一个或多个输入设备614可选地耦合到总线602,以用于将信息和命令选择传送到处理器604。输入设备614的一个示例是键盘,包括字母数字键和其他键。另一类型的用户输入设备614是诸如鼠标、轨迹球或光标方向键的光标控制器616,以用于将方向信息和命令选择递送到处理器604并且用于控制显示器612上的光标移动。输入设备通常具有两个轴上的两个自由度,第一轴(例如,x)和第二轴(例如,y),其允许设备指定平面中的位置。合适的输入设备614的又一示例包括固定到显示器612的触摸屏面板、相机、麦克风、加速计、运动检测器和/或其他传感器。在实施例中,可以利用基于网络的输入设备614。在这样的实施例中,用户输入和/或其他信息或命令可以经由局域网(LAN)或其他合适的共享网络上的路由器和/或交换机或者经由点对点网络,从输入设备614中继到计算机系统600上的网络链路620。One or more input devices 614 are optionally coupled to the bus 602 for transmitting information and command selections to the processor 604. An example of an input device 614 is a keyboard, including alphanumeric keys and other keys. Another type of user input device 614 is a cursor controller 616 such as a mouse, trackball, or cursor direction keys, for delivering direction information and command selections to the processor 604 and for controlling cursor movement on the display 612. The input device typically has two degrees of freedom on two axes, a first axis (e.g., x) and a second axis (e.g., y), which allows the device to specify a position in a plane. Another example of a suitable input device 614 includes a touch screen panel, a camera, a microphone, an accelerometer, a motion detector, and/or other sensors fixed to the display 612. In an embodiment, a network-based input device 614 can be utilized. In such an embodiment, user input and/or other information or commands can be relayed from the input device 614 to a network link 620 on the computer system 600 via a router and/or switch on a local area network (LAN) or other suitable shared network or via a point-to-point network.

如所讨论的,计算机系统600可以使用定制的硬连线逻辑、一个或多个ASIC或FPGA603、固件和/或程序逻辑来实现本文中描述的技术,该定制的硬连线逻辑、一个或多个ASIC或FPGA 603、固件和/或程序逻辑与计算机系统相结合使计算机系统600成为或将计算机系统600编程为专用机器。然而,根据一个实施例,本文中的技术由计算机系统600响应于处理器604执行主存储器606中包含的一个或多个指令的一个或多个序列来执行。这样的指令可以从诸如存储设备610的另一存储介质读入到主存储器606中。主存储器606中包含的指令序列的执行使处理器604执行本文中描述的处理步骤。As discussed, the computer system 600 may implement the techniques described herein using custom hardwired logic, one or more ASICs or FPGAs 603, firmware, and/or program logic that, in combination with the computer system, enables or programs the computer system 600 to be a special purpose machine. However, according to one embodiment, the techniques herein are performed by the computer system 600 in response to the processor 604 executing one or more sequences of one or more instructions contained in the main memory 606. Such instructions may be read into the main memory 606 from another storage medium, such as the storage device 610. Execution of the sequences of instructions contained in the main memory 606 causes the processor 604 to perform the process steps described herein.

本文中使用的术语“存储介质”指的是存储使机器以特定方式操作的数据和/或指令的任何非暂态介质。这样的存储介质可以包括非易失性介质和/或易失性介质。非易失性介质包括例如光盘或磁盘,诸如存储设备610。易失性介质包括动态存储器,诸如主存储器606。存储介质的常见形式包括例如软磁盘、软盘、硬盘、固态驱动器、磁带或任何其他磁性数据存储介质、CD-ROM、任何其他光学数据存储介质、任何具有孔图案的物理介质、RAM、PROM和EPROM、FLASH-EPROM、NVRAM、任何其他存储芯片或盒。The term "storage medium" as used herein refers to any non-transitory medium that stores data and/or instructions that cause a machine to operate in a particular manner. Such storage media may include non-volatile media and/or volatile media. Non-volatile media include, for example, optical or magnetic disks, such as storage device 610. Volatile media include dynamic memory, such as main memory 606. Common forms of storage media include, for example, floppy disks, diskettes, hard disks, solid-state drives, magnetic tapes or any other magnetic data storage medium, CD-ROMs, any other optical data storage medium, any physical medium with a pattern of holes, RAM, PROMs and EPROMs, FLASH-EPROMs, NVRAMs, any other memory chips or boxes.

存储介质不同于传输介质,但可以与传输介质结合使用。传输介质参与存储介质之间的信息传送。例如,传输介质包括同轴电缆、铜线和光纤、包括构成总线602的线。传输介质还可以采用声波或光波的形式,诸如在无线电波和红外数据通信期间产生的那些声波或光波的形式。Storage media are distinct from transmission media, but can be used in conjunction with transmission media. Transmission media participate in the transfer of information between storage media. For example, transmission media include coaxial cables, copper wires, and optical fibers, including the wires that make up bus 602. Transmission media can also take the form of sound or light waves, such as those generated during radio wave and infrared data communications.

各种形式的介质可以涉及将一个或多个指令的一个或多个序列递送到处理器604以供执行。例如,指令最初可以承载在远程计算机的磁盘或固态驱动器上。远程计算机可以将指令加载到其动态存储器中,并使用调制解调器通过诸如有线网络或蜂窝网络的网络发送指令作为调制信号。计算机系统600本地的调制解调器可以接收网络上的数据并且解调信号以解码所发送的指令。然后适当的电路装置可以将数据放置在总线602上。总线602将数据传送到主存储器606,处理器604从主存储器606检索并执行指令。由主存储器606接收的指令可以可选地在由处理器604执行之前或之后存储在存储设备610上。Various forms of media may be involved in delivering one or more sequences of one or more instructions to the processor 604 for execution. For example, the instructions may initially be carried on a disk or solid-state drive of a remote computer. The remote computer may load the instructions into its dynamic memory and use a modem to send the instructions as a modulated signal over a network such as a wired network or a cellular network. The modem local to the computer system 600 may receive the data on the network and demodulate the signal to decode the instructions sent. Appropriate circuitry may then place the data on the bus 602. The bus 602 transfers the data to the main memory 606, from which the processor 604 retrieves and executes the instructions. The instructions received by the main memory 606 may optionally be stored on the storage device 610 before or after execution by the processor 604.

6.0.扩展和备选6.0. Extensions and alternatives

如本文中所使用的,术语“第一”、“第二”、“某些”和“特定”用作命名约定,以将查询、计划、表示、步骤、对象、设备或其他项彼此区分开,以便这些项在它们被介绍后可以被引用。除非本文中另有规定,否则这些术语的使用并不暗示所引用项的顺序、时间安排或任何其他特征。As used herein, the terms "first," "second," "some," and "particular" are used as naming conventions to distinguish queries, plans, representations, steps, objects, devices, or other items from one another so that these items can be referenced after they are introduced. Unless otherwise specified herein, the use of these terms does not imply an order, timing, or any other characteristic of the referenced items.

在附图中,各种组件被描绘为通过箭头通信地耦合到各种其他组件。这些箭头仅说明组件之间信息流的某些示例。某些组件之间的箭头方向或箭头线的缺失均不应被解释为指示某些组件本身之间存在或不存在通信。实际上,每个组件可以具有适当的通信接口,通过该通信接口,组件可以根据需要通信地耦合到其他组件,以完成本文中描述的任何功能。In the accompanying drawings, various components are depicted as being communicatively coupled to various other components via arrows. These arrows merely illustrate certain examples of information flow between components. The direction of arrows or the absence of arrow lines between certain components should not be interpreted as indicating the presence or absence of communication between certain components themselves. In practice, each component may have an appropriate communication interface through which the component may be communicatively coupled to other components as needed to perform any of the functions described herein.

在前面的说明书中,已经参考许多具体细节描述了本发明主题的实施例,这些具体细节可以随实现方式的不同而变化。因此,发明主题的唯一且排他性的指示符,以及申请人意图作为发明主题的指示符,是本申请所发布的权利要求的集合,以这种权利要求所发布的特定形式包括任何后续更正在内的权利要求的集合。对此,虽然在本申请的权利要求中阐述了具体的权利要求从属关系,但是应当注意的是,本申请的从属权利要求的特征可以适当地与其他从属权利要求的特征以及与本申请的独立权利要求组合,而不仅仅是根据权利要求组中记载的特定从属关系。此外,虽然本文中讨论了单独的实施例,但是本文中讨论的实施例和/或部分实施例的任何组合可以组合以形成另外的实施例。In the foregoing specification, embodiments of the subject matter of the invention have been described with reference to many specific details, which may vary from implementation to implementation. Therefore, the sole and exclusive indicator of the subject matter of the invention, and the indicator of what the applicant intends to be the subject matter of the invention, is the set of claims issued by the present application, including the set of claims in the specific form in which such claims are issued, including any subsequent corrections. In this regard, although specific claim dependencies are set forth in the claims of the present application, it should be noted that features of the dependent claims of the present application may be appropriately combined with features of other dependent claims and with the independent claims of the present application, and not merely in terms of specific dependencies recorded in the claim group. In addition, although separate embodiments are discussed herein, any combination of embodiments and/or portions of embodiments discussed herein may be combined to form additional embodiments.

本文中针对这样的权利要求中包含的术语明确提出的任何定义应影响权利要求中所使用的这样术语的含义。因此,权利要求中未明确叙述的任何限制、元素、性质、特征、优点或属性不应以任何方式限制该权利要求的范围。因此,说明书和附图应被视为说明性的而非限制性的。Any definitions expressly set forth herein for terms contained in such claims shall affect the meaning of such terms as used in the claims. Accordingly, any limitation, element, property, feature, advantage or attribute not expressly recited in a claim shall not limit the scope of such claim in any way. The specification and drawings are, therefore, to be regarded as illustrative rather than restrictive.

Claims (24)

1.一种方法,包括:1. A method comprising: 将用于多个传入分组的多个分组元数据集合缓冲在与网络节点的端口相关联的多个队列选择缓冲器中;buffering a plurality of sets of packet metadata for a plurality of incoming packets in a plurality of queue selection buffers associated with ports of a network node; 将用于一个或多个传出分组的一个或多个分组元数据集合缓冲在与所述端口相关联的端口选择缓冲器中;buffering one or more sets of packet metadata for one or more outgoing packets in a port selection buffer associated with the port; 在选择时钟周期,在由所述网络节点的端口调度器从所述端口选择缓冲器选择用于所述一个或多个传出分组的子集的所述一个或多个分组元数据集合的子集时,由所述端口的队列调度器对针对所述端口设置的多个分组队列同时执行:At a selected clock cycle, when a subset of the one or more sets of packet metadata for a subset of the one or more outgoing packets is selected from the port selection buffer by a port scheduler of the network node, a queue scheduler of the port simultaneously performs, for a plurality of packet queues set for the port: 从存储在所述多个队列选择缓冲器中的所述多个分组元数据集合之中选择用于多个所述传入分组中的一个或多个传入分组的一个或多个第二分组元数据集合;selecting one or more second sets of packet metadata for one or more incoming packets of the plurality of incoming packets from among the plurality of sets of packet metadata stored in the plurality of queue selection buffers; 将用于一个或多个第二传出分组的所述一个或多个第二分组元数据集合添加到所述端口的所述端口选择缓冲器。The one or more sets of second packet metadata for one or more second outgoing packets are added to the port selection buffer of the port. 2.根据权利要求1所述的方法,其中所述多个队列选择缓冲器中的每个队列选择缓冲器与针对所述端口设置的所述多个分组队列中的相应的分组队列相关联。2 . The method according to claim 1 , wherein each queue selection buffer of the plurality of queue selection buffers is associated with a corresponding packet queue of the plurality of packet queues set for the port. 3.根据权利要求1所述的方法,其中所述一个或多个分组元数据集合中的每个分组元数据集合用于所述一个或多个传出分组中的相应的传出分组。3 . The method of claim 1 , wherein each of the one or more sets of packet metadata is for a corresponding outgoing packet of the one or more outgoing packets. 4.根据权利要求1所述的方法,其中所述多个队列选择缓冲器中的每个队列选择缓冲器包括相应的每队列指示符集合;其中所述相应的每队列指示符集合包括以下中的一项或多项:队列空状态指示符、拾取时队列空指示符或拾取时分组结束指示符。4. The method of claim 1, wherein each of the plurality of queue selection buffers comprises a corresponding set of per-queue indicators; wherein the corresponding set of per-queue indicators comprises one or more of: a queue empty status indicator, a queue empty indicator at pick-up, or a packet end indicator at pick-up. 5.根据权利要求1所述的方法,其中所述多个分组元数据集合中的每个分组元数据集合用于所述多个传入分组中的相应的传入分组;其中所述相应的传入分组的所述分组元数据集合包括以下中的一项或多项:分组复制计数或分组传送计数。5. The method according to claim 1, wherein each of the multiple packet metadata sets is used for a corresponding incoming packet among the multiple incoming packets; wherein the packet metadata set of the corresponding incoming packet includes one or more of the following: a packet copy count or a packet transmission count. 6.根据权利要求1所述的方法,其中所述端口属于所述网络节点的多个端口;所述方法还包括:6. The method according to claim 1, wherein the port belongs to a plurality of ports of the network node; the method further comprising: 将用于第二多个传入分组的第二多个分组元数据集合缓冲在与所述网络节点的所述多个端口中的第二端口相关联的第二多个队列选择缓冲器中;buffering a second plurality of sets of packet metadata for a second plurality of incoming packets in a second plurality of queue selection buffers associated with a second port of the plurality of ports of the network node; 将用于一个或多个第二传出分组的一个或多个第三分组元数据集合缓冲在与所述第二端口相关联的第二端口选择缓冲器中;buffering one or more third sets of packet metadata for one or more second outgoing packets in a second port selection buffer associated with the second port; 在所述选择时钟周期,在由所述网络节点的所述端口调度器从所述第二端口选择缓冲器选择用于所述一个或多个第二传出分组的第二子集的所述一个或多个第三分组元数据集合的第二子集时,由所述第二端口的第二队列调度器针对第二多个分组队列同时执行:In the selection clock cycle, when the port scheduler of the network node selects a second subset of the one or more third sets of packet metadata for a second subset of the one or more second outgoing packets from the second port selection buffer, the second queue scheduler of the second port simultaneously performs, for a second plurality of packet queues: 从存储在所述第二多个队列选择缓冲器中的所述第二多个分组元数据集合之中选择用于所述第二多个传入分组中的一个或多个第二传入分组的一个或多个第四分组元数据集合;selecting one or more fourth sets of packet metadata for one or more second incoming packets of the second plurality of incoming packets from among the second plurality of sets of packet metadata stored in the second plurality of queue selection buffers; 将用于一个或多个第四传出分组的所述一个或多个第四分组元数据集合添加到所述第二端口的所述第二端口选择缓冲器。The one or more sets of fourth packet metadata for one or more fourth outgoing packets are added to the second port selection buffer of the second port. 7.根据权利要求1所述的方法,其中所述端口调度器执行工作节省的时分复用选择方法。7. The method of claim 1, wherein the port scheduler implements a work-saving time-division multiplexing selection method. 8.根据权利要求1所述的方法,其中所述队列调度器执行以下中的一项:基于严格优先级的选择方法、加权动态轮询选择方法、加权轮询选择方法、基于阈值的选择方法、或两种或更多种不同选择方法的组合。8. The method of claim 1, wherein the queue scheduler performs one of the following: a strict priority-based selection method, a weighted dynamic polling selection method, a weighted polling selection method, a threshold-based selection method, or a combination of two or more different selection methods. 9.一种系统,包括:9. A system comprising: 一个或多个计算设备;one or more computing devices; 一个或多个非暂态计算机可读介质,存储有指令,所述指令在由所述一个或多个计算设备执行时引起以下的执行:One or more non-transitory computer-readable media storing instructions that, when executed by the one or more computing devices, cause the execution of: 将用于多个传入分组的多个分组元数据集合缓冲在与网络节点的端口相关联的多个队列选择缓冲器中;buffering a plurality of sets of packet metadata for a plurality of incoming packets in a plurality of queue selection buffers associated with ports of a network node; 将用于一个或多个传出分组的一个或多个分组元数据集合缓冲在与所述端口相关联的端口选择缓冲器中;buffering one or more sets of packet metadata for one or more outgoing packets in a port selection buffer associated with the port; 在选择时钟周期,在由所述网络节点的端口调度器从所述端口选择缓冲器选择用于所述一个或多个传出分组的子集的所述一个或多个分组元数据集合的子集时,由所述端口的队列调度器对针对所述端口设置的多个分组队列同时执行:At a selected clock cycle, when a subset of the one or more sets of packet metadata for a subset of the one or more outgoing packets is selected from the port selection buffer by a port scheduler of the network node, a queue scheduler of the port simultaneously performs, for a plurality of packet queues set for the port: 从存储在所述多个队列选择缓冲器中的所述多个分组元数据集合之中选择用于所述多个传入分组中的一个或多个传入分组的一个或多个第二分组元数据集合;selecting one or more second sets of packet metadata for one or more incoming packets of the plurality of incoming packets from among the plurality of sets of packet metadata stored in the plurality of queue selection buffers; 将用于一个或多个第二传出分组的所述一个或多个第二分组元数据集合添加到所述端口的所述端口选择缓冲器。The one or more sets of second packet metadata for one or more second outgoing packets are added to the port selection buffer of the port. 10.根据权利要求9所述的系统,其中所述多个队列选择缓冲器中的每个队列选择缓冲器与针对所述端口设置的所述多个分组队列中的相应的分组队列相关联。10. The system of claim 9, wherein each queue selection buffer of the plurality of queue selection buffers is associated with a corresponding packet queue of the plurality of packet queues set for the port. 11.根据权利要求9所述的系统,其中所述一个或多个分组元数据集合中的每个分组元数据集合用于所述一个或多个传出分组中的相应的传出分组。11. The system of claim 9, wherein each of the one or more sets of packet metadata is for a corresponding outgoing packet of the one or more outgoing packets. 12.根据权利要求9所述的系统,其中所述多个队列选择缓冲器中的每个队列选择缓冲器包括相应的每队列指示符集合;其中所述相应的每队列指示符集合包括以下中的一项或多项:队列空状态指示符、拾取时队列空指示符或拾取时分组结束指示符。12. A system according to claim 9, wherein each queue selection buffer of the multiple queue selection buffers includes a corresponding per-queue indicator set; wherein the corresponding per-queue indicator set includes one or more of the following: a queue empty status indicator, a queue empty indicator at pick-up, or a packet end indicator at pick-up. 13.根据权利要求9所述的系统,其中所述多个分组元数据集合中的每个分组元数据集合用于所述多个传入分组中的相应的传入分组;其中用于所述相应的传入分组的所述分组元数据集合包括以下中的一项或多项:分组复制计数或分组传送计数。13. A system according to claim 9, wherein each of the multiple packet metadata sets is used for a corresponding incoming packet among the multiple incoming packets; wherein the packet metadata set for the corresponding incoming packet includes one or more of the following: a packet copy count or a packet transmission count. 14.根据权利要求9所述的系统,其中所述端口属于所述网络节点的多个端口;其中所述指令在由所述一个或多个计算设备执行时还引起以下的执行:14. The system of claim 9, wherein the port belongs to a plurality of ports of the network node; wherein the instructions, when executed by the one or more computing devices, further cause execution of: 将用于第二多个传入分组的第二多个分组元数据集合缓冲在与所述网络节点的所述多个端口中的第二端口相关联的第二多个队列选择缓冲器中;buffering a second plurality of sets of packet metadata for a second plurality of incoming packets in a second plurality of queue selection buffers associated with a second port of the plurality of ports of the network node; 将一个或多个第二传出分组的一个或多个第三分组元数据集合缓冲在与所述第二端口相关联的第二端口选择缓冲器中;buffering one or more third sets of packet metadata for one or more second outgoing packets in a second port selection buffer associated with the second port; 在所述选择时钟周期,在由所述网络节点的所述端口调度器从所述第二端口选择缓冲器选择用于所述一个或多个第二传出分组的第二子集的所述一个或多个第三分组元数据集合的第二子集时,由所述第二端口的第二队列调度器针对第二多个分组队列同时执行:In the selection clock cycle, when the port scheduler of the network node selects a second subset of the one or more third sets of packet metadata for a second subset of the one or more second outgoing packets from the second port selection buffer, the second queue scheduler of the second port simultaneously performs, for a second plurality of packet queues: 从存储在所述第二多个队列选择缓冲器中的所述第二多个分组元数据集合之中选择用于所述第二多个传入分组中的一个或多个第二传入分组的一个或多个第四分组元数据集合;selecting one or more fourth sets of packet metadata for one or more second incoming packets of the second plurality of incoming packets from among the second plurality of sets of packet metadata stored in the second plurality of queue selection buffers; 将用于一个或多个第四传出分组的所述一个或多个第四分组元数据集合添加到所述第二端口的所述第二端口选择缓冲器。The one or more sets of fourth packet metadata for one or more fourth outgoing packets are added to the second port selection buffer of the second port. 15.根据权利要求9所述的系统,其中所述端口调度器执行工作节省的时分复用选择方法。15. The system of claim 9, wherein the port scheduler implements a work-saving time-division multiplexing selection method. 16.根据权利要求9所述的系统,其中所述队列调度器执行以下中的一项:基于严格优先级的选择方法、加权动态轮询选择方法、加权轮询选择方法、基于阈值的选择方法、或两种或更多种不同选择方法的组合。16. The system of claim 9, wherein the queue scheduler performs one of the following: a strict priority-based selection method, a weighted dynamic polling selection method, a weighted polling selection method, a threshold-based selection method, or a combination of two or more different selection methods. 17.一个或多个非暂态计算机可读介质,存储有指令,所述指令在由一个或多个计算设备执行时引起以下的执行:17. One or more non-transitory computer-readable media storing instructions that, when executed by one or more computing devices, cause the execution of: 将用于多个传入分组的多个分组元数据集合缓冲在与网络节点的端口相关联的多个队列选择缓冲器中;buffering a plurality of sets of packet metadata for a plurality of incoming packets in a plurality of queue selection buffers associated with ports of a network node; 将用于一个或多个传出分组的一个或多个分组元数据集合缓冲在与所述端口相关联的端口选择缓冲器中;buffering one or more sets of packet metadata for one or more outgoing packets in a port selection buffer associated with the port; 在选择时钟周期,在由所述网络节点的端口调度器从所述端口选择缓冲器选择用于所述一个或多个传出分组的子集的所述一个或多个分组元数据集合的子集时,由所述端口的队列调度器对针对所述端口设置的多个分组队列同时执行:At a selected clock cycle, when a subset of the one or more sets of packet metadata for a subset of the one or more outgoing packets is selected from the port selection buffer by a port scheduler of the network node, a queue scheduler of the port simultaneously performs, for a plurality of packet queues set for the port: 从存储在所述多个队列选择缓冲器中的所述多个分组元数据集合之中选择用于所述多个传入分组中的一个或多个传入分组的一个或多个第二分组元数据集合;selecting one or more second sets of packet metadata for one or more incoming packets of the plurality of incoming packets from among the plurality of sets of packet metadata stored in the plurality of queue selection buffers; 将用于一个或多个第二传出分组的所述一个或多个第二分组元数据集合添加到所述端口的所述端口选择缓冲器。The one or more sets of second packet metadata for one or more second outgoing packets are added to the port selection buffer of the port. 18.根据权利要求17所述的一个或多个非暂态计算机可读介质,其中所述多个队列选择缓冲器中的每个队列选择缓冲器与针对所述端口设置的所述多个分组队列中的相应的分组队列相关联。18. The one or more non-transitory computer-readable media of claim 17, wherein each queue selection buffer of the plurality of queue selection buffers is associated with a corresponding packet queue of the plurality of packet queues set for the port. 19.根据权利要求17所述的一个或多个非暂态计算机可读介质,其中所述一个或多个分组元数据集合中的每个分组元数据集合用于所述一个或多个传出分组中的相应的传出分组。19. The one or more non-transitory computer-readable media of claim 17, wherein each of the one or more sets of packet metadata is for a corresponding outgoing packet of the one or more outgoing packets. 20.根据权利要求17所述的一个或多个非暂态计算机可读介质,其中所述多个队列选择缓冲器中的每个队列选择缓冲器包括相应的每队列指示符集合;其中所述相应的每队列指示符集合包括以下中的一项或多项:队列空状态指示符、拾取时队列空指示符或拾取时分组结束指示符。20. One or more non-transitory computer-readable media according to claim 17, wherein each queue selection buffer of the plurality of queue selection buffers comprises a corresponding set of per-queue indicators; wherein the corresponding set of per-queue indicators comprises one or more of: a queue empty status indicator, a queue empty indicator at pick-up, or a packet end indicator at pick-up. 21.根据权利要求17所述的一个或多个非暂态计算机可读介质,其中所述多个分组元数据集合中的每个分组元数据集合用于所述多个传入分组中的相应的传入分组;其中用于所述相应的传入分组的所述分组元数据集合包括以下中的一项或多项:分组复制计数或分组传送计数。21. One or more non-transitory computer-readable media according to claim 17, wherein each of the multiple packet metadata sets is used for a corresponding incoming packet among the multiple incoming packets; wherein the packet metadata set for the corresponding incoming packet includes one or more of the following: a packet copy count or a packet transmission count. 22.根据权利要求17所述的一个或多个非暂态计算机可读介质,其中所述端口属于所述网络节点的多个端口;其中所述指令在由所述一个或多个计算设备执行时还引起以下的执行:22. The one or more non-transitory computer-readable media of claim 17, wherein the port belongs to a plurality of ports of the network node; wherein the instructions, when executed by the one or more computing devices, further cause execution of: 将用于第二多个传入分组的第二多个分组元数据集合缓冲在与所述网络节点的所述多个端口中的第二端口相关联的第二多个队列选择缓冲器中;buffering a second plurality of sets of packet metadata for a second plurality of incoming packets in a second plurality of queue selection buffers associated with a second port of the plurality of ports of the network node; 将一个或多个第二传出分组的一个或多个第三分组元数据集合缓冲在与所述第二端口相关联的第二端口选择缓冲器中;buffering one or more third sets of packet metadata for one or more second outgoing packets in a second port selection buffer associated with the second port; 在所述选择时钟周期,在由所述网络节点的所述端口调度器从所述第二端口选择缓冲器选择用于所述一个或多个第二传出分组的第二子集的所述一个或多个第三分组元数据集合的第二子集时,由所述第二端口的第二队列调度器针对第二多个分组队列同时执行:In the selection clock cycle, when the port scheduler of the network node selects a second subset of the one or more third sets of packet metadata for a second subset of the one or more second outgoing packets from the second port selection buffer, the second queue scheduler of the second port simultaneously performs, for a second plurality of packet queues: 从存储在所述第二多个队列选择缓冲器中的所述第二多个分组元数据集合之中选择用于所述第二多个传入分组中的一个或多个第二传入分组的一个或多个第四分组元数据集合;selecting one or more fourth sets of packet metadata for one or more second incoming packets of the second plurality of incoming packets from among the second plurality of sets of packet metadata stored in the second plurality of queue selection buffers; 将用于一个或多个第四传出分组的所述一个或多个第四分组元数据集合添加到所述第二端口的所述第二端口选择缓冲器。The one or more sets of fourth packet metadata for one or more fourth outgoing packets are added to the second port selection buffer of the second port. 23.根据权利要求17所述的一个或多个非暂态计算机可读介质,其中所述端口调度器执行工作节省的时分复用选择方法。23. The one or more non-transitory computer-readable media of claim 17, wherein the port scheduler implements a work-saving time-division multiplexing selection method. 24.根据权利要求17所述的一个或多个非暂态计算机可读介质,其中所述队列调度器执行以下中的一项:基于严格优先级的选择方法、加权动态轮询选择方法、加权轮询选择方法、基于阈值的选择方法、或两种或更多种不同选择方法的组合。24. One or more non-transitory computer-readable media according to claim 17, wherein the queue scheduler performs one of the following: a strict priority-based selection method, a weighted dynamic polling selection method, a weighted polling selection method, a threshold-based selection method, or a combination of two or more different selection methods.
CN202410410327.XA 2023-04-04 2024-04-07 Multi-level scheduler Pending CN118784593A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US63/457,122 2023-04-04
US18/227,117 US20240340250A1 (en) 2023-04-04 2023-07-27 Multi-stage scheduler
US18/227,117 2023-07-27

Publications (1)

Publication Number Publication Date
CN118784593A true CN118784593A (en) 2024-10-15

Family

ID=92990906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410410327.XA Pending CN118784593A (en) 2023-04-04 2024-04-07 Multi-level scheduler

Country Status (1)

Country Link
CN (1) CN118784593A (en)

Similar Documents

Publication Publication Date Title
US12101260B1 (en) Multi-destination traffic handling optimizations in a network device
US12068972B1 (en) Shared traffic manager
US10505851B1 (en) Transmission burst control in a network device
US10263919B1 (en) Buffer assignment balancing in a network device
US11652750B2 (en) Automatic flow management
US11489785B1 (en) Processing packets in an electronic device
US11949601B1 (en) Efficient buffer utilization for network data units
CN102047618A (en) Network processor unit and method for network processor unit
US11895015B1 (en) Optimized path selection for multi-path groups
US10884829B1 (en) Shared buffer memory architecture
US10846225B1 (en) Buffer read optimizations in a network device
US8879578B2 (en) Reducing store and forward delay in distributed systems
US10742558B1 (en) Traffic manager resource sharing
US12341711B1 (en) Spatial dispersion buffer
CN114531488A (en) High-efficiency cache management system facing Ethernet exchanger
US12413535B1 (en) Efficient scheduling using adaptive packing mechanism for network apparatuses
US12184492B1 (en) Foldable ingress buffer for network apparatuses
US10999223B1 (en) Instantaneous garbage collection of network data units
US10581759B1 (en) Sharing packet processing resources
US12289256B1 (en) Distributed link descriptor memory
US20240340250A1 (en) Multi-stage scheduler
US12231342B1 (en) Queue pacing in a network device
CN118784593A (en) Multi-level scheduler
US20250267100A1 (en) Minimized latency ingress arbitration
Benet et al. Providing in-network support to coflow scheduling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication