HK1195959B

HK1195959B - Processor with efficient work queuing

Info

Publication number: HK1195959B
Application number: HK14109347.5A
Authority: HK
Inventors: R.E.凯斯勒; A．马赫施瓦利; R．桑祖恩
Original assignee: 凯爲有限责任公司
Priority date: 2011-10-17
Filing date: 2012-09-19
Publication date: 2018-06-15

Description

Processor with efficient work queuing

Cross Reference to Related Applications

This application is a continuation of U.S. application No. 13/274,767 filed on 17/10/2011.

The entire teachings of one or more of the above-identified applications are incorporated herein by reference.

Background

In most operating systems, the input queues are used in operating system scheduling and allocation of processing resources. The input queue typically includes a set of work to be performed and often: clearing work coming out of the head of the queue and adding any incoming work to the tail of the queue. Depending on the operating system, various techniques may be used to process the work stored in the input queue. For example, a variety of techniques may be used, such as first come first served, round robin scheduling, priority scheduling, custom scheduling, and the like. Regardless of the queuing and scheduling techniques used by the operating system, queuing delays occur when work to be executed is waiting to be executed in the queue.

Disclosure of Invention

According to some embodiments, a network services processor includes a plurality of network services processor elements and a plurality of in-memory linked lists that perform work including a plurality of packet processing operations. Each individual packet processing operation may define a job. In response to a lack of processing resources in the network services processor elements, the in-memory linked-lists store work to be performed by the network services processor element. The task is moved from the in-memory linked-lists back to the network services processor elements in response to availability of processing resources in the network services processor elements.

In some embodiments, the in-memory linked-lists may be formed within a portion of network services processor memory that is independent of the portions describing and processing the work to be performed. The in-memory chained tables may include a dynamic random access memory. The work to be performed may be stored in an in-memory linked list input queue. The network services processor may maintain pointers to available storage locations in the in-memory linked lists. The work to be performed may be stored at an available storage location indicated by a pointer. The network services processor stores the work to be performed at the tail of the input queue of the available storage location. The network services processor may update a second pointer to the tail of the input queue with the pointer.

In some embodiments, in response to the availability of processing resources in the network services processor, the network services processor may retrieve the work to be performed from an available storage location. The network services processor may retrieve the work to be performed from the head of the input queue of the available storage location. The network services processor may release the pointer when the work to be performed is retrieved. The network services processor may update a second pointer to the head of the input queue with a new pointer obtained from the retrieved work.

In some embodiments, the network services processor may maintain pointers to available storage locations within the in-memory linked lists in a free pool allocator. The free pool allocator may be maintained in a dynamic random access memory. In response to a lack of processing resources in the network services processor, the network services processor may obtain a pointer from the free pool allocator to an available storage location within the in-memory linked lists.

In some embodiments, the work to be performed by the network services processor may be packed into a buffer of predetermined size before being stored in an in-memory linked-list. When moved from the in-memory linked-lists back to the network services processor, the network services processor's work to be performed may be decapsulated into individual data packets.

In some embodiments, the network services processor may maintain one work queue entry for each work. The network services processor may maintain a predetermined number of pointers to available storage space in the in-memory linked lists. The predetermined number of pointers may be a subset of a total number of work queue entries maintained by the network services processor.

Drawings

The foregoing will be apparent from the following more particular description of exemplary embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the invention.

FIG. 1 is a block diagram of a network services processor.

Fig. 2A is a diagram of a scheduling/Synchronization and Sequencing (SSO) module.

FIG. 2B is a diagram of the internal architecture of a scheduling/Synchronization and Sequencing (SSO) module.

FIG. 3 illustrates the format requirements for the work queue pointer.

FIG. 4 is a diagrammatic representation of a work queue entry buffer that may be used with the present invention.

Fig. 5 is a high level diagram of one embodiment of the present invention.

Detailed Description

The following is a description of exemplary embodiments of the invention.

Before describing in detail exemplary embodiments of the present invention, an exemplary network security processor in which these embodiments may be implemented is described immediately below to assist the reader in understanding the inventive features of the present invention.

Fig. 1 is a block diagram illustrating a network services processor 100. The network services processor 100 provides high application performance using at least one processor core 120. The elements of network services processor 100 described below are collectively referred to hereinafter as a "network services processor element" or a "processor element".

The network services processor 100 handles the open systems interconnection network L2-L7 layer protocol encapsulated in received packets. As is well known to those skilled in the art, the Open Systems Interconnection (OSI) reference model defines seven layers of network protocol layers (L1-7). The physical layer (L1) represents the actual interface that connects a device to a transmission medium, including electrical and physical interfaces. The data link layer (L2) performs data framing. The network layer (L3) formats the data into packets. The transport layer (L4) handles end-to-end transport. The session layer (L5) manages communication between devices, e.g., whether the communication is half-duplex or full-duplex. The presentation layer (L6) manages data formatting and presentation, such as syntax, control codes, special graphics and character sets. The application layer (L7) allows communication between multiple users, such as file transfers and e-mail.

The network service processor 100 may schedule and arrange work (packet processing operations) for upper network protocols (e.g., L4-L7) and allow processing of the upper network protocols in received packets to be executed in order to forward the packets at wire speed. Line speed is the rate at which data is transmitted over a network that transmits and receives data. By processing these protocols to forward these packets at wire speed, the network services processor does not slow down the network data transfer rate.

A plurality of interface units 122 receive a packet for processing. The PCIe interface 124 may also receive a packet. The interface units 122 perform preprocessing of the received packet by checking various fields in the L2 network protocol header included in the received packet, and then forward the packet to a packet input unit 126. The at least one interface unit 122a may receive data packets from multiple X Attachment Unit Interfaces (XAUIs), Reduced X Attachment Unit Interfaces (RXAUI), or Serial Gigabit Media Independent Interfaces (SGMII). At least one interface unit 122b may receive connections from an instradken Interface (ILK).

The packet input unit 126 performs further pre-processing of network protocol headers (e.g., L3 and L4 headers) included in the received packet. This preprocessing includes checksum checking for TCP/User Datagram Protocol (UDP) (L3 network protocol).

A free pool allocator 128 maintains pools of pointers to free memory in the level 2 cache memory 130 and the external DRAM 108. The packet input unit 126 uses one of the pools of pointers to store received packet data in the level 2 cache memory 130 or external DRAM108 and uses another of the pools of pointers to allocate work queue entries for the processor cores 120.

The packet input unit 126 then writes the packet data into the level-2 cache 130 or a buffer in the external DRAM 108. Preferably, the packet data is written into the buffers in a format that is convenient for higher level software executing in at least one of the processor cores 120. Thus, further processing of the higher level network protocol is facilitated.

Network services processor 100 may also include one or more application specific coprocessors. When included, the coprocessors offload some of the processing from the cores 120, thereby enabling the network services processor to achieve high throughput packet processing. For example, a compression/decompression coprocessor 132 is provided, dedicated to performing compression and decompression of received data packets. Other embodiments of the co-processing unit include a RAID/De-Dup unit 162, which speeds up the data chunking and data copying process for disk storage applications.

Another coprocessor is a Hyper Finite Automata (HFA) unit 160 that includes specialized HFA thread engines adapted to expedite pattern and/or feature matching necessary for anti-virus, intrusion detection systems, and other content processing applications. Using one HFA unit 160, pattern and/or feature matching is accelerated, for example performed at rates exceeding multiples of ten gigabits per second. The HFA unit 160 may, in some embodiments, include any of a Deterministic Finite Automata (DFA), a non-deterministic finite automata (NFA), or an HFA algorithm unit.

One I/O interface 136 manages overall protocol and arbitration and provides consistent I/O partitioning. The I/O interface 136 includes an I/O bridge 138 and a fetch and add unit 140. The I/O bridge includes two bridges, an I/O packet bridge (IOBP)138a and an I/O bus bridge (IOBN)138 b. The I/O packet bridge 138a is configured to manage overall protocol and arbitration and provide I/O partitioning that is primarily consistent with packet input and output. I/O bus bridge 138b is configured to manage overall protocols and arbitration and provide I/O partitioning that is primarily consistent with the I/O bus. Registers in the fetch and add unit 140 are used to maintain the length of output queues used to forward processed packets through a packet output unit 146. The I/O bridge 138 includes buffer queues for storing information to be transferred between a Coherent Memory Interconnect (CMI)144, an I/O bus 142, the packet input unit 126, and the packet output unit 146.

The various I/O interfaces (MIOs) 116 may include a number of auxiliary interfaces such as general purpose I/O (GPIO), flash memory, IEEE802 two-wire management interface (MDIO), Serial Management Interrupt (SMI), universal asynchronous receiver/transmitter (UART), Reduced Gigabit Media Independent Interface (RGMII), Media Independent Interface (MII), two-wire serial interface (TWSI), and others.

The network service provider 100 may also include a joint test action group ("JTAG") interface 123 that supports the MIPS EJTAG standard. According to the JTAG and MIPS EJTAG standards, the cores within the network service provider 100 will each have an internal test access port ("TAP") controller. This allows for multi-core debug support for the network service provider 100.

A schedule/Synchronize and Sequence (SSO) module 148 queues and schedules work for the processor cores 120. Work is queued by adding a work queue entry to a queue. For example, the packet input unit 126 adds a work queue entry for each packet arrival. A timer unit 150 is used to schedule work for the processor cores 120.

Processor core 120 requests work from SSO module 148. SSO module 148 selects (i.e., schedules) work for one of the processor cores 120 and returns a pointer to the work queue entry to describe the work to the processor core 120.

The processor core 120, in turn, includes an instruction cache 152, a level 1 data cache 154, and an encryption accelerator 156. In one embodiment, the network services processor 100 includes 32 superscalar Reduced Instruction Set Computer (RISC) type processor cores 120. In some embodiments, these superscalar RISC-type processor cores 120 each comprise an extension of the MIPS643 version of the processor core. In one embodiment, each of these superscalar RISC-type processor cores 120 includes a cNMPS II processor core.

The level 2 cache memory 130 and external DRAM108 are shared by all processor cores 120 and I/O coprocessor devices. Each processor core 120 is coupled to the level 2 cache memory 130 by the CMI 144. The CMI144 is the communication channel for all memory and I/O transactions between the processor cores 100, the I/O interface 136, and the level 2 cache memory 130 and controllers. In one embodiment, the CMI144 may be extended to 32 processor cores 120, supporting a fully coherent-level-1 data cache 154 with full writes. Preferably, the CMI144 is highly buffered for the ability to prioritize I/O. The CMI is coupled to a trace control unit 164 configured to capture bus requests so that software can then read the requests and generate traces of the sequence of events on the CMI.

Level 2 cache memory controller 131 maintains memory reference coherency. It returns an up-to-date copy of a block for each fill request, whether the block is stored in level 2 cache memory 130, external DRAM108, or "in flight". It also stores a copy of these tags for the data cache 154 in each processor core 120. It compares the address of the cache-block-store request to the data-cache tags and invalidates (both copies) a data-cache tag to a processor core 120 whenever a store instruction comes from another processor core or from an I/O component through the I/O interface 136.

In some embodiments, multiple DRAM controllers 133 support up to 128 gigabytes of DRAM. In one embodiment, the plurality of DRAM controllers includes four DRAM controllers each supporting 32 gigabytes of DRAM. Preferably, each DRAM controller 133 supports a 64-bit interface to DRAM 108. In addition, DRAM controller 133 may support a preferred protocol, such as the DDR-III protocol.

After a packet is processed by the processor cores 120, the packet output unit 146 reads the packet data from the level 2 cache memory 130, 108, performs L4 network protocol post processing (e.g., generates a TCP/UDP checksum), forwards the packet through the interface units 122 or PCIe interface 124, and frees the L2 cache memory 130/DRAM108 used by the packet.

The DRAM controllers 133 manage in-flight transactions (loads/stores) to/from the DRAM 108. In some embodiments, the DRAM controllers 133 comprise four DRAM controllers, the DRAM108 comprises four DRAM memories, and each DRAM controller is connected to one DRAM memory. The DFA unit 160 is coupled directly to the DRAM controllers 133 on a bypass cache access path 135. The bypass cache access path 135 allows the HFA unit to read directly from DRAM memory 108 without using level 2 cache memory 130, which may improve the efficiency of HFA operations.

Fig. 2A is a diagram of a scheduling/Synchronization and Sequencing (SSO) module 148. A scheduling/Synchronization and Sequencing (SSO) module 148 functions as a coprocessor that provides a number of important functions, such as work queuing, work scheduling/descheduling, and sequencing and synchronization of work.

Each job is described by an associated work queue entry that may be created by either a hardware unit (i.e., on-chip unit of network services processor 100 shown in fig. 1) or kernel software (i.e., instructions executed by these on-chip units). For example, in some embodiments, the centralized packet input hardware creates a work queue entry and commits work for each packet coming. Further, kernel software can create work queue entries and submit work as needed.

These coprocessors (described earlier with reference to fig. 1) may provide different levels of quality of service (QOS). Specifically, multiple input work queues may be used and incoming packets may be classified into one of the multiple input work queues using different default values and priorities. Further, some incoming packets may be discarded before being buffered and submitted to the kernel in order to provide a desired quality of service. For example, a Random Early Discard (RED) algorithm or a threshold algorithm to decide when or whether to discard an incoming packet. This dropping mechanism may be configured differently for different quality of service classes.

Each job is queued by adding a work entry to a queue. Depending on the desired quality of service, different priorities may be used to output the work stored in the queue. For example, queuing schemes such as static priority and weighted round robin may be used.

Each job often flows through the states 210, 220, 230 shown in fig. 2A. Specifically, each job is first in the input queue 210, enters the flight 220, and is not scheduled or completed 230 last. At any given time, since a quantum of work may be scheduled to a particular processor core 120, the number of scheduled items is limited by the number of processor cores. A processor core that is executing unscheduled work or has completed its scheduled work without yet requesting new work may be considered an unscheduled core. The scheduled jobs are a subset of SSO flight jobs 220. Any kernel can deschedule scheduled items at any point 225. Any unscheduled work remains in flight and is rescheduled later.

Despite the limited size of the SSO unit 148, the SSO unit 148 maintains the appearance of an infinite input work queue by maintaining an L2/DRAM (in-memory) chain and overflowing to L2/DRAM when needed. Specifically, in the absence of processing space, SSO unit 148 adds the input queue entries to an L2/DRAM list maintained by network services processor 100. If space is available in SSO unit 148 when a work is added, SSO unit 148 immediately buffers the work internally and avoids the overhead of memory lists. If SSO unit 148 places the work in the memory list, space becomes available, which later automatically moves the work from L2/DRAM into SSO unit 148 in the order in which it was originally added.

As described above, SSO unit 148 queues each work by adding a work queue entry to the queue. This work queue entry in L2/DRAM is the primary descriptor that describes each work. When a work queue entry is added to the queue or when work is moved from memory input queue 210 to an SSO entry in SSO unit 148, SSO unit 148 can read/write to the L2/DRAM location containing the work queue entry.

Typically, the SSO unit 148 only needs to maintain one Work Queue Pointer (WQP)300 (shown later in fig. 2B and 3) that points to the work queue entry. The work queue pointer may be a 64-bit alignment pointer within L2/DRAM that points to a legitimate work queue entry. When one kernel is available to process new work, SSO unit 148 stores the WQP and uses this pointer. Specifically, when the processor element of network services processor 100 adds an input queue entry 210 (the ADDWQ instruction shown in FIG. 2A) to the existing work queue structure, SSO unit 148 reads the L2/DRAM location containing the pointed to work queue entry.

FIG. 2B is a diagram of the internal architecture of a schedule/Synchronize and Sequence (SSO) unit 148. For purposes of illustration, five hypothetical cores are shown, however, example embodiments of the present invention may utilize a different number of cores.

As explained above, the SSO unit 148 entries may be in the input queue 210, in the flight state 220, attached to the core in the unscheduled state 221, or within the free list 222. Each entry of SSO unit 148 contains at least the following information:

a pointer 300 to a work queue entry in L2/DRAM (WQP)

Current tag 301 and tag type

Current group (shown later in FIG. 3)

Pointer 209 linking the entry to a different list

In the example shown in fig. 2B, the SSO unit 148 architecture includes a plurality of input queues 210. Each input queue includes a memory list and a cell list. When SSO unit 148 adds new work to the unit, it assigns an internal SSO unit entry 233 and fills out entry 233 with the required information. Once an internal SSO entry 233 is assigned to a job, the job remains within the SSO unit 148 while it is within the input queue 210 or in flight 220 until it is completed or not scheduled. After that point, SSO entry 233 cannot overflow to L2/DRAM. Kernel operations only cause SSO entries 233 to attach/detach from a particular kernel and move between lists.

Once SSO unit 148 has loaded a copy of the work, it no longer reads or writes to the work queue entry location in L2/DRAM. The work queue entry in L2/DRAM is only used when work is in input queue 210, but never when the work is within SSO unit 148. When a work queue pointer is within SSO unit 148, SSO unit 148 carries the work queue pointer at all points, since the work queue pointer indirectly describes the actual work that needs to be performed.

The in-flight entries 220 are organized in a first-in-first-out (FIFO) ordered list 224, with one FIFO entry associated with each unique in-flight tag and tag type value combination. When the SSO hardware either schedules work from the input queue 210 or switches from the unscheduled 221 core state, the work enters the FIFO listing 224.

The cores in SSO unit 148 can be scheduled to process in-flight entries 220 in unscheduled state 221 and or attached SSO entries 233 in free state 222.

FIG. 3 illustrates the format requirements for a work queue pointer 300. The fields labeled "QOS", "Grp", "TT", and "Tag" indicate the quality of service, group, Tag type, and Tag, respectively, corresponding to each job. The SSO hardware only reads the QOS, Grp, TT fields and those fields as set by the SSO software as appropriate for the hardware ADDWQ case. As noted above, when a kernel is available to process a work, the SSO hardware stores the work queue pointer 300 and sends the pointer 300 to the work.

FIG. 4 is a diagrammatic representation of a work queue entry buffer that may be used with the present invention. As described above, work submitted to the coprocessor comes in through the input queue 210 (shown in FIG. 2A) and is described by an associated Work Queue Entry (WQE). These input work queues can be arbitrarily large.

In some embodiments, the network services processor may be initialized with a free list of pointers to empty L2 cache lines. The cache lines may be arranged in packets (bundles) having a predetermined size. For example, inIn one embodiment, the software is implemented in a plurality ofIdle refers to initialization for hardware with as many N work queue entries as an application can use.

Some of these free pointers may be stored as on-chip registers and function as head and tail pointers to linked lists in memory. Other pointers may be stored within the free page allocator pool (FPA 128 shown in FIG. 1) in L2/DRAM.

When initialized, the in-memory linked list for a given input queue is empty, and the head and tail pointers for each input queue are the same. If on-chip space is available, the hardware immediately buffers the work internally and avoids the overhead of the in-memory linked list. In the event that no space is available on-chip, SSO unit 148 directs work to an in-memory linked list. Specifically, if on-chip space is not available, SSO unit 148 pre-fetches a free pointer from FPA128 (fig. 1) and inserts the work queue entry (e.g., WQE0, WQE1.. said.) into an internal buffer, while encapsulating (compressing) successive incoming work items by queue entries (e.g., WQE0, WQE1.. said.) until there are a sufficient number of work queue entries to form a bit encapsulation packet for storage in the L2/DRAM cache line. SSO unit 148 stores the bit-encapsulated packets 410, 420, 430 (plus the free pointer) to L2/DRAM at the tail of the input queue link list. For faster retrieval, SSO unit 148 may choose to store only the bit-encapsulated packets 410, 420, 430 to L2. Once the bit encapsulating packet 410, 420, 430 is stored, the tail pointer 440 for the packet 410, 420, 430 is updated with the free pointer. The entries held in the external input queue form a linked list of work queue entry packets 410, 420, 430. In some embodiments, at least one external input queue may be included for each class of service. In one embodiment (shown in FIG. 4), each work queue entry packet may contain 26 work queue entries bit-packed into one 256-byte block of memory.

Once on-chip processing space is available, the hardware automatically moves back the work from L2/DRAM to be processed in the order in which the work was originally added. Specifically, the hardware reads the bit-encapsulated packets 410, 420, 430 from the head pointer 450 of the input queue link list in the L2/DRAM and de-encapsulates and de-compresses the packets 410, 420, 430 simultaneously. Once the packet is decapsulated, SSO unit 148 releases head pointer 450 to FPA128 (shown in FIG. 1) and updates head pointer 450 with the next pointer retrieved from the decapsulated packet 410, 420, 430.

Fig. 5 is a high level diagram of one embodiment of the present invention. As shown in fig. 5, network services processor 100 includes a plurality of independent processor cores 120 that perform work including packet processing operations and a plurality of in-memory linked lists 501 arranged to store work to be performed by processor core 120 in response to a lack of processing resources in the processor core. Work is moved from the in-memory linked list 501 back to the processor core 120 in response to the availability of processing resources in the processing core. The in-memory linked-lists 501 may be formed within a portion of the processor core memory 120 that is independent of the portions describing and processing the work to be performed.

The work to be performed is stored in an input queue (not shown) of an in-memory linked list 501. The in-memory linked list 501 may be stored within a Dynamic Random Access Memory (DRAM)108 or within the L2 cache 130.

As explained earlier, the network services processor 100 maintains a plurality of pointers that are allocated by the free pool allocator unit 128 to a plurality of available storage locations in the in-memory linked list 501. If processing space is not available, processor 100 pre-fetches a free pointer from FPA128 and inserts a work queue entry into an internal buffer (not shown). The processor 100 encapsulates multiple consecutive work queue entries simultaneously until an encapsulated packet of work queue entries of a predetermined size has sufficient work queue entries. Once formed, the encapsulated packet is stored at the tail of the in-memory linked list 501 in the L2 cache 130 (or DRAM 108). Once the encapsulation packet is stored, the tail pointer for the packet is updated with the free pointer.

Once processing resources become available, network services processor 100 retrieves the work to be performed from the head of in-memory linked list 501 and releases the pointer to the retrieved work. The pointer to the head of the in-memory linked list 501 is also updated with the new pointer obtained from the retrieved work.

While this invention has been particularly shown and described with references to exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.

Claims

1. A network services processor comprising:

a plurality of network service processor elements that perform work comprising a plurality of packet processing operations;

a plurality of in-memory linked lists arranged to store entries indicating work to be performed by the network services processor elements; and

a scheduling processor configured to schedule the work for the plurality of network service processor elements, the scheduling processor further configured to 1) detect an availability of a processor to perform the work, and 2) store the entries to the plurality of in-memory linked-lists in response to detecting a lack of available processors to perform the work in the plurality of network service processor elements, and the scheduling processor moves the entries from the plurality of in-memory linked-lists back to a given one of the plurality of network service processor elements in response to detecting an availability of a processor to perform the stored work in the plurality of network service processor elements,

wherein work to be performed by the network services processor is encapsulated in buffers before being stored in an in-memory linked-list, and

wherein the work to be performed by the network services processor element is decapsulated while the work is moved from the in-memory linked-lists back to the network services processor element.

2. The network services processor of claim 1 wherein the in-memory linked-lists are formed within a portion of network services processor memory that is independent of portions describing and processing the work to be performed.

3. The network services processor of claim 1 wherein the work to be performed is stored in an input queue of an in-memory linked list.

4. The network services processor of claim 1 wherein the in-memory linked-lists comprise dynamic random access memory.

5. The network services processor of claim 1, wherein the network services processor maintains pointers to available storage locations in the in-memory linked-lists.

6. The network services processor of claim 5 wherein the network services processor stores the work to be performed in an available storage location indicated by a pointer.

7. The network services processor of claim 6 wherein the network services processor stores the work to be performed at the end of an input queue of the available storage locations.

8. The network services processor of claim 7 wherein the network services processor updates a second pointer to the tail of the input queue with the pointer.

9. The network services processor of claim 5 wherein, in response to availability of a processor for performing the work stored in the plurality of network services processor elements, the network services processor retrieves the work to be performed from an available storage location.

10. The network services processor of claim 9 wherein the network services processor retrieves the work to be performed from a head of an input queue of the available storage location.

11. The network services processor of claim 10 wherein the network services processor releases the pointer when the work to be performed is retrieved.

12. The network services processor of claim 10 wherein the network services processor updates a second pointer to the head of the input queue with a new pointer obtained from the retrieved work.

13. The network services processor of claim 1, wherein the network services processor maintains pointers to available storage locations within the in-memory linked-lists in a free pool allocator.

14. The network services processor of claim 13 wherein the free pool allocator is maintained in a dynamic random access memory.

15. The network services processor of claim 13 wherein, in response to a lack of processors in the network services processor, the network services processor obtains a pointer from the free pool allocator to an available storage location in the in-memory linked-lists.

16. The network services processor of claim 1, wherein the buffer is a predetermined size buffer.

17. The network services processor of claim 1 wherein the work to be performed by the network services processor element is decapsulated into separate data packets when moved from the in-memory linked-lists back to the network services processor element.

18. The network services processor of claim 1 wherein each individual packet processing operation defines a job.

19. The network services processor of claim 18 wherein the network services processor maintains one work queue entry for each work.

20. The network services processor of claim 19, wherein the network services processor maintains a predetermined number of pointers to available storage space in the in-memory linked-lists, the predetermined number of pointers being a subset of a total number of work queue entries maintained by the network services processor.

21. A method of processing network services, comprising:

scheduling work for a plurality of network service processor elements;

maintaining in-memory linked lists arranged to store entries indicating work to be performed by the plurality of network services processor elements;

detecting availability of the processor to perform the task;

in response to detecting a lack of available processors for performing the stored work in the network services processor elements, storing the entries to the plurality of in-memory linked lists;

in response to detecting availability of a processor for executing the stored work in the plurality of network service processor elements, moving the work to be executed by the network service processor element from the in-memory linked-list back to a given network service processor element of the plurality of network service processor elements;

encapsulating the work to be performed into a buffer before being stored in an in-memory linked list; and

decapsulating the work to be performed by the network services processor element when the work is moved back from the in-memory linked-lists to the network services processor element.

22. The method of claim 21, further comprising forming the in-memory linked-lists within a portion of network services processor memory that is independent of portions describing and processing the work to be performed.

23. The method of claim 21, further comprising storing the work to be performed in an input queue of an in-memory linked list.

24. The method of claim 21, wherein the in-memory linked-lists comprise dynamic random access memory.

25. The method of claim 21, further comprising maintaining, in a network services processor, pointers to available storage locations in the in-memory linked-lists.

26. The method of claim 21, further comprising storing the work to be performed at an available storage location indicated by a pointer.

27. The method of claim 26, further comprising storing the work to be performed at the tail of an input queue of the available storage location.

28. The method of claim 27, further comprising updating a second pointer to the tail of the input queue with the pointer.

29. The method of claim 25 further comprising retrieving the work to be performed from an available storage location in response to availability of a processor for performing the stored work in the network services processor element.

30. The method of claim 29, further comprising retrieving the work to be performed from a head of an input queue of the available storage location.

31. The method of claim 30, further comprising releasing the pointer when the work to be performed is retrieved.

32. The method of claim 30, further comprising updating a second pointer to the head of the input queue with a new pointer obtained from the retrieved work.

33. The method of claim 21, further comprising maintaining pointers to available storage locations in the in-memory linked lists in a free pool allocator.

34. The method of claim 33, further comprising maintaining the free pool allocator in a dynamic random access memory.

35. The method of claim 33, further comprising obtaining a pointer from the free pool allocator to an available storage location in the in-memory linked-lists in response to a lack of processors in the network services processor element.

36. The method of claim 21, wherein the buffer is a predetermined size buffer.

37. The method of claim 21, further comprising decapsulating the work to be performed into separate data packets when moved from the in-memory linked-lists back to the network services processor element.

38. The method of claim 21, wherein each individual packet processing operation defines a job.

39. The method of claim 38, further comprising maintaining one work queue entry for each work.

40. The method of claim 39, further comprising maintaining a predetermined number of pointers to available storage space in the in-memory linked-lists, the predetermined number of pointers being a subset of a total number of work queue entries maintained by the network services processor.