WO2001022667A1

WO2001022667A1 - Method and system for efficient routing table compression and fast routing lookups

Info

Publication number: WO2001022667A1
Application number: PCT/SE2000/001841
Authority: WO
Inventors: Mikael Sundström
Original assignee: Effnet Group AB
Current assignee: Effnet Group AB
Priority date: 1999-09-22
Filing date: 2000-09-22
Publication date: 2001-03-29
Anticipated expiration: 2002-03-22
Also published as: AU7819300A; TW508926B; SE9903460D0; SE9903460L

Abstract

A system for IP routing lookup in a routing table, comprising entries of arbitrary length prefixes with associated next-hop information in a next-hop table, for determining where IP datagrams are to be forwarded, and data structure means for representing the set of dominating points in the IP address space for each prefix in the routing table, said means comprising three levels of compressed pointer arrays (CPA), each including a chunk and an array of associated pointers, wherein each pointer comprises a next hop index or a pointer to the next level of chunk.

Description

APPLICANT: EFFNET GROUP AB

TITLE: METHOD AND SYSTEM FOR EFFICIENT ROUTING TABLE COMPRESSION AND FAST ROUTING LOOKUPS

Field of the Invention

The present invention relates generally to method and system of IP routing lookups in a routing table, comprising entries of arbitrary length prefixes with associated next- hop information in a next -hop table, to determine where IP datagrams are to be forwarded.

Description of the Prior Art

Internet is an interconnected set of networks, where- in each- of the constituent networks retains its identity, and special mechanisms are needed for communication across multiple networks. The constituent networks are referred to as subnetworks .

Each subnetwork in the Internet supports communica- tion among the devices connected to that subnetwork.

Further, subnetworks are connected by devices referred to as interworking units (I Us) .

A particular I U is a router, which is used to connect two networks that may or may not be similar. The router employs an internet protocol (IP) present in each router and each host of the network.

IP provides a connectionless or datagram service between stations.

Routing is generally accomplished by maintaining a routing table in each station and router that indicates, for each possible destination network, the next router to which the IP datagram should be sent.

IP routers do a routing lookup in a routing table to determine where IP datagrams are to be forwarded. The re- suit of the operation is the next-hop on the path towards the destination. An entry in a routing table is conceptually an arbitrary length prefix with associated next -hop in- formation. Routing lookups must find the routing entry with the longest matching prefix.

Since IP routing lookups have been inherently slow and complex, operations with prior art solutions have lead to a proliferation of techniques to avoid using them.

Most current IP router designs use caching techniques wherein the routing entries of the most recently used destination addresses are kept in a cache. The technique relies on there being enough locality in the traffic so that the cache hit rate is sufficiently high and the cost of a routing lookup is amortized over several packets. These caching methods have worked well in the past. However, as the current rapid growth of the Internet increases the required size of address caches, hardware caches may become uneconomical .

IP router designs can use special-purpose hardware to do IP processing. This can be an inflexible solution. Any changes in the IP format or protocol could invalidate such designs. The flexibility of software and the rapid perform- ance increase of general purpose processors makes such solutions preferable.

Summary of the Invention

It is therefore an objective of the present invention to provide an improved IP routing lookup method and system for performing full routing lookups for each IP packet up, which method and system overcome the above mentioned disadvantages .

A further objective is to increase routing lookup speeds with a conventional microprocessor.

Another objective is to minimize lookup time in the forwarding table. Another further objective of the invention is to provide a data structure which can fit entirely in the cache of a conventional microprocessor.

Consequently, memory accesses will be orders magni- tude faster than if the data structure needs to reside in memory consisting of, for example, relatively slow DRAM. These objectives are obtained with the IP routing lookup method and system according to the invention, which system is a data structure that can represent large routing tables in a very compact form and which can be searched quickly using few memory references.

An advantage of the invention is an efficient lookup implementation resulting in better instruction cache behavior and more stable lookup costs.

Brief Description of the Drawings

In order to explain the invention in more detail and the advantages and features of the invention preferred embodiments will be described in detail below, reference be- ing made to the accompanying drawings, in which

FIG 1 is a schematical view of a router design; FIG 2 is an illustrative view of a direct next- hop encoded in the code word at level 1 in the data- structure according to the invention; FIG 3 is an illustrative view of a direct next- hop encoded in a pointer at level 1 in the datastruc- ture according to the invention;

FIG 4 is an illustrative view of a pointer to the level 2 CPA encoded in the code word and pointer at level 1 in the datastructure according to the invention;

FIG 5 is an illustrative view of a direct next- hop encoded in the code word at level 2 in the data- structure according to the invention; FIG 6 is an illustrative view of a direct next- hop encoded in the pointer at level 2 in the datastruc- ture according to the invention;

FIG 7 is an illustrative view of a pointer to the level 3 CPA encoded in the code word and pointer at level 2 in the datastructure according to the invention;

FIG 8 is an illustrative view of a direct next- hop encoded in the code word at level 3 in the data- structure according to the invention; FIG 9 is an illustrative view of a Next-hop index encoded in a pointer at level 3 in the datastructure according to the invention;

FIG 10 is a high level view of the method according to the invention for creating data structure means for representing the set of dominating points in the IP address space for each prefix in a routing table;

FIG 11 is a block diagram of a first embodiment of a hardware implementation layout of the system for IP routing lookup in a routing table according to the invention;

FIG 12 is a block diagram of a second embodiment of a hardware implementation layout of the system for IP routing lookup in a routing table according to the invention; FIG 13 is an illustrative view of code words used in a third embodiment of a hardware implementation of the system for IP routing lookup in a routing table according to the invention;

FIG 14 is a block diagram of a third embodiment of a hardware implementation layout of the system for IP routing lookup in a routing table according to the invention;

FIG 15 is an illustrative view of a 32 bits level 1 code word used in a software implementation of the invention. Detailed Description of the Invention

With reference to FIG 1 a router design comprises a number of network inbound interfaces 1 , network outbound interfaces 2, forwarding engines 3, and a network processor

4, all of which are interconnected by a connection fabric

5. Inbound interfaces 1 send packet headers to the forwarding engines 3 through the connection fabric 5. The forwarding engines 3 in turn determine which outgoing interface 2 the packet should be sent to. This information is sent back to the inbound interface 1, which forwards the packet to the outbound interface 2. The only task of a forwarding engine 3 is to process packet headers. All other tasks such as participating in routing protocols, resource reserva- tion, handling packets that need extra attention, and other administrative duties, are handled by the network processor 4.

Each forwarding engine 3 uses a local version of the routing table, a forwarding table, downloaded from the net- work processor 4 and stored in storage means in the forwarding engine 3, to make its routing decisions. It is not necessary to download a new forwarding table for each routing update. Routing updates can be frequent but since routing protocols need some time to converge, forwarding tables can grow a little stale and need not change more than at most once per second. The network processor 4 needs a dynamic routing table designed for fast updates and fast generation of forwarding tables. The forwarding tables, on the other hand, can be optimized for lookup speed and need not be dynamic.

To minimize lookup time, two parameters, the number of memory accesses required during lookup, and the size of the data structure, have to be simultaneously minimized in the data structure in the forwarding table. Reducing the number of memory accesses required during a lookup is important because memory accesses are relatively slow and usually the bottleneck of lookup procedures. If the data structure can be made small enough, it can fit entirely in the cache of a conventional microprocessor. This means that memory accesses will be orders of magnitude faster than if the data structure needs to reside in a memory consisting of a relatively slow DRAM, as is the case for Patricia trees. If the forwarding table does not fit entirely in the cache, it is still beneficial if a large fraction of the table can reside in the cache. Locality in traffic patterns will keep the most frequently used pieces of the data structure in the cache, so that most lookups will be fast. Moreover, it becomes feasible to use fast SRAM for the small amount of needed external memory. SRAM is expensive, and more expensive the faster it is. For a given cost, the SRAM can be faster if less is needed.

As secondary design goals, the data structure should need few instructions during lookup and keep the entities naturally aligned as much as possible to avoid expensive instructions and cumbersome bit-extraction operations.

The Routing Table Lookup Problem A routing table is a data base of prefixes where each prefix is associated with next-hop information in the form of an index into a next-hop table. A prefix is essentially a bit string consisting of 0 to 32 bits - the length of the prefix. The first bit is numbered 31 , the second is numbered 30, and so on until the bit string ends. In a prefix of length 5 for example, bits 31 , 30 , 29, 28 , and 21 are specified. This means that such a prefix matches any IP address where bits 31 , 30, 29, 28 , and 21 of the IP address equals the correspond- ing bits of the prefix. Since only the five first (most significant) bits of the prefix are specified, it does not matter what the bits 0 to 26 of the IP address are. A prefix of length zero matches any IP address.

The result of a routing lookup is always the in- dex associated with the longest matching prefix in the routing table.

For example, an IP address starting with 101 . . . , where bit 31 is the leftmost bit, matches both the prefix 10* and the prefix 101 * , where the * denotes the end of the prefix. However, the prefix 101 * is longer than 10* . Therefore, the preferred result of a routing lookup among these two prefixes is the index associated with the prefix 101 * .

A prefix can also be represented by a range of size 2ⁱ of the IP address space 0. . 4294961295 , for some non-negative integer i. The starting point of such a range must be a multiple of 2¹ . Otherwise, it is not a prefix. This is called the power of 2 property. For example, the prefix 10* starts at 2141483648 and ends at 3221225412. The size of the range is 1013141824 , which equals 2³⁰. The starting point equals 2*2³⁰.

Every prefix is a range possessing the power of 2 property and vice versa. Therefore, we do not need to distinguish between ranges possessing the power of 2 property and prefixes. The length of a prefix is defined as the number of bits specified and the size of a prefix is defined as the size of the range of the IP address space covered by the prefix. Computing the binary representation of the starting point of a prefix is achieved by padding the bit string representation of the prefix with zeroes to the right until the bit string becomes 32 bits long. The size of a prefix of length i is given by 2^{32 '1} . Representing the Routing Table as a Partition of Prefixes

In a routing table, there will typically be a few large prefixes that completely cover smaller prefixes. To reduce the amount of memory required for representing the routing table, redundant information is removed by creating an auxiliary routing table containing exactly the same routing information but with no covering prefixes - a partition of prefixes. Covering prefixes are recursively split in half until the sizes of the remaining prefixes from the larger prefixes are the same as the smaller prefixes covered, which means that they can be removed. Then, the remaining prefixes from the covering prefix are re- moved. This method is used repeatedly, starting with the smallest prefixes and continuing to larger and larger prefixes as long as there are larger prefixes left that covers smaller prefixes. The result is a set of prefixes that forms a partition of the IP address space .

For example, consider the following routing table .

The prefix 001 * is covered by the prefix 0* .

Therefore, the larger prefix is split, which results in

By repeatedly splitting 00* we get

resulting in the routing table

The prefix * still covers all prefixes and is therefore split

By repeatedly splitting 1* we get

Finally, we have the following routing table where the prefixes properly partitions the IP address space .

A routing table can also be represented as a binary tree, where each node has zero, one or two children. A node with zero children is called external node or leaf and a node with one or two children is called internal node. In such a tree, the next-hop index for the prefix 01 * is stored in the right (1^st) child of the left (0^th) child of the root. The next-hop index for * is stored in the root. The tree representation of the final prefix partition, resulting from applying the method described above, possesses the following important properties:

• All internal nodes have exactly two children

• Only external nodes contains next-hop information

The method is sometimes referred to as tree com- pletion - the tree representation of the original routing table is completed into a full binary tree.

Representing the Partition of Prefixes as the Set of Starting Points of the Prefixes

Since the prefix partitioning yields a set of prefixes, where each prefix is associated with next-hop information, that properly partitions the IP address space, it suffices to represent the partition as the. set of the starting points (first IP addresses) of the prefixes .

Hence, the information required for representing the table from the example above is the following.

Finding the Closest Dominating Points in the Set of Prefixes

Given a set of points in the set of real numbers, we define the distance between two points ₂ and p₂ in the norm L«, (L- infinity) as lim_k→∞ ( {pi-p∑l ^k) ^1/k- | _-p2i! (in 1-dimension, L_∞ distance equals euclidian distance) . Moreover, we say that the point p₂ dominates the point p_± (p₂ is dominated by p₂) if an only if p_x< p₂. When a routing table is represented as the set P of starting points of the prefixes, as described in the previous section, the routing lookup problem is reduced to the problem of finding the closest dominating point in P. For example, to perform a routing lookup of the IP address 112.16.0.20 in the routing table above we simply need to find the point, in the set of dominating points 0.0.0.0, 32.0.0.0, 64.0.0.0, 128.0.0.0, located at the shortest L__ distance from 112.1 .0.20. Obvi- ously, that point is 128.0.0.0 and the result of the routing lookup is next -hop index 4. The XTC Data Structure

The XTC is a data structure for efficiently representing the set of dominating points by using three levels of compressed pointer arrays . Each com- pressed pointer array (CPA) consists of a chunk and an array of associated 20 bits pointers and/or 15 bits next -hop indices. The CPA constituting the first level represents an uncompressed pointer array of size 65536 pointers (bits 31 . . 16 of the IP address) and the CPAs constituting level 2 and 3 represents 256 pointers each.

Compressed Pointer Arrays and Sets of Dominating Points

A compressed pointer array is represented by a chunk and a list of pointers. The chunk consists of code words, where each code word represents 16 bits of an implicit bit array representing the chunk. The size of the implicit bit array is the same as the maximum number of pointers in the CPA. In the implicit bits bit array the bit with index i is set if and only if there is a prefix starting at position i. By starting point, we mean that the bits inspected { 31 . . 16 at level 1, 15. . 8 at level 2, and 1. . 0 at level 3) of the dominating point, when indexing into the CPA, equals i. The implicit bit array is, in turn, represented as 16 implicit bit masks. There are three classes of bit masks:

• Empty bits mask with no bits set

• Bit mask with exactly one bit set • Bit mask with two or more bits set

By the power of 2 property, a bit mask with one bit set represents a prefix where the starting point within the chunk is a multiple of 16. It completely covers the whole bit mask, and all the empty bit masks immediately following. For example, if there is prefix with next -hop index 15 starting at position 64 = 4 * 16 (the dominating point representing the prefix equals 16) and ending at position 127 = 1 * 16 + 15, bit mask 4 has only the first bit set while bit masks 5, 6, and 7 are empty.

For bit masks with one bit set and the empty bit masks immediately following, the next-hop index is encoded directly in the code word. Each such code word represents 16 pointers that would be required if the pointer array was not compressed. In the example above, the code words 4 . . 1 will be identical.

The bits masks with more than two bits set represents prefixes which does not completely cover a bit mask. For each such prefix, the starting point and end point lies in the same bit mask. By the power of two property, the number of different kinds of bit masks of length 2^X with at least one bit set is given by the following recurrence equation:

A (0) = 1 A (i ) = 1 + A (i - I ) ²

This means that for i -1 , 2, and 3 the bit masks shown in the table below are possible. For i=4 there are 677 different cases which are not listed.

Hence, by the power of 2 property, there are 616 different kinds of 16 bits bit masks with two or more bits set. Such bit masks and the corresponding pointers are represented by a combination of a code word and a pointer list associated with the code word. Each pointer either constitutes a 15 bits next hop index or a 20 bits pointer to the next level CPA.

Initially, there will be one pointer per bit set in the bit mask. However, in the XTC algorithm, the number of pointers (and bits) will be slightly increased during the encoding in order to reduce the complexity compared to the FTC algorithm.

In the FTC, 10 bits are used to encode the kind of the bit mask as an index into a map table. The map table requires memory accesses, hence slowing down the lookup. To improve the speed, a simpler encoding using only 2 bits instead of 10 is used in the XTC. Moreover, the map table is not used at all, thus reducing the number of memory accesses drastically.

By inspecting the set of prefixes whose starting points lies in the same bit mask, we can identify the prefix with the smallest size (for a prefix with starting point and end point within the same bit, the size is 1) . Each prefix in the set is either of size 1, 2, 4 , or 8. If the size is 16 it covers the bit mask completely and the direct encoding described above is applicable (the bit mask has exactly one bit set) . All prefixes that have larger size than the prefix with smallest can be repeatedly split in half until the sizes of all prefixes are equal. Each time a prefix is split, a zero bit in the bit mask is set (flipped) and a copy of the pointer associated with the closest set bit to the left of the flipped bit is added directly after the copied pointer in the pointer list. Each split increases the number of pointers by one. After this step we have one of the cases shown in the table below.

Instead of dealing with 616 different kinds of bit masks as in the FTC algorithm, we now have only four kinds of bit masks. It is easily shown that the number of extra pointers introduced by using this method is 11 if the number of initially set bits is 5. Hence, in the worst case the number of pointers required in the XTC representation compared to the FTC representation is increased by a factor of 16/5. Ex- periments have shown that the typical increase is a factor of 2.

In the lookup, a part of the IP address will be used to index into the CPA. The high 12 (on level 1) or 4 (otherwise) bits is the index of the code word. The 4 low (least significant) bits will be used to index into the bit mask of the code word (if the code word does not contain an encoded direct next-hop) . By indexing into the bit mask, we will either index a bit that is set (among the set bits) to retrieve the pointer associated with that bit, or we index a bit which is not set. If the bit is not set, we will need to retrieve the pointer associated with the closest set bit to the left of the indexed bit. As mentioned above, there is a pointer list associated with each bit mask/code word. Such pointer lists will either contain 2, 4, 8, or 16 pointers. By encoding bit masks as described in the table above, the index of the closest bit set to the left, among the set bits, (and also the index of the corresponding pointer in the pointer list) can be computed by a simple shift operation. That is, the low 4 bits used to index into the bit mask, in the lookup, are shifted code bits to the right. This is to be compared to the costly map table lookup as is used in the FTC algorithm.

In the XTC data structure, the chunk consists of an array of code words of fixed size (the size depend on the level) . To reduce the number of bits required for representing a the index of the first pointer associated with each code word (the beginning of the pointer list associated with the code word) , the pointers in the XTC data structure are stored in the same array as the chunks. The pointer array associated with a chunk is stored immediately after the last code word of the chunk.

As mentioned above, the pointer array consists of either 20 bits pointers and/or 15 bits next-hop indices. It is represented by an array of 16 bits integers, where the most significant bit indicates whether the low 15 bits represents a next-hop index or the low 15 bits of a 20 bits pointer. To represent a 20 bits pointer to the next level, we use 5 bits of the code word for representing the high 5 bits of the 20 bits pointer. These bits are simply appended to the pointer when it is retrieved. This clearly requires that in any list of pointers to the next level associated with the same bit mask, bits 15. . 19 must be identical. This is achieved moving the chunks at the next level during construction if the low 15 bits if the pointers wraps while building the chunks at the next level associated the bit mask in construction at the current level (see Section 3 for details) .

To summarize this, each code word consists of :

• 1 bit -that indicates if it is a direct next- hop or not .

If it is a next-hop, there are 15 bits represent- ing the next -hop index. Otherwise there are:

• 2 bits for the code (shift value)

• 5 bits for the high bits of the pointers associated

• 16 (level 1) or 8 (level 2 and 3) bits for representing the pointer offset, i.e. the index of the first pointer associated with the code word, in the pointer array stored directly after the chunk.

Level 1: a 65536 Elements Compressed Pointer Array The first level (level 1) of the XTC data structure is represented by a CPA with 4096 code words and 0 . . 65536 pointers. The first step in the lookup is to use bits 31 . . 20 of the IP address as index to the 24 bits code word which is retrieved (in the software implementation, we store the 24 bits code word in a 32 bits integer for alignment reasons) . If the most significant bit of the code word is set (represented by a white text on black background in Figure 2) , the following 15 bits represents a next-hop index which can be retrieved immediately and the lookup is completed. Bits 1. . 0 are unused in this case.

In the second step, bits 19. . 16 of the IP address are used to index into the code word to find the pointer in the pointer list. This is achieved by shifting the 4 bits of the IP address with the 2 bits integer retrieved from the code word (the code) , and, thereafter, adding the 16 bits offset also retrieved from the code word (see Figure 3) . The resulting number is the index to the pointer stored in the pointer list associated with the chunk. If the most significant bit of the pointer is set, the low 15 bits constitutes a next-hop index and the lookup is completed (Figure 3) .

If the pointer does not represent a direct next- hop, it's low 15 bits are retrieved and combined with the 5 bits stored in the code word, to form a 20 bits pointer to the level 2 CPA (Figure 4) where the lookup continues .

The compressed pointer arrays at level 2 are stored in one big array of 16 bits integers, where the size of each CPA depends on the number of pointers in its pointer list, which is stored immediately after the list of code words. Therefore, it is not possible to index the ith CPA at level 2. Instead, the level 3 CPAs have to be referenced by their starting points (the index of the first code word) and that is the reason for needing 20 bits pointers when representing large routing tables.

Level 2 : a 256 Elements Compressed Pointer Array The second level of the XTC consists of big array of level 2 CPAs. Each such CPA consists of 16 code words and 0. . 256 pointers. After the location of the level 2 CPA is computed at level 1, the lookup continues as follows. The third step in the lookup is to use bits 15. . 12 of the IP address as index to the 16 bits code word which is retrieved. If the most significant bit of the code word is set (represented by a white text on black background in Figure 5) , the following 15 bits represents a next-hop index which can be retrieved immediately and the lookup is completed. In the fourth step, bits 11 . . 8 of the IP address are used to index into the code word to find the pointer in the pointer list. This is achieved by shifting the 4 bits of the IP address with the 2 bits inte- ger retrieved from the code word (the code) , and, thereafter, adding the 8 bits offset also retrieved from the code word (see Figure 6) . The resulting number is the index to the pointer stored in the pointer list associated with the chunk. If the most significant bit of the pointer is set, the low 15 bits constitute a next-hop index and the lookup is completed (Figure 6) .

If the pointer does not represent a direct next- hop, it's low 15 bits are retrieved and combined with the 5 bits stored in the code word, to form a 20 bits pointer to the level 3 CPA (Figure 7) where the lookup continues .

In the same way as level 2, the compressed pointer arrays at level 3 are also stored in one big array of 16 bits integers, where the size of each CPA depends on the number of pointers in its pointer list. Hence, also here a 20 bits pointer us needed for representing the location of the next level CPA.

Level 3 : a 256 Elements Compressed Pointer Array The third level of the XTC consists of big array of level 3 CPAs. Each such CPA consists of 16 code words and 0..256 pointers. After the location of the level 3 CPA is computed at level 2, the lookup continues as follows in a similar way as at level 2. The fifth step is to use bits 1. . 4 of the IP address as in- dex to the 16 bits code word which is retrieved. If the most significant bit of the code word is set (represented by a white text on black background in Figure 8) , the following 15 bits represents a next-hop index which can be retrieved immediately and the lookup is completed.

In the sixth step, bits 3 . . 0 of the IP address are used to index into the code word to find the pointer in the pointer list. This is achieved by shifting the 4 bits of the IP address with the 2 bits inte- ger retrieved from the code word (the code) , and, thereafter, adding the 8 bits offset also retrieved from the code word (see Figure 9) . The resulting number is the index to the pointer stored in the pointer list associated with the chunk. At this stage, all bits of the IP address have been inspected. Hence, the pointer retrieved from the pointer list must represent a next- hop index. Therefore, this step completes the lookup (see Figure 9) .

By these last four sections, we complete the description of the XTC data structure. We have described in great detail how chunks and code word are represented at each level and how the information is encoded in these. Finally, the lookup, resulting in either a next-hop index or a pointer to the next level, performed at each level have been described. The resulting lookup is performed in six steps. In each step, a 16 bits memory access is performed, except for the first access where 24 ( 32 in the software implementation) bits are accessed. By using the XTC, the next-hop index is computed in 6 memory accesses, accessing a total of 13 ( 14) bytes, in the worst case.

XTC Construction

To achieve the fastest possible construction the XTC is built by traversing the routing table entries, building the next-hop table, and constructing the compressed XTC representation, in a single pass. The only requirement on the construction is that the routing entries are traversed in a certain order, and that a function is called on each entry. Hence, the representation of the dynamic routing table, from which the XTC forwarding table is built, is completely independent from the XTC construction algorithm.

Routing Entry Conversion and Next-hop Table Con- struction

The complete builder process consists of three steps that are executed simultanesously (see Figure 10) . In the first step, the dynamic routing table is traversed in sorted order with respect to the IP address (starting point) and the length of the IP net- mask. Entries with smaller values of the IP address are processed before entries with larger IP addresses. For entries with equal IP addresses, the entry with the shortest IP netmask (shortest prefix) is processed first. The tree traversal is intimately related with the representation of the dynamic routing table used, which in Effnet's case is a Radix Tree, and is therefore not described.

Each routing entry is sent to the routing entry conversion and next-hop table generation process as shown in Figure 10. This process efficiently splits routing entries into prefixes and discards prefixes that covers other prefixes as described in the introduction. It is also responsible for generating the next-hop table from the next-hop information (interface, MAC address, ATM channel etc.) that is present in the dymanic routing table. Finally, each prefix resulting from splitting each routing entry (splitting a routing entry can result in several prefixes) is fed into the XTC builder process which is described in de- tail in the following section. The same routing entry conversion and next hop table generation process can be used both for the FTC builder process and the XTC builder process (and also for other representations) . Therefore, it is detailed description is not given here. (It will be subject to a separate patent application. )

Depth First, Left to Right XTC Construction When the routing entries are traversed and split into prefixes as described, each prefix generated immediately propagates to the XTC construction in progress. Each prefix has a starting point and an end point as described in the introduction. By the conversion described in the previous section, the prefixes are fed the XTC construction in sorted order. Moreover, the end point of the next prefix to be processed equals the end point of the current prefix plus one.

All three levels of the XTC are built simultaneously. Depending on the length of the prefix currently processed, construction takes place at different levels. Sometimes, construction work is required at all levels for one prefix. This requires an immense amount of book keeping in the construction. We keep track of the following:

construction . chunk 3 Address to the level 3 chunk currently in construction . pointerl Address to the level 1 pointer list currently in construction. pointer2 Address to the level 2 pointer list currently in construction. pointer3 Address to the level 3 pointer list currently in construction. pos2 Position of the level 2 chunk in construction with respect to base∑ . pos3 Position of the level 3 chunk in construction with respect to base3 . from2 Position of the first level 2 chunk associated with the current level 1 code word. from3 Position of the first level 3 chunk associated with the current level 2 code word. bucketl [] Buckets containing 16 pointers associated the current level 1 code word. bucket2 [] Buckets containing 16 pointers associated the current level 2 code word. bucket 3 [] Buckets containing 16 pointers associated the current level 3 code word. offsetl Pointer offset associated with current level 1 code word . offset2 Pointer offset associated with current level 2 code word . offset3 Pointer offset associated with current level 3 code word . ma ski Bit mask associated with current level 1 code word. mask2 Bit mask associated with current level 2 code word. mask3 Bit mask associated with current level 3 code word. For each prefix processed, the expect variable is checked against the starting point to ensure that the stream of prefixes is valid. The processing of each prefix consists of an ini tialization phase, a construction phase, and a completion phase . In the initialization phase of the first prefix (starting point zero) , the whole construction is initialized by setting basel , Jbase2, and, jbase3 to the memory addresses of the re- spective levels, and by zeroing pos2, pos3 , from2, and from3 . Level 1 construction is also initialized by setting chunkl to basel , and pointerl to the memory address where the level 1 pointer list is to be stored, i.e. to basel + 8192 if the pointer list is stored immediately after the chunk. Moreover, offsetl is set to zero.

Then, for all prefixes (including the first) the initialization performs the following. First, the depth of construction ( 12, 16, 20, 24 , 28 , or 32) is computed by analyzing the net mask stored in the prefix. To compute the depth from the net mask m, the following formula is used:

32 - ( ( l (m & Oxf) + ! (m & OxfO ) + ! (m & OxfOO) + ! (m & Oxf 000) + ! (m & Oxf 0000 ) ) « 2) ,

where & denotes bit wise and operation, .' denotes logical not operation meaning !x = 1 if x = 0, and 0 otherwise, and << denotes left bit shift operation. Note also that we usually represent bit masks as integers in hexadecimal representation denoted by Ox in the beginning .

The initialization continues by processing the particular initialization for the depth computed. How- ever, for one prefix construction at several depths may need to be initialized, starting at the greatest depth.

Depth 32 initialization: Determine if the construction of a new level 3 code word begins by using the following formula: ! ( expect & ( (1 << 4) - 1) )

If the result is false (zero) , no level 3 code word initialization is required and the processing con- tinues with the construction phase. Otherwise, level 3 code word construction is initialized by setting mask3 to 0, bucket3 [0] , . . . , bucket3 [15] to EMPTY, and then continuing with the depth 28 initialization.

Depth 28 initialization:

Determine if the construction of a new level 3 chunk begins by using the following formula: ! ( expect & ( (1 « 8) - 1) ) If the result is false (zero) , no level 3 chunk initialization is required and the processing continues with the construction phase. Otherwise, bits 15. . 19 of from3 and pos3 are compared to ensure that they are equal. Recall that from3 and pos3 are the position of the first level 3 chunk and the current level 3 chunk, respectively, associated with the current level 2 code word in construction. Also, recall that the 20 bits pointers to these chunks will be stored at level 2 with the 5 high bits in the code word and the low 15 bits in the pointers (represented by from3 and pos3) associated with the code word. Therefore, it is required that bits 15. . 19 of _rro_rι3 and pos3 are equal. If the bits are equal, the depth 28 initialization is completed by setting chunk3 to base3 + pos3 , pointer3 to chunk3 + 16, which is the size of the chunk, and offset3 to zero. If the bits are not equal, it is needed to adjust the compressed pointer arrays built at level 3, starting from from3 so that the high 5 bits becomes equal and the 20 bits pointer encoding is possible. This is achieved by computing pos3 & OxfδOOO and assigning this value to a wrap variable. Then the distance delta of the adjustment (how many positions to move the CPAs) is computed as the difference between wrap and from3 . Moving the CPAs is then achieved by repeatedly assigning base3 [index] to base3 [index + del ta] , where index loops from pos3 - 1 to from3 . It remains to update the pointers stored in the buckets associated with the level 2 code word in construction since the locations of the level 3 CPAs associated with these have changed. This is achieved by increasing each pointer stored in the bucket2 array (EMPTY buckets are skipped) by delta and then keeping only the low 15 bits by a simple bit masking operation. Finally, from3 is set to wrap and pos3 is increased by delta, before completing the depth 28 initialization, by setting chunk3 to base3 + pos3 , pointer3 to chunk3 + 16, and offset3 to zero, and continuing with the depth 24 initialization.

Depth 24 initialization:

Determine if the construction of a new level 2 code word begins by using the following formula: ! ( expect & ( (1 « 12) - 1) )

If the result is false (zero) , no level 2 code word initialization is required and the processing continues with the construction phase. Otherwise, level 2 code word construction is initialized by setting mask2 to 0, bucket2 [0] , . . . , bucket2 [15] to EMPTY, and then continuing with the depth 20 initialization. Depth 20 initialization:

Determine if the construction of a new level 2 chunk begins by using the following formula: ! ( expect & ( (1 « 16) - 1 ) ) If the result is false (zero) , no level 2 chunk initialization is required and the processing continues with the construction phase. Otherwise, bits 15. . 19 of from2 and pos2 axe compared to ensure that they are equal. Recall that from2 and pos2 are the position of the first level 2 chunk and the current level 2 chunk, respectively, associated with the current level 1 code word in construction. Also, recall that the 20 bits pointers to these chunks will be stored at level 1 with the 5 high bits in the code word and the low 15 bits in the pointers (represented by from2 and pos2) associated with the code word. Therefore, it is required that bits 15. . 19 of from2 and pos2 are equal. If the bits are equal, the depth 20 initialization is completed by setting chunk2 to base2 + pos2 , pointer2 to chunk2 + 16, which is the size of the chunk, and offset2 to zero. If the bits are not equal, it is needed to adjust the compressed pointer arrays built at level 2, starting from from2 so that the high 5 bits becomes equal and the 20 bits pointer encoding is possible. This is achieved by computing pos2 & OxfδOOO and assigning this value to a wrap variable. Then the distance delta of the adjustment (how many positions to move the CPAs) is computed as the difference between wrap and from2. Moving the CPAs is then achieved by repeatedly assigning base2 [index] to base2 [index + del ta] , where index loops from pos2 - 1 to from2. It remains to update the pointers stored in the buckets associated with the level 1 code word in construction since the locations of the level 2 CPAs associated with these have changed. This is achieved by increasing each pointer stored in the bucketl array (EMPTY buckets are skipped) by delta and then keeping only the low 15 bits by a simple bit masking operation. Finally, fro ∑ is set to wrap and pos2 is increased by delta, before completing the depth 20 initialization, by setting chunk2 to ase2 + pos2, pointer2 to chunk2 + 16, and of fset2 *to zero, and continuing with the depth 16 initialization.

Depth 16 initialization: Determine if the construction of a new level 1 code word begins by using the following formula: I ( expect & ( (1 « 20) - 1 ) )

If the result is false (zero) , no level 1 code word initialization is required and the processing con- tinues with the construction phase. Otherwise, level 1 code word construction is initialized by setting maskl to 0, bucketl [0] , ..., bucketl [15] to EMPTY, and then continuing with the depth 12 initialization.

Depth 12 initialization:

No initialization is required here, it is taken care of when the whole construction is initialized.

After the initialization phase, the expect variable is updated by setting it to the end point of the prefix being processed plus one. Then, the processing continues with the construction phase where construction takes place at the depth computed in the beginning of the processing.

Depth 32 construction:

A direct next-hop stored in a level 3 pointer is built as follows. Compute a bit index bix by extracting bits 0 . . 3 from the starting point of the prefix. Then, bit bix of mask3 is set and the next-hop index of the prefix is stored (without encoding) in bucket3 [bix] . In the completion phase, the content of the buckets will be moved to the pointer array associated with the level 3 code word in construction.

Depth 28 construction:

Direct next -hops directly encoded in level 3 code words are built as follows. The indices, start and stop, of the first and last code word to be built are computed by extracting bits 4..7 of the starting point and end point of the prefix respectively. Then the next-hop index of the prefix is encoded by setting bit 15 and stored the chunk by assigning the encoded direct next -hop to chunk3 [start] , . . . , chunk3 [stop] .

Depth 24 construction:

A direct next-hop stored in a level 2 pointer is built as follows. Compute a bit index bix by extracting bits 8. . 11 from the starting point of the prefix. Then, bit bix of mask2 is set and the next-hop index of the prefix is encoded by setting bit 15 and is thereafter stored in bucket2 [bix] . In the completion phase, the content of the buckets will be moved to the pointer array associated with the level 2 code word in construction.

Depth 20 construction:

Direct next-hops directly encoded in level 2 code words are built as follows. The indices, start and stop, of the first and last code word to be built are computed by extracting bits 12. . 15 of the starting point and end point of the prefix respectively. Then the next-hop index of the prefix is encoded by setting bit 15 and stored the chunk by assigning the encoded direct next-hop to chunk2 [start] , . . . , chunk2 [stop] .

Depth 16 construction: A direct next-hop stored in a level 2 pointer is built as follows. Compute a bit index bix by extracting bits 16. . 19 from the starting point of the prefix. Then, bit bix of mask2 is set and the next -hop index of the prefix is encoded by setting bit 15 and is there- after stored in bucket2 [bix] . In the completion phase, the content of the buckets will be moved to the pointer array associated with the level 1 code word in construction.

Depth 12 construction:

Direct next -hops directly encoded in level 3 code words are built as follows. The indices, start and stop, of the first and last code word to be built are computed by extracting bits 20. . 31 of the starting point and end point of the prefix respectively. Then the next -hop index of the prefix is encoded by shifting 8 bits to the left, setting bit 23 and stored the chunk by assigning the encoded direct next -hop to chunkl [start] , . . . , chunkl [stop] . Recall that the level 1 chunk contains 24 bits integers and in this case, the lowest 8 bits are unused. In the software implementation, these 24 bits will be stored in a 32 bits integer for alignment reasons and this requires a slightly changed encoding .

The processing of the prefix is completed with the completion phase, which starts at the same depth as where the construction took place. For one prefix, completion at several depths may be required starting at the greatest depth. Depth 32 completion:

Determine if the construction of a level 3 code word ends by using the following formula (recall that the expect variable has been updated after the initialization phase) :

! (expect & ( (1 « 4) - 1) )

If the result is false (zero) , one or more prefixes are needed to complete the level 3 code word and the processing of the current prefix is completed. Otherwise, the index tix of the code word to be completed is computed by extracting bits 4 . . 1 from the starting point of the prefix. Then mask3 is analyzed to compute the 2 bits shift value shift. This is achieved by the following simple procedure:

Thereafter, the shift value and the encoding (shift << 8) + offset3 of the code word is computed and stored in chunk3 [tix] . To complete the code word, the pointers stored in the buckets must be moved to the pointer list associated with the code word. Depending on the shift value, pointers are either retrieved from 2, 4, 8, or 16 buckets. Shift Buckets value

0 0, 1, 2, 3, 4, 5, 6, 1, 8, 9, 10, 11, 12, 13, 14, 15

1 0, 2, 4, 6, 8, 10, 12, 14

2 0, 4, 8, 12

3 0, 8

The pointers are retrieved from the buckets and stored in the pointer array associated with the code word by looping through the buckets specified in the table above. For each non-EMPTY bucket the content of the bucket is assigned to pointer3[offset3], and for each EMPTY bucket pointer3[offset3 - 1] is assigned to pointer3 [offset3] . Thereafter, offset3 is increased by one before proceeding with the next bucket . After processing each bucket listed in the table above, the code word is completed and the processing continues with depth 28 completion.

Depth 28 completion:

Determine if the construction of a level 3 chunk ends by using the following formula (recall that the expect variable has been updated after the initialization phase) : ! (expect & ( (1 « 8) - 1) )

If the result is false (zero), one or more prefixes are needed to complete the level 3 chunk and the processing of the current prefix is completed. Otherwise, the level 2 bit index bix is computed by extract- ing bits 8..11 from the starting point of the prefix followed by setting bit bix of mask2. Finally, the low 15 bits of pos3 , which is the position of the level 3 chunk being completed, are stored in bucket2 [bix] followed by adding 16 + offset3 to pos3 , to prepare for the construction of the next level 3 chunk, before proceeding to the depth 24 completion.

Depth 24 completion:

Determine if the construction of a level 2 code word ends by using the following formula (recall that the expect variable have been updated after the initialization phase) : ! (expect & ( (1 « 12) - 1) )

If the result is false (zero) , one or more prefixes are needed to complete the level 2 code word and the processing of the current prefix is completed. Otherwise, the index tix of the code word to be com- pleted is computed by extracting bits 12 . . 15 from the starting point of the prefix. Then mask2 is analyzed to compute the 2 bits shift value shift . This is achieved by the same simple procedure as used in the depth 32 completion. Thereafter, the shift value and the encod- ing

( (from3 & Oxf 8000) » 5) + (shift << 8) + offset2 of the code word is computed and stored in chunk2 [tix] . In addition to the encoding made in the depth 32 completion, bits 15. . 19 of the pointers to the level 3 CPAs associated with the code word are extracted from from3 and encoded as bits 10 . . 14 of the code word, followed by setting from3 to pos3 to prepare for the next level 2 code word construction. To complete the code word, the pointers stored in the buckets must be moved to the pointer list associated with the code word. Depending on the shift value, pointers are either retrieved from 2, 4, 8, or 16 buckets in the same way as in the depth 32 completion. The pointers are retrieved from the buckets and stored in the pointer array associated with the code word by looping through the buckets specified in the table above. For each non-EMPTY bucket the content of the bucket is assigned to pointer 2 [of fset2] , and for each EMPTY bucket pointer2 [offset2 - 1] is assigned to pointer2 [offset2] . Thereafter, offset2 is increased by one before proceeding with the next bucket. After processing each bucket listed in the table above, the code word is completed and the processing continues with depth 20 completion.

Depth 20 completion:

Determine if the construction of a level 2 chunk ends by using the following formula (recall that the expect variable has been updated after the initializa- tion phase) :

! (expect & ( (1 « 16) - 1) )

If the result is false (zero) , one or more prefixes are needed to complete the level 2 chunk and the processing of the current prefix is completed. Other- wise, the level 1 bit index bix is computed by extracting bits 16. . 19 from the starting point of the prefix followed by setting bit bix of maskl . Finally, the low 15 bits of pos2, which is the position of the level 2 chunk being completed, are stored in bucketl [bix] followed by adding 16 + offset∑ to pos2 , to prepare for the construction of the next level 2 chunk, before proceeding to the depth 16 completion.

Depth 16 completion: Determine if the construction of a level 1 code word ends by using the following formula (recall that the expect variable has been updated after the initialization phase) :

.' (expect & ( (1 « 20) - 1 ) ) If the result is false (zero) , one or more prefixes are needed to complete the level 1 code word and the processing of the current prefix is completed. Otherwise, the index tix of the code word to be com- pleted is computed by extracting bits 20. . 31 from the starting point of the prefix. Then maskl is analyzed to compute the 2 bits shift value shift . This is achieved by the same simple procedure as used in the depth 32 completion. Thereafter, the shift value and the encod- ing

( (from2 & Oxf 8000) « 3) + (shift « 16) + off- setl of the code word is computed and stored in chunkl [tix] . In addition to the encoding made in the depth 32 completion, bits 15. . 19 of the pointers to the level 2 CPAs associated with the code word are extracted from from2 and encoded as bits 20. . 23 of the code word, followed by setting from2 to pos2 to prepare for the next level 1 code word construction. To com- plete the code word, the pointers stored in the buckets must be moved to the pointer list associated with the code word. Depending on the shift value, pointers are either retrieved from 2, 4, 8, or 16 buckets in the same way as in the depth 32 completion. The pointers are retrieved from the buckets and stored in the pointer array associated with the code word by looping through the buckets specified in the table above. For each non-E PTY bucket the content of the bucket is assigned to pointerl [offsetl] , and for each EMPTY bucket pointerl [offsetl - 1] is assigned to pointerl [of fsetl] . Thereafter, offsetl is increased by one before proceeding with the next bucket. After processing each bucket listed in the table above, the code word is completed and also the processing continues with depth 12 completion. Depth 12 completion:

The XTC construction is completed. In the construction software, some statistics regarding construc- tion are updated, but no work is really necessary to perform completion at depth 12.

By this section, the complete method of processing a list of routing entries repeatedly fed into the construction to compress and convert these into the XTC representation is described.

XTC Lookup Implementations In this section we describe a number of possible implementations of the XTC lookup, both in hardware and in software. Some of these are more of less straight forward but some deviates slightly from the description made above and requires minor changes in the representation of the data structure and the builder process . Straight Forward Hardware Implementation

The XTC lookup is extremely suitable for hardware implementations. In Figure 11, a rough layout of a direct hardware implementation of the algorithm described in the previous sections is showed. The IP address are fed into the lookup from the left and progresses throughout the lookup in six steps until the next-hop index is available to the right. It is assumed that the data structure is stored with each level in a separate memory. That is, one for the one level 1 chunk, one for the array of associated level 1 pointers, one for level 2 chunks and pointers, and one for level 3 chunks and pointers (these can also be mapped into one memory) . If the lookup is completed^in less than six steps, the result is locked by feeding bit 15 of the result into the chain of multiplexers. In _J t o o . o

≤ co H- n CD LQ Ό l-h ft⁾ X CD rr φ rt s: φ CD Hi Hi rr H- tn rt CD ri^¬ Ω rt rt

U rr 0 φ H- r ri 0 0 a Φ (3 H- 13 H- rr X rt Φ 0 H- 3 P- rr rt ft⁾ ft⁾ rr rr

H- 0 rt 3 tr Φ Φ Ω r a ri r-¹ 3 Ω 0 H- rt Hi CD (3 CQ 13 n Φ φ 3 Hi ft H-

Ω Hi 0 rr Φ Φ Φ rt Φ Φ 3 Ω Hi (3 Φ Hi rr Ω 13 Φ a rt CD rr Φ Φ Φ H- φ - CD ft⁾ rr ft⁾ Ω 13 Φ Φ φ Ω CO O Φ to ^ O 13 n ft⁾ CD 0 Hi • rt rt Hi 3 0 (3 a (3 ϊ^>

3 0⁾ rr H 3 l- rr iQ CD ft⁾ Hi ri 3 hH 3 C Φ Φ φ hH n H- rs rt CD • 2 ft⁾

Φ ft⁾ 0 0 ft⁾ CD 0 2 rt φ rr φ Hi a Ω 13 2 0 rt rt 0 13 ft⁾ NO tr 3 X CD rr 0 3 Φ 3 S3 0 ft⁾ 3 Φ Φ 0 rt CD Hi ^ H Φ ^

1-5 μ-> H- a C H- P- rr H- 3- Φ Hi (3 ft⁾ P H- 0 • Φ 3 ft⁾ ^ rt H- 0 H- Hi Ω cn rt CD r> 3 ft⁾ μ-¹ Φ H- Ω rr rr rr CD ω Hi X CD 13 rt H- CD 13 ft⁾ Φ rt tr O Φ o £ tr CQ μ-¹ rt rr rr H- rr ^ H ft⁾ ri H- CQ M Ω Hi Hi CO rr rt H- r H( 3 H- Φ ri rr ft⁾ H- Φ 13 Φ rt ft⁾ rr Ω P- Φ 0 (3 Hti Hi μ-¹ Hi Φ 0 CO Φ rr rr n ft⁾ CD Φ IQ Φ rr CD Ό 3 rr ft⁾ H- rt 13 CD 13 Hi Hl ft⁾ 0 Φ (3 ft⁾ ft⁾ O 0 3 H- ft co H- ft⁾ CD rr Φ H Φ rt ft⁾ CD Ω cn CO CD Φ Φ r LQ ft⁾ C- Hi Hi Hi rt a CQ X fi 13 rt C CD Φ 3 Ό X H- r 0 Ω ^ rt H- ≤ Ω rr H- CO ra H- Φ

13 Φ Φ φ Φ ^ rr ft⁾ X 0 φ 3 Φ ft⁾ Φ 0 H- μ-¹ rt Φ Ω Φ φ 3 ω 3 0 3 Φ 3 rr Φ O φ ft⁾ ri φ rr ft⁾ CD ft⁾ ft⁾ 13 rt to H- Hi a Φ ^ c

0 H- s: rr φ rr a rr Φ ^•o O ri ^ a CD rr CD CD μ-¹ CD a rr ^ <; 13 rt 3

H- 3 0 rr 3 rr 0 3 ft⁾ CD Φ a Hi Φ CD Φ 0 Hi Φ CD Φ rr 0 rr rt

3 rt ^ Φ 0 re 0 • CD rr ft⁾ ri ft⁾ 0 (3 CD Ξ 0 ft⁾ n ft⁾ H- Ω φ rt Hi ft⁾ ω rr Φ a i ft rr r CD CD ft⁾ Ω Φ ri H r 3 CD Φ Hi rt o r 3 Φ rr H- Ό

Φ Hi en o H- CD Hi rr ^ 0 1-3 Φ CD Ω CD Φ H- φ rt <! H- 3 ft⁾ 13 CD CD Φ Φ 13 Hi

H • 0 Φ 13 a φ rr a Φ CD 13 3 a o ft⁾ Φ rt 0 Φ ft⁾ Hi cn 13 CD Φ Hi ω H- a CD Φ ≤: ft⁾ H- μ- tr CD S3 φ φ rr 13 s- ; a φ ft⁾ φ 3 0

3 H Φ Φ ft H- o 13 D H- Φ CD ft⁾ CD ri rt rt cn Hi rr 0 z. . Hi φ 0 rt CD 3 rr " rr a i 3 n Hi Φ 13 φ Hi CD rr rr H- N. Φ H- ft⁾ (3 0 ^ a 0 rr > 0 H- , rr Φ 13 Φ rr 3 13 ft⁾ a a 0 Ό ft Φ 3 Φ CD rt rt Hi rV ft⁾ CD 0

3 co 0 ft⁾ o 3 CD rr ft⁾ rr CD ii φ rt 13 ft⁾ .. Φ rr (3 rt H- r rr ^ rr l- H φ CD Φ ^ rr c Hi Hi 3 rs 13 0 ft⁾ D 13 3 rr tr rr 3 a U 3 φ Φ CD Ω 0 0 H- rt 0 0 Φ a s; H- Φ Hi H <; Φ Ω C Φ

Φ rt) ft⁾ CD O rr 13 Φ CD 0 tr Φ 0 H Hi 13 13 0 ri rr CD Φ rt Ω ft⁾ Hi

? ft⁾ rr 3 rr Φ μ-¹ a a CQ Hi ) Φ Φ CD ft⁾ 13 ft⁾ ft⁾ 13 rr cn ft⁾ rt) ft⁾ 3 Φ Φ rr rr rr 0 Φ H- Ό 0 ^ c a Hi 13 Φ !3 H- !3 ft⁾ 13 rr r to rr 3 ft⁾ rr Φ rr 0 rr 13 H- φ Hi 13 φ H- Φ ri 3 Φ o μ-¹ tr 3 ft⁾

0 ri tr φ rr c ri rr rV 0 rr Ω H- 0 CD P 0 3 rt Φ tr Φ Φ Hi r . ft⁾ H- ft⁾ Φ 0 3 H- CD Φ c 0 Φ rV CQ ft⁾ n H- Φ H 13 Hi φ a Φ 0 rt

Φ ^ rt 0 rr 0 a Ό ri ri H- rr Φ ⁽-Q rt ft rr Φ Φ rs σ ft⁾ C CD

O 0 ft⁾ 3 ft⁾ Φ - 0 a O 13 rt D rr Ω Φ Hi rt a 13 Ω cn

0 13 Ω c rr a CD ri H- o ri LQ 0 3 Φ rr Hi Ω ft⁾ Φ Φ H- Ω 0

H- Mi 0 rr n 13 H- a H- Φ CD 0 Φ 3 0 φ tr H- 0 0 rt 3 CD 13 φ ^ Hi

3 D rr Φ - 0 H- Hi ri rr Hi φ ? 3 S3 0 a Hi CD H- Ω Φ CD

13 to r. CD 3 n 3 Φ rr H- 0 0 H- ri CO r ft⁾ 0 H- Φ CD H- tn 3 S3 0 rt Hi Φ Φ 3 rt

CQ ri 0 rr ft⁾ 3 (3 0 13 Hi H- <: 0 ft⁾ 13 3 H- H- φ ft⁾ rr

H- tr Φ Φ rr i 3 Φ IQ H ri ^ ft⁾ l£! 3 φ Ω 13 13 a ri Φ

CD 13 a CD CD Φ 13 O

H- ^ P- a H- H- Ω Hi rt φ Φ

Φ rr Φ o H- a r-¹ 3 0 rr σ CD H- a P cn Φ Hi f- H- a CO 3 rt) Φ n CD 3 13 Φ φ H- rr H- ri CD en ft⁾ H- Φ a 0 P- a 0

£U H rr <! 0 φ o s- rr 3 3 φ rt Φ 13 CD rt tr Ω Φ 13 H- 3 H- 0

H; CD 0 Φ 13 13 φ ft⁾ Φ o rr CD Hi Hi S3 rt 0 c r a CD 13 S3 ω ft⁾ μ-¹ o 0 a ^ a ri Φ i Φ 0 CD $, a Hi rt H- 1 S3 rr CD CD Φ H- ^ ri Φ ft⁾ Hi 13 Φ 0 o) Φ rr <; 0 Hi 1 13

(D r. 1 1 rr 0 tr rr O a rt 13 ft⁾ Φ Hi o

Hi Φ Hi ft⁾ 1 c ft⁾ 1 Φ Hi

memory instead of storing the pointer associated with a chunk directly after the chunk.

This gives a slightly simpler design compared to the original design as shown in Figure 14. The reason is that each chunk will occupy exactly 16 24-bit words in the chunk memory of the level it belongs to, which implies that the starting point of a chunk will always be a multiple of 16. Hence, 4 bits are saved for the chunk reference so that the 15 bits of the pointer suffices to refer the next level chunk. Therefore, it is neither needed to encode the high bits of the pointers in the code word nor extract these bits to get a "big enough" pointer. The lookup can be pipelined in six steps, which means that each time a memory access is completed, a routing lookup is also completed. By this rather simple lookup design, performance of over 100 million packets per second can be achieved.

Straight Forward Software Implementation

In the software implementation, the code words at level 1 cannot easily be represented by 24 bits without adding an extra memory access. Therefore, these are represented by 32 bits words with the pointer offset stored in the high 16 bits (see Figure 15) . Except for the 8 least significant bits, which are unused if the code word does not represent a direct next-hop, the low 15 bits are used in the same way as at level 2 and 3.

Moreover, the whole data structure is stored in a sufficiently large consecutive block of memory with the space allocated for each of the three levels specified by a constant. Except for these small deviations, the software version of the lookup is a straightforward implementation of the algorithm described. Performance Improvements

The performance improvements of the XTC data structure when compared to the FTC data structure comes from trading memory usage for memory accesses, simpli- city and speed. It is a well known fact that one can easily do an extremely simple routing lookup in one memory access if the memory usage does not matter at all. This is achieved by simply having a 2³² elements array where the next-hop index for each possible IP address is stored in the slot with the same index as the numerical value of the IP address.

To achieve a trade-off, it is possible to do an XTC lookup in only two memory accesses if it is considered to use only one level with 2²⁸ code words, which still consumes an extreme amount of memory. A possibility might be to let each code word represent more than 16 bits to reduce the number of code words. For instance, if each code word represents 256 bits instead of 16, 8 bits of the code word must be used for the shift value. In the worst case, the number of added pointers resulting from the simplified XTC encoding increases then by a factor of 256/9. The total number of code words will be 2²⁴ which is still quite large.

A better way of increasing the performance with- out wasting so much memory is to reduce the lookup to four memory accesses. This is easily achieved by building an XTC with only two levels, where for instance level 1 represents the high 20 bits and each level 2 chunk represents the low 12 bits. Thus, each chunk at each level increases by a factor of 16 which also gives a rough estimate on the total growth of the data structure. Possibly, this could be combined with the idea of using "larger" code words at level 1 as describe above. The most straight forward way of increasing the speed somewhat is to only use XTC techniques at level 2 and 3 while using direct addressing at level 1. This immediately removes one memory access.

The focus in this section has been on ideas of how to increase performance by wasting more memory. If the goal instead is to compress the representation as much as possible, the opposite of these ideas can be used, i.e. "smaller" code words and more levels.

It should be apparent that the present invention pro- vides an improved IP routing lookup method and system that fully satisfies the aims and advantages set forth above. Although the invention has been described in conjunction with specific embodiments thereof, alternatives, modifications and variations are apparent to those skilled in the art.

Claims

1. A system for IP routing lookup in a routing table, comprising entries of arbitrary length prefixes with asso- ciated next-hop information in a next-hop table, for determining where IP datagrams are to be forwarded, characterized by data structure means for representing the set of dominating points in the IP address space for each prefix in the routing table, said means comprising three levels of compressed pointer arrays (CPA) , each including a chunk and an array of associated pointers, wherein each pointer comprises a next hop index or a pointer to the next level of chunk.

2. A system according to claim 1, characterized in that said pointers has 20 bits and said next-hop indices comprises 15 bits.

3. A system according to claim 1 or 2, characterized in that the CPA of the first level represents an uncompressed pointer array of size 65536 pointers, bits 31 . . 16 of the IP address, and the CPA of level 2 and 3 represents 256 pointers each, bits 15..8 and 7..0 of the IP address respectively.

4. A system according to any of the preceding claims 1-3, characterized in that the chunk comprises code words, wherein each code word represents 16 bits of an implicit bit array representing the chunk, and in that the size of the implicit bit array is the same as the maximum number of pointers in the CPA.

5. A system according to claim 4, characterized in that in the implicit bits bit array the bit with in- dex i is set if and only if there is a prefix starting at position i.

6. A system according to claim 5, characterized in that the implicit bit array is represented by implicit 16 bits bit masks, wherein each bit mask corresponds to a code word, including an empty bit mask with no bits set, a bit mask with exactly one bit set, and/or a bit mask with two or more bits set.

7. A system according to claim 6, characterized in that the starting point within the chunk is a multiple of 16, and it completely covers the whole bit mask, and all the empty bit masks immediately following.

8. A system according to claim 7, characterized in that for bit masks with one bit set and the empty bit masks immediately following, the next-hop index is encoded directly in the code word.

9. A system according to claim 8, characterized by 676 different kinds of 16 bits bit masks, having two or more bits set, wherein the bit masks and the corresponding pointers are represented by a combination of a code word and a list of pointers associated with the code word.

10. A system according to claim 9, characterized by means capable of identifying a prefix with the smallest size in a set of prefixes having their starting points in the same bit mask, dividing prefixes having a larger size than the prefix with the smallest size in half until the sizes of all prefixes have an equal smallest size, wherein each time a prefix is split, a zero bit is flipped in the bitmask and a copy of the pointer associated with the closest set bit to the left of the flipped bit is added directly after the copied pointer in the pointer list.

11. A system according to claim 10, characterized in that the pointer list comprises 2, 4, 8 or 16 pointers associated with each code word.

12. A system according to claim 10, characterized in that each code word comprises: 1 bit that indicates if it is a direct next-hop or not , if it is a next-hop, 15 bits representing the next-hop index, and if it is not a next -hop, 2 bits shift value for the code, 5 bits for the high bits of the associated pointers, 16 bits for level 1 or 8 bits for level 2 and 3 for representing the pointer offset, i.e. the index of the first pointer associated with the code word, in the pointer array stored directly after the chunk.

13. A method of IP routing lookup in a routing table, comprising entries of arbitrary length routing prefixes with associated next-hop information in a next-hop table, for determining where IP datagrams are to be forwarded, characterized by the steps of : accessing a first level code word (23..0) in a chunk in a first level compressed pointer array at a location corresponding to a first index part (31..20) of the IP address, and if the most significant bit of the code word is set, retrieving the next-hop index bits (22..8) in the code word.

14. A method according to claim 13, characterized by the further steps of: if the most significant bit of the code word is not set, shifting a second index part (19..16) of the IP address with shift bits (17..16) of the code word, adding offset bits (14..0) of the code word to the shifted second index part forming a pointer index, accessing a first level pointer at the pointer index location in a list of pointers of the first level compressed pointer array, and if the most significant bit of the pointer is set, retrieving the next-hop index bits (14..0) in the pointer.

15. A method according to claim 14, characterized by the further step of : if the most significant bit of the pointer is not set, retrieving and combining the remaining bits of the pointer (14..0) with a pointer bits part (22..18) of the code word forming a 20 bits second level starting pointer to the starting point of a second level com- pressed pointer array of a set of second level compressed pointer arrays.

16. A method according to claim 15, characterized by the further steps of : accessing a second level code word (15..0) in a chunk in the second level compressed pointer array at a location corresponding to a third index part (15..12) of the IP address, and if the most significant bit of the code word is set, retrieving the next-hop index bits (14..0) in the code word .

17. A method according to claim 16, characterized by the further steps of : if the most significant bit of the code word is not set, shifting a fourth index part (11..8) of the IP address with shift bits (9..8) of the code word, adding offset bits (7..0) of the code word to the shifted fourth index part forming a pointer index, accessing a second level pointer at the pointer index location in a list of pointers of the second level compressed pointer array, and if the most significant bit of the pointer is set, retrieving the next-hop index bits (14..0) in the pointer.

18. A method according to claim 17, characterized by the further steps of : if the most significant bit of the second level pointer is not set, retrieving and combining the remaining bits of the pointer with a pointer bits part of the code word forming a 20 bits third level starting pointer to the starting point of a third level com- pressed pointer array of a set of third level compressed pointer arrays.

19. A method according to claim 18, characterized by the further steps of : accessing a third level code word (15..0) in a chunk in the third level compressed pointer array at a location corresponding to a fifth index part (7..4) of the IP address, if the most significant bit of the code word is set, retrieving the next-hop index bits (14..0) in the code word .

20. A method according to claim 19, characterized by the further steps of: if the most significant bit of the code word is not set, shifting a sixth index part (3..0) of the IP address with shift bits (9..8) of the code word, adding offset bits (7..0) of the code word to the shifted sixth index part forming a pointer index, accessing a third level pointer at the pointer index location in a list of pointers of the third level compressed pointer array, retrieving the next-hop index bits (14..0) in the pointer.

21. A computer program produkt directly loadable into the internal memory of a digital computer, characterized in that said product comprises software code means for performing the steps of any of the claims 13- 20.

22. A computer program product comprising a computer readable medium, characterized in that, on said medium it is stored computer program code means, when it is loaded on a computer, to make the computer performing the steps of any of the claims 13-20.

23. A system for IP routing lookup in a routing table, comprising entries of arbitrary length prefixes with associated next -hop information in a next-hop table, for determining where IP datagrams are to be forwarded, characterized by means for performing the steps of any of the claims 13-20.

24. A method for creating data structure means for use in a system of IP routing lookup in a routing table, comprising entries of arbitrary length routing prefixes with associated next-hop information in a next-hop table, for determining where IP datagrams are to be forwarded, characterized by the step of : creating data structure means for representing the set of dominating points in the IP address space for each prefix in the routing table, comprising three levels of compressed pointer arrays (CPA) , each including a chunk and an array of associated pointers, wherein each pointer comprises a next hop index or a pointer to the next level of chunk .

25. A computer program produkt directly loadable into the internal memory of a digital computer, characterized in that said product comprises software code means for performing the step of claim 24.

26. A computer program product comprising a computer readable medium, characterized in that, on said medium it is stored computer program code means, when it is loaded on a computer, to make the computer per- forming the step of claim 24.

27. System for creating data structure means for use in a system of IP routing lookup in a routing table, comprising entries of arbitrary length routing prefixes with associated next-hop information in a next-hop table, for determining where IP datagrams are to be forwarded, characterized by means for performing the step of claim 24.