[go: up one dir, main page]

WO2011078108A1 - Pattern-matching method and device for a multiprocessor environment - Google Patents

Pattern-matching method and device for a multiprocessor environment Download PDF

Info

Publication number
WO2011078108A1
WO2011078108A1 PCT/JP2010/072869 JP2010072869W WO2011078108A1 WO 2011078108 A1 WO2011078108 A1 WO 2011078108A1 JP 2010072869 W JP2010072869 W JP 2010072869W WO 2011078108 A1 WO2011078108 A1 WO 2011078108A1
Authority
WO
WIPO (PCT)
Prior art keywords
packet
core
search key
pattern matching
pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2010/072869
Other languages
French (fr)
Japanese (ja)
Inventor
則夫 山垣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2011547531A priority Critical patent/JPWO2011078108A1/en
Publication of WO2011078108A1 publication Critical patent/WO2011078108A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Definitions

  • the present invention relates to a multiprocessor environment composed of a plurality of processors, and in particular, in a system in which a plurality of processors and a shared memory shared by these processors are regarded as one unit, and there are a plurality of units, sharing in each unit is performed.
  • the present invention relates to a system for performing pattern matching by dividing a pattern management table into a memory and executing pattern matching in parallel using a plurality of processors in the system.
  • Pattern matching processing that identifies whether an input character string matches a character string (pattern) is an important elemental technology in the network field. For example, it is input in a network relay device such as a router or a switch. Using the header information of the packet to identify the flow to which the packet belongs (hereinafter referred to as flow search processing), not only the packet header information but also the data in the packet payload It is used in a technique called Deep Packet Inspection (DPI) for performing pattern matching processing, specifying flows more finely, providing QoS (Quality of Service) for each flow, performing load distribution control, and the like.
  • DPI Deep Packet Inspection
  • TCP Transmission Control Protocol
  • UDP User Datagram Protocol
  • IP Internet Protocol
  • a series of packet sequences specified using fields such as a transmission port number and a destination port number defined in the header is called a flow.
  • a flow is specified using a technique called hashing or hashing.
  • the search key is converted into a bit string of a specific length called a hash value by performing a calculation called a hash function on the search key using the set of fields as described above for identifying the flow.
  • FIG. 17 shows a configuration example of a hash table.
  • the flow table since the flow table has a large capacity, it is constructed on the main memory 6.
  • the packet transmission / reception block 10 When the packet is received by the packet transmission / reception block 10, it is written on the main memory 6 using, for example, a DMA (Direct Memory Access) function via the MAC or reception FIFO.
  • the packet transmission / reception block 10 applies interrupt control to the processor core 11-1 of the CPU 11, or an event is issued to the event queue. As a result of polling, the processor core 11-1 determines the arrival of the packet and reads the header information of the packet from the main memory 6.
  • DMA Direct Memory Access
  • the processor core 11-1 derives a hash value for a plurality of header field sets as search keys, reads an entry of the flow table corresponding to the hash value from the main memory 6, and Make a comparison. If these search keys match, it is determined that the packet belongs to the flow corresponding to the entry. If they do not match, follow the pointer stored in the entry, read the next entry, and repeat the process of comparing the search key with the search key in the entry, so that the packet is in the entry The process of confirming whether it belongs to the flow is performed. Note that processing when the search key of the packet does not match the entry in the flow table to the end, for example, registering a table entry as a new flow, is omitted here because it depends on the individual system specifications. To do.
  • FIG. 19 shows a schematic configuration example of a normal server architecture composed of two CPUs having two processor cores.
  • FIG. 19 shows a configuration in which the CPU 12 and the CPU 13 having two processor cores are provided instead of the CPU 11 in the configuration shown in FIG.
  • the CPU 12 includes two processor cores 12-1 and 12-2 and an L2 cache 12-0
  • the CPU 13 includes two processor cores 13-1 and 13-2 and an L2 cache 13-0.
  • Patent Document 3 discloses a replacement control for reducing the miss-miss rate of a shared cache in a multi-core processor environment as shown in FIG. 19, and Patent Document 4 discloses an exclusive control for a shared memory in a multi-processor environment.
  • Patent Document 5 discloses a prefetch function for a shared cache in a multi-core processor environment, but solves the problem itself when it is necessary to refer to different data for each process such as a flow search process. Not what you want.
  • the longest match search (LPM: Longest Prefix Match) is targeted, and a part of the data area in which data serving as a search key is stored is a memory area having a high-speed access performance such as a cache.
  • LPM Longest Prefix Match
  • Patent Document 6 as shown in FIG. 1, a configuration in which (a plurality of) processors, a cache memory, a memory, peripheral devices (peripherals), and a packet controller are connected via a system bus is disclosed (for example, the seventh See paragraph).
  • the pattern matching logic configured in the packet receiver and the hash mechanism define in advance the pattern stored in the register in the pattern matching logic and the received packet Addresses and the like are compared to determine whether or not the packet can be received (for example, see the ninth paragraph).
  • Patent Document 6 has a function of performing pattern matching inside the packet controller, and performs pattern matching there. Pattern matching is not performed in each processor.
  • a part of the packet data permitted to be received by pattern matching is written into the cache memory of Patent Document 6 and used for packet processing by the processor.
  • a pattern (flow information) to be subjected to pattern matching is written in the cache memory and is not used as a flow management table.
  • FIG. 7 of Patent Document 7 shows a configuration in which a plurality of micro engines (processors), a memory (SRAM, DRAM) shared by them, a packet receiving unit, and the like are connected by a bus (for example, see paragraph 50). It is disclosed.
  • Each micro engine includes a local memory and a computing core (processor core).
  • the flow state information (pattern, state transition information used in pattern matching) held in the shared memory (SRAM) is read to the local memory in the micro engine, and pattern matching processing is performed by the processor core in the micro engine. (See, for example, paragraph 55).
  • Patent Document 7 there are main memories such as SRAM and DRAM which are shared and used by a plurality of processors. There is only local memory corresponding to the processor, and there is no cache memory that can be shared by a plurality of processors. In addition to the main memory (SRAM, DRAM, etc.) shared by a plurality of processors, there is no cache memory shared by a specific number of processors.
  • main memory SRAM, DRAM, etc.
  • the flow table is built on the main memory with low access performance, and the flow table on the main memory needs to be accessed multiple times.
  • a part of the flow table is cached in a memory such as a cache memory that can be accessed faster than the main memory, and the number of accesses to the main memory is reduced, the flow to which the arriving packet belongs is random. This is because there is less dependency between arriving packets and there is less possibility of matching some entries in the cached flow table, so that the effect of reducing the number of accesses to the main memory is small.
  • An object of the present invention is to provide a pattern matching system capable of realizing a pattern matching process having a high-speed search performance corresponding to a network band.
  • the first and second pattern matching apparatuses include one or a plurality of multi-core CPUs (1 and 6 in FIG. 1) having a plurality of processor cores and a shared memory shared between the processor cores. 1, 2,..., M), a packet transmission / reception block (5 in FIG. 1, 5 in FIG. 6), and header information serving as a search key in the flow search process is extracted from the received packet.
  • a search key extraction block (4 in FIG. 1 and 4 in FIG. 6) for generating and issuing a packet reception event and writing the packet to the main memory, a DRAM in which the packet reception event and the packet itself are written if necessary, etc. 1 main memory (6 in FIG. 1, 6 in FIG. 6), a bus (FIG.
  • a pattern table based on a hash table is configured in a shared memory in the multi-core CPU. Adopting such a configuration, the search key extraction block extracts a set of fields as a search key from the packet header information in advance, notifies the processor core of the set, and allows faster access than the main memory. By performing the pattern matching process using the pattern table in the shared memory, the object of the present invention can be achieved.
  • the third pattern matching device of the present invention has a final result determination block (7 in FIG. 9).
  • Each multi-core CPU has a sub pattern table obtained by dividing one original pattern table on the shared memory. Adopting such a configuration, the search key extraction block extracts in advance the set of fields that will be the search key from the packet header information, notifies it to one processor core of each multi-core CPU, and distributes the pattern matching
  • the object of the present invention can be achieved by performing the pattern matching process using the pattern table in the shared memory that can be accessed at a higher speed than the main memory.
  • the fourth pattern matching device of the present invention has a configuration in which a hash value calculation block (9 in FIG. 13) is added except for the final result determination block (7 in FIG. 9) in the third pattern matching device.
  • a search key extraction block (8 in FIG. 13) previously extracts a set of fields serving as a search key from the header information of the packet, and further in a hash value calculation block (9 in FIG. 13).
  • a multi-core CPU having a pattern table holding a pattern entry group corresponding to the hash value can be specified, and pattern matching is performed in one processor core of the multi-core CPU.
  • the object of the present invention can be achieved by performing the pattern matching process using the pattern table in the shared memory that can be accessed at a higher speed than the main memory.
  • FIG. 2 is a flowchart showing the operation of the apparatus according to the first embodiment of the present invention; It is a flowchart which shows step A4 in FIG. It is a figure which shows an example of a flow table. It is a figure which shows the operation
  • FIG. 6 It is a block diagram which shows the structure of the apparatus by the 3rd Embodiment of this invention. It is a figure which shows an example of the division
  • 6 is a flowchart showing the operation of the apparatus according to the third embodiment of the present invention; It is a figure which shows the operation
  • FIG. 1 It is a figure which shows the operation
  • FIG. 1 is a block diagram showing the configuration of the first exemplary embodiment of the present invention.
  • a multi-core CPU 1 a packet transmission / reception block 5 for transmitting / receiving packets, and header information serving as a search key in flow search processing are extracted from received packets.
  • Search key extraction block 4 for generating and issuing packet reception events and writing packets to the main memory; main memory 6 such as a DRAM in which packet reception events and packets themselves are written; multi-core CPU 1; search key extraction Block 4 and bus 3 interconnecting main memory 6 are included.
  • the multi-core CPU 1 includes N processor cores 1-1, 1-2,..., 1-N, and a shared memory 1-0 shared by the N processor cores.
  • the shared memory 1-0 holds all the flow tables configured as hash tables. When the shared memory is used as a data cache, the flow table is held in an area logically separated from the data cache.
  • the packet transmission / reception block 5 When the network is Ethernet (registered trademark), the packet transmission / reception block 5 has a MAC function, and after shaping the input bit stream as an Ethernet (registered trademark) frame, outputs it to the search key extraction block 4 . Since such a packet transmission / reception block 5 is well known to those skilled in the art, its detailed configuration is omitted.
  • the search key extraction block 4 receives the packet received from the packet transmission / reception block 5 and extracts header information serving as a search key in the flow search process.
  • a packet reception event including the extracted search key information is generated and written to the event queue for the multi-core CPU 1, and the packet after the search key is extracted is written to the main memory.
  • the DMA function is used for the writing process.
  • this block reads packet data from the main memory and outputs it to the packet transmission / reception block 5 as it is.
  • the multi-core CPU 1, the search key extraction block 4, and the main memory 6 are connected by a bus 3, but this connection form is a chip called North Bridge or South Bridge provided in a server or the like. A form connected through a set, a crossbar, or a ring bus may be used.
  • the present embodiment may be configured on a motherboard such as a server, or all except the main memory 6 is configured as one SoC (System-on-a-Chip) on one chip. May be.
  • Packets input from the network are received by the packet transmission / reception block 5.
  • the packet received by the packet transmission / reception block 5 is shaped as an Ethernet (registered trademark) frame (in the present specification, these are collectively referred to as a packet), and then a search key. It outputs to the extraction block 4 (step A1).
  • the search key extraction block 4 When receiving a packet from the packet transmission / reception block 5, the search key extraction block 4 extracts a predetermined field as a search key in the flow search process from the packet header information, and uniquely identifies the packet within the apparatus. A possible identifier is assigned and written in the FIFO included in the search key extraction block 4 (step A2).
  • a field of the packet header used as a search key for example, a source IP address (32 bits), a destination IP address (32 bits), a protocol number (8 bits), and a TCP / UDP header defined in the IP header are defined.
  • the search key extraction block 4 generates and issues an event for notifying the multi-core CPU 1 of packet reception, writes it in the event queue, and writes the corresponding packet in the main memory 6 (step A3).
  • the event generated here only needs to include the extracted search key information and the identifier of the corresponding packet.
  • the event queue is configured in the main memory 6, for example, the event is written in the event queue in the main memory 6. .
  • the event queue area for the multi-core CPU 1 and the area for writing packet data are logically separated, and the base address is known to the multi-core CPU 1. Shall.
  • the multi-core CPU 1 monitors whether there is an event in the event queue by polling. If there is an event in the event queue, the multi-core CPU 1 reads the leading event and performs flow search processing by a specific processor core. Perform (Step A4).
  • FIG. 3 is a flowchart for explaining the detailed operation of step A4.
  • the multi-core CPU 1 When there is an event in the event queue, the multi-core CPU 1 reads the top event and assigns it to a specific processor core (step B1).
  • a specific processor core for example, the processor core 1-1 performs processing specialized for polling, and after reading an event from the event queue, the processing A method of assigning to another free processor core that has not been performed is conceivable.
  • each processor core collects an event by polling the event queue. In the latter case, since each processor core accesses a memory area where the same event queue exists, arbitration between these processor cores is necessary.
  • the processor core 1-k to which the event is assigned calculates a hash value using a preset hash function for the search key stored in the event (step B2).
  • a hash function and the hash value are well known to those skilled in the art, a detailed description thereof will be omitted.
  • FIG. 4 shows an example of a flow table configured in the shared memory 1-0.
  • the flow table is configured based on a hash table, and a plurality of entries having the same hash value are connected in a bidirectional list.
  • One entry holds a search key, a flow ID for specifying a flow, an address value serving as a pointer to the next entry in the list, and a pointer to the previous entry in the list.
  • the flow ID, the pointer to the next entry, and the data length of the pointer to the previous entry is longer than the word length of the shared memory 1-0, it is continuous.
  • An entry is composed of a plurality of words.
  • the address value means a logical address value indicating the first word of each entry, such as ignoring the lower bits of the bit length of the physical address.
  • Entries are basically added to the free space and managed by a bidirectional list.
  • an entry is added to an address value for which an entry is already managed, that is, a hash value, the priority of the entry having the address value of the entry as a hash value is higher, and the already managed entry is Assume that it is moved to another free area. Note that FIG.
  • the hash value may be calculated in a range smaller than the logical address space indicating the first word of the entry.
  • the flow table may be configured according to the configuration of the hash table.
  • an address value can be used for the flow ID that identifies the flow, so there is no need to hold the flow ID in the entry.
  • the processor core 1-k that has read the entry from the flow table determines whether the search key of the read entry matches the search key of the event (step B5).
  • the ID is acquired (step B6), and the flow search process (step A4) is terminated. If they do not match, it is confirmed whether there is a next entry connected by a link (step B7). If there is a next entry, the next pointer of the entry is used as an address value (step B8), and the next The flow search process is performed by repeating the process of reading the entry (step B4). If the next entry does not exist, the corresponding flow is not registered. As a result, it is determined that the flow is not registered (step B9), and the flow search process (step A4) is terminated.
  • the corresponding processor core 1-k outputs the result obtained in the flow search process (step A4) (step A5).
  • the flow search processing is generally performed to identify a flow to which a packet belongs, such as QoS processing, and to perform individual processing on the flow
  • a processing block for performing these processing a CPU Or the like is provided via the bus 3 or by some communication means. Since this embodiment focuses on the flow search processing, these processing blocks are omitted, but as a result output destination in step A5, individual processing is performed for each such flow. Assume output to processing block.
  • a matching search key it is determined that the flow has not been registered, and the process ends.
  • the packet itself is received from the network by the packet transmission / reception block 5, but only the packet header is input to the flow search device shown in the present embodiment.
  • the packet transmission / reception block 5 receives a packet header input from the outside and outputs it to the search key extraction block 4.
  • the search key extraction block 4 extracts a set of fields serving as a search key from the packet header, and generates and issues an event. In this case, since the payload data of the packet is not received, it is not necessary to write the packet data to the main memory 6.
  • FIG. 5 when the packets pkt1, pkt2,... Arrive, the search keys Key # 1, Key # 2,.
  • a search key is assigned to a plurality of processor cores in the multi-core CPU, and a flow search process is performed.
  • a method of assigning to a plurality of processor cores in the multi-core CPU a method of sequentially assigning by a technique such as round-robin is conceivable, but is not limited thereto.
  • the search key extraction block extracts a set of fields as a search key from the header information of the packet in advance, and notifies the processor core of the set to store in the main memory. Since there is no need to read the packet header, the processing performance of the flow search can be improved.
  • the flow table is configured on a shared memory shared by a plurality of processor cores in the multi-core CPU, so that reading of entries is performed more than when the flow table is configured on a main memory such as a DRAM.
  • a main memory such as a DRAM
  • FIG. 6 is a block diagram showing the configuration of the second exemplary embodiment of the present invention.
  • the second embodiment of the present invention relates to a multi-core CPU 1, multi-core CPU 2,..., Multi-core CPU M, a packet transmission / reception block 5 for transmitting / receiving packets, and a flow search process from received packets.
  • the search key extraction block 4 that extracts header information as a search key, generates and issues a packet reception event, and writes the packet to the main memory, and a main body such as a DRAM in which the packet reception event and the packet itself are written if necessary.
  • the memory 6 includes a multi-core CPU 1, a multi-core CPU 2,..., A multi-core CPU M, a search key extraction block 4, and a bus 3 to which the main memory 6 is connected.
  • This embodiment is the same as the first embodiment except that multi-core CPUs 2,..., Multi-core CPUM are added to the first embodiment shown in FIG.
  • the multi-core CPU 2 has the same form as the multi-core CPU 1, and the N processor cores 2-1, 2-2,..., 2-N and the shared memory 2-0 shared by the N processor cores
  • the shared memory 2-0 holds a flow table configured as a hash table. Such a configuration is the same for other multi-core CPUs, and all the flow tables configured in the shared memory in each multi-core CPU have the same flow table.
  • the flow table is held in an area logically separated from the data cache.
  • the multi-core CPU 1, the multi-core CPU 2,..., The multi-core CPU M, the search key extraction block 4, and the main memory 6 are connected via the bus 3.
  • this connection form may be a form connected through a chip set called North Bridge or South Bridge provided in a server or the like, a form of a crossbar, or a ring bus.
  • the present embodiment may be configured on a motherboard such as a server, or may be configured on one chip as a single SoC except for the main memory 6.
  • the flowchart of FIG. 7 is a modification of the flowchart of the first embodiment, in which step A3 is replaced with step A6 and step A5 is replaced with step A7. Since other steps are the same as those in the first embodiment, detailed description thereof is omitted.
  • the search key extraction block 4 receives a packet input from the network via the packet transmission / reception block 5 (step A1), and performs processing for step A2 such as search key extraction.
  • the search key extraction block 4 generates and issues an event for notifying one of a plurality of multi-core CPUs of packet reception, writes it in the event queue, and writes the corresponding packet in the main memory 6 ( Step A6).
  • the generated event only needs to include the extracted search key information and the identifier of the packet, as in the first embodiment.
  • the event queue is configured in the main memory 6, but the event queue for each multi-core CPU writes the event queue area of each multi-core CPU and packet data. It is assumed that the areas are logically separated, and the base address of the event queue area for each multi-core CPU is known to each multi-core CPU.
  • the event is issued only to one of the plurality of multi-core CPUs, and the selection of the multi-core CPU that issues the event is performed in order to smooth the flow search process for each multi-core CPU, for example.
  • a method using the Round-Robin method such as the multi-core CPU 1, the multi-core CPU 2,..., The multi-core CPU M, the multi-core CPU 1, and so on is conceivable, but is not limited thereto.
  • each of the M multi-core CPUs monitors whether there is an event in its own event queue by polling. If there is an event in the event queue, the M multi-core CPU reads the first event and has its own A flow search process is performed by a specific processor core (step A4).
  • step A4 is a process performed independently by each multi-core CPU, and the contents of each process are the same as those in the first embodiment, and thus detailed description thereof is omitted here.
  • step A7 the processor core in each multi-core CPU that has performed the actual flow search process outputs the result obtained there (step A7).
  • a plurality of processor cores in one multi-core CPU perform this processing, but here, among the plurality of processor cores in the plurality of multi-core CPUs, the flow search processing is actually performed. This is different from step A5 in the first embodiment in that the processor core outputs the result.
  • the other processing contents of step A7 are the same as those of step A5, so the details are omitted.
  • the flow table configured in the shared memory in each multi-core CPU has the same contents in all multi-core CPUs. For this reason, if the process of registering an unregistered flow as a new flow is performed, the contents of the flow table updated in the processor core in a certain multi-core CPU are updated to the flow tables in other multi-core CPUs. There is a need to do.
  • the packet itself is received from the network by the packet transmission / reception block 5, but the processing when only the packet header is input is the same as in the first embodiment.
  • the event queue components main memory 6, shared memory in each multi-core CPU, specific memory in the search key extraction block 4, registers, etc. are also the same as those in the first embodiment. Is omitted.
  • a search key is sent to one of a plurality of multi-core CPUs.
  • Each transmitted search key is assigned to one of a plurality of processor cores in each multi-core CPU, and a flow search process is performed by the processor core in each multi-core CPU.
  • a method for assigning to a plurality of processor cores in each multi-core CPU for example, a method such as Round-Robin may be used, but the method is not limited to this.
  • the flow search process is distributed to each multi-core CPU.
  • the processing performance of the flow search can be further improved according to the increased number of multi-core CPUs.
  • distributed processing can be performed for a plurality of multi-core CPUs, the frequency of contention for access to the shared memory among the plurality of processor cores in one multi-core CPU is reduced, and more efficient flow search processing is performed. Is possible.
  • FIG. 9 is a block diagram showing the configuration of the third exemplary embodiment of the present invention.
  • the third embodiment of the present invention relates to a multi-core CPU 1, a multi-core CPU 2,..., A multi-core CPU M, a packet transmission / reception block 5 for transmitting / receiving packets, and a flow search process from received packets.
  • the header information as a search key is extracted, the packet reception event is generated and issued, the search key extraction block 4 for writing the packet to the main memory, and the results of the individual flow search processes in a plurality of multi-core CPUs are collected.
  • Final result determination block 7 for selecting a typical flow search processing result, a main memory 6 such as a DRAM in which a packet reception event or a packet itself is written if necessary, a multicore CPU1, a multicore CPU2,..., A multicore CPUM, Search key extraction block 4, main Mori 6, including the final result decision block 7 and the bus 3 connected respectively.
  • a main memory 6 such as a DRAM in which a packet reception event or a packet itself is written if necessary
  • a multicore CPU1 a multicore CPU2,..., A multicore CPUM
  • Search key extraction block 4 main Mori 6, including the final result decision block 7 and the bus 3 connected respectively.
  • This embodiment is the same as the second embodiment except that a final result determination block 7 is added to the second embodiment shown in FIG.
  • Each subflow table is basically a flow table based on a hash table, as in the first and second embodiments, so description of the contents of each entry is omitted, but each subflow table is The table is completely independent. That is, a flow for a hash value assigned to one subflow table is stored in an empty entry area in the same subflow table. For this reason, when the hash value is calculated in a range smaller than the logical address space indicating the first word of the entry in consideration of the occurrence of the collision, the above holds in each subflow table. Must be configured. Supplementary information is as follows.
  • the hash value is calculated in a range smaller than the logical address space indicating the first word of the entry, and the entry in the area that cannot be specified by the hash value is the address value. If this table is divided into subflow tables when located in a large area, the entry area that cannot be specified by the hash value is biased to the Mth subflow table or the like. In this case, in the subflow table having only the entry area that can be specified by the hash value, collisions frequently occur, and conversely, the subflow table having many entry areas that cannot be specified by the hash value or only the entry area that cannot be specified. Thus, the entry area cannot be used up effectively, and an efficient flow table cannot be constructed. For this reason, it is necessary to secure an entry area in consideration of collision for each subflow table.
  • the flowchart of FIG. 11 is a modification of the flowchart of the second embodiment shown in FIG. 7, in which step A6 is replaced with step A8, and step A7 is replaced with step A10, and flow search processing (step A4) and result output are performed.
  • a final result determination step (step A9) is added between the steps (step A10). Since other steps are the same as those in the first and second embodiments, detailed description thereof is omitted.
  • the search key extraction block 4 receives a packet input from the network via the packet transmission / reception block 5 (step A1), and performs processing for step A2 such as search key extraction.
  • the search key extraction block 4 generates and issues an event for notifying all of the plurality of multi-core CPUs of packet reception, writes the event in the event queue, and writes the corresponding packet in the main memory 6 (step A8).
  • the information on the event to be generated and the event queue for each multi-core CPU are the same as those in the second embodiment, and thus the details are omitted.
  • each of the M multi-core CPUs monitors whether there is an event in its own event queue by polling. If there is an event in the event queue, the M multi-core CPU reads the first event and has its own A flow search process is performed by a specific processor core (step A4).
  • step A4 is a process performed independently by each multi-core CPU, and the contents of each process are the same as those in the first and second embodiments, and thus detailed description thereof is omitted here.
  • the processor core that has performed the flow search process outputs the result obtained there and the identifier for identifying the processed packet to the final result determination block 7.
  • the final result determination block 7 checks whether there is a result of which the flow ID can be specified. If it can be specified, the result is output as the final result. On the other hand, if all the search results cannot identify the flow ID, the final result is output as an unregistered flow (step A9, step A10). Note that the output destination of this final result is the same as in the first and second embodiments, and details thereof are omitted.
  • the final result is confirmed and determined when all the search results from the M multi-core CPUs are prepared.
  • a variable, a register, or the like that can specify the final result for the packet to be processed is prepared.
  • a method of sequentially checking the results every time search results from the multi-core CPU are returned may be used.
  • the packet itself is received from the network by the packet transmission / reception block 5, but the processing when only the packet header is input is the same as in the first embodiment.
  • the event queue components main memory 6, shared memory in each multi-core CPU, specific memory in the search key extraction block 4, registers, etc.
  • the flow tables existing in a plurality of multi-core CPUs all have independent contents. For this reason, distributed processing cannot be performed for a plurality of multi-core CPUs, but in addition to the effects of the first embodiment, the number of flow entries in the second embodiment is M times, and more flows are registered. It becomes possible to manage and search.
  • FIG. 13 is a block diagram showing the configuration of the fourth exemplary embodiment of the present invention.
  • the fourth embodiment of the present invention relates to a multi-core CPU 1, a multi-core CPU 2,..., A multi-core CPU M, a packet transmission / reception block 5 for transmitting / receiving packets, and a flow search process from received packets.
  • a search key extraction block 8 for extracting header information serving as a search key; a hash value calculation block 9 for calculating a hash value for the extracted search key, generating and issuing a packet reception event, and writing the packet to the main memory;
  • a main memory 6 such as a DRAM in which a packet reception event or a packet itself is written as necessary, a multi-core CPU 1, a multi-core CPU 2,..., A multi-core CPU M, a search key extraction block 8, a hash value calculation block 9, and a main memory 6
  • This embodiment is different from the third embodiment shown in FIG. 9 except that the final result determination block 7 is changed, the search key extraction block 4 is changed to a search key extraction block 8, and a hash value calculation block 9 is added. The others are the same as those of the second embodiment.
  • one flow table of the original flow table divided into M pieces is configured as in the third embodiment. Since this flow table is the same as that of the third embodiment, detailed description thereof is omitted.
  • the multi-core CPU 1, the multi-core CPU 2,..., The multi-core CPU M, the hash value calculation block 9, and the main memory 6 are connected by the bus 3.
  • this connection form may be a form connected through a chip set called North Bridge or South Bridge provided in a server or the like, a form of a crossbar, or a ring bus.
  • the present embodiment may be configured on a motherboard such as a server, or may be configured on one chip as a single SoC except for the main memory 6.
  • Step A1 and step A2 in the flowchart of FIG. 13 are the same as step A1 and step A2 in the first, second, and third embodiments, and thus detailed description is omitted, but the packet transmission / reception block 5 (step Via A1), the search key extraction block 8 receives a packet input from the network and performs processing for step A2 such as search key extraction.
  • the search key extraction block 8 outputs the extracted search key and the packet received from the packet transmission / reception block 5 to the hash value calculation block 9, and the hash value calculation block 9 calculates a hash value for the received search key.
  • the hash value calculated in the hash value calculation block 9 is the same as the hash value calculation (step B2) in the flow search process (step A4) in the first, second, and third embodiments. Detailed description is omitted.
  • the hash value calculation block 9 calculates the hash value
  • the hash value calculation block 9 When the hash value calculation block 9 calculates the hash value, the hash value calculation block 9 generates and issues an event for notifying one of the multi-core CPUs of packet reception, writes the event to the event queue, and stores the corresponding packet in the main Write to the memory 6 (step A12).
  • the event to be generated only needs to include the value of the calculated hash value in addition to the extracted search key information and the corresponding packet identifier.
  • the configuration of the event queue for each multi-core CPU This is the same as in the third embodiment.
  • the event generated and issued by the hash value calculation block 9 is one of M multi-core CPUs. A method of selecting a multi-core CPU that issues this event will be described below.
  • the multi-core CPU according to the present embodiment configures one of the original flow tables into M sub-flow tables on the shared memory as in the third embodiment. Yes.
  • a base address included in each subflow table is set in advance. Therefore, when the hash value calculation block 9 calculates the hash value, the hash value calculation block 9 may determine in which subflow table of the M multi-core CPUs the entry group corresponding to the hash value is registered. It becomes possible.
  • the hash value calculation block 9 issues a packet reception event to the multi-core CPU in which the entry group corresponding to the hash value is registered according to the calculated hash value.
  • FIG. 15 is a flowchart for explaining more detailed operation of step A13.
  • This flow search process is a process common to the processor cores that perform the flow search process in the M multi-core CPUs, and will be described below as a process for any processor core in any multi-core CPU.
  • step A7 When the flow search process (step A13) ends, the processor core outputs the result (step A7). Since this result output step A7 is the same as that of the second embodiment, detailed description thereof is omitted.
  • the subflow tables configured in the shared memory in each multi-core CPU all have independent contents. For this reason, if processing for registering an unregistered flow for which a flow ID could not be specified as a new flow is performed, it is necessary to perform flow registration processing.
  • one packet Since the flow search process is performed only on one processor core of one multi-core CPU, it is a registration process independent of each multi-core CPU.
  • the packet itself is received from the network by the packet transmission / reception block 5, but the processing when only the packet header is input is the same as in the first embodiment.
  • the event queue for each multi-core CPU is configured in an area logically separated from the flow table in the shared memory in each multi-core CPU, and events are directly sent from the hash value calculation block 9 to the event queue in the shared memory. You may write in and you may comprise using the memory in the specific in the hash value calculation block 9, and a register
  • the hash value is calculated not in each processor core but in the hash value calculation block, and a multi-core CPU having a flow table to be subjected to a flow search process according to the calculated hash value is specified. Is possible. Therefore, unlike the third embodiment, there is no need to perform flow search processing with one processor core in all multi-core CPUs, and the flow search processing is efficiently distributed to a plurality of processor cores. Therefore, the performance of the flow search process is improved.
  • Appendix 3 Send and receive packets, Extract a set of arbitrary fields that are search keys from the header information of the received packet, generate and issue a packet reception event, and store the packet data in the main memory,
  • Appendix 4 Generate and issue a packet reception event including the extracted search key, Sequentially assigns received packet reception events to one of a plurality of processor cores of the multi-core CPU; 4. The pattern matching method according to appendix 3, wherein pattern matching processing is performed by a processor core to which a packet reception event is assigned.
  • Appendix 6 Generate a packet reception event including the extracted search key, The pattern matching method according to appendix 5, wherein the generated packet reception event is sequentially issued to one of a plurality of multi-core CPUs.
  • Appendix 9 Generate a packet reception event including the extracted search key, The pattern matching method according to appendix 8, wherein the generated packet reception event is issued to all of the plurality of multi-core CPUs.
  • (Appendix 11) Sending and receiving packets, Extract any set of fields that will be the search key from the header information of the received packet, The hash value for the extracted search key is calculated, a packet reception event is generated and issued, and the packet data is stored in the main memory.
  • On one shared memory of a plurality of multi-core CPUs one of the sub-pattern tables obtained by dividing the pattern table into the same number as the multi-core CPUs is configured.
  • the pattern matching method according to appendix 1 or appendix 2, wherein pattern matching processing is performed by one of a plurality of multi-core CPUs.
  • the shared memory includes a pattern table including a search pattern, A pattern matching apparatus that performs a pattern matching process on a search key to be searched by the plurality of processor cores in parallel.
  • One multi-core CPU One multi-core CPU; A packet transmission / reception block for transmitting and receiving packets; A search key extraction block for extracting a set of arbitrary fields to be a search key from the header information of the received packet, generating and issuing a packet reception event, and storing the packet data in the main memory; Main memory for storing packet data;
  • the pattern matching device according to appendix 14 or appendix 15, further comprising: a bus connecting the multi-core CPU, a search key extraction block, and a main memory.
  • a plurality of multi-core CPUs a plurality of multi-core CPUs; A packet transmission / reception block for transmitting and receiving packets; A search key extraction block for extracting a set of arbitrary fields to be a search key from the header information of the received packet, generating and issuing a packet reception event, and storing the packet data in the main memory; Main memory for storing packet data; A plurality of multi-core CPUs, a search key extraction block, and a bus connecting the main memory;
  • the pattern matching apparatus according to appendix 14 or appendix 15, wherein the same pattern table is configured on a shared memory of a plurality of multi-core CPUs.
  • the multi-core CPU that has received the packet reception event
  • the pattern matching apparatus according to appendix 19, wherein the received packet reception event is sequentially assigned to one of a plurality of processor cores of the multi-core CPU.
  • (Supplementary note 21) a plurality of multi-core CPUs; A packet transmission / reception block for transmitting and receiving packets; A search key extraction block for extracting a set of arbitrary fields to be a search key from the header information of the received packet, generating and issuing a packet reception event, and storing the packet data in the main memory; Main memory for storing packet data; A final result determination block for collecting pattern matching results in a plurality of processor cores in each multi-core CPU and determining a final pattern matching result; A plurality of multi-core CPUs, a search key extraction block, a main memory, and a bus connecting the final result determination block; The pattern matching apparatus according to appendix 14 or appendix 15, wherein one of the sub-pattern tables obtained by dividing the pattern table into the same number as the multi-core CPU is configured on the shared memory of the plurality of multi-core CPUs. .
  • Appendix 22 Generate a packet reception event including the extracted search key, The pattern matching apparatus according to appendix 21, wherein the generated packet reception event is issued to all of the plurality of multi-core CPUs.
  • the plurality of multi-core CPUs are: The pattern matching apparatus according to appendix 22, wherein the received packet reception event is sequentially assigned to one of a plurality of processor cores of the multi-core CPU.
  • a plurality of multi-core CPUs a plurality of multi-core CPUs; A packet transmission / reception block for transmitting and receiving packets; A search key extraction block that extracts a set of arbitrary fields as search keys from the header information of the received packet; A hash value calculation block for calculating a hash value for the extracted search key, generating and issuing a packet reception event, and storing packet data in the main memory; Main memory for storing packet data; A plurality of multi-core CPUs, a search key extraction block, a hash value calculation block, and a bus connecting the main memory; The pattern matching apparatus according to appendix 14 or appendix 15, wherein one of the sub-pattern tables obtained by dividing the pattern table into the same number as the multi-core CPU is configured on the shared memory of the plurality of multi-core CPUs. .
  • the multi-core CPU that has received the packet reception event 26.
  • a network device such as a switch or a router that identifies a flow to which a packet belongs from a header set of the packet by a specific field set and performs a specific process for each flow such as QoS processing or load distribution
  • an appliance device such as a load balancer.
  • Multi-core CPU 2 Multi-core CPU 3 Bus 4 Search key extraction block 5 Packet transmission / reception block 6 Main memory 7 Final result determination block 8 Search key extraction block 9 Hash value calculation block 10 Packet transmission / reception block 11
  • Multi-core CPU 13 Multi-core CPU M multi-core CPU 1-0 shared memory 1-1, 1-2, 1-N processor core 2-0 shared memory 2-1, 2-2, 1-N processor core 11-0 L2 cache 11-1 processor core 12-0 L2 Cache 12-1, 12-2 Processor core 13-0 L2 cache 13-1, 13-2 Processor core M-0 Shared memory M-1, M-2, MN Processor core

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Disclosed is a pattern-matching device that operates in a multi-core processor environment that has a multi-core CPU provided with: a plurality of processor cores; and a shared memory shared among the processor cores, which exist within the same CPU. The shared memory contains a pattern table provided with search patterns. The pattern-matching device executes a pattern-matching process in parallel across the plurality of processor cores. Said pattern-matching process tries to match a search target, namely a search key.

Description

マルチプロセッサ環境におけるパターンマッチング方法、及び装置Pattern matching method and apparatus in multiprocessor environment

 本発明は、複数のプロセッサから構成されるマルチプロセッサ環境に関し、特に、複数個のプロセッサとそれらのプロセッサで共有される共有メモリを一ユニットとし、複数のユニットが存在するシステムにおいて、各ユニットにおける共有メモリにパターン管理テーブルを分割して構成し、システム内の複数個のプロセッサを用いて並列にパターンマッチング処理を実行することで、パターンマッチング処理を行うシステムに関する。 The present invention relates to a multiprocessor environment composed of a plurality of processors, and in particular, in a system in which a plurality of processors and a shared memory shared by these processors are regarded as one unit, and there are a plurality of units, sharing in each unit is performed. The present invention relates to a system for performing pattern matching by dividing a pattern management table into a memory and executing pattern matching in parallel using a plurality of processors in the system.

 入力された文字列がある文字列(パターン)に一致するかを特定するパターンマッチング処理は、ネットワーク分野においても重要な要素技術であり、例えば、ルータやスイッチ等のネットワーク中継装置において、入力されたパケットのヘッダ情報を用いて、そのパケットがどのフローに属するかを特定するための処理(以下、これをフロー検索処理と呼ぶ)や、パケットのヘッダ情報のみならずパケットペイロード内のデータに対してパターンマッチング処理を行い、よりきめ細やかにフローを特定し、各フローに対するQoS(Quality of Service)の提供や、負荷分散制御等を行うためのDeep Packet Inspection(DPI)と呼ばれる技術に用いられる。 Pattern matching processing that identifies whether an input character string matches a character string (pattern) is an important elemental technology in the network field. For example, it is input in a network relay device such as a router or a switch. Using the header information of the packet to identify the flow to which the packet belongs (hereinafter referred to as flow search processing), not only the packet header information but also the data in the packet payload It is used in a technique called Deep Packet Inspection (DPI) for performing pattern matching processing, specifying flows more finely, providing QoS (Quality of Service) for each flow, performing load distribution control, and the like.

 フロー検索処理では、一般的に入力されたパケットのIP(Internet Protocol)ヘッダに定義されている送信元IPアドレス、宛先IPアドレス、プロトコル番号に加え、TCP(Transmission Control Protcol)/UDP(User Datagram Protcol)ヘッダに定義されている送信ポート番号、宛先ポート番号等のフィールドを用いて特定される一連のパケット列をフローと呼び、例えば、ハッシュ法、又はハッシングと呼ばれる技術を用いてフローの特定を行う。ハッシュ法では、フローの特定を行う上記のようなフィールドの組を検索キーとし、検索キーに対してハッシュ関数と呼ばれる演算を行うことで、検索キーをハッシュ値と呼ばれる特定長のビット列に変換し、得られたハッシュ値に相当するテーブル内のエントリ位置に、検索キーと次のエントリへのポインタを格納したテーブルを用いてフロー検索処理を行う。このようなテーブルをハッシュテーブルと呼び、特にフローを特定する検索キーが格納されているため、フローテーブルと呼ぶこともある。ハッシュ法については、例えば特許文献1に記載されているため詳細については省略するが、図17にハッシュテーブルの構成例を示す。 In the flow search process, TCP (Transmission Control Protocol) / UDP (User Datagram Protocol) in addition to the source IP address, destination IP address, and protocol number defined in the IP (Internet Protocol) header of generally input packets. ) A series of packet sequences specified using fields such as a transmission port number and a destination port number defined in the header is called a flow. For example, a flow is specified using a technique called hashing or hashing. . In the hash method, the search key is converted into a bit string of a specific length called a hash value by performing a calculation called a hash function on the search key using the set of fields as described above for identifying the flow. Then, a flow search process is performed using a table storing a search key and a pointer to the next entry at the entry position in the table corresponding to the obtained hash value. Such a table is called a hash table, and in particular, since a search key for specifying a flow is stored, it may be called a flow table. Since the hash method is described in, for example, Patent Document 1 and details thereof are omitted, FIG. 17 shows a configuration example of a hash table.

 図18に1つのみのプロセッサコアをもつ1つのみのCPU(Central Processing Unit)を含む通常のサーバアーキテクチャの概略構成例を示し、本サーバ上でフロー検索処理を行う場合の動作概要について説明する。図18に示す通常のサーバアーキテクチャの例では、1つのみのプロセッサコアを持つCPU11と、DRAM(Dynamic Random Access Memory)等のメインメモリ6と、MAC(Media Access Control)や送受信FIFO(First-In First-Out)等を含むパケット送受信ブロック10が、バス3を通じて接続されている。また、CPU11は、プロセッサコア11-1とレベル2キャッシュ(L2キャッシュ)11-0を含む。なお、CPU11とメインメモリ6とパケット送受信ブロック10は、North BridgeやSouth Bridgeと呼ばれるチップセットを通じて接続されている場合もあるが、ここではこれらのチップセットも含めバス3として示している。 FIG. 18 shows a schematic configuration example of a normal server architecture including only one CPU (Central Processing Unit) having only one processor core, and describes an operation outline when performing flow search processing on this server. . In the example of the normal server architecture shown in FIG. 18, a CPU 11 having only one processor core, a main memory 6 such as a DRAM (Dynamic Random Access Memory), a MAC (Media Access Control) and a transmission / reception FIFO (First-In). A packet transmission / reception block 10 including First-Out) is connected through the bus 3. The CPU 11 includes a processor core 11-1 and a level 2 cache (L2 cache) 11-0. The CPU 11, the main memory 6, and the packet transmission / reception block 10 may be connected through a chip set called North Bridge or South Bridge, but here, these chipsets are also shown as the bus 3.

 通常、フローテーブルは大容量であるため、メインメモリ6上に構築される。パケットがパケット送受信ブロック10で受信されると、MACや受信FIFOを経由し、例えばDMA(Direct Memory Access)機能を用いてメインメモリ6上に書き込まれる。また、パケットがパケット送受信ブロック10で受信されると、パケット送受信ブロック10がCPU11のプロセッサコア11-1へ割り込み制御をかける、又は、イベントキューにイベントが発行され、プロセッサコア11-1によるイベントのポーリングが行われること等により、プロセッサコア11-1はパケットの到着を判断し、メインメモリ6から当該パケットのヘッダ情報を読み出す。続いて、プロセッサコア11-1は、検索キーである複数のヘッダのフィールドの組に対して、ハッシュ値を導出し、ハッシュ値に対応するフローテーブルのエントリをメインメモリ6から読み出し、検索キーの比較を行う。これらの検索キーが一致している場合、当該パケットはそのエントリに対応するフローに属すると判断する。一致していない場合、そのエントリに格納されているポインタを辿り、次のエントリを読み出し、検索キーとそのエントリ内の検索キーとの比較を行うという処理を繰り返すことで、当該パケットがエントリ内のフローに属しているかを確認するという処理を行う。なお、当該パケットの検索キーがフローテーブル内のエントリと最後まで一致しなかった場合の処理、例えば、新規フローとしてテーブルエントリを登録する等については、個々のシステム仕様に依存するため、ここでは省略する。 Usually, since the flow table has a large capacity, it is constructed on the main memory 6. When the packet is received by the packet transmission / reception block 10, it is written on the main memory 6 using, for example, a DMA (Direct Memory Access) function via the MAC or reception FIFO. When a packet is received by the packet transmission / reception block 10, the packet transmission / reception block 10 applies interrupt control to the processor core 11-1 of the CPU 11, or an event is issued to the event queue. As a result of polling, the processor core 11-1 determines the arrival of the packet and reads the header information of the packet from the main memory 6. Subsequently, the processor core 11-1 derives a hash value for a plurality of header field sets as search keys, reads an entry of the flow table corresponding to the hash value from the main memory 6, and Make a comparison. If these search keys match, it is determined that the packet belongs to the flow corresponding to the entry. If they do not match, follow the pointer stored in the entry, read the next entry, and repeat the process of comparing the search key with the search key in the entry, so that the packet is in the entry The process of confirming whether it belongs to the flow is performed. Note that processing when the search key of the packet does not match the entry in the flow table to the end, for example, registering a table entry as a new flow, is omitted here because it depends on the individual system specifications. To do.

 このようなネットワーク分野におけるパターンマッチング処理、特にフロー検索処理は、入力されたパケット毎に行うため、ネットワークの帯域に見合った高速な検索スループットでこれを実現することが必要である。このため、上記のように、メインメモリ上にフローテーブルを構築し、CPUによるソフトウェア処理でフロー検索処理を実行した場合、1パケットに対する検索処理あたりアクセス性能の低いメインメモリ上のフローテーブルに複数回のアクセスが必要であることから、処理性能の低下が課題となる。CPU内部のキャッシュメモリ、例えば、図18におけるL2キャッシュ11-0にメインメモリのフローテーブルの一部を保持することで、メインメモリへのアクセスに関わる遅延を削減することが可能である。しかし、ネットワークにおいては多数のフローに属するパケットがほぼランダムに到着するため、このようなフロー検索処理のように処理毎に異なるデータ(フロー検索処理ではテーブルエントリ)を参照する必要がある場合には有効に機能せず、結果的にメインメモリへのアクセスが劇的に減少するとは考えにくく、有効に機能しないことが考えられる。同様に、例えば、特許文献2のように、CPUに具備されているデータキャッシュとは別に、ハッシュテーブルの一部をキャッシュしたテーブルを用意しても、上記と同様の課題が存在することが考えられる。 Since such pattern matching processing in the network field, particularly flow search processing, is performed for each input packet, it is necessary to realize this with a high-speed search throughput corresponding to the bandwidth of the network. For this reason, as described above, when a flow table is constructed on the main memory and the flow search process is executed by software processing by the CPU, the flow table on the main memory having a low access performance per search process for one packet is stored multiple times. Therefore, the reduction in processing performance becomes a problem. By holding a part of the flow table of the main memory in the cache memory inside the CPU, for example, the L2 cache 11-0 in FIG. 18, it is possible to reduce the delay related to the access to the main memory. However, in a network, packets belonging to a large number of flows arrive almost randomly, so when it is necessary to refer to different data (table entries in the flow search process) for each process such as the flow search process. It is unlikely that it will not function effectively, resulting in a dramatic decrease in access to the main memory, and may not function effectively. Similarly, for example, as in Patent Document 2, there is a problem similar to the above even if a table in which a part of the hash table is cached is prepared separately from the data cache provided in the CPU. It is done.

 一方、近年、複数個のプロセッサコアを内蔵したマルチコアCPU(マルチコアプロセッサ)が登場し、さらにそれらを複数個搭載したサーバも提供されている。例えば、図19に2つのプロセッサコアをもつ2つのCPUから構成される通常のサーバアーキテクチャの概略構成例を示す。図19は、図18に示した構成のうち、CPU11の代わりに2つのプロセッサコアを持つCPU12、CPU13が具備された構成である。ここで、CPU12は2つのプロセッサコア12-1、12-2とL2キャッシュ12-0を、CPU13は2つのプロセッサコア13-1、13-2とL2キャッシュ13-0を含む。このようなサーバでは、プロセッサコアが4つ存在することになり、これらのプロセッサコア間で並列にフロー検索処理を行うことで、図18に示すような構成のサーバにおいてフロー検索処理を行う場合よりも検索性能が向上する。しかし、フローテーブルはメインメモリに構築されるため、低速なメインメモリへのアクセスが必要である上、複数のプロセッサコアが同一のメインメモリへアクセスするためのアクセス競合の調停等が必要となる。このため、各プロセッサコアを有効に活用することができず、性能面での課題は残る。より一般的に、複数個のプロセッサコアをもつCPUを複数個具備するサーバにおいても同様の課題は存在する。同様に、このようなCPUにおいてもキャッシュメモリは存在するが、上述したような処理毎に異なるデータを参照する必要がある場合の課題は解決されていない。 On the other hand, in recent years, multi-core CPUs (multi-core processors) incorporating a plurality of processor cores have appeared, and servers equipped with a plurality of them have also been provided. For example, FIG. 19 shows a schematic configuration example of a normal server architecture composed of two CPUs having two processor cores. FIG. 19 shows a configuration in which the CPU 12 and the CPU 13 having two processor cores are provided instead of the CPU 11 in the configuration shown in FIG. Here, the CPU 12 includes two processor cores 12-1 and 12-2 and an L2 cache 12-0, and the CPU 13 includes two processor cores 13-1 and 13-2 and an L2 cache 13-0. In such a server, there are four processor cores, and by performing flow search processing in parallel between these processor cores, the flow search processing is performed in the server configured as shown in FIG. Also improves search performance. However, since the flow table is constructed in the main memory, access to the low-speed main memory is required, and arbitration of access contention for a plurality of processor cores to access the same main memory is required. For this reason, each processor core cannot be used effectively, and there remains a problem in terms of performance. More generally, a similar problem exists in a server including a plurality of CPUs having a plurality of processor cores. Similarly, although such a CPU has a cache memory, the problem in the case where it is necessary to refer to different data for each process as described above has not been solved.

 また、特許文献3では、図19に示すようなマルチコアプロセッサ環境における共有キャッシュのミスヒット率を低減させるためのリプレイスメント制御についての開示があり、特許文献4では、マルチプロセッサ環境における共有メモリに対する排他制御についての開示があり、特許文献5では、マルチコアプロセッサ環境における共有キャッシュに対するプリフェッチ機能についての開示があるが、フロー検索処理のような処理毎に異なるデータを参照する必要がある場合の課題そのものを解決するものではない。さらに、非特許文献1では、最長一致検索(LPM: Longest Prefix Match)を対象とし、検索キーとなるデータが格納された一部のデータ領域をキャッシュのような高速なアクセス性能を持つメモリ領域に格納し、低速なメモリへのアクセス回数を低減させるような手法について開示されている。 Patent Document 3 discloses a replacement control for reducing the miss-miss rate of a shared cache in a multi-core processor environment as shown in FIG. 19, and Patent Document 4 discloses an exclusive control for a shared memory in a multi-processor environment. Patent Document 5 discloses a prefetch function for a shared cache in a multi-core processor environment, but solves the problem itself when it is necessary to refer to different data for each process such as a flow search process. Not what you want. Furthermore, in Non-Patent Document 1, the longest match search (LPM: Longest Prefix Match) is targeted, and a part of the data area in which data serving as a search key is stored is a memory area having a high-speed access performance such as a cache. A technique for storing and reducing the number of accesses to a low-speed memory is disclosed.

 特許文献6では、図1が示すように、(複数の)プロセッサ、キャッシュメモリ、メモリ、周辺機器(ペリフェラル)、パケットコントローラがシステム・バスで接続された構成が開示されている(たとえば、第7段落参照)。 In Patent Document 6, as shown in FIG. 1, a configuration in which (a plurality of) processors, a cache memory, a memory, peripheral devices (peripherals), and a packet controller are connected via a system bus is disclosed (for example, the seventh See paragraph).

 また、パケットコントローラにて、パケットを受信すると、パケット受信部に構成されているパターンマッチングロジック、およびハッシュ機構によって、予め定義され、パターンマッチングロジック内のレジスタに保持されているパターンと受信パケット内のアドレス等が比較され、当該パケットの受信可否が判定される(たとえば、第9段落参照)。 Also, when the packet is received by the packet controller, the pattern matching logic configured in the packet receiver and the hash mechanism define in advance the pattern stored in the register in the pattern matching logic and the received packet Addresses and the like are compared to determine whether or not the packet can be received (for example, see the ninth paragraph).

 更に、システム内のキャッシュメモリには、受信を許可されたパケットの一部が書き込まれ、引き続きプロセッサにて行われるパケット処理を行うために保持される(たとえば、第11段落参照)。 Further, a part of the packet permitted to be received is written in the cache memory in the system and is held for subsequent packet processing performed by the processor (see, for example, the 11th paragraph).

 しかし、上述したように、特許文献6では、パケットコントローラの内部でパターンマッチを行う機能があり、そこでパターンマッチを行う。パターンマッチが各プロセッサにおいて行われるようなことはない。 However, as described above, Patent Document 6 has a function of performing pattern matching inside the packet controller, and performs pattern matching there. Pattern matching is not performed in each processor.

 また、上述したように、特許文献6のキャッシュメモリには、パターンマッチングにより受信を許可されたパケットデータの一部が書き込まれ、プロセッサによるパケット処理に利用される。キャッシュメモリに、パターンマッチの対象となるパターン(フロー情報)が書き込まれており、フロー管理テーブルとして利用するようなことはない。 Also, as described above, a part of the packet data permitted to be received by pattern matching is written into the cache memory of Patent Document 6 and used for packet processing by the processor. A pattern (flow information) to be subjected to pattern matching is written in the cache memory and is not used as a flow management table.

 特許文献7の図7には、複数のマイクロエンジン(プロセッサ)、それらで共有されるメモリ(SRAM、DRAM)、パケット受信部等がバスで接続された構成(たとえば、第50段落参照等)が開示されている。各マイクロエンジンには、ローカルメモリと計算コア(プロセッサコア)が含まれる。 FIG. 7 of Patent Document 7 shows a configuration in which a plurality of micro engines (processors), a memory (SRAM, DRAM) shared by them, a packet receiving unit, and the like are connected by a bus (for example, see paragraph 50). It is disclosed. Each micro engine includes a local memory and a computing core (processor core).

 また、共有メモリ(SRAM)で保持されたフローの状態情報(パターン、パターンマッチで利用する状態遷移情報)を、マイクロエンジン内のローカルメモリに読み出し、マイクロエンジン内のプロセッサコアにてパターンマッチ処理が行われる(たとえば、第55段落参照)。 In addition, the flow state information (pattern, state transition information used in pattern matching) held in the shared memory (SRAM) is read to the local memory in the micro engine, and pattern matching processing is performed by the processor core in the micro engine. (See, for example, paragraph 55).

 しかし、上述したように、特許文献7では、複数のプロセッサで共有して利用するSRAM、DRAM等のメインメモリは存在するが、それらよりもより高速にアクセスできるキャッシュのようなメモリとしては、各プロセッサに対応するローカルメモリしか存在せず、複数のプロセッサで共有できるキャッシュメモリは存在しない。複数のプロセッサで共有されるメインメモリ(SRAM、DRAM等)に加えて、特定の数のプロセッサで共有するキャッシュメモリが存在するようなことはない。 However, as described above, in Patent Document 7, there are main memories such as SRAM and DRAM which are shared and used by a plurality of processors. There is only local memory corresponding to the processor, and there is no cache memory that can be shared by a plurality of processors. In addition to the main memory (SRAM, DRAM, etc.) shared by a plurality of processors, there is no cache memory shared by a specific number of processors.

 また、上述したように、特許文献7のパターンマッチ処理では、パターン等の情報は複数のプロセッサで共有利用するメインメモリ上で保持されており、キャッシュメモリ(マイクロエンジン内のローカルメモリ)には、対象となるパケットのパターンマッチ処理を行う間だけメインメモリから情報を読み出し、利用する。キャッシュメモリに、パターンマッチの対象となるパターン(フロー情報)が書き込まれており、キャッシュメモリがフロー管理テーブルとして直接利用されるようなことはない。 As described above, in the pattern matching process of Patent Document 7, information such as a pattern is held on a main memory shared and used by a plurality of processors, and the cache memory (local memory in the micro engine) Information is read from the main memory and used only during the pattern matching process of the target packet. A pattern (flow information) to be subjected to pattern matching is written in the cache memory, and the cache memory is not directly used as a flow management table.

特開平11-53199号公報Japanese Patent Laid-Open No. 11-53199 特表2002-511703号公報Special table 2002-511703 gazette 特開2002-373115号公報JP 2002-373115 A 特開2008-293368号公報JP 2008-293368 A 特開2007-207240号公報JP 2007-207240 A 特表2007-525037号公報Special Table 2007-525037 特表2008-544728号公報Special table 2008-544728

Butler Lampson、 Venkatachary Srinivasan、George Varghese、「IP Lookups Using Multiway and Multicolumn Search」、IEEE/ACM Transactions on Networking、vol.7、No.3、1999年、第324-334頁Butler Lampson, Venkatachari Srinivasan, George Vargese, “IP Lookups Multiway and Multicolumn Search”, IEEE / ACM Transactions. 7, no. 3, 1999, pages 324-334

 上記のようなネットワーク分野におけるマルチプロセッサ環境を用いたパターンマッチング処理では、ネットワークの帯域に見合った高速な検索スループットでこれを実現することが必要であるが、それを満たす高い処理性能を提供できないことに課題がある。 In pattern matching processing using the multiprocessor environment in the network field as described above, it is necessary to achieve this with high-speed search throughput that matches the network bandwidth, but it is not possible to provide high processing performance that satisfies it. There is a problem.

 その理由は、フローテーブルがアクセス性能の低いメインメモリ上に構築され、メインメモリ上のフローテーブルに複数回のアクセスが必要であるためである。また、メインメモリよりも高速にアクセス可能なキャッシュメモリのようなメモリにフローテーブルの一部をキャッシュし、メインメモリへのアクセス回数を削減しようとしても、到着するパケットが属するフローがランダムな場合には、到着するパケット間での依存性が少なく、キャッシュしたフローテーブルの一部のエントリにマッチする可能性が小さいため、メインメモリへのアクセス回数削減の効果が少ないことにある。 The reason is that the flow table is built on the main memory with low access performance, and the flow table on the main memory needs to be accessed multiple times. In addition, if a part of the flow table is cached in a memory such as a cache memory that can be accessed faster than the main memory, and the number of accesses to the main memory is reduced, the flow to which the arriving packet belongs is random. This is because there is less dependency between arriving packets and there is less possibility of matching some entries in the cached flow table, so that the effect of reducing the number of accesses to the main memory is small.

 本発明の目的は、ネットワーク帯域に見合った高速な検索性能を有するパターンマッチング処理を実現できるパターンマッチングシステムを提供することにある。 An object of the present invention is to provide a pattern matching system capable of realizing a pattern matching process having a high-speed search performance corresponding to a network band.

 本発明の第1、第2のパターンマッチング装置は、複数個のプロセッサコアとそれらのプロセッサコア間で共有される共有メモリを有する1個、又は複数個のマルチコアCPU(図1の1、図6の1、2、・・・、M)と、パケットの送受信を行うパケット送受信ブロック(図1の5、図6の5)と、受信したパケットからフロー検索処理における検索キーとなるヘッダ情報を抽出すると共に、パケット受信イベントの生成と発行、及びパケットをメインメモリへ書き込む検索キー抽出ブロック(図1の4、図6の4)と、パケット受信イベントや必要に応じてパケットそのものが書き込まれるDRAM等のメインメモリ(図1の6、図6の6)と、マルチコアCPU、検索キー抽出ブロック、メインメモリをそれぞれ接続するバス(図1の3、図6の3)とを有する。また、ハッシュテーブルをベースとしたパターンテーブルは、マルチコアCPU内の共有メモリにおいて構成される。このような構成を採用し、検索キー抽出ブロックにより、予めパケットのヘッダ情報から検索キーとなるフィールドの組を抽出し、それをプロセッサコアに通知すると共に、メインメモリよりも高速なアクセスが可能な共有メモリ内のパターンテーブルを用いてパターンマッチング処理を行うことにより、本発明の目的を達成することができる。 The first and second pattern matching apparatuses according to the present invention include one or a plurality of multi-core CPUs (1 and 6 in FIG. 1) having a plurality of processor cores and a shared memory shared between the processor cores. 1, 2,..., M), a packet transmission / reception block (5 in FIG. 1, 5 in FIG. 6), and header information serving as a search key in the flow search process is extracted from the received packet. In addition, a search key extraction block (4 in FIG. 1 and 4 in FIG. 6) for generating and issuing a packet reception event and writing the packet to the main memory, a DRAM in which the packet reception event and the packet itself are written if necessary, etc. 1 main memory (6 in FIG. 1, 6 in FIG. 6), a bus (FIG. 1) for connecting the multi-core CPU, the search key extraction block, and the main memory 3, and a 3 in Figure 6). A pattern table based on a hash table is configured in a shared memory in the multi-core CPU. Adopting such a configuration, the search key extraction block extracts a set of fields as a search key from the packet header information in advance, notifies the processor core of the set, and allows faster access than the main memory. By performing the pattern matching process using the pattern table in the shared memory, the object of the present invention can be achieved.

 また、本発明の第3のパターンマッチング装置は、第2のパターンマッチング装置に加え、最終結果決定ブロック(図9の7)を有する。また、各マルチコアCPU内は、その共有メモリ上に、元の1つのパターンテーブルを分割したサブパターンテーブルを持つ。このような構成を採用し、検索キー抽出ブロックにより、予めパケットのヘッダ情報から検索キーとなるフィールドの組を抽出し、それを各マルチコアCPUの1つのプロセッサコアに通知し、分散してパターンマッチング処理を行うと共に、メインメモリよりも高速なアクセスが可能な共有メモリ内のパターンテーブルを用いてパターンマッチング処理を行うことにより、本発明の目的を達成することができる。 In addition to the second pattern matching device, the third pattern matching device of the present invention has a final result determination block (7 in FIG. 9). Each multi-core CPU has a sub pattern table obtained by dividing one original pattern table on the shared memory. Adopting such a configuration, the search key extraction block extracts in advance the set of fields that will be the search key from the packet header information, notifies it to one processor core of each multi-core CPU, and distributes the pattern matching The object of the present invention can be achieved by performing the pattern matching process using the pattern table in the shared memory that can be accessed at a higher speed than the main memory.

 さらに、本発明の第4のパターンマッチング装置は、第3のパターンマッチング装置における最終結果決定ブロック(図9の7)を除き、ハッシュ値計算ブロック(図13の9)を加えた構成である。このような構成を採用し、検索キー抽出ブロック(図13の8)により、予めパケットのヘッダ情報から検索キーとなるフィールドの組を抽出し、さらに、ハッシュ値計算ブロック(図13の9)において、検索キーに対するハッシュ値を算出することで、当該ハッシュ値に対応するパターンエントリ群が保持されているパターンテーブルを有するマルチコアCPUを特定することができ、当該マルチコアCPUの1つのプロセッサコアにおいてパターンマッチング処理を行うと共に、メインメモリよりも高速なアクセスが可能な共有メモリ内のパターンテーブルを用いてパターンマッチング処理を行うことにより、本発明の目的を達成することができる。 Furthermore, the fourth pattern matching device of the present invention has a configuration in which a hash value calculation block (9 in FIG. 13) is added except for the final result determination block (7 in FIG. 9) in the third pattern matching device. Employing such a configuration, a search key extraction block (8 in FIG. 13) previously extracts a set of fields serving as a search key from the header information of the packet, and further in a hash value calculation block (9 in FIG. 13). By calculating a hash value for the search key, a multi-core CPU having a pattern table holding a pattern entry group corresponding to the hash value can be specified, and pattern matching is performed in one processor core of the multi-core CPU. The object of the present invention can be achieved by performing the pattern matching process using the pattern table in the shared memory that can be accessed at a higher speed than the main memory.

 本発明によれば、ネットワーク帯域に見合った高速な検索性能を有するパターンマッチング処理を実現できる。 According to the present invention, it is possible to realize a pattern matching process having a high-speed search performance corresponding to the network bandwidth.

本発明の第1の実施の形態による装置の構成を示すブロック図である。It is a block diagram which shows the structure of the apparatus by the 1st Embodiment of this invention. 本発明の第1の実施による装置の動作を示す流れ図である。2 is a flowchart showing the operation of the apparatus according to the first embodiment of the present invention; 図2におけるステップA4を示す流れ図である。It is a flowchart which shows step A4 in FIG. フローテーブルの一例を示す図である。It is a figure which shows an example of a flow table. 本発明の第1の実施による装置の動作概要を示す図である。It is a figure which shows the operation | movement outline | summary of the apparatus by the 1st implementation of this invention. 本発明の第2の実施の形態による装置の構成を示すブロック図である。It is a block diagram which shows the structure of the apparatus by the 2nd Embodiment of this invention. 本発明の第2の実施による装置の動作を示す流れ図である。6 is a flowchart showing the operation of the apparatus according to the second embodiment of the present invention. 本発明の第2の実施による装置の動作概要を示す図である。It is a figure which shows the operation | movement outline | summary of the apparatus by the 2nd implementation of this invention. 本発明の第3の実施の形態による装置の構成を示すブロック図である。It is a block diagram which shows the structure of the apparatus by the 3rd Embodiment of this invention. 本発明の第3の実施の形態におけるフローテーブルの分割の一例を示す図である。It is a figure which shows an example of the division | segmentation of the flow table in the 3rd Embodiment of this invention. 本発明の第3の実施による装置の動作を示す流れ図である。6 is a flowchart showing the operation of the apparatus according to the third embodiment of the present invention; 本発明の第3の実施による装置の動作概要を示す図である。It is a figure which shows the operation | movement outline | summary of the apparatus by the 3rd implementation of this invention. 本発明の第4の実施の形態による装置の構成を示すブロック図である。It is a block diagram which shows the structure of the apparatus by the 4th Embodiment of this invention. 本発明の第4の実施による装置の動作を示す流れ図である。6 is a flowchart showing the operation of the apparatus according to the fourth embodiment of the present invention; 図14におけるステップA13を示す流れ図である。It is a flowchart which shows step A13 in FIG. 本発明の第4の実施による装置の動作概要を示す図である。It is a figure which shows the operation | movement outline | summary of the apparatus by the 4th implementation of this invention. ハッシュテーブルの構成例である。It is a structural example of a hash table. 通常の1つのCPU(シングルコア)を持つサーバにおけるソフトウェアによるフロー検索処理のアーキテクチャ例である。It is an example of an architecture of flow search processing by software in a server having a normal single CPU (single core). 通常の2つのCPU(デュアルコア)を持つサーバにおけるソフトウェアによるフロー検索処理のアーキテクチャ例である。It is an example of an architecture of flow search processing by software in a server having two normal CPUs (dual core).

 以下、図面を参照して本発明を実施するための形態について詳細に説明する。
<実施の形態1>
 図1は、本発明の第1の実施の形態の構成を示すブロック図である。図1を参照すると、本発明の第1の実施の形態は、マルチコアCPU1と、パケットの送受信を行うパケット送受信ブロック5と、受信したパケットからフロー検索処理における検索キーとなるヘッダ情報を抽出すると共に、パケット受信イベントの生成と発行、及びパケットをメインメモリへ書き込む検索キー抽出ブロック4と、パケット受信イベントや必要に応じてパケットそのものが書き込まれるDRAM等のメインメモリ6と、マルチコアCPU1、検索キー抽出ブロック4、メインメモリ6を相互に接続するバス3とを含む。
DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
<Embodiment 1>
FIG. 1 is a block diagram showing the configuration of the first exemplary embodiment of the present invention. Referring to FIG. 1, in the first embodiment of the present invention, a multi-core CPU 1, a packet transmission / reception block 5 for transmitting / receiving packets, and header information serving as a search key in flow search processing are extracted from received packets. Search key extraction block 4 for generating and issuing packet reception events and writing packets to the main memory; main memory 6 such as a DRAM in which packet reception events and packets themselves are written; multi-core CPU 1; search key extraction Block 4 and bus 3 interconnecting main memory 6 are included.

 マルチコアCPU1は、N個のプロセッサコア1-1、1-2、・・・、1-Nと、上記N個のプロセッサコアで共有される共有メモリ1-0を含む。また、共有メモリ1-0には、ハッシュテーブルとして構成されたフローテーブルの全てを保持している。なお、本共有メモリがデータキャッシュとして利用されている場合、本フローテーブルはデータキャッシュとは論理的に分離された領域に保持されるものとする。 The multi-core CPU 1 includes N processor cores 1-1, 1-2,..., 1-N, and a shared memory 1-0 shared by the N processor cores. The shared memory 1-0 holds all the flow tables configured as hash tables. When the shared memory is used as a data cache, the flow table is held in an area logically separated from the data cache.

 パケット送受信ブロック5は、ネットワークがEthernet(登録商標)である場合には、MACの機能を持ち、入力されたビットストリームをEhernet(登録商標)フレームとして整形した後に、検索キー抽出ブロック4へ出力する。このようなパケット送受信ブロック5は、当業者にとってよく知られているため、その詳細な構成は省略する。 When the network is Ethernet (registered trademark), the packet transmission / reception block 5 has a MAC function, and after shaping the input bit stream as an Ethernet (registered trademark) frame, outputs it to the search key extraction block 4 . Since such a packet transmission / reception block 5 is well known to those skilled in the art, its detailed configuration is omitted.

 検索キー抽出ブロック4は、パケット送受信ブロック5から受信したパケットを受け取り、フロー検索処理における検索キーとなるヘッダ情報を抽出する。抽出した検索キーの情報を含むパケット受信イベントを生成し、マルチコアCPU1に対するイベントキューへ書き込むと共に、検索キーを抽出した後のパケットをメインメモリへ書き込む。本書き込み処理には、一般的にDMA機能が用いられる。なお、パケット送信時には、本ブロックはメインメモリからパケットデータを読み出し、そのままパケット送受信ブロック5へ出力する。 The search key extraction block 4 receives the packet received from the packet transmission / reception block 5 and extracts header information serving as a search key in the flow search process. A packet reception event including the extracted search key information is generated and written to the event queue for the multi-core CPU 1, and the packet after the search key is extracted is written to the main memory. In general, the DMA function is used for the writing process. At the time of packet transmission, this block reads packet data from the main memory and outputs it to the packet transmission / reception block 5 as it is.

 なお、図1では、マルチコアCPU1、検索キー抽出ブロック4、メインメモリ6をバス3によってバス接続している形態であるが、本接続形態はサーバ等に備えられたNorth BridgeやSouth Bridgeと呼ばれるチップセットを通じて接続される形態や、クロスバー、リングバスの形態でも構わない。また、本実施の形態は、サーバ等のマザーボード上で構成しても良いし、メインメモリ6を除いた全てを1つのSoC(System-on-a-Chip)として、1つのチップ上に構成しても良い。 In FIG. 1, the multi-core CPU 1, the search key extraction block 4, and the main memory 6 are connected by a bus 3, but this connection form is a chip called North Bridge or South Bridge provided in a server or the like. A form connected through a set, a crossbar, or a ring bus may be used. In addition, the present embodiment may be configured on a motherboard such as a server, or all except the main memory 6 is configured as one SoC (System-on-a-Chip) on one chip. May be.

 次に、図1、及び図2の流れ図を参照して、本発明の第1の実施の形態の動作について詳細に説明する。 Next, the operation of the first exemplary embodiment of the present invention will be described in detail with reference to the flowcharts of FIGS.

 ネットワークから入力されたパケットは、パケット送受信ブロック5で受信する。パケット送受信ブロック5で受信したパケットは、例えばネットワークがEthernet(登録商標)である場合には、Ehernet(登録商標)フレーム(本明細書では、これらを総じてパケットと呼ぶ)として整形した後に、検索キー抽出ブロック4に出力する(ステップA1)。 Packets input from the network are received by the packet transmission / reception block 5. For example, when the network is the Ethernet (registered trademark), the packet received by the packet transmission / reception block 5 is shaped as an Ethernet (registered trademark) frame (in the present specification, these are collectively referred to as a packet), and then a search key. It outputs to the extraction block 4 (step A1).

 検索キー抽出ブロック4は、パケット送受信ブロック5からパケットを受け取ると、パケットのヘッダ情報のうち、フロー検索処理における検索キーとして、予め決められた任意のフィールドを抜き出し、パケットに装置内で一意に識別できる識別子を付与し、検索キー抽出ブロック4に含まれるFIFOに書き込む(ステップA2)。ここで、検索キーとして用いるパケットヘッダのフィールドとしては、例えば、IPヘッダに定義されている送信元IPアドレス(32bits)、宛先IPアドレス(32bits)、プロトコル番号(8bits)、TCP/UDPヘッダに定義されている送信ポート番号(16bits)、宛先ポート番号(16bits)の5つのフィールドの組が用いられ、この場合の検索キーの長さは、104bits(=13bytes)となる。 When receiving a packet from the packet transmission / reception block 5, the search key extraction block 4 extracts a predetermined field as a search key in the flow search process from the packet header information, and uniquely identifies the packet within the apparatus. A possible identifier is assigned and written in the FIFO included in the search key extraction block 4 (step A2). Here, as a field of the packet header used as a search key, for example, a source IP address (32 bits), a destination IP address (32 bits), a protocol number (8 bits), and a TCP / UDP header defined in the IP header are defined. A set of five fields of a transmission port number (16 bits) and a destination port number (16 bits) are used, and the length of the search key in this case is 104 bits (= 13 bytes).

 次に、検索キー抽出ブロック4は、パケット受信をマルチコアCPU1に通知するためのイベントを生成、発行し、イベントキューへ書き込むと共に、対応するパケットをメインメモリ6に書き込む(ステップA3)。ここで生成するイベントには、抽出した検索キーの情報と、対応するパケットの識別子を含まれれば良く、イベントキューが例えばメインメモリ6に構成されている場合は、メインメモリ6のイベントキューに書き込む。なお、イベントキューがメインメモリ6に構成されている場合、マルチコアCPU1に対するイベントキューの領域と、パケットデータを書き込む領域は論理的に分離されており、そのベースとなるアドレスはマルチコアCPU1にとって既知であるものとする。 Next, the search key extraction block 4 generates and issues an event for notifying the multi-core CPU 1 of packet reception, writes it in the event queue, and writes the corresponding packet in the main memory 6 (step A3). The event generated here only needs to include the extracted search key information and the identifier of the corresponding packet. When the event queue is configured in the main memory 6, for example, the event is written in the event queue in the main memory 6. . When the event queue is configured in the main memory 6, the event queue area for the multi-core CPU 1 and the area for writing packet data are logically separated, and the base address is known to the multi-core CPU 1. Shall.

 一方、マルチコアCPU1は、イベントキューにイベントが存在するかをポーリング(polling)によって監視しており、イベントキューにイベントが存在すれば、その先頭のイベントを読み出し、特定のプロセッサコアによってフロー検索処理を行う(ステップA4)。 On the other hand, the multi-core CPU 1 monitors whether there is an event in the event queue by polling. If there is an event in the event queue, the multi-core CPU 1 reads the leading event and performs flow search processing by a specific processor core. Perform (Step A4).

 図3は、ステップA4のより詳細な動作を説明するための流れ図である。 FIG. 3 is a flowchart for explaining the detailed operation of step A4.

 マルチコアCPU1は、イベントキューにイベントが存在すると、その先頭のイベントを読み出し、特定のプロセッサコアに割り当てる(ステップB1)。ここで、マルチコアCPU1のように複数個のプロセッサコアをもつCPUでは、ポーリングに特化した処理を特定のプロセッサコア、例えば、プロセッサコア1-1が行い、イベントキューからイベントを読み出した後、処理を行っていない他の空き状態のプロセッサコアに割り当てるという手法が考えられる。また、各プロセッサコアがそれぞれイベントキューに対してポーリングを行うことでイベントを引き取る手法もある。後者の場合は、各プロセッサコアが同じイベントキューが存在するメモリ領域に対してアクセスを行うため、これらのプロセッサコア間での調停が必要となる。本実施形態では、このようなイベントの読み出し処理については、既存の手法を用いることとし、各プロセッサコア(特定のプロセッサコアがポーリングに特化した処理を行う場合には、これを除くプロセッサコア)に平均的にイベントの割り当てが行われれば良い。以下では、簡単化のため、イベントが割り当たったプロセッサコアをプロセッサコア1-k(k=1、・・・、N)とし、プロセッサコア1-kの処理について説明する。 When there is an event in the event queue, the multi-core CPU 1 reads the top event and assigns it to a specific processor core (step B1). Here, in a CPU having a plurality of processor cores such as the multi-core CPU 1, a specific processor core, for example, the processor core 1-1 performs processing specialized for polling, and after reading an event from the event queue, the processing A method of assigning to another free processor core that has not been performed is conceivable. In addition, there is a method in which each processor core collects an event by polling the event queue. In the latter case, since each processor core accesses a memory area where the same event queue exists, arbitration between these processor cores is necessary. In this embodiment, an existing method is used for such event reading processing, and each processor core (a processor core other than this when a specific processor core performs processing specialized for polling) It suffices if events are assigned on average. Hereinafter, for simplification, the processor core 1-k (k = 1,..., N) is assumed as the processor core to which the event is assigned, and the processing of the processor core 1-k will be described.

 イベントが割り当たったプロセッサコア1-kは、イベント内に格納されている検索キーに対して、予め設定されたハッシュ関数を用いてハッシュ値を計算する(ステップB2)。ここで、ハッシュ関数やハッシュ値は、当業者にとってよく知られているため、その詳細な説明は省略する。 The processor core 1-k to which the event is assigned calculates a hash value using a preset hash function for the search key stored in the event (step B2). Here, since the hash function and the hash value are well known to those skilled in the art, a detailed description thereof will be omitted.

 続いて、プロセッサコア1-kは、算出したハッシュ値をアドレス値として(ステップB3)、フローテーブルからエントリを読み出す(ステップB4)。図4に、共有メモリ1-0に構成されているフローテーブルの一例を示す。フローテーブルはハッシュテーブルをベースに構成されており、同じハッシュ値をもつ複数のエントリは双方向リストで接続されている。1つのエントリは検索キーと、フローを特定するためのフローIDと、リストの次のエントリへのポインタとなるアドレス値と、リストの前のエントリへのポインタを保持している。但し、図4で示すエントリの検索キー、フローID、次のエントリへのポインタ、前のエントリへのポインタのデータ長の合計が共有メモリ1-0のワード長よりも長い場合には、連続する複数のワードを用いて1つのエントリが構成されるものとし、例えば、物理アドレスのビット長の下位ビットを無視する等、アドレス値は各エントリの先頭ワードを示す論理的なアドレス値を意味するものとする。エントリは、基本的に空き領域に追加され、双方向リストによって管理される。一方、既にエントリが管理されているアドレス値、つまりハッシュ値にエントリが追加される場合は、当該エントリのアドレス値をハッシュ値とするエントリの優先度の方が高く、既に管理されているエントリは別の空き領域に移動させるものとする。なお、図4はあくまで一例であり、衝突の発生を考慮して、エントリの先頭ワードを示す論理的なアドレス空間よりも小さい範囲でハッシュ値を計算するようにしても良いし、他の既存のハッシュテーブルの構成に従ってフローテーブルを構成しても構わない。また、エントリの格納場所の移動が発生しないようなフローテーブルを構成した場合には、フローを特定するフローIDにアドレス値を用いることができるため、フローIDをエントリ内に保持する必要はない。 Subsequently, the processor core 1-k reads the entry from the flow table (step B4) using the calculated hash value as an address value (step B3). FIG. 4 shows an example of a flow table configured in the shared memory 1-0. The flow table is configured based on a hash table, and a plurality of entries having the same hash value are connected in a bidirectional list. One entry holds a search key, a flow ID for specifying a flow, an address value serving as a pointer to the next entry in the list, and a pointer to the previous entry in the list. However, if the sum of the search key of the entry shown in FIG. 4, the flow ID, the pointer to the next entry, and the data length of the pointer to the previous entry is longer than the word length of the shared memory 1-0, it is continuous. An entry is composed of a plurality of words. For example, the address value means a logical address value indicating the first word of each entry, such as ignoring the lower bits of the bit length of the physical address. And Entries are basically added to the free space and managed by a bidirectional list. On the other hand, when an entry is added to an address value for which an entry is already managed, that is, a hash value, the priority of the entry having the address value of the entry as a hash value is higher, and the already managed entry is Assume that it is moved to another free area. Note that FIG. 4 is merely an example, and in consideration of the occurrence of a collision, the hash value may be calculated in a range smaller than the logical address space indicating the first word of the entry. The flow table may be configured according to the configuration of the hash table. In addition, when a flow table is configured such that the entry storage location does not move, an address value can be used for the flow ID that identifies the flow, so there is no need to hold the flow ID in the entry.

 フローテーブルからエントリを読み出したプロセッサコア1-kは、読み出したエントリの検索キーと、当該イベントの検索キーが一致するかを判断し(ステップB5)、一致すればエントリ内に保持しているフローIDを取得し(ステップB6)、フロー検索処理(ステップA4)を終了する。一致しない場合、リンクで接続された次のエントリが存在するかを確認し(ステップB7)、次のエントリが存在する場合には、当該エントリの次のポインタをアドレス値とし(ステップB8)、次のエントリを読み出す(ステップB4)という処理を繰り返すことでフロー検索処理を行う。また、次のエントリが存在しない場合には、該当するフローは登録されていないため、結果としてフロー未登録と判断し(ステップB9)、フロー検索処理(ステップA4)を終了する。 The processor core 1-k that has read the entry from the flow table determines whether the search key of the read entry matches the search key of the event (step B5). The ID is acquired (step B6), and the flow search process (step A4) is terminated. If they do not match, it is confirmed whether there is a next entry connected by a link (step B7). If there is a next entry, the next pointer of the entry is used as an address value (step B8), and the next The flow search process is performed by repeating the process of reading the entry (step B4). If the next entry does not exist, the corresponding flow is not registered. As a result, it is determined that the flow is not registered (step B9), and the flow search process (step A4) is terminated.

 最後に、該当するプロセッサコア1-kは、フロー検索処理(ステップA4)で得た結果を出力する(ステップA5)。ここで、一般的に、フロー検索処理は、QoS処理のような、パケットが属するフローを識別し、そのフローに対する個別の処理を行うために行われるため、これらの処理を行う処理ブロックや、CPU等がバス3を経由して、又は何らかの通信手段によって備わっている場合が想定される。本実施の形態では、フロー検索処理に焦点を当てているため、これらの処理ブロックについては省略しているが、ステップA5における結果出力先としては、このようなフロー毎に個別の処理を行わせる処理ブロックへ出力することを想定する。なお、上記では、一致する検索キーを持つエントリが無ければ、フロー未登録と判断して処理を終了していたが、当該フローを新規フローとして登録する処理を行っても良い。同様に、既にエントリとして登録されているフローの削除を行うことも可能であるが、これらの処理は、フローテーブルであるハッシュテーブルの構成に大きく依存するため、それぞれの詳細については、省略する。 Finally, the corresponding processor core 1-k outputs the result obtained in the flow search process (step A4) (step A5). Here, since the flow search processing is generally performed to identify a flow to which a packet belongs, such as QoS processing, and to perform individual processing on the flow, a processing block for performing these processing, a CPU Or the like is provided via the bus 3 or by some communication means. Since this embodiment focuses on the flow search processing, these processing blocks are omitted, but as a result output destination in step A5, individual processing is performed for each such flow. Assume output to processing block. In the above description, if there is no entry having a matching search key, it is determined that the flow has not been registered, and the process ends. However, a process for registering the flow as a new flow may be performed. Similarly, it is possible to delete a flow that has already been registered as an entry. However, since these processes largely depend on the configuration of a hash table that is a flow table, the details thereof are omitted.

 さらに、そのようなフロー毎に個別の処理を行う処理ブロックで何らかの処理を行った後、当該パケットを送信する場合の処理については、検索キー抽出ブロック4がメインメモリ6からパケットを読み出し、読み出したパケットデータをパケット送受信ブロック5に出力し、パケットデータを受け取ったパケット送受信ブロック5がパケットを送信することを想定するが、本実施形態ではフロー検索処理に着目しているため、詳細は省略する。 Further, after performing some processing in the processing block that performs individual processing for each flow, the search key extraction block 4 reads and reads the packet from the main memory 6 for processing when the packet is transmitted. It is assumed that the packet data is output to the packet transmission / reception block 5 and the packet transmission / reception block 5 that has received the packet data transmits the packet. However, in this embodiment, attention is paid to the flow search process, and the details are omitted.

 或いは、本実施の形態では、パケットそのものをパケット送受信ブロック5にて、ネットワークから受信する形態であったが、本実施の形態で示したフロー検索装置に対して、パケットヘッダのみが入力される場合も想定される。この場合、パケット送受信ブロック5が、外部から入力されたパケットヘッダを受信し、それを検索キー抽出ブロック4へ出力する。パケットヘッダを受け取った検索キー抽出ブロック4は、パケットヘッダから検索キーとなるフィールドの組を抽出し、イベントを生成、発行する。この場合、パケットのペイロードデータは受信していないため、メインメモリ6へのパケットデータの書き込みは不要となる。 Alternatively, in the present embodiment, the packet itself is received from the network by the packet transmission / reception block 5, but only the packet header is input to the flow search device shown in the present embodiment. Is also envisaged. In this case, the packet transmission / reception block 5 receives a packet header input from the outside and outputs it to the search key extraction block 4. Upon receiving the packet header, the search key extraction block 4 extracts a set of fields serving as a search key from the packet header, and generates and issues an event. In this case, since the payload data of the packet is not received, it is not necessary to write the packet data to the main memory 6.

 さらに、本実施の形態では、マルチコアCPU1に対するイベントキューは、メインメモリ6において構成されることを想定したが、本イベントキューは、マルチコアCPU1内の共有メモリ1-0において、フローテーブルとは論理的に分離された領域に構成し、検索キー抽出ブロック4から直接共有メモリ1-0内のイベントキューへイベントを書き込んでも構わないし、検索キー抽出ブロック内の特定内のメモリ、レジスタを用いて構成しても構わない。いずれにせよ、マルチコアCPU1から見ることのできるアドレス空間上で特定できる領域であれば問題無い。 Furthermore, in the present embodiment, it is assumed that the event queue for the multi-core CPU 1 is configured in the main memory 6, but this event queue is logically different from the flow table in the shared memory 1-0 in the multi-core CPU 1. The event may be written directly to the event queue in the shared memory 1-0 from the search key extraction block 4 or may be configured using a specific memory or register in the search key extraction block. It doesn't matter. In any case, there is no problem as long as it is an area that can be specified on the address space that can be viewed from the multi-core CPU 1.

 最後に、上記で説明した本実施の形態における処理概要を、図5を用いて説明する。本実施の形態を用いたフロー検索処理、図5に示すように、パケットpkt1、pkt2、・・・が到着すると、順に検索キーKey#1、Key#2、・・・を抽出し、それぞれの検索キーをマルチコアCPU内の複数のプロセッサコアへ割り当て、フロー検索処理を行う。ここでマルチコアCPU内の複数のプロセッサコアへの割り当て方式は、例えばラウンドロビン(Round-Robin)のような手法によって、順次割り当てていく方法が考えられるが、これに限定されるものではない。 Finally, an outline of the processing in the present embodiment described above will be described with reference to FIG. As shown in FIG. 5, when the packets pkt1, pkt2,... Arrive, the search keys Key # 1, Key # 2,. A search key is assigned to a plurality of processor cores in the multi-core CPU, and a flow search process is performed. Here, as a method of assigning to a plurality of processor cores in the multi-core CPU, a method of sequentially assigning by a technique such as round-robin is conceivable, but is not limited thereto.

 次に、本発明の第1の実施の形態の作用効果について説明する。 Next, the function and effect of the first embodiment of the present invention will be described.

 上記のように、本実施の形態では、検索キー抽出ブロックにより、予めパケットのヘッダ情報から検索キーとなるフィールドの組を抽出し、それをプロセッサコアに通知することにより、メインメモリに格納されるパケットヘッダを読み出す必要が無いことから、フロー検索の処理性能を向上できる。 As described above, in the present embodiment, the search key extraction block extracts a set of fields as a search key from the header information of the packet in advance, and notifies the processor core of the set to store in the main memory. Since there is no need to read the packet header, the processing performance of the flow search can be improved.

 また、本実施の形態では、フローテーブルをマルチコアCPU内の複数のプロセッサコアで共有する共有メモリ上で構成することにより、DRAM等のメインメモリ上にフローテーブルを構成した場合よりも、エントリの読み出しにかかるアクセスレイテンシが大幅に低減されるため、フロー検索処理に関わる処理性能を向上させることが可能となる。 Also, in this embodiment, the flow table is configured on a shared memory shared by a plurality of processor cores in the multi-core CPU, so that reading of entries is performed more than when the flow table is configured on a main memory such as a DRAM. As a result, the processing latency related to the flow search process can be improved.

 以上のことから、ネットワーク帯域に見合った高速な検索性能を有するパターンマッチング処理を実現できるパターンマッチングシステムを提供することが可能となる。 From the above, it is possible to provide a pattern matching system capable of realizing a pattern matching process having a high-speed search performance corresponding to the network bandwidth.

 次に、本発明の第2の実施の形態について図面を参照して詳細に説明する。
<実施の形態2>
 図6は、本発明の第2の実施の形態の構成を示すブロック図である。図6を参照すると、本発明の第2の実施の形態は、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUMと、パケットの送受信を行うパケット送受信ブロック5と、受信したパケットからフロー検索処理における検索キーとなるヘッダ情報を抽出すると共に、パケット受信イベントの生成と発行、及びパケットをメインメモリへ書き込む検索キー抽出ブロック4と、パケット受信イベントや必要に応じてパケットそのものが書き込まれるDRAM等のメインメモリ6と、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUM、検索キー抽出ブロック4、メインメモリ6をそれぞれ接続するバス3とを含む。本実施の形態は、図1に示した前記第1の実施の形態に、マルチコアCPU2、・・・、マルチコアCPUMを加えたものであり、その他は前記第1の実施の形態と同じである。
Next, a second embodiment of the present invention will be described in detail with reference to the drawings.
<Embodiment 2>
FIG. 6 is a block diagram showing the configuration of the second exemplary embodiment of the present invention. Referring to FIG. 6, the second embodiment of the present invention relates to a multi-core CPU 1, multi-core CPU 2,..., Multi-core CPU M, a packet transmission / reception block 5 for transmitting / receiving packets, and a flow search process from received packets. The search key extraction block 4 that extracts header information as a search key, generates and issues a packet reception event, and writes the packet to the main memory, and a main body such as a DRAM in which the packet reception event and the packet itself are written if necessary. The memory 6 includes a multi-core CPU 1, a multi-core CPU 2,..., A multi-core CPU M, a search key extraction block 4, and a bus 3 to which the main memory 6 is connected. This embodiment is the same as the first embodiment except that multi-core CPUs 2,..., Multi-core CPUM are added to the first embodiment shown in FIG.

 マルチコアCPU2は、マルチコアCPU1と同様の形態であり、N個のプロセッサコア2-1、2-2、・・・、2-Nと、上記N個のプロセッサコアで共有される共有メモリ2-0を備え、共有メモリ2-0には、ハッシュテーブルとして構成されたフローテーブルを保持している。なお、このような形態は他のマルチコアCPUについても同じであり、各マルチコアCPU内の共有メモリ内に構成されるフローテーブルも、全て同じフローテーブルが構成される。なお、本共有メモリがデータキャッシュとして利用されている場合、本フローテーブルはデータキャッシュとは論理的に分離された領域に保持されるものとする。 The multi-core CPU 2 has the same form as the multi-core CPU 1, and the N processor cores 2-1, 2-2,..., 2-N and the shared memory 2-0 shared by the N processor cores The shared memory 2-0 holds a flow table configured as a hash table. Such a configuration is the same for other multi-core CPUs, and all the flow tables configured in the shared memory in each multi-core CPU have the same flow table. When the shared memory is used as a data cache, the flow table is held in an area logically separated from the data cache.

 なお、図6に示すように、本実施の形態は、第1の実施の形態と同様、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUM、検索キー抽出ブロック4、メインメモリ6をバス3によってバス接続している形態であるが、本接続形態はサーバ等に備えられたNorth BridgeやSouth Bridgeと呼ばれるチップセットを通じて接続される形態や、クロスバー、リングバスの形態でも構わない。また、本実施の形態は、サーバ等のマザーボード上で構成しても良いし、メインメモリ6を除いた全てを1つのSoCとして、1つのチップ上に構成しても良い。 As shown in FIG. 6, in the present embodiment, as in the first embodiment, the multi-core CPU 1, the multi-core CPU 2,..., The multi-core CPU M, the search key extraction block 4, and the main memory 6 are connected via the bus 3. Although this is a form of bus connection, this connection form may be a form connected through a chip set called North Bridge or South Bridge provided in a server or the like, a form of a crossbar, or a ring bus. Further, the present embodiment may be configured on a motherboard such as a server, or may be configured on one chip as a single SoC except for the main memory 6.

 次に、図6、及び図7の流れ図を参照して、本発明の第2の実施の形態の動作について詳細に説明する。 Next, the operation of the second exemplary embodiment of the present invention will be described in detail with reference to the flowcharts of FIGS.

 図7の流れ図は、第1の実施の形態の流れ図を変更したものであり、ステップA3をステップA6に、ステップA5をステップA7に置き換えたものである。その他のステップについては、第1の実施の形態と同様であるため、詳細な説明は省略する。 The flowchart of FIG. 7 is a modification of the flowchart of the first embodiment, in which step A3 is replaced with step A6 and step A5 is replaced with step A7. Since other steps are the same as those in the first embodiment, detailed description thereof is omitted.

 パケット送受信ブロック5(ステップA1)を経由して、検索キー抽出ブロック4は、ネットワークから入力されたパケットを受け取り、検索キーの抽出等のステップA2に対する処理を行う。 The search key extraction block 4 receives a packet input from the network via the packet transmission / reception block 5 (step A1), and performs processing for step A2 such as search key extraction.

 次に、検索キー抽出ブロック4は、パケット受信を複数のマルチコアCPUのうちの1つに通知するためのイベントを生成、発行し、イベントキューへ書き込むと共に、対応するパケットをメインメモリ6に書き込む(ステップA6)。ここで、生成するイベントには、第1の実施の形態と同様、抽出した検索キーの情報と当該パケットの識別子を含まれれば良い。また、イベントキューが例えばメインメモリ6に構成されている場合は、メインメモリ6のイベントキューに書き込むが、各マルチコアCPUに対するイベントキューは、それぞれのマルチコアCPUのイベントキューの領域と、パケットデータを書き込む領域が論理的に分離されており、各マルチコアCPUに対するイベントキューの領域のベースアドレスは、それぞれのマルチコアCPUにとって既知であるものとする。イベントの発行は、複数のマルチコアCPUのうちの1つに対してのみ行われ、イベントを発行するマルチコアCPUの選択は、例えば、フロー検索処理を各マルチコアCPUに対して平滑化して行うために、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUM、マルチコアCPU1、・・・のように、Round-Robinの手法を用いる方法が考えられるが、これに限定されるものではない。 Next, the search key extraction block 4 generates and issues an event for notifying one of a plurality of multi-core CPUs of packet reception, writes it in the event queue, and writes the corresponding packet in the main memory 6 ( Step A6). Here, the generated event only needs to include the extracted search key information and the identifier of the packet, as in the first embodiment. For example, when the event queue is configured in the main memory 6, the event queue is written in the event queue of the main memory 6, but the event queue for each multi-core CPU writes the event queue area of each multi-core CPU and packet data. It is assumed that the areas are logically separated, and the base address of the event queue area for each multi-core CPU is known to each multi-core CPU. The event is issued only to one of the plurality of multi-core CPUs, and the selection of the multi-core CPU that issues the event is performed in order to smooth the flow search process for each multi-core CPU, for example. A method using the Round-Robin method such as the multi-core CPU 1, the multi-core CPU 2,..., The multi-core CPU M, the multi-core CPU 1, and so on is conceivable, but is not limited thereto.

 一方、M個のマルチコアCPUは、それぞれ自身のイベントキューにイベントが存在するかをポーリング(polling)によって監視しており、イベントキューにイベントが存在すれば、その先頭のイベントを読み出し、自身の持つ特定のプロセッサコアによってフロー検索処理を行う(ステップA4)。ここで、ステップA4は、それぞれのマルチコアCPUが独立に行う処理となり、それぞれの処理の内容は、第1の実施の形態と同様であるため、ここでは詳細な説明を省略する。 On the other hand, each of the M multi-core CPUs monitors whether there is an event in its own event queue by polling. If there is an event in the event queue, the M multi-core CPU reads the first event and has its own A flow search process is performed by a specific processor core (step A4). Here, step A4 is a process performed independently by each multi-core CPU, and the contents of each process are the same as those in the first embodiment, and thus detailed description thereof is omitted here.

 最後に、実際のフロー検索処理を行った各マルチコアCPU内のプロセッサコアは、そこで得た結果を出力する(ステップA7)。第1の実施の形態では、1つのマルチコアCPU内の複数のプロセッサコアがこの処理を行っていたが、ここでは、複数のマルチコアCPU内の複数のプロセッサコアのうち、実際にフロー検索処理を行ったプロセッサコアが結果の出力を行う点で、第1の実施の形態におけるステップA5と異なる。その他のステップA7の処理内容については、ステップA5と同じであるため、詳細は省略する。 Finally, the processor core in each multi-core CPU that has performed the actual flow search process outputs the result obtained there (step A7). In the first embodiment, a plurality of processor cores in one multi-core CPU perform this processing, but here, among the plurality of processor cores in the plurality of multi-core CPUs, the flow search processing is actually performed. This is different from step A5 in the first embodiment in that the processor core outputs the result. The other processing contents of step A7 are the same as those of step A5, so the details are omitted.

 なお、本実施の形態では、各マルチコアCPU内の共有メモリに構成されるフローテーブルは、全てのマルチコアCPUにおいて同一の内容である。このため、もし、未登録フローを新規フローとして登録する処理を行う場合、あるマルチコアCPU内のプロセッサコアにおいて更新したフローテーブルの内容を、他のマルチコアCPU内のフローテーブルに対しても更新作業を行う必要がある。 In this embodiment, the flow table configured in the shared memory in each multi-core CPU has the same contents in all multi-core CPUs. For this reason, if the process of registering an unregistered flow as a new flow is performed, the contents of the flow table updated in the processor core in a certain multi-core CPU are updated to the flow tables in other multi-core CPUs. There is a need to do.

 また、本実施の形態では、パケットそのものをパケット送受信ブロック5にて、ネットワークから受信する形態であったが、パケットヘッダのみが入力される場合の処理についても第1の実施の形態と同様であり、イベントキューの構成箇所(メインメモリ6、各マルチコアCPU内の共有メモリ、検索キー抽出ブロック4内の特定内のメモリ、レジスタ等)についても、第1の実施の形態と同様であるため、詳細は省略する。 In the present embodiment, the packet itself is received from the network by the packet transmission / reception block 5, but the processing when only the packet header is input is the same as in the first embodiment. The event queue components (main memory 6, shared memory in each multi-core CPU, specific memory in the search key extraction block 4, registers, etc.) are also the same as those in the first embodiment. Is omitted.

 最後に、上記で説明した本実施の形態における処理概要を、図8を用いて説明する。図8に示すように、本実施の形態を用いたフロー検索処理は、パケットpkt1、pkt2、・・・が到着すると、順に検索キーKey#1、Key#2、・・・を抽出し、各検索キーを複数個あるマルチコアCPUのうちの1つへ送出する。送出された各検索キーは、それぞれのマルチコアCPU内の複数のプロセッサコアのうちの1つに割り当て、それぞれのマルチコアCPU内のプロセッサコアによってフロー検索処理を行うことになる。ここで、各マルチコアCPU内の複数のプロセッサコアへの割り当て方法は、例えばRound-Robinのような手法を用いることが考えられるが、これに限定されるものではない。 Finally, an outline of the processing in the present embodiment described above will be described with reference to FIG. As shown in FIG. 8, in the flow search process using this embodiment, when packets pkt1, pkt2,... Arrive, search keys Key # 1, Key # 2,. A search key is sent to one of a plurality of multi-core CPUs. Each transmitted search key is assigned to one of a plurality of processor cores in each multi-core CPU, and a flow search process is performed by the processor core in each multi-core CPU. Here, as a method for assigning to a plurality of processor cores in each multi-core CPU, for example, a method such as Round-Robin may be used, but the method is not limited to this.

 次に、本発明の第2の実施の形態の作用効果について説明する。 Next, the function and effect of the second embodiment of the present invention will be described.

 上記のように、本実施の形態では、第1の実施の形態における作用効果に加えて、マルチコアCPUが複数個あり、それぞれのマルチコアCPUに対して、フロー検索処理を分散させて行うことより、増加したマルチコアCPU分に応じてより一層フロー検索の処理性能を向上できる。また、複数個のマルチコアCPUに対して分散処理が可能となるので、1つのマルチコアCPU内の複数個のプロセッサコア間における共有メモリへのアクセス競合の頻度も低下し、より効率的なフロー検索処理が可能となる。 As described above, in this embodiment, in addition to the operation and effect in the first embodiment, there are a plurality of multi-core CPUs, and the flow search process is distributed to each multi-core CPU. The processing performance of the flow search can be further improved according to the increased number of multi-core CPUs. In addition, since distributed processing can be performed for a plurality of multi-core CPUs, the frequency of contention for access to the shared memory among the plurality of processor cores in one multi-core CPU is reduced, and more efficient flow search processing is performed. Is possible.

 次に、本発明の第3の実施の形態について図面を参照して詳細に説明する。
<実施の形態3>
 図9は、本発明の第3の実施の形態の構成を示すブロック図である。図9を参照すると、本発明の第3の実施の形態は、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUMと、パケットの送受信を行うパケット送受信ブロック5と、受信したパケットからフロー検索処理における検索キーとなるヘッダ情報を抽出すると共に、パケット受信イベントの生成と発行、及びパケットをメインメモリへ書き込む検索キー抽出ブロック4と、複数のマルチコアCPUにおける個別のフロー検索処理の結果を収集し、最終的なフロー検索処理結果を選択する最終結果決定ブロック7と、パケット受信イベントや必要に応じてパケットそのものが書き込まれるDRAM等のメインメモリ6と、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUM、検索キー抽出ブロック4、メインメモリ6、最終結果決定ブロック7をそれぞれ接続するバス3とを含む。本実施の形態は、図6に示した前記第2の実施の形態に、最終結果決定ブロック7を加えたものであり、その他は前記第2の実施の形態と同じである。
Next, a third embodiment of the present invention will be described in detail with reference to the drawings.
<Embodiment 3>
FIG. 9 is a block diagram showing the configuration of the third exemplary embodiment of the present invention. Referring to FIG. 9, the third embodiment of the present invention relates to a multi-core CPU 1, a multi-core CPU 2,..., A multi-core CPU M, a packet transmission / reception block 5 for transmitting / receiving packets, and a flow search process from received packets. The header information as a search key is extracted, the packet reception event is generated and issued, the search key extraction block 4 for writing the packet to the main memory, and the results of the individual flow search processes in a plurality of multi-core CPUs are collected. Final result determination block 7 for selecting a typical flow search processing result, a main memory 6 such as a DRAM in which a packet reception event or a packet itself is written if necessary, a multicore CPU1, a multicore CPU2,..., A multicore CPUM, Search key extraction block 4, main Mori 6, including the final result decision block 7 and the bus 3 connected respectively. This embodiment is the same as the second embodiment except that a final result determination block 7 is added to the second embodiment shown in FIG.

 第2の実施の形態では、各マルチコアCPU内の共有メモリには、全て同一のフローテーブルが構成されていたが、本実施の形態では、元のフローテーブルをM個に分割したうちの1つのフローテーブルが構成される。なお、本共有メモリがデータキャッシュとして利用されている場合、本フローテーブルはデータキャッシュとは論理的に分離された領域に保持されるものとする。図10に、本実施の形態におけるフローテーブルの分割の一例を示す図を示す。図10に示すように、本実施の形態では、元の1つのフローテーブルをM個に分割し、それぞれのサブフローテーブルk(k=1、・・・、M)を、マルチコアCPUk(k=1、・・・、M)内の共有メモリ上に構成する。各サブフローテーブルは、基本的には第1、第2の実施例と同様、ハッシュテーブルをベースとしたフローテーブルであるので、各エントリの内容等についての説明は省略するが、各サブフローテーブルは、そのテーブルだけで完全に独立して構成される。つまり、1つのサブフローテーブルに割り当てられたハッシュ値に対するフローは、同一サブフローテーブル内の空きエントリ領域に格納される。このため、衝突の発生を考慮して、エントリの先頭ワードを示す論理的なアドレス空間よりも小さい範囲でハッシュ値を計算するようにしていた場合には、各サブフローテーブルにおいて、上記が成り立つように構成する必要がある。補足すると以下のようになる。 In the second embodiment, the same flow table is configured for all the shared memories in each multi-core CPU, but in this embodiment, one of the original flow tables divided into M pieces is used. A flow table is configured. When the shared memory is used as a data cache, the flow table is held in an area logically separated from the data cache. FIG. 10 is a diagram illustrating an example of the division of the flow table in the present embodiment. As shown in FIG. 10, in the present embodiment, one original flow table is divided into M pieces, and each sub-flow table k (k = 1,..., M) is divided into multi-core CPUk (k = 1). ,..., M). Each subflow table is basically a flow table based on a hash table, as in the first and second embodiments, so description of the contents of each entry is omitted, but each subflow table is The table is completely independent. That is, a flow for a hash value assigned to one subflow table is stored in an empty entry area in the same subflow table. For this reason, when the hash value is calculated in a range smaller than the logical address space indicating the first word of the entry in consideration of the occurrence of the collision, the above holds in each subflow table. Must be configured. Supplementary information is as follows.

 例えば、元のフローテーブルにおいて衝突の発生を考慮し、エントリの先頭ワードを示す論理的なアドレス空間よりも小さい範囲でハッシュ値を計算するようにし、ハッシュ値によって特定できない領域のエントリがアドレス値の大きな領域に位置していた場合、このテーブルをサブフローテーブルに分割すると、ハッシュ値によって特定できないエントリ領域がM番目のサブフローテーブル等に偏ってしまうことになる。この場合、ハッシュ値によって特定できるエントリ領域しか持たないサブフローテーブルでは、衝突が多発することになり、逆にハッシュ値によって特定できないエントリ領域を多く持つ領域や、又は特定できないエントリ領域しか持たないサブフローテーブルでは、そのエントリ領域を有効に使い切れず、効率的なフローテーブルを構成することができない。このため、サブフローテーブル単位で衝突を考慮したエントリ領域を確保しておく必要がある。 For example, considering the occurrence of a collision in the original flow table, the hash value is calculated in a range smaller than the logical address space indicating the first word of the entry, and the entry in the area that cannot be specified by the hash value is the address value. If this table is divided into subflow tables when located in a large area, the entry area that cannot be specified by the hash value is biased to the Mth subflow table or the like. In this case, in the subflow table having only the entry area that can be specified by the hash value, collisions frequently occur, and conversely, the subflow table having many entry areas that cannot be specified by the hash value or only the entry area that cannot be specified. Thus, the entry area cannot be used up effectively, and an efficient flow table cannot be constructed. For this reason, it is necessary to secure an entry area in consideration of collision for each subflow table.

 なお、図9に示すように、本実施形態は、第1、第2の実施の形態と同様、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUM、検索キー抽出ブロック4、メインメモリ6、最終結果決定ブロック7をバス3によってバス接続している形態であるが、本接続形態はサーバ等に備えられたNorth BridgeやSouth Bridgeと呼ばれるチップセットを通じて接続される形態や、クロスバー、リングバスの形態でも構わない。また、本実施の形態は、サーバ等のマザーボード上で構成しても良いし、メインメモリ6を除いた全てを1つのSoCとして、1つのチップ上に構成しても良い。 As shown in FIG. 9, the present embodiment is similar to the first and second embodiments in that the multi-core CPU 1, the multi-core CPU 2,..., The multi-core CPU M, the search key extraction block 4, the main memory 6, and the final The result determination block 7 is connected to the bus 3 by a bus 3, but this connection form is connected through a chip set called North Bridge or South Bridge provided in a server or the like, a crossbar, or a ring bus. It does not matter in the form. Further, the present embodiment may be configured on a motherboard such as a server, or may be configured on one chip as a single SoC except for the main memory 6.

 次に、図9、及び図11の流れ図を参照して、本発明の第3の実施の形態の動作について詳細に説明する。 Next, the operation of the third exemplary embodiment of the present invention will be described in detail with reference to the flowcharts of FIG. 9 and FIG.

 図11の流れ図は、図7に示す第2の実施の形態の流れ図を変更したものであり、ステップA6をステップA8に、ステップA7をステップA10に置き換え、フロー検索処理(ステップA4)と結果出力ステップ(ステップA10)の間に最終結果決定ステップ(ステップA9)を追加したものである。その他のステップについては、第1、第2の実施の形態と同様であるため、詳細な説明は省略する。 The flowchart of FIG. 11 is a modification of the flowchart of the second embodiment shown in FIG. 7, in which step A6 is replaced with step A8, and step A7 is replaced with step A10, and flow search processing (step A4) and result output are performed. A final result determination step (step A9) is added between the steps (step A10). Since other steps are the same as those in the first and second embodiments, detailed description thereof is omitted.

 パケット送受信ブロック5(ステップA1)を経由して、検索キー抽出ブロック4は、ネットワークから入力されたパケットを受け取り、検索キーの抽出等のステップA2に対する処理を行う。 The search key extraction block 4 receives a packet input from the network via the packet transmission / reception block 5 (step A1), and performs processing for step A2 such as search key extraction.

 次に、検索キー抽出ブロック4は、パケット受信を複数のマルチコアCPUの全てに通知するためのイベントを生成、発行し、イベントキューへ書き込むと共に、対応するパケットをメインメモリ6に書き込む(ステップA8)。ここで、生成するイベントの情報や、各マルチコアCPUに対するイベントキューについては、第2の実施の形態と同様であるため、詳細は省略する。 Next, the search key extraction block 4 generates and issues an event for notifying all of the plurality of multi-core CPUs of packet reception, writes the event in the event queue, and writes the corresponding packet in the main memory 6 (step A8). . Here, the information on the event to be generated and the event queue for each multi-core CPU are the same as those in the second embodiment, and thus the details are omitted.

 一方、M個のマルチコアCPUは、それぞれ自身のイベントキューにイベントが存在するかをポーリング(polling)によって監視しており、イベントキューにイベントが存在すれば、その先頭のイベントを読み出し、自身の持つ特定のプロセッサコアによってフロー検索処理を行う(ステップA4)。ここで、ステップA4は、それぞれのマルチコアCPUが独立に行う処理となり、それぞれの処理の内容は、第1、第2の実施の形態と同様であるため、ここでは詳細な説明を省略する。 On the other hand, each of the M multi-core CPUs monitors whether there is an event in its own event queue by polling. If there is an event in the event queue, the M multi-core CPU reads the first event and has its own A flow search process is performed by a specific processor core (step A4). Here, step A4 is a process performed independently by each multi-core CPU, and the contents of each process are the same as those in the first and second embodiments, and thus detailed description thereof is omitted here.

 各マルチコアCPU内の複数のプロセッサコアのうち、フロー検索処理を行ったプロセッサコアは、そこで得た結果と、処理を行ったパケットを識別する識別子を最終結果決定ブロック7に出力する。M個のマルチコアCPUがフロー検索処理を行った同一パケットに対する検索結果が全て(M個)揃うと、最終結果決定ブロック7は、そのうちフローIDを特定できた結果が存在するかの確認を行い、特定できていれば、その結果を最終結果とし、出力する。一方、全ての検索結果がフローIDを特定できていなければ、最終結果を未登録フローとして出力する(ステップA9、ステップA10)。なお、この最終結果の出力先は、第1、第2の実施形態と同じであるので、詳細は省略する。さらに、上記では、M個のマルチコアCPUからの検索結果が全て揃った時点で最終結果の確認、決定を行っているが、処理対象のパケットに対する最終結果を特定できる変数やレジスタ等を用意し、マルチコアCPUからの検索結果が返信される度に順次結果を確認していく手法を用いても良い。 Among the plurality of processor cores in each multi-core CPU, the processor core that has performed the flow search process outputs the result obtained there and the identifier for identifying the processed packet to the final result determination block 7. When all (M) search results for the same packet for which the M multi-core CPUs have performed the flow search processing are prepared, the final result determination block 7 checks whether there is a result of which the flow ID can be specified. If it can be specified, the result is output as the final result. On the other hand, if all the search results cannot identify the flow ID, the final result is output as an unregistered flow (step A9, step A10). Note that the output destination of this final result is the same as in the first and second embodiments, and details thereof are omitted. Furthermore, in the above, the final result is confirmed and determined when all the search results from the M multi-core CPUs are prepared. However, a variable, a register, or the like that can specify the final result for the packet to be processed is prepared. A method of sequentially checking the results every time search results from the multi-core CPU are returned may be used.

 なお、本実施の形態では、各マルチコアCPU内の共有メモリに構成されるサブフローテーブルは、全て独立した内容となっている。このため、もし、最終結果決定ブロック7で、フローIDを特定できなかった未登録フローを新規フローとして登録する処理を行う場合には、各マルチコアCPU内のサブフローテーブルのうち、適切なサブフローテーブルにフロー登録を行う処理が必要である。 In the present embodiment, all the subflow tables configured in the shared memory in each multi-core CPU have independent contents. For this reason, if the final result determination block 7 performs processing for registering an unregistered flow for which a flow ID could not be specified as a new flow, an appropriate subflow table among the subflow tables in each multi-core CPU is used. Processing to perform flow registration is required.

 また、本実施の形態では、パケットそのものをパケット送受信ブロック5にて、ネットワークから受信する形態であったが、パケットヘッダのみが入力される場合の処理についても第1の実施の形態と同様であり、イベントキューの構成箇所(メインメモリ6、各マルチコアCPU内の共有メモリ、検索キー抽出ブロック4内の特定内のメモリ、レジスタ等)についても、第1、第2の実施の形態と同様であるため、詳細は省略する。 In the present embodiment, the packet itself is received from the network by the packet transmission / reception block 5, but the processing when only the packet header is input is the same as in the first embodiment. The event queue components (main memory 6, shared memory in each multi-core CPU, specific memory in the search key extraction block 4, registers, etc.) are also the same as in the first and second embodiments. Therefore, details are omitted.

 最後に、上記で説明した本実施の形態における処理概要を、図12を用いて説明する。図12に示すように、本実施の形態におけるフロー検索処理は、パケットpkt1、pkt2、・・・が到着すると、順に検索キーKey#1、Key#2、・・・を抽出し、各検索キーを全てのマルチコアCPUへ送出する。送出された各検索キーは、各マルチコアCPU内の複数のプロセッサコアのうちの1つに割り当て、それぞれのマルチコアCPU内のプロセッサコアによってフロー検索処理を行い、最後に最終結果決定ブロックによって最終結果を決定することになる。ここで、各マルチコアCPU内の複数のプロセッサコアへの割り当ては、例えばRound-Robinのような手法を用いることが考えられるが、これに限定されるものではない。 Finally, an outline of the processing in the present embodiment described above will be described with reference to FIG. As shown in FIG. 12, in the flow search process according to the present embodiment, when packets pkt1, pkt2,... Arrive, search keys Key # 1, Key # 2,. Is sent to all multi-core CPUs. Each sent search key is assigned to one of a plurality of processor cores in each multi-core CPU, a flow search process is performed by the processor core in each multi-core CPU, and finally a final result is determined by a final result determination block. Will be determined. Here, for example, a technique such as Round-Robin may be used for allocation to a plurality of processor cores in each multi-core CPU, but the present invention is not limited to this.

 次に、本発明の第3の実施の形態の作用効果について説明する。 Next, the function and effect of the third embodiment of the present invention will be described.

 上記のように、本実施の形態では、第2の実施の形態とは異なり、複数個のマルチコアCPU内に存在するフローテーブルは、全て独立した内容となっている。このため、複数個のマルチコアCPUに対する分散処理はできないが、第1の実施の形態における作用効果に加えて、第2の実施の形態におけるフローエントリ数がM倍となり、より多くのフローを登録、管理、検索することが可能となる。 As described above, in the present embodiment, unlike the second embodiment, the flow tables existing in a plurality of multi-core CPUs all have independent contents. For this reason, distributed processing cannot be performed for a plurality of multi-core CPUs, but in addition to the effects of the first embodiment, the number of flow entries in the second embodiment is M times, and more flows are registered. It becomes possible to manage and search.

 次に、本発明の第4の実施の形態について図面を参照して詳細に説明する。
<実施の形態4>
 図13は、本発明の第4の実施の形態の構成を示すブロック図である。図13を参照すると、本発明の第4の実施の形態は、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUMと、パケットの送受信を行うパケット送受信ブロック5と、受信したパケットからフロー検索処理における検索キーとなるヘッダ情報を抽出する検索キー抽出ブロック8と、抽出した検索キーに対するハッシュ値を計算すると共に、パケット受信イベントの生成と発行、及びパケットをメインメモリへ書き込むハッシュ値計算ブロック9と、パケット受信イベントや必要に応じてパケットそのものが書き込まれるDRAM等のメインメモリ6と、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUM、検索キー抽出ブロック8、ハッシュ値計算ブロック9、メインメモリ6をそれぞれ接続するバス3とを含む。本実施の形態は、図9に示した前記第3の実施の形態から、最終結果決定ブロック7を除き、検索キー抽出ブロック4を検索キー抽出ブロック8に変更し、ハッシュ値計算ブロック9を加えたものであり、その他は前記第2の実施の形態と同じである。
Next, a fourth embodiment of the present invention will be described in detail with reference to the drawings.
<Embodiment 4>
FIG. 13 is a block diagram showing the configuration of the fourth exemplary embodiment of the present invention. Referring to FIG. 13, the fourth embodiment of the present invention relates to a multi-core CPU 1, a multi-core CPU 2,..., A multi-core CPU M, a packet transmission / reception block 5 for transmitting / receiving packets, and a flow search process from received packets. A search key extraction block 8 for extracting header information serving as a search key; a hash value calculation block 9 for calculating a hash value for the extracted search key, generating and issuing a packet reception event, and writing the packet to the main memory; A main memory 6 such as a DRAM in which a packet reception event or a packet itself is written as necessary, a multi-core CPU 1, a multi-core CPU 2,..., A multi-core CPU M, a search key extraction block 8, a hash value calculation block 9, and a main memory 6 Each connected bus Including a 3 and. This embodiment is different from the third embodiment shown in FIG. 9 except that the final result determination block 7 is changed, the search key extraction block 4 is changed to a search key extraction block 8, and a hash value calculation block 9 is added. The others are the same as those of the second embodiment.

 また、本実施の形態における各マルチコアCPU内の共有メモリには、第3の実施の形態と同様、元のフローテーブルをM個に分割したうちの1つのフローテーブルを構成する。本フローテーブルについては、第3の実施の形態と同様であるため、詳細な説明は省略する。 Also, in the shared memory in each multi-core CPU in the present embodiment, one flow table of the original flow table divided into M pieces is configured as in the third embodiment. Since this flow table is the same as that of the third embodiment, detailed description thereof is omitted.

 なお、図13においても、第1、第2、第3の実施の形態と同様、マルチコアCPU1、マルチコアCPU2、・・・、マルチコアCPUM、ハッシュ値計算ブロック9、メインメモリ6をバス3によってバス接続している形態であるが、本接続形態はサーバ等に備えられたNorth BridgeやSouth Bridgeと呼ばれるチップセットを通じて接続される形態や、クロスバー、リングバスの形態でも構わない。また、本実施の形態は、サーバ等のマザーボード上で構成しても良いし、メインメモリ6を除いた全てを1つのSoCとして、1つのチップ上に構成しても良い。 In FIG. 13, as in the first, second, and third embodiments, the multi-core CPU 1, the multi-core CPU 2,..., The multi-core CPU M, the hash value calculation block 9, and the main memory 6 are connected by the bus 3. However, this connection form may be a form connected through a chip set called North Bridge or South Bridge provided in a server or the like, a form of a crossbar, or a ring bus. Further, the present embodiment may be configured on a motherboard such as a server, or may be configured on one chip as a single SoC except for the main memory 6.

 次に、図13、及び図14の流れ図を参照して、本発明の第3の実施の形態の動作について詳細に説明する。 Next, the operation of the third exemplary embodiment of the present invention will be described in detail with reference to the flowcharts of FIGS.

 図13の流れ図におけるステップA1、ステップA2は、第1、第2、第3の実施の形態におけるステップA1、ステップA2と同様であるため、詳細な説明は省略するが、パケット送受信ブロック5(ステップA1)を経由して、検索キー抽出ブロック8は、ネットワークから入力されたパケットを受け取り、検索キーの抽出等のステップA2に対する処理を行う。 Step A1 and step A2 in the flowchart of FIG. 13 are the same as step A1 and step A2 in the first, second, and third embodiments, and thus detailed description is omitted, but the packet transmission / reception block 5 (step Via A1), the search key extraction block 8 receives a packet input from the network and performs processing for step A2 such as search key extraction.

 次に、検索キー抽出ブロック8は、抽出した検索キーとパケット送受信ブロック5から受け取ったパケットをハッシュ値計算ブロック9へ出力し、ハッシュ値計算ブロック9は、受け取った検索キーに対するハッシュ値を計算する(ステップA11)。ここで、ハッシュ値計算ブロック9において算出するハッシュ値については、第1、第2、第3の実施例におけるフロー検索処理(ステップA4)におけるハッシュ値の算出(ステップB2)と同じであるため、詳細な説明は省略する。 Next, the search key extraction block 8 outputs the extracted search key and the packet received from the packet transmission / reception block 5 to the hash value calculation block 9, and the hash value calculation block 9 calculates a hash value for the received search key. (Step A11). Here, the hash value calculated in the hash value calculation block 9 is the same as the hash value calculation (step B2) in the flow search process (step A4) in the first, second, and third embodiments. Detailed description is omitted.

 ハッシュ値計算ブロック9は、ハッシュ値の計算を行うと、パケット受信を複数のマルチコアCPUのうちの1つに通知するためのイベントを生成、発行し、イベントキューへ書き込むと共に、対応するパケットをメインメモリ6に書き込む(ステップA12)。ここで、生成するイベントには、抽出した検索キーの情報と、対応するパケットの識別子に加え、算出したハッシュ値の値を含むものであれば良く、各マルチコアCPUに対するイベントキューの構成については、第3の実施の形態と同様である。 When the hash value calculation block 9 calculates the hash value, the hash value calculation block 9 generates and issues an event for notifying one of the multi-core CPUs of packet reception, writes the event to the event queue, and stores the corresponding packet in the main Write to the memory 6 (step A12). Here, the event to be generated only needs to include the value of the calculated hash value in addition to the extracted search key information and the corresponding packet identifier. For the configuration of the event queue for each multi-core CPU, This is the same as in the third embodiment.

 また、ハッシュ値計算ブロック9が生成し、発行するイベントは、M個のマルチコアCPUのうちの1つである。このイベントを発行するマルチコアCPUの選択方法については、以下に説明する。上述したように、本実施の形態におけるマルチコアCPUは、第3の実施の形態と同様、元の1つのフローテーブルをM個に分割したサブフローテーブルのうちの1つを共有メモリ上に構成している。ハッシュ値計算ブロック9には、各サブフローテーブルに含まれるベースアドレスを予め設定しておく。このため、ハッシュ値計算ブロック9は、ハッシュ値の算出を行うと、当該ハッシュ値に対応するエントリ群が、M個のマルチコアCPUのうちのどのサブフローテーブル内に登録されているかを判断することが可能となる。ハッシュ値計算ブロック9は、算出したハッシュ値に応じて、そのハッシュ値に対応するエントリ群が登録されているマルチコアCPUに対して、パケット受信イベントを発行する。 The event generated and issued by the hash value calculation block 9 is one of M multi-core CPUs. A method of selecting a multi-core CPU that issues this event will be described below. As described above, the multi-core CPU according to the present embodiment configures one of the original flow tables into M sub-flow tables on the shared memory as in the third embodiment. Yes. In the hash value calculation block 9, a base address included in each subflow table is set in advance. Therefore, when the hash value calculation block 9 calculates the hash value, the hash value calculation block 9 may determine in which subflow table of the M multi-core CPUs the entry group corresponding to the hash value is registered. It becomes possible. The hash value calculation block 9 issues a packet reception event to the multi-core CPU in which the entry group corresponding to the hash value is registered according to the calculated hash value.

 一方、M個のマルチコアCPUは、それぞれ自身のイベントキューにイベントが存在するかをポーリング(polling)によって監視しており、イベントキューにイベントが存在すれば、その先頭のイベントを読み出し、自身の持つ特定のプロセッサコアによってフロー検索処理を行う(ステップA13)。 On the other hand, each of the M multi-core CPUs monitors whether there is an event in its own event queue by polling. If there is an event in the event queue, the M multi-core CPU reads the first event and has its own A flow search process is performed by a specific processor core (step A13).

 図15は、ステップA13のより詳細な動作を説明するための流れ図である。本フロー検索処理は、M個のマルチコアCPUにおいてフロー検索処理を行うプロセッサコアに共通の処理であり、以下では、任意のマルチコアCPUにおける任意のプロセッサコアに対する処理として説明する。 FIG. 15 is a flowchart for explaining more detailed operation of step A13. This flow search process is a process common to the processor cores that perform the flow search process in the M multi-core CPUs, and will be described below as a process for any processor core in any multi-core CPU.

 図15を参照すると、本実施の形態におけるフロー検索処理(ステップA13)は、第1、第2、第3の実施の形態におけるフロー検索処理(ステップA4)のうち、ハッシュ値の計算処理(ステップB2)を除いた処理の流れであり、上述したように、受信したイベントにはハッシュ値計算ブロック9において算出されたハッシュ値が含まれる。本実施の形態におけるフロー検索処理においては、第1、第2、第3の実施の形態におけるステップB2において算出するはずのハッシュ値として、受信したイベントに含まれるハッシュ値を用いて処理を行う。本実施の形態におけるフロー検索処理(ステップA13)における他の詳細な処理については、第1、第2、第3の実施の形態におけるフロー検索処理(ステップA4)と同じであるため、詳細な説明は省略する。 Referring to FIG. 15, the flow search process (step A13) in the present embodiment is a hash value calculation process (step A4) of the flow search processes (step A4) in the first, second, and third embodiments. This is a processing flow excluding B2). As described above, the received event includes the hash value calculated in the hash value calculation block 9. In the flow search process in the present embodiment, processing is performed using the hash value included in the received event as the hash value that should be calculated in step B2 in the first, second, and third embodiments. The other detailed processing in the flow search processing (step A13) in the present embodiment is the same as the flow search processing (step A4) in the first, second, and third embodiments, and thus detailed description thereof. Is omitted.

 フロー検索処理(ステップA13)が終了すると、プロセッサコアは、その結果を出力する(ステップA7)。本結果出力ステップA7は、第2の実施の形態と同様であるため、詳細な説明は省略する。なお、本実施の形態では、各マルチコアCPU内の共有メモリに構成されるサブフローテーブルは、全て独立した内容となっている。このため、もし、フローIDを特定できなかった未登録フローを新規フローとして登録する処理を行う場合には、フロー登録処理を行う必要があるが、第3の実施の形態と異なり、1つのパケットに対するフロー検索処理は、1つのマルチコアCPUのうちの1つのプロセッサコアでのみ行われるため、各マルチコアCPUに独立した登録処理となる。 When the flow search process (step A13) ends, the processor core outputs the result (step A7). Since this result output step A7 is the same as that of the second embodiment, detailed description thereof is omitted. In the present embodiment, the subflow tables configured in the shared memory in each multi-core CPU all have independent contents. For this reason, if processing for registering an unregistered flow for which a flow ID could not be specified as a new flow is performed, it is necessary to perform flow registration processing. Unlike the third embodiment, one packet Since the flow search process is performed only on one processor core of one multi-core CPU, it is a registration process independent of each multi-core CPU.

 また、本実施の形態では、パケットそのものをパケット送受信ブロック5にて、ネットワークから受信する形態であったが、パケットヘッダのみが入力される場合の処理についても第1の実施の形態と同様である。さらに、各マルチコアCPUに対するイベントキューは、各マルチコアCPU内の共有メモリにおいて、フローテーブルとは論理的に分離された領域に構成し、ハッシュ値計算ブロック9から直接共有メモリ内のイベントキューへイベントを書き込んでも構わないし、ハッシュ値計算ブロック9内の特定内のメモリ、レジスタを用いて構成しても構わない。いずれにせよ、各マルチコアCPUから見ることのできるアドレス空間上で、自身のマルチコアCPUに対するイベントキューとして特定できる領域であれば問題無い。 In the present embodiment, the packet itself is received from the network by the packet transmission / reception block 5, but the processing when only the packet header is input is the same as in the first embodiment. . Further, the event queue for each multi-core CPU is configured in an area logically separated from the flow table in the shared memory in each multi-core CPU, and events are directly sent from the hash value calculation block 9 to the event queue in the shared memory. You may write in and you may comprise using the memory in the specific in the hash value calculation block 9, and a register | resistor. In any case, there is no problem as long as it is an area that can be specified as an event queue for its own multi-core CPU in an address space that can be viewed from each multi-core CPU.

 最後に、上記で説明した本実施の形態における処理概要を、図16を用いて説明する。本実施の形態を用いたフロー検索処理、図16に示すように、パケットpkt1、pkt2、・・・が到着すると、順に検索キーKey#1、Key#2、・・・を抽出し、各検索キーに対するハッシュ値を算出する。続いて、算出したハッシュ値に応じて、当該ハッシュ値に対するエントリ群が登録されているマルチコアCPUに対して検索キーとハッシュ値を送出する。送出された検索キーとハッシュ値は、それぞれのマルチコアCPU内の複数のプロセッサコアのうちの1つに割り当て、それぞれのマルチコアCPU内のプロセッサコアによってフロー検索処理を行うことになる。ここで、各マルチコアCPU内の複数のプロセッサコアへの割り当ては、例えばRound-Robinのような手法を用いることが考えられるが、これに限定されるものではない。 Finally, an outline of processing in the present embodiment described above will be described with reference to FIG. Flow search processing using this embodiment, as shown in FIG. 16, when packets pkt1, pkt2,... Arrive, search keys Key # 1, Key # 2,. Calculate a hash value for the key. Subsequently, according to the calculated hash value, the search key and the hash value are sent to the multi-core CPU in which the entry group for the hash value is registered. The transmitted search key and hash value are assigned to one of a plurality of processor cores in each multi-core CPU, and flow search processing is performed by the processor core in each multi-core CPU. Here, for example, a technique such as Round-Robin may be used for allocation to a plurality of processor cores in each multi-core CPU, but the present invention is not limited to this.

 次に、本発明の第4の実施の形態の作用効果について説明する。 Next, the function and effect of the fourth embodiment of the present invention will be described.

 上記のように、本実施の形態では、第3の実施の形態と同様、各マルチコアCPU内のフローテーブルは全て独立した内容となっている。このため、より多くのフローを登録、管理、検索することが可能である。また、本実施の形態では、ハッシュ値の算出を各プロセッサコアではなく、ハッシュ値計算ブロックにて行い、算出したハッシュ値に応じてフロー検索処理を行うべきフローテーブルを有するマルチコアCPUを特定することが可能となる。このため、第3の実施の形態とは異なり、全てのマルチコアCPU内の1つのプロセッサコアでフロー検索処理を行う必要が無く、複数個のプロセッサコアに対して効率的にフロー検索処理を分散させることができるため、フロー検索処理の性能が向上する。 As described above, in the present embodiment, as in the third embodiment, all the flow tables in each multi-core CPU have independent contents. For this reason, it is possible to register, manage, and search more flows. In this embodiment, the hash value is calculated not in each processor core but in the hash value calculation block, and a multi-core CPU having a flow table to be subjected to a flow search process according to the calculated hash value is specified. Is possible. Therefore, unlike the third embodiment, there is no need to perform flow search processing with one processor core in all multi-core CPUs, and the flow search processing is efficiently distributed to a plurality of processor cores. Therefore, the performance of the flow search process is improved.

 上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限定されない。
 (付記1)複数のプロセッサコアと、同一CPU内に存在するそれらプロセッサコア間で共有される共有メモリから構成されるマルチコアCPUを有するマルチコアプロセッサ環境において、
 共有メモリ上に検索パターンを備えるパターンテーブルを構成し、
 複数のプロセッサコアで検索対象となる検索キーに対するパターンマッチング処理を並列に実行する
 ことを特徴とするパターンマッチング方法。
A part or all of the above embodiment can be described as in the following supplementary notes, but is not limited to the following.
(Supplementary Note 1) In a multi-core processor environment having a multi-core CPU composed of a plurality of processor cores and a shared memory shared among those processor cores existing in the same CPU,
Configure a pattern table with search patterns on shared memory,
A pattern matching method, wherein a pattern matching process for a search key to be searched is executed in parallel by a plurality of processor cores.

 (付記2)前記パターンテーブルは、ハッシュテーブルとして構成されることを特徴とする付記1に記載のパターンマッチング方法。 (Supplementary note 2) The pattern matching method according to supplementary note 1, wherein the pattern table is configured as a hash table.

 (付記3)パケットの送受信を行い、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させ、
 1つのマルチコアCPUでパターンマッチング処理を行うこと
 を特徴とする付記1又は付記2に記載のパターンマッチング方法。
(Appendix 3) Send and receive packets,
Extract a set of arbitrary fields that are search keys from the header information of the received packet, generate and issue a packet reception event, and store the packet data in the main memory,
The pattern matching method according to appendix 1 or appendix 2, wherein pattern matching processing is performed by one multi-core CPU.

 (付記4)抽出した検索キーを含むパケット受信イベントを生成、発行し、
 受信したパケット受信イベントを前記マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当て、
 パケット受信イベントが割り当たったプロセッサコアでパターンマッチング処理を行うこと
 を特徴とする付記3に記載のパターンマッチング方法。
(Appendix 4) Generate and issue a packet reception event including the extracted search key,
Sequentially assigns received packet reception events to one of a plurality of processor cores of the multi-core CPU;
4. The pattern matching method according to appendix 3, wherein pattern matching processing is performed by a processor core to which a packet reception event is assigned.

 (付記5)パケットの送受信を行い、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させ、
 複数個のマルチコアCPUが有する共有メモリ上に、同一のパターンテーブルを構成し、
 複数個のマルチコアCPUのうちの1つでパターンマッチング処理を行うこと
 を特徴とする付記1又は付記2に記載のパターンマッチング方法。
(Appendix 5) Sending and receiving packets
Extract a set of arbitrary fields that are search keys from the header information of the received packet, generate and issue a packet reception event, and store the packet data in the main memory,
Configure the same pattern table on the shared memory of multiple multi-core CPUs,
The pattern matching method according to appendix 1 or appendix 2, wherein pattern matching processing is performed by one of a plurality of multi-core CPUs.

 (付記6)抽出した検索キーを含むパケット受信イベントを生成し、
 生成したパケット受信イベントは、複数個のマルチコアCPUのうちの1つに対して順次発行する
 ことを特徴とする付記5に記載のパターンマッチング方法。
(Appendix 6) Generate a packet reception event including the extracted search key,
The pattern matching method according to appendix 5, wherein the generated packet reception event is sequentially issued to one of a plurality of multi-core CPUs.

 (付記7)前記パケット受信イベントを受信したマルチコアCPUにおいて、
 受信したパケット受信イベントを、当該マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする付記6に記載のパターンマッチング方法。
(Supplementary Note 7) In the multi-core CPU that has received the packet reception event,
The pattern matching method according to claim 6, wherein the received packet reception event is sequentially assigned to one of a plurality of processor cores of the multi-core CPU.

 (付記8)パケットの送受信を行い、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させ、
 複数個のマルチコアCPUが有する共有メモリ上に、パターンテーブルをマルチコアCPUと同数に分割したサブパターンテーブルのうちの1つを構成し、
 複数個のマルチコアCPUの全てでパターンマッチング処理を行い、
 各マルチコアCPU内の複数のプロセッサコアにおいて行ったパターンマッチングの結果を収集し、最終的なパターンマッチング結果を決定する
 ことを特徴とする付記1又は付記2に記載のパターンマッチング方法。
(Appendix 8) Send and receive packets
Extract a set of arbitrary fields that are search keys from the header information of the received packet, generate and issue a packet reception event, and store the packet data in the main memory,
On one shared memory of a plurality of multi-core CPUs, one of the sub-pattern tables obtained by dividing the pattern table into the same number as the multi-core CPUs is configured.
Perform pattern matching on all of the multi-core CPUs.
The pattern matching method according to Supplementary Note 1 or Supplementary Note 2, wherein results of pattern matching performed in a plurality of processor cores in each multi-core CPU are collected and a final pattern matching result is determined.

 (付記9)抽出した検索キーを含むパケット受信イベントを生成し、
 生成したパケット受信イベントは、複数個のマルチコアCPUの全てに対して発行する
 ことを特徴とする付記8に記載のパターンマッチング方法。
(Appendix 9) Generate a packet reception event including the extracted search key,
The pattern matching method according to appendix 8, wherein the generated packet reception event is issued to all of the plurality of multi-core CPUs.

(付記10)前記複数個のマルチコアCPUにおいて、
 受信したパケット受信イベントを、当該マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする付記9に記載のパターンマッチング方法。
(Supplementary Note 10) In the plurality of multi-core CPUs,
The pattern matching method according to appendix 9, wherein the received packet reception event is sequentially assigned to one of a plurality of processor cores of the multi-core CPU.

 (付記11)パケットの送受信を行い、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、
 抽出した検索キーに対するハッシュ値の算出を行い、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させ、
 複数個のマルチコアCPUが有する共有メモリ上に、パターンテーブルをマルチコアCPUと同数に分割したサブパターンテーブルのうちの1つを構成し、
 複数個のマルチコアCPUのうちの1つでパターンマッチング処理を行う
 ことを特徴とする付記1又は付記2に記載のパターンマッチング方法。
(Appendix 11) Sending and receiving packets,
Extract any set of fields that will be the search key from the header information of the received packet,
The hash value for the extracted search key is calculated, a packet reception event is generated and issued, and the packet data is stored in the main memory.
On one shared memory of a plurality of multi-core CPUs, one of the sub-pattern tables obtained by dividing the pattern table into the same number as the multi-core CPUs is configured.
The pattern matching method according to appendix 1 or appendix 2, wherein pattern matching processing is performed by one of a plurality of multi-core CPUs.

 (付記12)抽出した検索キーと算出したハッシュ値を含むパケット受信イベントを生成し、
 生成したパケット受信イベントは、算出したハッシュ値に対応したエントリ群を保持するサブパターンテーブルを有するマルチコアCPUに対して発行する
 ことを特徴とする付記11に記載のパターンマッチング方法。
(Supplementary Note 12) Generate a packet reception event including the extracted search key and the calculated hash value,
12. The pattern matching method according to appendix 11, wherein the generated packet reception event is issued to a multi-core CPU having a sub-pattern table that holds an entry group corresponding to the calculated hash value.

 (付記13)前記パケット受信イベントを受信したマルチコアCPUにおいて、
 受信したパケット受信イベントを、当該マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする付記12に記載のパターンマッチング方法。
(Additional remark 13) In multi-core CPU which received the said packet reception event,
The pattern matching method according to appendix 12, wherein the received packet reception event is sequentially assigned to one of a plurality of processor cores of the multi-core CPU.

 (付記14)複数のプロセッサコアと、同一CPU内に存在するそれらプロセッサコア間で共有される共有メモリを備えるマルチコアCPUを有するマルチコアプロセッサ環境において、
 前記共有メモリ上は、検索パターンを備えるパターンテーブルを含み、
 前記複数のプロセッサコアで検索対象となる検索キーに対するパターンマッチング処理を並列に実行する
 ことを特徴とするパターンマッチング装置。
(Supplementary Note 14) In a multi-core processor environment having a multi-core CPU including a plurality of processor cores and a shared memory shared between the processor cores existing in the same CPU,
The shared memory includes a pattern table including a search pattern,
A pattern matching apparatus that performs a pattern matching process on a search key to be searched by the plurality of processor cores in parallel.

 (付記15)前記パターンテーブルは、ハッシュテーブルとして構成することを特徴とする付記14に記載のパターンマッチング装置。 (Supplementary note 15) The pattern matching device according to Supplementary note 14, wherein the pattern table is configured as a hash table.

 (付記16)1つのマルチコアCPUと、
 パケットの送受信を行うパケット送受信ブロックと、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させる検索キー抽出ブロックと、
 パケットデータを記憶するメインメモリと、
 前記マルチコアCPUと検索キー抽出ブロックとメインメモリを接続するバスと
 を備えたことを特徴とする付記14又は付記15に記載のパターンマッチング装置。
(Supplementary Note 16) One multi-core CPU;
A packet transmission / reception block for transmitting and receiving packets;
A search key extraction block for extracting a set of arbitrary fields to be a search key from the header information of the received packet, generating and issuing a packet reception event, and storing the packet data in the main memory;
Main memory for storing packet data;
The pattern matching device according to appendix 14 or appendix 15, further comprising: a bus connecting the multi-core CPU, a search key extraction block, and a main memory.

 (付記17)抽出した検索キーを含むパケット受信イベントを生成、発行し、
 パケット受信イベントは、前記マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする付記16に記載のパターンマッチング装置。
(Supplementary Note 17) Generate and issue a packet reception event including the extracted search key,
The pattern matching apparatus according to appendix 16, wherein packet reception events are sequentially assigned to one of a plurality of processor cores included in the multi-core CPU.

 (付記18)複数個のマルチコアCPUと、
 パケットの送受信を行うパケット送受信ブロックと、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させる検索キー抽出ブロックと、
 パケットデータを記憶するメインメモリと、
 複数個のマルチコアCPUと検索キー抽出ブロックとメインメモリを接続するバスとを備え、
 複数個のマルチコアCPUが有する共有メモリ上には、同一のパターンテーブルが構成される
 ことを特徴とする付記14又は付記15に記載のパターンマッチング装置。
(Supplementary Note 18) a plurality of multi-core CPUs;
A packet transmission / reception block for transmitting and receiving packets;
A search key extraction block for extracting a set of arbitrary fields to be a search key from the header information of the received packet, generating and issuing a packet reception event, and storing the packet data in the main memory;
Main memory for storing packet data;
A plurality of multi-core CPUs, a search key extraction block, and a bus connecting the main memory;
The pattern matching apparatus according to appendix 14 or appendix 15, wherein the same pattern table is configured on a shared memory of a plurality of multi-core CPUs.

 (付記19)抽出した検索キーを含むパケット受信イベントを生成し、
 パケット受信イベントは、複数個のマルチコアCPUのうちの1つに対して発行する
 ことを特徴とする付記18に記載のパターンマッチング装置。
(Supplementary note 19) Generate a packet reception event including the extracted search key,
The pattern matching apparatus according to appendix 18, wherein the packet reception event is issued to one of a plurality of multi-core CPUs.

 (付記20)前記パケット受信イベントを受信したマルチコアCPUは、
 受信したパケット受信イベントを、当該マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする付記19に記載のパターンマッチング装置。
(Supplementary Note 20) The multi-core CPU that has received the packet reception event
The pattern matching apparatus according to appendix 19, wherein the received packet reception event is sequentially assigned to one of a plurality of processor cores of the multi-core CPU.

 (付記21)複数個のマルチコアCPUと、
 パケットの送受信を行うパケット送受信ブロックと、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させる検索キー抽出ブロックと、
 パケットデータを記憶するメインメモリと、
 各マルチコアCPU内の複数のプロセッサコアにおけるパターンマッチング結果を収集し、最終的なパターンマッチング結果を決定する最終結果決定ブロックと、
 複数個のマルチコアCPUと検索キー抽出ブロックとメインメモリと最終結果決定ブロックを接続するバスとを備え、
 複数個のマルチコアCPUが有する共有メモリ上には、パターンテーブルをマルチコアCPUと同数に分割したサブパターンテーブルのうちの1つが構成される
 ことを特徴とする付記14又は付記15に記載のパターンマッチング装置。
(Supplementary note 21) a plurality of multi-core CPUs;
A packet transmission / reception block for transmitting and receiving packets;
A search key extraction block for extracting a set of arbitrary fields to be a search key from the header information of the received packet, generating and issuing a packet reception event, and storing the packet data in the main memory;
Main memory for storing packet data;
A final result determination block for collecting pattern matching results in a plurality of processor cores in each multi-core CPU and determining a final pattern matching result;
A plurality of multi-core CPUs, a search key extraction block, a main memory, and a bus connecting the final result determination block;
The pattern matching apparatus according to appendix 14 or appendix 15, wherein one of the sub-pattern tables obtained by dividing the pattern table into the same number as the multi-core CPU is configured on the shared memory of the plurality of multi-core CPUs. .

 (付記22)抽出した検索キーを含むパケット受信イベントを生成し、
 生成したパケット受信イベントは、複数個のマルチコアCPUの全てに対して発行する
 ことを特徴とする付記21に記載のパターンマッチング装置。
(Appendix 22) Generate a packet reception event including the extracted search key,
The pattern matching apparatus according to appendix 21, wherein the generated packet reception event is issued to all of the plurality of multi-core CPUs.

 (付記23)前記複数個のマルチコアCPUは、
 受信したパケット受信イベントを、当該マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする付記22に記載のパターンマッチング装置。
(Supplementary Note 23) The plurality of multi-core CPUs are:
The pattern matching apparatus according to appendix 22, wherein the received packet reception event is sequentially assigned to one of a plurality of processor cores of the multi-core CPU.

 (付記24)複数個のマルチコアCPUと、
 パケットの送受信を行うパケット送受信ブロックと、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出する検索キー抽出ブロックと、
 抽出した検索キーに対するハッシュ値の算出を行い、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させるハッシュ値計算ブロックと、
 パケットデータを記憶するメインメモリと、
 複数個のマルチコアCPUと検索キー抽出ブロックとハッシュ値計算ブロックとメインメモリを接続するバスとを備え、
 複数個のマルチコアCPUが有する共有メモリ上には、パターンテーブルをマルチコアCPUと同数に分割したサブパターンテーブルのうちの1つが構成される
 ことを特徴とする付記14又は付記15に記載のパターンマッチング装置。
(Supplementary Note 24) a plurality of multi-core CPUs;
A packet transmission / reception block for transmitting and receiving packets;
A search key extraction block that extracts a set of arbitrary fields as search keys from the header information of the received packet;
A hash value calculation block for calculating a hash value for the extracted search key, generating and issuing a packet reception event, and storing packet data in the main memory;
Main memory for storing packet data;
A plurality of multi-core CPUs, a search key extraction block, a hash value calculation block, and a bus connecting the main memory;
The pattern matching apparatus according to appendix 14 or appendix 15, wherein one of the sub-pattern tables obtained by dividing the pattern table into the same number as the multi-core CPU is configured on the shared memory of the plurality of multi-core CPUs. .

 (付記25)抽出した検索キーと算出したハッシュ値を含むパケット受信イベントを生成し、
 生成したパケット受信イベントは、算出したハッシュ値に対応したエントリ群を保持するサブパターンテーブルを有するマルチコアCPUに対して発行する
 ことを特徴とする付記24に記載のパターンマッチング装置。
(Supplementary Note 25) Generate a packet reception event including the extracted search key and the calculated hash value,
25. The pattern matching apparatus according to appendix 24, wherein the generated packet reception event is issued to a multi-core CPU having a sub-pattern table that holds an entry group corresponding to the calculated hash value.

 (付記26)前記パケット受信イベントを受信したマルチコアCPUは、
 受信したパケット受信イベントを、当該マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする付記25に記載のパターンマッチング装置。
(Supplementary Note 26) The multi-core CPU that has received the packet reception event
26. The pattern matching apparatus according to appendix 25, wherein received packet reception events are sequentially assigned to one of a plurality of processor cores of the multi-core CPU.

 本発明の代表的な実施形態が詳細に述べられたが、様々な変更(changes)、置き換え(substitutions)及び選択(alternatives)が請求項で定義された発明の精神と範囲から逸脱することなくなされることが理解されるべきである。また、仮にクレーム場出願手続きにおいて補正されたとしても、クレームされた発明の均等の範囲は維持されるものと発明者は意図する。 Although representative embodiments of the present invention have been described in detail, various changes, substitutions and alternatives may be made without departing from the spirit and scope of the invention as defined in the claims. It should be understood. In addition, the inventors intend that the equivalent scope of the claimed invention will be maintained even if it is amended in the claim place application procedure.

 この出願は、2009年12月21日に出願された日本出願特願2009-289011号を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2009-289011 filed on Dec. 21, 2009, the entire disclosure of which is incorporated herein.

 本発明の活用例として、パケットのヘッダ情報から特定のフィールドの組によって、パケットが属するフローを識別し、QoS処理や負荷分散等、フロー毎に特定の処理を行わせるスイッチやルータ等のネットワーク装置、及びロードバランサー等のアプライアンス装置といった用途に適用できる。 As an application example of the present invention, a network device such as a switch or a router that identifies a flow to which a packet belongs from a header set of the packet by a specific field set and performs a specific process for each flow such as QoS processing or load distribution And an appliance device such as a load balancer.

1 マルチコアCPU
2 マルチコアCPU
3 バス
4 検索キー抽出ブロック
5 パケット送受信ブロック
6 メインメモリ
7 最終結果決定ブロック
8 検索キー抽出ブロック
9 ハッシュ値計算ブロック
10 パケット送受信ブロック
11 CPU
12 マルチコアCPU
13 マルチコアCPU
M マルチコアCPU 
1-0 共有メモリ
1-1、1-2、1-N プロセッサコア
2-0 共有メモリ
2-1、2-2、1-N プロセッサコア
11-0 L2キャッシュ
11-1 プロセッサコア
12-0 L2キャッシュ
12-1、12-2 プロセッサコア
13-0 L2キャッシュ
13-1、13-2 プロセッサコア
M-0 共有メモリ
M-1、M-2、M-N プロセッサコア
1 Multi-core CPU
2 Multi-core CPU
3 Bus 4 Search key extraction block 5 Packet transmission / reception block 6 Main memory 7 Final result determination block 8 Search key extraction block 9 Hash value calculation block 10 Packet transmission / reception block 11 CPU
12 Multi-core CPU
13 Multi-core CPU
M multi-core CPU
1-0 shared memory 1-1, 1-2, 1-N processor core 2-0 shared memory 2-1, 2-2, 1-N processor core 11-0 L2 cache 11-1 processor core 12-0 L2 Cache 12-1, 12-2 Processor core 13-0 L2 cache 13-1, 13-2 Processor core M-0 Shared memory M-1, M-2, MN Processor core

Claims (14)

 複数のプロセッサコアと、同一CPU内に存在するそれらプロセッサコア間で共有される共有メモリを備えるマルチコアCPUを有するマルチコアプロセッサ環境において、
 前記共有メモリ上は、検索パターンを備えるパターンテーブルを含み、
 前記複数のプロセッサコアで検索対象となる検索キーに対するパターンマッチング処理を並列に実行する
 ことを特徴とするパターンマッチング装置。
In a multi-core processor environment having a multi-core CPU having a plurality of processor cores and a shared memory shared among those processor cores existing in the same CPU,
The shared memory includes a pattern table including a search pattern,
A pattern matching apparatus that performs a pattern matching process on a search key to be searched by the plurality of processor cores in parallel.
 前記パターンテーブルは、ハッシュテーブルとして構成することを特徴とする請求項1に記載のパターンマッチング装置。 The pattern matching apparatus according to claim 1, wherein the pattern table is configured as a hash table.  1つのマルチコアCPUと、
 パケットの送受信を行うパケット送受信ブロックと、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させる検索キー抽出ブロックと、
 パケットデータを記憶するメインメモリと、
 前記マルチコアCPUと検索キー抽出ブロックとメインメモリを接続するバスと
 を備えたことを特徴とする請求項1又は請求項2に記載のパターンマッチング装置。
One multi-core CPU;
A packet transmission / reception block for transmitting and receiving packets;
A search key extraction block for extracting a set of arbitrary fields to be a search key from the header information of the received packet, generating and issuing a packet reception event, and storing the packet data in the main memory;
Main memory for storing packet data;
The pattern matching apparatus according to claim 1, further comprising: a bus that connects the multi-core CPU, a search key extraction block, and a main memory.
 抽出した検索キーを含むパケット受信イベントを生成、発行し、
 パケット受信イベントは、前記マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする請求項3に記載のパターンマッチング装置。
Generate and issue a packet reception event that includes the extracted search key,
The pattern matching apparatus according to claim 3, wherein the packet reception event is sequentially assigned to one of a plurality of processor cores included in the multi-core CPU.
 複数個のマルチコアCPUと、
 パケットの送受信を行うパケット送受信ブロックと、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させる検索キー抽出ブロックと、
 パケットデータを記憶するメインメモリと、
 複数個のマルチコアCPUと検索キー抽出ブロックとメインメモリを接続するバスとを備え、
 複数個のマルチコアCPUが有する共有メモリ上には、同一のパターンテーブルが構成される
 ことを特徴とする請求項1又は請求項2に記載のパターンマッチング装置。
A plurality of multi-core CPUs;
A packet transmission / reception block for transmitting and receiving packets;
A search key extraction block for extracting a set of arbitrary fields to be a search key from the header information of the received packet, generating and issuing a packet reception event, and storing the packet data in the main memory;
Main memory for storing packet data;
A plurality of multi-core CPUs, a search key extraction block, and a bus connecting the main memory;
The pattern matching apparatus according to claim 1, wherein the same pattern table is configured on a shared memory included in a plurality of multi-core CPUs.
 抽出した検索キーを含むパケット受信イベントを生成し、
 パケット受信イベントは、複数個のマルチコアCPUのうちの1つに対して発行する
 ことを特徴とする請求項5に記載のパターンマッチング装置。
Generate a packet reception event that includes the extracted search key,
The pattern matching apparatus according to claim 5, wherein the packet reception event is issued to one of a plurality of multi-core CPUs.
 前記パケット受信イベントを受信したマルチコアCPUは、
 受信したパケット受信イベントを、当該マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする請求項6に記載のパターンマッチング装置。
The multi-core CPU that has received the packet reception event,
The pattern matching apparatus according to claim 6, wherein the received packet reception event is sequentially allocated to one of a plurality of processor cores included in the multi-core CPU.
 複数個のマルチコアCPUと、
 パケットの送受信を行うパケット送受信ブロックと、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出し、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させる検索キー抽出ブロックと、
 パケットデータを記憶するメインメモリと、
 各マルチコアCPU内の複数のプロセッサコアにおけるパターンマッチング結果を収集し、最終的なパターンマッチング結果を決定する最終結果決定ブロックと、
 複数個のマルチコアCPUと検索キー抽出ブロックとメインメモリと最終結果決定ブロックを接続するバスとを備え、
 複数個のマルチコアCPUが有する共有メモリ上には、パターンテーブルをマルチコアCPUと同数に分割したサブパターンテーブルのうちの1つが構成される
 ことを特徴とする請求項1又は請求項2に記載のパターンマッチング装置。
A plurality of multi-core CPUs;
A packet transmission / reception block for transmitting and receiving packets;
A search key extraction block for extracting a set of arbitrary fields to be a search key from the header information of the received packet, generating and issuing a packet reception event, and storing the packet data in the main memory;
Main memory for storing packet data;
A final result determination block for collecting pattern matching results in a plurality of processor cores in each multi-core CPU and determining a final pattern matching result;
A plurality of multi-core CPUs, a search key extraction block, a main memory, and a bus connecting the final result determination block;
3. The pattern according to claim 1, wherein one of the sub-pattern tables obtained by dividing the pattern table into the same number as the multi-core CPU is configured on the shared memory of the plurality of multi-core CPUs. Matching device.
 抽出した検索キーを含むパケット受信イベントを生成し、
 生成したパケット受信イベントは、複数個のマルチコアCPUの全てに対して発行する
 ことを特徴とする請求項8に記載のパターンマッチング装置。
Generate a packet reception event that includes the extracted search key,
The pattern matching apparatus according to claim 8, wherein the generated packet reception event is issued to all of the plurality of multi-core CPUs.
 前記複数個のマルチコアCPUは、
 受信したパケット受信イベントを、当該マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする請求項9に記載のパターンマッチング装置。
The plurality of multi-core CPUs are:
The pattern matching apparatus according to claim 9, wherein the received packet reception event is sequentially assigned to one of a plurality of processor cores of the multi-core CPU.
 複数個のマルチコアCPUと、
 パケットの送受信を行うパケット送受信ブロックと、
 受信したパケットのヘッダ情報から検索キーとなる任意のフィールドの組を抽出する検索キー抽出ブロックと、
 抽出した検索キーに対するハッシュ値の算出を行い、パケット受信イベントを生成し、発行すると共に、パケットデータをメインメモリへ記憶させるハッシュ値計算ブロックと、
 パケットデータを記憶するメインメモリと、
 複数個のマルチコアCPUと検索キー抽出ブロックとハッシュ値計算ブロックとメインメモリを接続するバスとを備え、
 複数個のマルチコアCPUが有する共有メモリ上には、パターンテーブルをマルチコアCPUと同数に分割したサブパターンテーブルのうちの1つが構成される
 ことを特徴とする請求項1又は請求項2に記載のパターンマッチング装置。
A plurality of multi-core CPUs;
A packet transmission / reception block for transmitting and receiving packets;
A search key extraction block that extracts a set of arbitrary fields as search keys from the header information of the received packet;
A hash value calculation block for calculating a hash value for the extracted search key, generating and issuing a packet reception event, and storing packet data in the main memory;
Main memory for storing packet data;
A plurality of multi-core CPUs, a search key extraction block, a hash value calculation block, and a bus connecting the main memory;
3. The pattern according to claim 1, wherein one of the sub-pattern tables obtained by dividing the pattern table into the same number as the multi-core CPU is configured on the shared memory of the plurality of multi-core CPUs. Matching device.
 抽出した検索キーと算出したハッシュ値を含むパケット受信イベントを生成し、
 生成したパケット受信イベントは、算出したハッシュ値に対応したエントリ群を保持するサブパターンテーブルを有するマルチコアCPUに対して発行する
 ことを特徴とする請求項11に記載のパターンマッチング装置。
Generate a packet reception event that includes the extracted search key and the calculated hash value,
The pattern matching apparatus according to claim 11, wherein the generated packet reception event is issued to a multi-core CPU having a sub-pattern table that holds an entry group corresponding to the calculated hash value.
 前記パケット受信イベントを受信したマルチコアCPUは、
 受信したパケット受信イベントを、当該マルチコアCPUが有する複数個のプロセッサコアのうちの1つに順次割り当てる
 ことを特徴とする請求項12に記載のパターンマッチング装置。
The multi-core CPU that has received the packet reception event,
The pattern matching apparatus according to claim 12, wherein the received packet reception event is sequentially assigned to one of a plurality of processor cores of the multi-core CPU.
 複数のプロセッサコアと、同一CPU内に存在するそれらプロセッサコア間で共有される共有メモリから構成されるマルチコアCPUを有するマルチコアプロセッサ環境において、
 共有メモリ上に検索パターンを備えるパターンテーブルを構成し、
 複数のプロセッサコアで検索対象となる検索キーに対するパターンマッチング処理を並列に実行することを特徴とするパターンマッチング方法。
In a multi-core processor environment having a multi-core CPU composed of a plurality of processor cores and a shared memory shared among those processor cores existing in the same CPU,
Configure a pattern table with search patterns on shared memory,
A pattern matching method, wherein pattern matching processing for search keys to be searched is executed in parallel by a plurality of processor cores.
PCT/JP2010/072869 2009-12-21 2010-12-20 Pattern-matching method and device for a multiprocessor environment Ceased WO2011078108A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2011547531A JPWO2011078108A1 (en) 2009-12-21 2010-12-20 Pattern matching method and apparatus in multiprocessor environment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2009289011 2009-12-21
JP2009-289011 2009-12-21

Publications (1)

Publication Number Publication Date
WO2011078108A1 true WO2011078108A1 (en) 2011-06-30

Family

ID=44195634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2010/072869 Ceased WO2011078108A1 (en) 2009-12-21 2010-12-20 Pattern-matching method and device for a multiprocessor environment

Country Status (2)

Country Link
JP (1) JPWO2011078108A1 (en)
WO (1) WO2011078108A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014030300A1 (en) * 2012-08-23 2014-02-27 日本電気株式会社 Matching system, matching method, and matching program
KR20150077372A (en) * 2013-12-27 2015-07-07 엑스플라이언트 인코포레이션 Method and system for reconfigurable parallel lookups using multiple shared memories
JP2015171089A (en) * 2014-03-10 2015-09-28 日本電気株式会社 Switching device, data transmission control method and data transmission control program
JP2015179970A (en) * 2014-03-19 2015-10-08 アンリツネットワークス株式会社 packet relay device
JP2016005282A (en) * 2014-06-19 2016-01-12 エックスプライアント, インコーポレイテッド Method and apparatus for identifying packet structure using unique identifier of packet
JP2016515367A (en) * 2013-03-13 2016-05-26 クゥアルコム・インコーポレイテッドQualcomm Incorporated Network element with distributed flow table
CN108462715A (en) * 2018-04-24 2018-08-28 王颖 The On Network Information Filtering System of WM String matching parallel algorithms based on MPI
WO2018173799A1 (en) * 2017-03-24 2018-09-27 住友電気工業株式会社 Switch device, communication control method, and communication control program
US10397113B2 (en) 2014-06-19 2019-08-27 Cavium, Llc Method of identifying internal destinations of network packets and an apparatus thereof
US10560399B2 (en) 2014-06-19 2020-02-11 Cavium, Llc Method of dynamically renumbering ports and an apparatus thereof
US10616380B2 (en) 2014-06-19 2020-04-07 Cavium, Llc Method of handling large protocol layers for configurable extraction of layer information and an apparatus thereof
US10785169B2 (en) 2013-12-30 2020-09-22 Marvell Asia Pte, Ltd. Protocol independent programmable switch (PIPS) for software defined data center networks
US11050859B2 (en) 2014-06-19 2021-06-29 Marvell Asia Pte, Ltd. Method of using bit vectors to allow expansion and collapse of header layers within packets for enabling flexible modifications and an apparatus thereof
CN113595822A (en) * 2021-07-26 2021-11-02 北京恒光信息技术股份有限公司 Data packet management method, system and device
JP2022021779A (en) * 2020-07-22 2022-02-03 富士通株式会社 Packet transmission device and packet transmission method
US11756818B2 (en) 2010-12-20 2023-09-12 Ev Group E. Thallner Gmbh Accommodating device for retaining wafers
WO2025141729A1 (en) * 2023-12-26 2025-07-03 日本電気株式会社 Parallel processing system, parallel processing method, and parallel processing device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008204292A (en) * 2007-02-21 2008-09-04 Toshiba Corp Memory management system
JP2008210027A (en) * 2007-02-23 2008-09-11 Toshiba Corp Computer system and parallelizing compiler
JP2009037403A (en) * 2007-08-01 2009-02-19 Fujitsu Ltd Effective use of core memory in multi-core processors
WO2009148021A1 (en) * 2008-06-03 2009-12-10 株式会社日立製作所 Packet analysis apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008204292A (en) * 2007-02-21 2008-09-04 Toshiba Corp Memory management system
JP2008210027A (en) * 2007-02-23 2008-09-11 Toshiba Corp Computer system and parallelizing compiler
JP2009037403A (en) * 2007-08-01 2009-02-19 Fujitsu Ltd Effective use of core memory in multi-core processors
WO2009148021A1 (en) * 2008-06-03 2009-12-10 株式会社日立製作所 Packet analysis apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHINJI SHIOTA: "[Gairon] Microprocessor no Genjo", UNIX MAGAZINE, vol. 22, no. 4, 1 October 2007 (2007-10-01), pages 26 - 27 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11756818B2 (en) 2010-12-20 2023-09-12 Ev Group E. Thallner Gmbh Accommodating device for retaining wafers
WO2014030300A1 (en) * 2012-08-23 2014-02-27 日本電気株式会社 Matching system, matching method, and matching program
JP2016515367A (en) * 2013-03-13 2016-05-26 クゥアルコム・インコーポレイテッドQualcomm Incorporated Network element with distributed flow table
KR20150077372A (en) * 2013-12-27 2015-07-07 엑스플라이언트 인코포레이션 Method and system for reconfigurable parallel lookups using multiple shared memories
JP2015172925A (en) * 2013-12-27 2015-10-01 エックスプライアント, インコーポレイテッド Method and system for reconfigurable parallel lookups using multiple shared memories
KR102391602B1 (en) 2013-12-27 2022-04-27 마벨 아시아 피티이 엘티디. Method and system for reconfigurable parallel lookups using multiple shared memories
US10785169B2 (en) 2013-12-30 2020-09-22 Marvell Asia Pte, Ltd. Protocol independent programmable switch (PIPS) for software defined data center networks
US12301456B2 (en) 2013-12-30 2025-05-13 Marvell Asia Pte, LTD Protocol independent programmable switch (PIPS) for software defined data center networks
US11824796B2 (en) 2013-12-30 2023-11-21 Marvell Asia Pte, Ltd. Protocol independent programmable switch (PIPS) for software defined data center networks
JP2015171089A (en) * 2014-03-10 2015-09-28 日本電気株式会社 Switching device, data transmission control method and data transmission control program
JP2015179970A (en) * 2014-03-19 2015-10-08 アンリツネットワークス株式会社 packet relay device
US11799989B2 (en) 2014-06-19 2023-10-24 Marvell Asia Pte, Ltd. Method of using bit vectors to allow expansion and collapse of header layers within packets for enabling flexible modifications and an apparatus thereof
US10560399B2 (en) 2014-06-19 2020-02-11 Cavium, Llc Method of dynamically renumbering ports and an apparatus thereof
US12381963B2 (en) 2014-06-19 2025-08-05 Marvell Asia Pte, LTD Method of using bit vectors to allow expansion and collapse of header layers within packets for enabling flexible modifications and an apparatus thereof
JP2016005282A (en) * 2014-06-19 2016-01-12 エックスプライアント, インコーポレイテッド Method and apparatus for identifying packet structure using unique identifier of packet
US10397113B2 (en) 2014-06-19 2019-08-27 Cavium, Llc Method of identifying internal destinations of network packets and an apparatus thereof
US11050859B2 (en) 2014-06-19 2021-06-29 Marvell Asia Pte, Ltd. Method of using bit vectors to allow expansion and collapse of header layers within packets for enabling flexible modifications and an apparatus thereof
US10616380B2 (en) 2014-06-19 2020-04-07 Cavium, Llc Method of handling large protocol layers for configurable extraction of layer information and an apparatus thereof
US11637803B2 (en) 2017-03-24 2023-04-25 Sumitomo Electric Industries, Ltd. Switch device and communication control method
JP2018164164A (en) * 2017-03-24 2018-10-18 住友電気工業株式会社 Switch device, communication control method, and communication control program
WO2018173799A1 (en) * 2017-03-24 2018-09-27 住友電気工業株式会社 Switch device, communication control method, and communication control program
CN111373699A (en) * 2017-03-24 2020-07-03 住友电气工业株式会社 Switching device, communication control method, and communication control program
CN108462715B (en) * 2018-04-24 2021-03-12 王颖 Network information filtering method based on MPI WM (pulse Width modulation) string matching parallel algorithm
CN108462715A (en) * 2018-04-24 2018-08-28 王颖 The On Network Information Filtering System of WM String matching parallel algorithms based on MPI
JP2022021779A (en) * 2020-07-22 2022-02-03 富士通株式会社 Packet transmission device and packet transmission method
JP7468219B2 (en) 2020-07-22 2024-04-16 富士通株式会社 Packet transmission device and packet transmission method
CN113595822A (en) * 2021-07-26 2021-11-02 北京恒光信息技术股份有限公司 Data packet management method, system and device
CN113595822B (en) * 2021-07-26 2024-03-22 北京恒光信息技术股份有限公司 Data packet management method, system and device
WO2025141729A1 (en) * 2023-12-26 2025-07-03 日本電気株式会社 Parallel processing system, parallel processing method, and parallel processing device

Also Published As

Publication number Publication date
JPWO2011078108A1 (en) 2013-05-09

Similar Documents

Publication Publication Date Title
WO2011078108A1 (en) Pattern-matching method and device for a multiprocessor environment
US9154442B2 (en) Concurrent linked-list traversal for real-time hash processing in multi-core, multi-thread network processors
Varvello et al. Multi-layer packet classification with graphics processing units
US8539199B2 (en) Hash processing in a network communications processor architecture
US8321385B2 (en) Hash processing in a network communications processor architecture
US8407707B2 (en) Task queuing in a network communications processor architecture
US7240141B2 (en) Programmable inter-virtual channel and intra-virtual channel instructions issuing rules for an I/O bus of a system-on-a-chip processor
US8761204B2 (en) Packet assembly module for multi-core, multi-thread network processors
US20120002546A1 (en) Multicasting traffic manager in a network communications processor architecture
Paul et al. MG-Join: A scalable join for massively parallel multi-GPU architectures
US20050078601A1 (en) Hash and route hardware with parallel routing scheme
US7768907B2 (en) System and method for improved Ethernet load balancing
US20110254590A1 (en) Mapping address bits to improve spread of banks
CN115917520A (en) System for providing LPM implementations for programmable data planes via distributed algorithms
KR102461039B1 (en) Gpu-native packet i/o method and apparatus for gpu application on commodity ethernet
US10616116B1 (en) Network traffic load balancing using rotating hash
US9515929B2 (en) Traffic data pre-filtering
Watanabe et al. Accelerating NFV application using CPU-FPGA tightly coupled architecture
US7293158B2 (en) Systems and methods for implementing counters in a network processor with cost effective memory
Hong et al. Kafe: Can os kernels forward packets fast enough for software routers?
Zhu et al. Hermes: an integrated CPU/GPU microarchitecture for IP routing
US20240297846A1 (en) Network-on-chip packetization and routing method and apparatus for scalable high-performance networking on and off chip
US7340570B2 (en) Engine for comparing a key with rules having high and low values defining a range
CN101645852B (en) Equipment and method for classifying network packet
CN106302259B (en) Method and router for processing message in network on chip

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10839338

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2011547531

Country of ref document: JP

122 Ep: pct application non-entry in european phase

Ref document number: 10839338

Country of ref document: EP

Kind code of ref document: A1