KR102880085B1

KR102880085B1 - Network Interface Device

Info

Publication number: KR102880085B1
Application number: KR1020217017269A
Authority: KR
Inventors: 스티븐 레슬리 포프; 닐 터튼; 데이비드 제임스 리도크; 드미트리 키타리에프; 소한 립두만; 데릭 에드워드 로버츠
Original assignee: 자일링크스 인코포레이티드
Priority date: 2018-11-05
Filing date: 2019-11-05
Publication date: 2025-10-31
Anticipated expiration: 2039-11-05
Also published as: JP2022512879A; EP3877851A1; WO2020094664A1; CN113272793B; CN113272793A; JP2024116163A; KR20210088652A

Abstract

복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 구비하는 네트워크 인터페이스 디바이스. 복수의 프로세싱 유닛의 각각은 자기 자신의 적어도 하나의 미리 정의된 동작과 관련된다. 컴파일 시간에, 하드웨어 모듈은, 데이터 패킷과 관련하여 기능을 수행하기 위해, 데이터 패킷과 관련하여 그들 각각의 적어도 하나의 동작을 소정의 순서로 수행하도록 복수의 프로세싱 유닛 중 적어도 일부를 배열하는 것에 의해 구성된다. 컴파일러는 각각의 프로세싱 유닛에 상이한 프로세싱 스테이지를 할당하기 위해 제공된다. 컨트롤러는, 다른 프로세싱 회로가 컴파일되는 동안 하나의 프로세싱 회로부가 사용될 수도 있도록, 상이한 프로세싱 회로부 사이를 즉석에서 스위칭하기 위해 제공된다.A network interface device comprising a hardware module comprising a plurality of processing units, each of which is associated with at least one predefined operation thereof. At compile time, the hardware module is configured to arrange at least some of the plurality of processing units to perform at least one operation of each of them in a predetermined order in relation to the data packet, in order to perform a function in relation to the data packet. A compiler is provided for assigning different processing stages to each processing unit. A controller is provided for switching between different processing circuitry on the fly, such that one processing circuitry may be used while another processing circuitry is being compiled.

Description

Network Interface Device

본 출원은 데이터 패킷과 관련하여 기능을 수행하기 위한 네트워크 인터페이스 디바이스에 관한 것이다.The present application relates to a network interface device for performing a function in relation to a data packet.

네트워크 인터페이스 디바이스가 공지되어 있으며 통상적으로 컴퓨팅 디바이스와 네트워크 사이에서 인터페이스를 제공하기 위해 사용된다. 네트워크 인터페이스 디바이스는 네트워크로부터 수신되는 데이터를 프로세싱하도록 및/또는 네트워크 상에 배치될 데이터를 프로세싱하도록 구성될 수 있다.Network interface devices are known and are commonly used to provide an interface between a computing device and a network. A network interface device may be configured to process data received from a network and/or to process data to be placed on a network.

한 양태에 따르면, 호스트 디바이스를 네트워크에 인터페이싱하기 위한 네트워크 인터페이스 디바이스가 제공되는데, 네트워크 인터페이스 디바이스는: 제1 인터페이스 - 제1 인터페이스는 복수의 데이터 패킷을 수신하도록 구성됨 - ; 복수의 프로세싱 유닛 - 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련됨 - 을 포함하는 구성 가능한 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 하드웨어 모듈은, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하다.According to one aspect, a network interface device for interfacing a host device to a network is provided, the network interface device comprising: a first interface, the first interface configured to receive a plurality of data packets; a configurable hardware module comprising a plurality of processing units, each processing unit being associated with a predefined type of operation executable in a single step, wherein at least some of the plurality of processing units are associated with different predefined types of operations, and the hardware module is configured to interconnect at least some of the plurality of processing units to provide a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function in relation to the one or more of the plurality of data packets.

몇몇 실시형태에서, 제1 기능은 필터링 기능을 포함한다. 몇몇 실시형태에서, 기능은 터널링, 캡슐화(encapsulation) 및 라우팅 기능 중 적어도 하나를 포함한다. 몇몇 실시형태에서, 제1 기능은 확장된 버클리 패킷 필터 기능(extended Berkley packet filter function)을 포함한다.In some embodiments, the first function comprises a filtering function. In some embodiments, the function comprises at least one of tunneling, encapsulation, and routing functions. In some embodiments, the first function comprises an extended Berkley packet filter function.

몇몇 실시형태에서, 제1 기능은 분산형 서비스 거부 스크러빙 동작(distributed denial of service scrubbing operation)을 포함한다.In some embodiments, the first function comprises a distributed denial of service scrubbing operation.

몇몇 실시형태에서, 제1 기능은 방화벽 동작을 포함한다.In some embodiments, the first function comprises firewall operation.

몇몇 실시형태에서, 제1 인터페이스는 네트워크로부터 제1 데이터 패킷을 수신하도록 구성된다.In some embodiments, the first interface is configured to receive a first data packet from a network.

몇몇 실시형태에서, 제1 인터페이스는 호스트 디바이스로부터 제1 데이터 패킷을 수신하도록 구성된다.In some embodiments, the first interface is configured to receive a first data packet from a host device.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상은 그들의 관련된 적어도 하나의 미리 정의된 동작을 병렬로 수행하도록 구성된다.In some embodiments, at least two of the plurality of processing units are configured to perform their associated at least one predefined operation in parallel.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상은 하드웨어 모듈의 공통 클록 신호에 따라 그들의 관련된 미리 정의된 타입의 동작을 수행하도록 구성된다.In some embodiments, at least two of the plurality of processing units are configured to perform their associated predefined types of operations in response to a common clock signal of the hardware module.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상의 각각은 클록 신호에 의해 정의되는 미리 정의된 길이의 시간 내에서 자신의 관련된 미리 정의된 타입의 동작을 수행하도록 구성된다.In some embodiments, at least two of the plurality of processing units are each configured to perform their associated predefined type of operation within a predefined length of time defined by a clock signal.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상은: 미리 정의된 길이의 시간의 시간 기간 내에 제1 데이터 패킷에 액세스하도록; 그리고 미리 정의된 길이의 시간의 종료에 응답하여, 각각의 적어도 하나의 동작의 결과를 다음 번 프로세싱 유닛으로 전송하도록 구성된다.In some embodiments, at least two of the plurality of processing units are configured to: access the first data packet within a time period of a predefined length; and, in response to the expiration of the time period of the predefined length, transmit the results of each of the at least one operations to a next processing unit.

몇몇 실시형태에서, 결과는 다음의 것 중 적어도 하나 이상을 포함한다: 적어도, 복수의 데이터 패킷 중 하나 이상으로부터의 값; 맵 상태에 대한 업데이트; 및 메타데이터.In some embodiments, the result includes at least one of the following: a value from at least one of the plurality of data packets; an update to the map state; and metadata.

몇몇 실시형태에서, 복수의 프로세싱 유닛의 각각은 각각의 프로세싱 유닛과 관련된 적어도 하나의 동작을 수행하도록 구성되는 주문형 집적 회로를 포함한다.In some embodiments, each of the plurality of processing units includes an application-specific integrated circuit configured to perform at least one operation associated with each processing unit.

몇몇 실시형태에서, 프로세싱 유닛의 각각은 필드 프로그래머블 게이트 어레이를 포함한다. 몇몇 실시형태에서, 프로세싱 유닛의 각각은 임의의 다른 타입의 소프트 로직을 포함한다.In some embodiments, each of the processing units includes a field programmable gate array. In some embodiments, each of the processing units includes any other type of soft logic.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 하나는 디지털 회로 및 디지털 회로에 의해 실행되는 프로세싱에 관련되는 상태를 저장하는 메모리를 포함하되, 디지털 회로는, 메모리와 통신하여, 각각의 프로세싱 유닛과 관련되는 미리 정의된 타입의 동작을 수행하도록 구성된다.In some embodiments, at least one of the plurality of processing units includes a digital circuit and a memory that stores state related to processing performed by the digital circuit, wherein the digital circuit is configured to communicate with the memory and perform a predefined type of operation associated with each processing unit.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 복수의 프로세싱 유닛 중 두 개 이상이 액세스 가능한 메모리를 포함하되, 메모리는 제1 데이터 패킷과 관련되는 상태를 저장하도록 구성되고, 하드웨어 모듈에 의한 제1 기능의 수행 동안, 복수의 프로세싱 유닛 중 두 개 이상은 상태에 액세스하여 수정하도록 구성된다.In some embodiments, the network interface device includes a memory accessible to two or more of the plurality of processing units, the memory configured to store a state associated with the first data packet, and during performance of the first function by the hardware module, the two or more of the plurality of processing units are configured to access and modify the state.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 제1의 것은 복수의 프로세싱 유닛 중 제2의 것에 의한 상태의 값의 액세스 동안 스톨하도록(stall) 구성된다.In some embodiments, at least a first one of the plurality of processing units is configured to stall during access of a value of a state by a second one of the plurality of processing units.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 하나 이상은, 개별적으로, 그들의 관련된 미리 정의된 타입의 동작에 기초하여, 각각의 파이프라인에 고유한 동작을 수행하도록 구성 가능하다.In some embodiments, one or more of the plurality of processing units are individually configurable to perform operations unique to each pipeline based on their associated predefined types of operations.

몇몇 실시형태에서, 하드웨어 모듈은 명령어를 수신하도록, 그리고 상기 명령어에 응답하여, 다음의 것 중 적어도 하나를 하도록 구성된다: 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 데이터 프로세싱 파이프라인을 제공하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하는 것; 상기 복수의 프로세싱 유닛 중 하나 이상으로 하여금 상기 하나 이상의 데이터 패킷과 관련하여 그들의 관련된 미리 정의된 타입의 동작을 수행하게 하는 것; 상기 복수의 프로세싱 유닛 중 하나 이상을 데이터 프로세싱 파이프라인에 추가하는 것; 및 데이터 프로세싱 파이프라인으로부터 상기 복수의 프로세싱 유닛 중 하나 이상을 제거하는 것.In some embodiments, the hardware module is configured to receive a command and, in response to the command, to do at least one of the following: interconnecting at least some of the plurality of processing units to provide a data processing pipeline for processing one or more of the plurality of data packets; causing one or more of the plurality of processing units to perform their associated predefined types of operations with respect to the one or more data packets; adding one or more of the plurality of processing units to the data processing pipeline; and removing one or more of the plurality of processing units from the data processing pipeline.

몇몇 실시형태에서, 미리 정의된 동작은 다음의 것 중 적어도 하나를 포함한다: 메모리로부터 제1 데이터 패킷의 적어도 하나의 값을 로딩하는 것; 데이터 패킷의 적어도 하나의 값을 메모리에 저장하는 것; 및 데이터 패킷과 관련하여 실행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것.In some embodiments, the predefined operation includes at least one of the following: loading at least one value of the first data packet from memory; storing at least one value of the data packet to memory; and performing a lookup against a lookup table to determine an action to be performed with respect to the data packet.

몇몇 실시형태에서, 하드웨어 모듈은 명령어를 수신하도록 구성되되, 하드웨어 모듈은, 상기 명령어에 응답하여, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 데이터 프로세싱 파이프라인을 제공하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하고, 명령어는 제3 프로세싱 파이프라인을 통해 전송되는 데이터 패킷을 포함한다.In some embodiments, the hardware module is configured to receive a command, wherein the hardware module is configured to interconnect at least some of the plurality of processing units to provide a data processing pipeline for processing one or more of the plurality of data packets, in response to the command, the command comprising a data packet transmitted through the third processing pipeline.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 하나 이상은, 상기 명령어에 응답하여, 복수의 데이터 패킷 중 상기 하나 이상의 데이터와 관련하여 그들의 관련된 미리 정의된 타입의 동작 중 선택된 동작을 수행하도록 구성 가능하다.In some embodiments, at least one of the plurality of processing units is configured, in response to the instruction, to perform a selected operation among their associated predefined types of operations with respect to the one or more data packets among the plurality of data packets.

몇몇 실시형태에서, 복수의 컴포넌트는 하드웨어 모듈과는 상이한 회로부(circuitry)에서 제1 기능을 제공하도록 구성되는 복수의 컴포넌트 중 제2의 것을 포함하되, 네트워크 인터페이스 디바이스는, 프로세싱 파이프라인을 통과하는 데이터 패킷으로 하여금, 복수의 컴포넌트 중 제1의 것 및 복수의 컴포넌트 중 제2의 것: 중 하나에 의해 프로세싱되게 하도록 구성되는 적어도 하나의 컨트롤러를 포함한다.In some embodiments, the plurality of components comprises a second one of the plurality of components configured to provide the first function in circuitry different from the hardware module, wherein the network interface device comprises at least one controller configured to cause a data packet passing through the processing pipeline to be processed by one of the first one of the plurality of components and the second one of the plurality of components.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는, 하드웨어 모듈로 하여금, 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하는 명령어를 발행하도록 구성되는 적어도 하나의 컨트롤러를 포함하되, 명령어는, 복수의 컴포넌트 중 제1의 것으로 하여금, 프로세싱 파이프라인에 삽입되게 하도록 구성된다.In some embodiments, a network interface device includes at least one controller configured to issue a command to cause a hardware module to initiate performance of a first function relating to a data packet, the command being configured to cause a first one of the plurality of components to be inserted into a processing pipeline.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는, 하드웨어 모듈로 하여금, 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하는 명령어를 발행하도록 구성되는 적어도 하나의 컨트롤러를 포함하되, 명령어는, 프로세싱 파이프라인을 통해 전송되는 그리고 복수의 컴포넌트 중 제1의 것으로 하여금, 활성화되게 하도록 구성되는 제어 메시지를 포함한다.In some embodiments, a network interface device includes at least one controller configured to issue a command to cause a hardware module to begin performing a first function in connection with a data packet, the command comprising a control message transmitted through a processing pipeline and configured to cause a first one of the plurality of components to become active.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 하나 이상에 대해, 관련된 적어도 하나의 동작은 다음의 것 중 적어도 하나를 포함한다: 네트워크 인터페이스 디바이스의 메모리로부터 제1 데이터 패킷의 적어도 하나의 값을 로딩하는 것; 제1 데이터 패킷의 적어도 하나의 값을 네트워크 인터페이스 디바이스의 메모리에 저장하는 것; 및 제1 데이터 패킷과 관련하여 실행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것.In some embodiments, for at least one of the plurality of processing units, the associated at least one operation comprises at least one of the following: loading at least one value of the first data packet from memory of the network interface device; storing at least one value of the first data packet to memory of the network interface device; and performing a lookup against a lookup table to determine an action to be performed with respect to the first data packet.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 하나 이상은 자신의 관련된 적어도 하나의 미리 정의된 동작의 적어도 하나의 결과를 제1 프로세싱 파이프라인에서의 다음 번 프로세싱 유닛으로 전달하도록 구성되되, 다음 번 프로세싱 유닛은 적어도 하나의 결과에 의존하여 다음 번 미리 정의된 동작을 수행하도록 구성된다.In some embodiments, at least one of the plurality of processing units is configured to pass at least one result of its associated at least one predefined operation to a next processing unit in the first processing pipeline, wherein the next processing unit is configured to perform a next predefined operation dependent on the at least one result.

몇몇 실시형태에서, 상이한 미리 정의된 타입의 동작의 각각은 상이한 템플릿에 의해 정의된다.In some embodiments, each of the different predefined types of actions is defined by a different template.

몇몇 실시형태에서, 미리 정의된 타입의 동작은 다음의 것 중 적어도 하나를 포함한다: 데이터 패킷에 액세스하는 것; 하드웨어 모듈의 메모리에 저장되는 룩업 테이블에 액세스하는 것; 데이터 패킷으로부터 로딩되는 데이터에 대해 논리 연산을 수행하는 것; 및 룩업 테이블로부터 로딩되는 데이터에 대해 논리 연산을 수행하는 것.In some embodiments, the predefined type of operation includes at least one of the following: accessing a data packet; accessing a lookup table stored in a memory of a hardware module; performing a logical operation on data loaded from the data packet; and performing a logical operation on data loaded from the lookup table.

몇몇 실시형태에서, 하드웨어 모듈은 라우팅 하드웨어를 포함하되, 하드웨어 모듈은, 제1 데이터 프로세싱 파이프라인에 의해 정의되는 특정한 순서로 복수의 프로세싱 유닛 사이에서 데이터 패킷을 라우팅하도록 라우팅 하드웨어를 구성하는 것에 의해 제1 데이터 프로세싱 파이프라인을 제공하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하다.In some embodiments, the hardware module comprises routing hardware, wherein the hardware module is configured to interconnect at least some of the plurality of processing units to provide a first data processing pipeline by configuring the routing hardware to route data packets among the plurality of processing units in a particular order defined by the first data processing pipeline.

몇몇 실시형태에서, 하드웨어 모듈은, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제2 데이터 프로세싱 파이프라인을 제공하여 제1 기능과는 상이한 제2 기능을 수행하기 위해, 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하다.In some embodiments, the hardware module is configured to interconnect at least some of the plurality of processing units to perform a second function different from the first function by providing a second data processing pipeline for processing one or more of the plurality of data packets.

몇몇 실시형태에서, 하드웨어 모듈은, 제1 데이터 프로세싱 파이프라인을 제공하기 위해 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트한 이후, 제2 데이터 프로세싱 파이프라인을 제공하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능하다.In some embodiments, the hardware module is configured to interconnect at least some of the plurality of processing units to provide a first data processing pipeline, and then interconnect at least some of the plurality of processing units to provide a second data processing pipeline.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는, 하드웨어 모듈에 별개이며 상기 복수의 데이터 패킷 중 하나 이상에 대해 제1 기능을 수행하도록 구성되는 추가적인 회로부를 포함한다.In some embodiments, the network interface device includes additional circuitry separate from the hardware module and configured to perform a first function for one or more of the plurality of data packets.

몇몇 실시형태에서, 추가적인 회로부는 다음의 것 중 적어도 하나를 포함한다: 필드 프로그래머블 게이트 어레이; 및 복수의 중앙 프로세싱 유닛.In some embodiments, the additional circuitry includes at least one of: a field programmable gate array; and a plurality of central processing units.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 적어도 하나의 컨트롤러를 포함하되, 추가적인 회로부는, 하드웨어 모듈에서 수행될 제1 기능에 대한 컴파일 프로세스(compilation process) 동안 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 적어도 하나의 컨트롤러는, 컴파일 프로세스의 완료에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 하드웨어 모듈을 제어하도록 구성된다.In some embodiments, the network interface device comprises at least one controller, wherein additional circuitry is configured to perform a first function with respect to a data packet during a compilation process for a first function to be performed in a hardware module, and wherein the at least one controller is configured to control the hardware module to initiate performance of the first function with respect to the data packet in response to completion of the compilation process.

몇몇 실시형태에서, 추가적인 회로부는 복수의 중앙 프로세싱 유닛을 포함한다.In some embodiments, the additional circuitry includes a plurality of central processing units.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 하드웨어 모듈에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 추가적인 회로부를 제어하도록 구성된다.In some embodiments, at least one controller is configured to control additional circuitry to stop performing the first function associated with the data packet in response to determining that the compilation process for the first function to be performed in the hardware module is complete.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 적어도 하나의 컨트롤러를 포함하되, 하드웨어 모듈은, 추가적인 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 적어도 하나의 컨트롤러는, 추가적인 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 것을 결정하도록, 그리고, 상기 결정에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 추가적인 회로부를 제어하도록 구성된다.In some embodiments, the network interface device comprises at least one controller, wherein the hardware module is configured to perform a first function with respect to a data packet during a compilation process for a first function to be performed in an additional circuitry, and wherein the at least one controller is configured to determine that the compilation process for the first function to be performed in the additional circuitry is complete, and, in response to the determination, control the additional circuitry to initiate performance of the first function with respect to the data packet.

몇몇 실시형태에서, 추가적인 회로부는 필드 프로그래머블 게이트 어레이를 포함한다.In some embodiments, the additional circuitry includes a field programmable gate array.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 추가적인 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 하드웨어 모듈을 제어하도록 구성된다.In some embodiments, at least one controller is configured to control a hardware module to stop performing the first function associated with the data packet in response to a determination that the compilation process for the first function to be performed in the additional circuitry has been completed.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는, 하드웨어 모듈에서 수행될 제1 기능을 제공하기 위해 컴파일 프로세스를 수행하도록 구성되는 적어도 하나의 컨트롤러를 포함한다.In some embodiments, the network interface device includes at least one controller configured to perform a compilation process to provide a first function to be performed in a hardware module.

몇몇 실시형태에서, 컴파일 프로세스는, 하드웨어 모듈에서 제어 메시지에 응답하는 제어 평면 인터페이스를 제공하기 위한 명령어를 제공하는 것을 포함한다.In some embodiments, the compilation process includes providing instructions for providing a control plane interface that responds to control messages from hardware modules.

다른 양태에 따르면, 제1 양태에 따른 네트워크 인터페이스 디바이스 및 호스트 디바이스를 포함하는 데이터 프로세싱 시스템이 제공되되, 데이터 프로세싱 시스템은, 하드웨어 모듈에서 수행될 제1 기능을 제공하기 위해 컴파일 프로세스를 수행하도록 구성되는 적어도 하나의 컨트롤러를 포함한다.According to another aspect, a data processing system is provided comprising a network interface device and a host device according to the first aspect, wherein the data processing system comprises at least one controller configured to perform a compilation process to provide a first function to be performed in a hardware module.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는 다음의 것 중 하나 이상에 의해 제공된다: 네트워크 인터페이스 디바이스; 및 호스트 디바이스.In some embodiments, at least one controller is provided by one or more of the following: a network interface device; and a host device.

몇몇 실시형태에서, 컴파일 프로세스는, 제1 기능을 표현하는 컴퓨터 프로그램이 호스트 디바이스의 커널 모드에서의 실행에 대해 안전하다는 적어도 하나의 컨트롤러에 의한 결정에 응답하여 수행된다.In some embodiments, the compilation process is performed in response to a determination by at least one controller that the computer program representing the first function is safe for execution in kernel mode of the host device.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 복수의 프로세싱 유닛 중 적어도 일부의 각각에, 컴퓨터 코드 명령어의 시퀀스에 의해 표현되는 복수의 동작으로부터의 적어도 하나의 동작을, 제1 데이터 프로세싱 파이프라인의 특정한 순서로, 수행할 것을 할당하는 것에 의해 컴파일 프로세스를 수행하도록 구성되되, 복수의 동작은 복수의 데이터 패킷 중 하나 이상과 관련하여 제1 기능을 제공한다.In some embodiments, at least one controller is configured to perform a compilation process by assigning, to each of at least some of the plurality of processing units, at least one operation from a plurality of operations represented by a sequence of computer code instructions to be performed in a particular order in a first data processing pipeline, wherein the plurality of operations provide a first function in relation to one or more of the plurality of data packets.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는 다음의 것을 하도록 구성된다: 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 추가적인 회로부로 하여금 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위한 제1 명령어를 전송하는 것; 및 컴파일 프로세스의 완료에 후속하여, 하드웨어 모듈로 하여금, 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하기 위한 제2 명령어를 전송하는 것.In some embodiments, at least one controller is configured to: prior to completion of the compilation process, transmit a first command to cause additional circuitry of the network interface device to perform a first function with respect to the data packet; and subsequent to completion of the compilation process, transmit a second command to cause the hardware module to begin performing the first function with respect to the data packet.

다른 양태에 따르면, 네트워크 인터페이스 디바이스에서의 구현을 위한 방법이 제공되는데, 그 방법은: 제1 인터페이스에서, 복수의 데이터 패킷을 수신하는 것; 및 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해, 하드웨어 모듈의 복수의 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 하드웨어 모듈을 구성하는 것을 포함하되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련된다.In another aspect, a method for implementation in a network interface device is provided, the method comprising: receiving, at a first interface, a plurality of data packets; and configuring a hardware module to interconnect at least some of a plurality of processing units of the hardware module to provide a first data processing pipeline for processing at least one of the plurality of data packets to perform a first function in relation to the at least one of the plurality of data packets, wherein each processing unit is associated with a predefined type of operation executable in a single step, and wherein at least some of the plurality of processing units are associated with different predefined types of operations.

다른 양태에 따르면, 네트워크 인터페이스 디바이스로 하여금 방법을 수행하게 하기 위한 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공되는데, 방법은: 제1 인터페이스에서, 복수의 데이터 패킷을 수신하는 것; 및 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해, 하드웨어 모듈의 복수의 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 하드웨어 모듈을 구성하는 것을 포함하되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련된다.In another aspect, a non-transitory computer-readable medium is provided comprising program instructions for causing a network interface device to perform a method, the method comprising: receiving, at a first interface, a plurality of data packets; and configuring a hardware module to interconnect at least some of a plurality of processing units of the hardware module to provide a first data processing pipeline for processing at least one of the plurality of data packets to perform a first function in relation to the at least one of the plurality of data packets, wherein each processing unit is associated with a predefined type of operation executable in a single step, and wherein at least some of the plurality of processing units are associated with different predefined types of operations.

다른 양태에 따르면, 프로세싱 유닛이 제공되는데, 프로세싱 유닛은: 네트워크 인터페이스 디바이스에서 수신되는 제1 데이터 패킷과 관련하여 적어도 하나의 미리 정의된 동작을 수행하도록; 제1 데이터 패킷과 관련하여 제1 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제1 추가적인 프로세싱 유닛에 연결되도록; 제1 데이터 패킷과 관련하여 제2 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제2 추가적인 프로세싱 유닛에 연결되도록; 제1 추가적인 프로세싱 유닛으로부터, 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과를 수신하도록; 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과에 의존하여 적어도 하나의 미리 정의된 동작을 수행하도록; 제2 추가적인 적어도 하나의 미리 정의된 동작에서의 프로세싱을 위해 적어도 하나의 미리 정의된 동작의 결과를 제2 추가적인 프로세싱 유닛으로 전송하도록 구성된다.In another aspect, a processing unit is provided, the processing unit configured to: perform at least one predefined operation in relation to a first data packet received at a network interface device; be connected to a first additional processing unit configured to perform a first additional at least one predefined operation in relation to the first data packet; be connected to a second additional processing unit configured to perform a second additional at least one predefined operation in relation to the first data packet; receive, from the first additional processing unit, a result of the first additional at least one predefined operation; perform at least one predefined operation dependent on a result of the first additional at least one predefined operation; and transmit a result of the at least one predefined operation to the second additional processing unit for processing in the second additional at least one predefined operation.

몇몇 실시형태에서, 프로세싱 유닛은 적어도 하나의 미리 정의된 동작의 타이밍을 맞추기 위한 클록 신호를 수신하도록 구성되되, 프로세싱 유닛은 클록 신호의 적어도 하나의 사이클에서 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다.In some embodiments, the processing unit is configured to receive a clock signal for timing at least one predefined operation, wherein the processing unit is configured to perform at least one predefined operation in at least one cycle of the clock signal.

몇몇 실시형태에서, 프로세싱 유닛은 클록 신호의 단일의 사이클에서 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다.In some embodiments, the processing unit is configured to perform at least one predefined operation in a single cycle of a clock signal.

몇몇 실시형태에서, 적어도 하나의 미리 정의된 동작, 제1 추가적인 적어도 하나의 미리 정의된 동작, 및 제2 추가적인 적어도 하나의 미리 정의된 동작은, 네트워크 인터페이스 디바이스에서 수신되는 제1 데이터 패킷과 관련하여 수행되는 기능의 일부를 형성한다.In some embodiments, the at least one predefined operation, the first additional at least one predefined operation, and the second additional at least one predefined operation form part of a function performed in connection with a first data packet received at the network interface device.

몇몇 실시형태에서, 제1 데이터 패킷은 호스트 디바이스로부터 수신되되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크에 인터페이싱하도록 구성된다.In some embodiments, the first data packet is received from a host device, wherein the network interface device is configured to interface the host device to a network.

몇몇 실시형태에서, 제1 데이터 패킷은 네트워크로부터 수신되되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크에 인터페이싱하도록 구성된다.In some embodiments, the first data packet is received from a network, wherein the network interface device is configured to interface the host device to the network.

몇몇 실시형태에서, 기능은 필터링 기능이다.In some embodiments, the function is a filtering function.

몇몇 실시형태에서, 필터링 기능은 확장된 버클리 패킷 필터 기능이다.In some embodiments, the filtering function is an extended Berkeley packet filter function.

몇몇 실시형태에서, 프로세싱 유닛은 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 주문형 집적 회로를 포함한다.In some embodiments, the processing unit comprises a custom integrated circuit configured to perform at least one predefined operation.

몇몇 실시형태에서, 프로세싱 유닛은 다음의 것을 포함한다: 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 디지털 회로; 및 실행되는 적어도 하나의 미리 정의된 동작에 관련되는 상태를 저장하는 메모리.In some embodiments, the processing unit includes: a digital circuit configured to perform at least one predefined operation; and a memory that stores state related to the at least one predefined operation being performed.

몇몇 실시형태에서, 프로세싱 유닛은 제1 추가적인 프로세싱 유닛 및 제2 추가적인 프로세싱 유닛이 액세스 가능한 메모리에 액세스하도록 구성되되, 메모리는 제1 데이터 패킷과 관련되는 상태를 저장하도록 구성되고, 적어도 하나의 미리 정의된 동작은 메모리에 저장되는 상태를 수정하는 것을 포함한다.In some embodiments, the processing unit is configured to access a memory accessible to the first additional processing unit and the second additional processing unit, the memory being configured to store a state associated with the first data packet, and wherein the at least one predefined operation comprises modifying the state stored in the memory.

몇몇 실시형태에서, 프로세싱 유닛은, 제1 클록 사이클 동안, 메모리로부터 상기 상태의 값을 판독하도록 그리고 제2 추가적인 프로세싱 유닛에 의한 수정을 위해 상기 값을 제2 추가적인 프로세싱 유닛에 제공하도록 구성되되, 프로세싱 유닛은 제1 클록 사이클에 이어지는 제2 클록 사이클 동안 스톨하도록 구성된다.In some embodiments, the processing unit is configured to read a value of the state from memory during a first clock cycle and provide the value to a second additional processing unit for modification by the second additional processing unit, wherein the processing unit is configured to stall during a second clock cycle following the first clock cycle.

몇몇 실시형태에서, 적어도 하나의 미리 정의된 동작은 다음의 것 중 적어도 하나를 포함한다: 네트워크 인터페이스 디바이스의 메모리로부터 제1 데이터 패킷을 로딩하는 것; 제1 데이터 패킷을 네트워크 인터페이스 디바이스의 메모리에 저장하는 것; 및 제1 데이터 패킷과 관련하여 실행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것.In some embodiments, the at least one predefined operation comprises at least one of the following: loading a first data packet from memory of the network interface device; storing the first data packet in memory of the network interface device; and performing a lookup on a lookup table to determine an action to be performed with respect to the first data packet.

다른 양태에 따르면, 프로세싱 유닛에서 구현되는 방법이 제공되는데, 그 방법은 다음의 것을 포함한다: 네트워크 인터페이스 디바이스에서 수신되는 제1 데이터 패킷과 관련하여 적어도 하나의 미리 정의된 동작을 수행하는 것; 제1 데이터 패킷과 관련하여 제1 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제1 추가적인 프로세싱 유닛에 연결하는 것; 제1 데이터 패킷과 관련하여 제2 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제2 추가적인 프로세싱 유닛에 연결하는 것; 제1 추가적인 프로세싱 유닛으로부터, 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과를 수신하는 것; 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과에 의존하여 적어도 하나의 미리 정의된 동작을 수행하는 것; 및 제2 추가적인 적어도 하나의 미리 정의된 동작에서의 프로세싱을 위해 적어도 하나의 미리 정의된 동작의 결과를 제2 추가적인 프로세싱 유닛으로 전송하는 것.According to another aspect, a method implemented in a processing unit is provided, the method comprising: performing at least one predefined operation in relation to a first data packet received at a network interface device; connecting to a first additional processing unit configured to perform a first additional at least one predefined operation in relation to the first data packet; connecting to a second additional processing unit configured to perform a second additional at least one predefined operation in relation to the first data packet; receiving from the first additional processing unit a result of the first additional at least one predefined operation; performing at least one predefined operation dependent on a result of the first additional at least one predefined operation; and transmitting the result of the at least one predefined operation to the second additional processing unit for processing in the second additional at least one predefined operation.

다른 양태에 따르면, 프로세싱 유닛에 의해 실행될 때, 프로세싱 유닛으로 하여금 다음의 것을 포함하는 방법을 수행하게 하는 명령어를 저장하는 컴퓨터 판독 가능 비일시적 스토리지 디바이스가 제공된다: 네트워크 인터페이스 디바이스에서 수신되는 제1 데이터 패킷과 관련하여 적어도 하나의 미리 정의된 동작을 수행하는 것; 제1 데이터 패킷과 관련하여 제1 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제1 추가적인 프로세싱 유닛에 연결하는 것; 제1 데이터 패킷과 관련하여 제2 추가적인 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제2 추가적인 프로세싱 유닛에 연결하는 것; 제1 추가적인 프로세싱 유닛으로부터, 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과를 수신하는 것; 제1 추가적인 적어도 하나의 미리 정의된 동작의 결과에 의존하여 적어도 하나의 미리 정의된 동작을 수행하는 것; 및 제2 추가적인 적어도 하나의 미리 정의된 동작에서의 프로세싱을 위해 적어도 하나의 미리 정의된 동작의 결과를 제2 추가적인 프로세싱 유닛으로 전송하는 것.In another aspect, a computer-readable non-transitory storage device is provided that stores instructions that, when executed by a processing unit, cause the processing unit to perform a method comprising: performing at least one predefined operation in relation to a first data packet received at a network interface device; connecting to a first additional processing unit configured to perform a first additional at least one predefined operation in relation to the first data packet; connecting to a second additional processing unit configured to perform a second additional at least one predefined operation in relation to the first data packet; receiving from the first additional processing unit a result of the first additional at least one predefined operation; performing at least one predefined operation dependent on a result of the first additional at least one predefined operation; and transmitting a result of the at least one predefined operation to the second additional processing unit for processing in the second additional at least one predefined operation.

다른 양태에 따르면, 호스트 디바이스를 네트워크에 인터페이싱하기 위한 네트워크 인터페이스 디바이스가 제공되는데, 네트워크 인터페이스 디바이스는: 적어도 하나의 컨트롤러; 제1 인터페이스 - 제1 인터페이스는 데이터 패킷을 수신하도록 구성됨 - ; 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되는 제1 회로부; 및 제2 회로부를 포함하되, 제1 회로부는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 적어도 하나의 컨트롤러는 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었는지를 결정하도록 그리고, 상기 결정에 응답하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 제2 회로부를 제어하도록 구성된다.In another aspect, a network interface device for interfacing a host device to a network is provided, the network interface device comprising: at least one controller; a first interface, the first interface configured to receive a data packet; a first circuitry configured to perform a first function in relation to a data packet received at the first interface; and a second circuitry, wherein the first circuitry is configured to perform the first function in relation to the data packet received at the first interface during a compilation process for the first function to be performed at the second circuitry, and wherein the at least one controller is configured to determine whether the compilation process for the first function to be performed at the second circuitry is complete and, in response to the determination, control the second circuitry to initiate performance of the first function in relation to the data packet received at the first interface.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 제1 회로부를 제어하도록 구성된다.In some embodiments, at least one controller is configured to control the first circuit unit to stop performing the first function with respect to the data packet received at the first interface in response to a determination that the compilation process for the first function to be performed at the second circuit unit is complete.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여: 제1 인터페이스에서 수신되는 제1 데이터 플로우의 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록; 그리고 제1 데이터 플로우의 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 제1 회로부를 제어하도록 구성된다.In some embodiments, at least one controller is configured to control the first circuit unit to, in response to a determination that the compilation process for the first function to be performed in the second circuit unit is complete: to start performing the first function with respect to a data packet of the first data flow received at the first interface; and to stop performing the first function with respect to the data packet of the first data flow.

몇몇 실시형태에서, 제1 회로부는 적어도 하나의 중앙 프로세싱 유닛을 포함하되, 적어도 하나의 중앙 프로세싱 유닛의 각각은 제1 인터페이스에서 수신되는 적어도 하나의 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성된다.In some embodiments, the first circuit unit includes at least one central processing unit, each of the at least one central processing unit configured to perform a first function in relation to at least one data packet received at the first interface.

몇몇 실시형태에서, 제2 회로부는 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 구성되는 필드 프로그래머블 게이트 어레이를 포함한다.In some embodiments, the second circuit unit includes a field programmable gate array configured to initiate performance of a first function in connection with a data packet received at the first interface.

몇몇 실시형태에서, 제2 회로부는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 적어도 하나의 미리 정의된 동작과 관련되고, 제1 인터페이스는 제1 데이터 패킷을 수신하도록 구성되고, 하드웨어 모듈은, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스에 후속하여, 복수의 프로세싱 유닛 중 적어도 일부로 하여금, 제1 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위해 그들의 관련된 적어도 하나의 미리 정의된 동작을 특정한 순서로 수행하게 하도록 구성된다.In some embodiments, the second circuit unit comprises a hardware module comprising a plurality of processing units, each processing unit being associated with at least one predefined operation, the first interface being configured to receive the first data packet, and the hardware module being configured to cause at least some of the plurality of processing units to perform their associated at least one predefined operation in a particular order to perform the first function in relation to the first data packet, subsequent to a compilation process for the first function to be performed in the second circuit unit.

몇몇 실시형태에서, 제1 회로부는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 적어도 하나의 미리 정의된 동작과 관련되고, 제1 인터페이스는 제1 데이터 패킷을 수신하도록 구성되고, 하드웨어 모듈은, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안, 복수의 프로세싱 유닛 중 적어도 일부로 하여금, 제1 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위해 그들의 관련된 적어도 하나의 미리 정의된 동작을 특정한 순서로 수행하게 하도록 구성된다.In some embodiments, the first circuit unit comprises a hardware module comprising a plurality of processing units, each processing unit being associated with at least one predefined operation, the first interface being configured to receive a first data packet, and the hardware module being configured to cause, during a compilation process for a first function to be performed in the second circuit unit, at least some of the plurality of processing units to perform their associated at least one predefined operation in a particular order to perform the first function in relation to the first data packet.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에 의해 수행될 제1 기능을 컴파일하기 위한 컴파일 프로세스를 수행하도록 구성된다.In some embodiments, at least one controller is configured to perform a compilation process to compile the first function to be performed by the second circuit unit.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는: 컴파일 프로세스의 완료 이전에, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행할 것을 제1 회로부에 지시하도록 구성된다.In some embodiments, at least one controller is configured to: instruct the first circuit to perform a first function with respect to a data packet received at the first interface prior to completion of the compilation process.

몇몇 실시형태에서, 제2 회로부에 의해 수행될 제1 기능을 컴파일하기 위한 컴파일 프로세스는 호스트 디바이스에 의해 수행되되, 적어도 하나의 컨트롤러는, 호스트 디바이스로부터 컴파일 프로세스의 완료의 표시를 수신하는 것에 응답하여 컴파일 프로세스가 완료되었다는 것을 결정하도록 구성된다.In some embodiments, a compilation process for compiling a first function to be performed by a second circuit unit is performed by a host device, wherein at least one controller is configured to determine that the compilation process is complete in response to receiving an indication of completion of the compilation process from the host device.

몇몇 실시형태에서, 제1 인터페이스에서 수신되는 데이터 패킷을 프로세싱하기 위한 프로세싱 파이프라인을 포함하되, 프로세싱 파이프라인은, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 복수의 기능 중 하나를 수행하도록 각각 구성되는 복수의 컴포넌트를 포함하고, 복수의 컴포넌트 중 제1의 것은 제1 회로부에 의해 제공될 때 제1 기능을 제공하도록 구성되고, 복수의 컴포넌트 중 제2의 것은 제2의 적어도 하나의 프로세싱 유닛에 의해 제공될 때 제1 기능을 제공하도록 구성된다.In some embodiments, a processing pipeline for processing a data packet received at a first interface is provided, the processing pipeline including a plurality of components each configured to perform one of a plurality of functions in relation to the data packet received at the first interface, a first of the plurality of components being configured to provide the first function when provided by the first circuitry, and a second of the plurality of components being configured to provide the first function when provided by the second at least one processing unit.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 복수의 컴포넌트 중 제2의 것을 프로세싱 파이프라인에 삽입하는 것에 의해 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 제2 회로부를 제어하도록 구성된다.In some embodiments, at least one controller is configured to control the second circuitry to initiate performance of a first function in connection with a data packet received at the first interface by inserting a second one of the plurality of components into the processing pipeline.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 프로세싱 파이프라인으로부터 복수의 컴포넌트 중 제1의 것을 제거하는 것에 의해 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 제1 회로부를 제어하도록 구성된다.In some embodiments, at least one controller is configured to control the first circuit unit to stop performing the first function with respect to the data packet received at the first interface by removing a first one of the plurality of components from the processing pipeline in response to determining that the compilation process for the first function to be performed at the second circuit unit is complete.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 복수의 컴포넌트 중 제2의 것을 활성화하기 위해 프로세싱 파이프라인을 통해 제어 메시지를 전송하는 것에 의해 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 제2 회로부를 제어하도록 구성된다.In some embodiments, at least one controller is configured to control the second circuitry to initiate performance of a first function in connection with a data packet received at the first interface by sending a control message through the processing pipeline to activate a second one of the plurality of components.

몇몇 실시형태에서, 적어도 하나의 컨트롤러는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 상기 결정에 응답하여, 복수의 컴포넌트 중 제2의 것을 비활성화하기 위해 프로세싱 파이프라인을 통해 제어 메시지를 전송하는 것에 의해 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 중지하도록 제1 회로부를 제어하도록 구성된다.In some embodiments, at least one controller is configured to control the first circuit unit to stop performing the first function with respect to the data packet received at the first interface by sending a control message through the processing pipeline to disable a second one of the plurality of components in response to a determination that the compilation process for the first function to be performed at the second circuit unit is complete.

몇몇 실시형태에서, 복수의 컴포넌트 중 제1의 것은, 프로세싱 파이프라인을 통과하는 제1 데이터 플로우의 데이터 패킷과 관련하여 제1 기능을 제공하도록 구성되되, 복수의 컴포넌트 중 제2의 것은 프로세싱 파이프라인을 통과하는 제2 데이터 플로우의 데이터 패킷과 관련하여 제1 기능을 제공하도록 구성된다.In some embodiments, a first of the plurality of components is configured to provide a first function in relation to data packets of a first data flow passing through the processing pipeline, and a second of the plurality of components is configured to provide a first function in relation to data packets of a second data flow passing through the processing pipeline.

몇몇 실시형태에서, 제1 기능은 데이터 패킷을 필터링하는 것을 포함한다.In some embodiments, the first function includes filtering data packets.

몇몇 실시형태에서, 제1 인터페이스는 네트워크로부터 데이터 패킷을 수신하도록 구성된다.In some embodiments, the first interface is configured to receive data packets from a network.

몇몇 실시형태에서, 제1 인터페이스는 호스트 디바이스로부터 데이터 패킷을 수신하도록 구성된다.In some embodiments, the first interface is configured to receive data packets from a host device.

몇몇 실시형태에서, 제2 회로부에 대한 제1 기능의 컴파일 시간은 제1 회로부에 대한 제1 기능의 컴파일 시간보다 더 크다.In some embodiments, the compilation time of the first function for the second circuit unit is greater than the compilation time of the first function for the first circuit unit.

다른 양태에 따르면, 방법이 제공되는데, 그 방법은: 네트워크 인터페이스 디바이스의 제1 인터페이스에서 데이터 패킷을 수신하는 것; 네트워크 인터페이스 디바이스의 제1 회로부에서, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하는 것을 포함하되; 제1 회로부는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 방법은 다음의 것을 포함한다: 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 것을 결정하는 것; 및 상기 결정에 응답하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 네트워크 인터페이스 디바이스의 제2 회로부를 제어하는 것.In another aspect, a method is provided, comprising: receiving a data packet at a first interface of a network interface device; performing, at a first circuit portion of the network interface device, a first function with respect to the data packet received at the first interface; wherein the first circuit portion is configured to perform the first function with respect to the data packet received at the first interface during a compilation process for a first function to be performed at a second circuit portion, the method comprising: determining that the compilation process for the first function to be performed at the second circuit portion is complete; and controlling, in response to the determination, the second circuit portion of the network interface device to initiate performance of the first function with respect to the data packet received at the first interface.

다른 양태에 따르면, 데이터 프로세싱 시스템으로 하여금, 방법을 수행하게 하기 위한 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공되는데, 그 방법은: 네트워크 인터페이스 디바이스의 제1 인터페이스에서 데이터 패킷을 수신하는 것; 네트워크 인터페이스 디바이스의 제1 회로부에서, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하는 것을 포함하되, 제1 회로부는, 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 방법은 다음의 것을 포함한다: 제2 회로부에서 수행될 제1 기능에 대한 컴파일 프로세스가 완료되었다는 것을 결정하는 것; 및 상기 결정에 응답하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 네트워크 인터페이스 디바이스의 제2 회로부를 제어하는 것.In another aspect, a non-transitory computer-readable medium is provided comprising program instructions for causing a data processing system to perform a method, the method comprising: receiving a data packet at a first interface of a network interface device; performing, at a first circuitry of the network interface device, a first function with respect to the data packet received at the first interface, wherein the first circuitry is configured to perform the first function with respect to the data packet received at the first interface during a compilation process for the first function to be performed at a second circuitry, the method comprising: determining that the compilation process for the first function to be performed at the second circuitry is complete; and controlling, in response to the determination, the second circuitry of the network interface device to initiate performance of the first function with respect to the data packet received at the first interface.

다른 양태에 따르면, 데이터 프로세싱 시스템으로 하여금, 다음의 것을 수행하게 하기 위한 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공된다: 네트워크 인터페이스 디바이스의 제2 회로부에 의해 수행될 제1 기능을 컴파일하기 위해 컴파일 프로세스를 수행하는 것; 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 제1 회로부로 하여금, 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위한 제1 명령어를 전송하는 것; 및 컴파일 프로세스의 완료에 후속하여, 제2 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하기 위한 제2 명령어를 전송하는 것.In another aspect, a non-transitory computer-readable medium is provided comprising program instructions for causing a data processing system to: perform a compilation process to compile a first function to be performed by a second circuitry of a network interface device; prior to completion of the compilation process, transmitting a first instruction for causing the first circuitry of the network interface device to perform the first function in relation to a data packet received at a first interface of the network interface device; and subsequent to completion of the compilation process, transmitting a second instruction for causing the second circuitry to begin performing the first function in relation to the data packet received at the first interface.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금, 추가적인 컴파일 프로세스를 수행하게 하여 제1 회로부에 의해 수행될 제1 기능을 컴파일하기 위한 프로그램 명령어를 포함하되, 컴파일 프로세스에 대해 소요되는 시간은 추가적인 컴파일 프로세스에 대해 소요되는 시간보다 더 길다.In some embodiments, the non-transitory computer-readable medium includes program instructions for causing the data processing system to perform an additional compilation process to compile a first function to be performed by the first circuitry, wherein the compilation process takes longer than the additional compilation process.

몇몇 실시형태에서, 데이터 프로세싱 시스템은 호스트 디바이스를 포함하되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크와 인터페이싱하도록 구성된다.In some embodiments, the data processing system includes a host device, wherein the network interface device is configured to interface the host device with a network.

몇몇 실시형태에서, 시스템을 포함하는 데이터는 네트워크 인터페이스 디바이스를 포함하되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크와 인터페이싱하도록 구성된다.In some embodiments, the data comprising the system comprises a network interface device, wherein the network interface device is configured to interface the host device with a network.

몇몇 실시형태에서, 데이터 프로세싱 시스템은 호스트 디바이스 및 네트워크 인터페이스 디바이스를 포함하되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크와 인터페이싱하도록 구성된다.In some embodiments, the data processing system includes a host device and a network interface device, wherein the network interface device is configured to interface the host device with a network.

몇몇 실시형태에서, 제1 기능은 네트워크로부터 제1 인터페이스에서 수신되는 데이터 패킷을 필터링하는 것을 포함한다.In some embodiments, the first function comprises filtering data packets received at the first interface from the network.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금 다음의 것을 수행하게 하기 위한 프로그램 명령어를 포함한다: 컴파일 프로세스의 완료에 후속하여, 제1 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 중지하게 하기 위한 제3 명령어를 전송하는 것.In some embodiments, the non-transitory computer-readable medium includes program instructions for causing the data processing system to: transmit a third instruction to cause the first circuit unit to stop performing a function related to a data packet received at the first interface, subsequent to completion of the compilation process.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금 다음의 것을 수행하게 하기 위한 프로그램 명령어를 포함한다: 제2 회로부로 하여금, 제1 데이터 플로우의 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위한 명령어를 전송하는 것; 및 제1 회로부로 하여금, 제1 데이터 플로우의 데이터 패킷과 관련한 제1 기능의 수행을 중지하게 하기 위한 명령어를 전송하는 것.In some embodiments, a non-transitory computer-readable medium includes program instructions for causing a data processing system to: transmit instructions for causing a second circuit unit to perform a first function with respect to data packets of a first data flow; and transmit instructions for causing the first circuit unit to stop performing the first function with respect to data packets of the first data flow.

몇몇 실시형태에서, 제1 회로부는 적어도 하나의 중앙 프로세싱 유닛을 포함하되, 제2 컴파일 프로세스의 완료 이전에, 적어도 하나의 중앙 프로세싱 유닛의 각각은 제1 인터페이스에서 수신되는 적어도 하나의 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성된다.In some embodiments, the first circuitry includes at least one central processing unit, wherein prior to completion of the second compilation process, each of the at least one central processing unit is configured to perform a first function in relation to at least one data packet received at the first interface.

몇몇 실시형태에서, 제2 회로부는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 적어도 하나의 미리 정의된 동작과 관련되고, 제1 인터페이스에서 수신되는 데이터 패킷은 제1 데이터 패킷을 포함하고, 하드웨어 모듈은, 제2 컴파일 프로세스의 완료에 후속하여, 복수의 프로세싱 유닛 중 적어도 일부의 각각의 프로세싱 유닛이 제1 데이터 패킷과 관련하여 자신의 각각의 적어도 하나의 동작을 수행하는 것에 의해 제1 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성된다.In some embodiments, the second circuitry comprises a hardware module comprising a plurality of processing units, each processing unit being associated with at least one predefined operation, wherein a data packet received at the first interface comprises a first data packet, and wherein the hardware module is configured to perform a first function with respect to the first data packet by causing each processing unit of at least some of the plurality of processing units to perform its respective at least one operation with respect to the first data packet subsequent to completion of the second compilation process.

몇몇 실시형태에서, 제1 회로부는, 데이터 패킷과 관련하여 제1 기능을 제공하도록 구성되는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함하되, 각각의 프로세싱 유닛은 적어도 하나의 미리 정의된 동작과 관련되고, 제1 인터페이스에서 수신되는 데이터 패킷은 제1 데이터 패킷을 포함하고, 하드웨어 모듈은, 제2 컴파일 프로세스의 완료 이전에, 복수의 프로세싱 유닛 중 적어도 일부의 각각의 프로세싱 유닛이 제1 데이터 패킷과 관련하여 자신의 각각의 적어도 하나의 동작을 수행하는 것에 의해 제1 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성된다.In some embodiments, the first circuit unit comprises a hardware module comprising a plurality of processing units configured to provide a first function in relation to a data packet, each processing unit being associated with at least one predefined operation, wherein a data packet received at the first interface comprises the first data packet, and wherein the hardware module is configured to perform the first function in relation to the first data packet by causing each processing unit of at least some of the plurality of processing units to perform its respective at least one operation in relation to the first data packet prior to completion of the second compilation process.

몇몇 실시형태에서, 컴파일 프로세스는, 컴퓨터 코드 명령어의 시퀀스의 복수의 프로세싱 스테이지 중 하나와 관련된 적어도 하나의 동작을, 특정한 순서로, 수행할 것을 제2 회로부의 복수의 프로세싱 유닛의 각각에 할당하는 것을 포함한다.In some embodiments, the compilation process includes assigning, to each of the plurality of processing units of the second circuitry, to perform, in a particular order, at least one operation associated with one of the plurality of processing stages of a sequence of computer code instructions.

몇몇 실시형태에서, 제1 회로부에 의해 제공되는 제1 기능은, 제1 인터페이스에서 수신되는 데이터 패킷을 프로세싱하기 위한 프로세싱 파이프라인의 컴포넌트로서 제공되되, 제2 회로부에 의해 제공되는 제1 기능은 프로세싱 파이프라인의 컴포넌트로서 제공된다.In some embodiments, the first function provided by the first circuit unit is provided as a component of a processing pipeline for processing a data packet received at the first interface, and the first function provided by the second circuit unit is provided as a component of the processing pipeline.

몇몇 실시형태에서, 제1 명령어는 복수의 컴포넌트 중 제1의 것으로 하여금 프로세싱 파이프라인에 삽입되게 하도록 구성되는 명령어를 포함한다.In some embodiments, the first instruction comprises an instruction configured to cause a first one of the plurality of components to be inserted into the processing pipeline.

몇몇 실시형태에서, 제2 명령어는 복수의 컴포넌트 중 제2의 것으로 하여금 프로세싱 파이프라인에 삽입되게 하도록 구성되는 명령어를 포함한다.In some embodiments, the second instruction comprises an instruction configured to cause a second one of the plurality of components to be inserted into the processing pipeline.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금: 컴파일 프로세스의 완료에 후속하여, 제1 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 중지하게 하기 위한 제3 명령어를 전송하는 것을 수행하게 하기 위한 프로그램 명령어를 포함하되, 제3 명령어는 복수의 컴포넌트 중 제1의 것으로 하여금 프로세싱 파이프라인으로부터 제거되게 하도록 구성되는 명령어를 포함한다.In some embodiments, a non-transitory computer-readable medium includes program instructions for causing a data processing system to: subsequent to completion of a compilation process, transmit a third instruction to cause a first circuit unit to cease performing a first function associated with a data packet received at a first interface, wherein the third instruction comprises instructions configured to cause a first one of the plurality of components to be removed from a processing pipeline.

몇몇 실시형태에서, 제1 명령어는 복수의 컴포넌트 중 제2의 것을 활성화하기 위해 프로세싱 파이프라인을 통해 송신될 제어 메시지를 포함한다.In some embodiments, the first command includes a control message to be transmitted through the processing pipeline to activate a second one of the plurality of components.

몇몇 실시형태에서, 제2 명령어는 복수의 컴포넌트 중 제2의 것을 활성화하기 위해 프로세싱 파이프라인을 통해 송신될 제어 메시지를 포함한다.In some embodiments, the second instruction comprises a control message to be transmitted through the processing pipeline to activate a second one of the plurality of components.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금: 컴파일 프로세스의 완료에 후속하여, 제1 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 중지하게 하기 위한 제3 명령어를 전송하는 것을 수행하게 하기 위한 프로그램 명령어를 포함하는데, 제3 명령어는 복수의 컴포넌트 중 제1의 것을 비활성화하기 위한 프로세싱 파이프라인을 통과하는 제어 메시지를 포함한다.In some embodiments, a non-transitory computer-readable medium includes program instructions that cause a data processing system to: subsequent to completion of the compilation process, transmit a third instruction to cause the first circuitry to cease performing a function associated with a data packet received at the first interface, the third instruction comprising a control message that passes through a processing pipeline to disable a first one of the plurality of components.

다른 양태에 따르면, 적어도 하나의 프로세서 및 컴퓨터 프로그램 코드를 포함하는 적어도 하나의 메모리를 포함하는 데이터 프로세싱 시스템이 제공되는데, 적어도 하나의 메모리 및 컴퓨터 프로그램 코드는, 적어도 하나의 프로세서와 함께, 데이터 프로세싱 시스템으로 하여금: 네트워크 인터페이스 디바이스의 제2 회로부에 의해 수행될 기능을 컴파일하기 위해 컴파일 프로세스를 수행하게 하도록; 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 기능을 수행할 것을 네트워크 인터페이스 디바이스의 제1 회로부에 지시하게 하도록; 그리고 제2 컴파일 프로세스의 완료에 후속하여, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 시작할 것을 제2의 적어도 하나의 프로세싱 유닛에 지시하게 하도록 구성된다.According to another aspect, a data processing system is provided, comprising at least one processor and at least one memory including computer program code, wherein the at least one memory and the computer program code, together with the at least one processor, are configured to cause the data processing system to: perform a compilation process to compile a function to be performed by a second circuit unit of a network interface device; prior to completion of the compilation process, instruct the first circuit unit of the network interface device to perform a function in relation to a data packet received at a first interface of the network interface device; and subsequent to completion of the second compilation process, instruct a second at least one processing unit to begin performing the function in relation to the data packet received at the first interface.

다른 양태에 따르면, 데이터 프로세싱 시스템에서의 구현을 위한 방법이 제공되는데, 그 방법은 다음의 것을 포함한다: 네트워크 인터페이스 디바이스의 제2 회로부에 의해 수행될 기능을 컴파일하기 위해 컴파일 프로세스를 수행하는 것; 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 제1 회로부로 하여금, 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 기능을 수행하게 하기 위한 제1 명령어를 전송하는 것; 및 컴파일 프로세스의 완료에 후속하여, 제2 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 시작하게 하기 위한 제2 명령어를 전송하는 것.According to another aspect, a method for implementation in a data processing system is provided, the method comprising: performing a compilation process to compile a function to be performed by a second circuitry of a network interface device; prior to completion of the compilation process, transmitting a first command to cause a first circuitry of the network interface device to perform a function in relation to a data packet received at a first interface of the network interface device; and subsequent to completion of the compilation process, transmitting a second command to cause the second circuitry to start performing the function in relation to the data packet received at the first interface.

다른 양태에 따르면, 데이터 프로세싱 시스템으로 하여금, 컴퓨터 코드 명령어의 시퀀스의 복수의 프로세싱 스테이지의 각각과 관련되는 적어도 하나의 동작을, 특정한 순서로, 수행할 것을 복수의 프로세싱 유닛의 각각에 할당하게 하기 위한 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공되되, 복수의 프로세싱 스테이지는 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 제1 데이터 패킷과 관련하여 제1 기능을 제공하고, 복수의 프로세싱 유닛의 각각은 복수의 타입의 프로세싱 중 하나를 수행하도록 구성되고, 복수의 프로세싱 유닛 중 적어도 일부는 상이한 타입의 프로세싱을 수행하도록 구성되고, 복수의 프로세싱 유닛의 각각에 대해, 할당하는 것은, 프로세싱 유닛이 각각의 적어도 하나의 동작을 수행하기에 적절한 타입의 프로세싱을 수행하도록 구성된다는 것을 결정하는 것에 의존하여 수행된다.In another aspect, a non-transitory computer-readable medium is provided comprising program instructions for causing a data processing system to assign, in a particular order, at least one operation associated with each of a plurality of processing stages of a sequence of computer code instructions to each of a plurality of processing units, wherein the plurality of processing stages provide a first function in relation to a first data packet received at a first interface of a network interface device, wherein each of the plurality of processing units is configured to perform one of a plurality of types of processing, at least some of the plurality of processing units are configured to perform different types of processing, and wherein, for each of the plurality of processing units, the assigning is performed based on determining that the processing unit is configured to perform a type of processing appropriate for performing each of the at least one operation.

몇몇 실시형태에서, 프로세싱의 타입의 각각은 복수의 템플릿 중 하나에 의해 정의된다.In some embodiments, each type of processing is defined by one of a plurality of templates.

몇몇 실시형태에서, 프로세싱의 타입은 다음의 것 중 적어도 하나를 포함한다: 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷에 액세스하는 것; 하드웨어 모듈의 메모리에 저장되는 룩업 테이블에 액세스하는 것; 데이터 패킷으로부터 로딩되는 데이터에 대해 논리 연산을 수행하는 것; 및 룩업 테이블로부터 로딩되는 데이터에 대해 논리 연산을 수행하는 것.In some embodiments, the type of processing includes at least one of the following: accessing a data packet received at a network interface device; accessing a lookup table stored in a memory of a hardware module; performing a logical operation on data loaded from the data packet; and performing a logical operation on data loaded from the lookup table.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상은 하드웨어 모듈의 공통 클록 신호에 따라 그들의 관련된 적어도 하나의 동작을 수행하도록 구성된다.In some embodiments, at least two of the plurality of processing units are configured to perform at least one of their associated operations in response to a common clock signal of the hardware module.

몇몇 실시형태에서, 할당하는 것은, 복수의 프로세싱 유닛 중 적어도 일부의 두 개 이상의 각각에, 클록 신호에 의해 정의되는 미리 정의된 길이의 시간 내에 자신의 관련된 적어도 하나의 동작을 수행할 것을 할당하는 것을 포함한다.In some embodiments, assigning comprises assigning to each of two or more of at least some of the plurality of processing units to perform at least one associated operation within a predefined length of time defined by a clock signal.

몇몇 실시형태에서, 할당하는 것은, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상에, 미리 정의된 길이의 시간의 시간 기간 내에 제1 데이터 패킷에 액세스할 것을 할당하는 것을 포함한다.In some embodiments, assigning comprises assigning access to the first data packet within a time period of a predefined length to at least two of the plurality of processing units.

몇몇 실시형태에서, 할당하는 것은, 미리 정의된 길이의 시간의 시간 기간의 종료에 응답하여, 복수의 프로세싱 유닛 중 적어도 일부 중 두 개 이상의 각각에, 각각의 적어도 하나의 동작의 결과를 다음 번 프로세싱 유닛으로 전송할 것을 할당하는 것을 포함한다.In some embodiments, the assigning comprises, in response to the expiration of a time period of a predefined length, assigning to each of two or more of at least some of the plurality of processing units, the results of each of the at least one operations, to be transferred to a next processing unit.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금, 다음의 것을 수행하게 하기 위한 프로그램 명령어를 포함한다: 복수의 스테이지 중 적어도 일부에 단일의 클록 사이클을 차지할 것을 할당하는 것.In some embodiments, a non-transitory computer-readable medium includes program instructions for causing a data processing system to: assign at least some of the plurality of stages to occupy a single clock cycle.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금, 복수의 프로세싱 유닛 중 두 개 이상에, 병렬로 실행되도록 그들의 할당된 적어도 하나의 동작을 실행할 것을 할당하게 하기 위한 프로그램 명령어를 포함한다.In some embodiments, a non-transitory computer-readable medium includes program instructions for causing a data processing system to assign at least one of a plurality of processing units to perform their assigned operations in parallel.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 복수의 프로세싱 유닛을 포함하는 하드웨어 모듈을 포함한다.In some embodiments, the network interface device comprises a hardware module comprising a plurality of processing units.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 데이터 프로세싱 시스템으로 하여금, 다음의 것을 수행하게 하기 위한 컴퓨터 프로그램 명령어를 포함한다: 할당을 포함하는 컴파일 프로세스를 수행하는 것; 컴파일 프로세스의 완료 이전에, 네트워크 인터페이스 디바이스의 회로부로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능을 수행하게 하기 위한 제1 명령어를 전송하는 것; 및 컴파일 프로세스의 완료에 후속하여, 복수의 프로세싱 유닛으로 하여금, 제1 인터페이스에서 수신되는 데이터 패킷과 관련한 제1 기능의 수행을 시작하게 하기 위한 제2 명령어를 전송하는 것.In some embodiments, a non-transitory computer-readable medium includes computer program instructions for causing a data processing system to: perform a compilation process including an assignment; prior to completion of the compilation process, transmit a first instruction for causing circuitry of a network interface device to perform a first function with respect to a data packet received at a first interface; and subsequent to completion of the compilation process, transmit a second instruction for causing a plurality of processing units to begin performing the first function with respect to the data packet received at the first interface.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 하나 이상에 대해, 할당된 적어도 하나의 동작은 다음의 것 중 적어도 하나를 포함한다: 네트워크 인터페이스 디바이스의 메모리로부터 제1 데이터 패킷의 적어도 하나의 값을 로딩하는 것; 제1 데이터 패킷의 적어도 하나의 값을 네트워크 인터페이스 디바이스의 메모리에 저장하는 것; 및 제1 데이터 패킷과 관련하여 실행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것.In some embodiments, for at least one of the plurality of processing units, the assigned at least one operation comprises at least one of the following: loading at least one value of the first data packet from memory of the network interface device; storing at least one value of the first data packet to memory of the network interface device; and performing a lookup against a lookup table to determine an action to be performed with respect to the first data packet.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 제1 데이터 패킷과 관련하여 제1 기능을 수행하기 위해, 데이터 프로세싱 시스템으로 하여금, 특정한 순서로 복수의 프로세싱 유닛 사이에서 제1 데이터 패킷을 라우팅하도록 네트워크 인터페이스 디바이스의 라우팅 하드웨어를 구성하기 위한 명령어를 발행하게 하기 위한 컴퓨터 프로그램 명령어를 포함한다.In some embodiments, a non-transitory computer-readable medium includes computer program instructions for causing a data processing system to issue instructions to configure routing hardware of a network interface device to route a first data packet among a plurality of processing units in a particular order to perform a first function with respect to the first data packet.

몇몇 실시형태에서, 복수의 프로세싱 유닛에 의해 제공되는 제1 기능은, 제1 인터페이스에서 수신되는 데이터 패킷을 프로세싱하기 위한 프로세싱 파이프라인의 컴포넌트로서 제공된다.In some embodiments, the first function provided by the plurality of processing units is provided as a component of a processing pipeline for processing data packets received at the first interface.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 컴포넌트로 하여금 프로세싱 파이프라인으로 삽입되게 하기 위한 명령어를 데이터 프로세싱 시스템으로 하여금 발행하게 하는 것에 의해, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능의 수행을 복수의 프로세싱 유닛으로 하여금 시작하게 하기 위한 컴퓨터 프로그램 명령어를 포함한다.In some embodiments, a non-transitory computer-readable medium includes computer program instructions for causing a data processing system to issue instructions to cause a component to be inserted into a processing pipeline, thereby causing a plurality of processing units to perform a first function in relation to a data packet received at a first interface.

몇몇 실시형태에서, 비일시적 컴퓨터 판독 가능 매체는, 컴포넌트로 하여금 프로세싱 파이프라인에서 활성화되게 하기 위한 명령어를 데이터 프로세싱 시스템으로 하여금 발행하게 하는 것에 의해, 제1 인터페이스에서 수신되는 데이터 패킷과 관련하여 제1 기능의 수행을 복수의 프로세싱 유닛으로 하여금 시작하게 하기 위한 컴퓨터 프로그램 명령어를 포함한다.In some embodiments, a non-transitory computer-readable medium includes computer program instructions for causing a plurality of processing units to perform a first function in relation to a data packet received at a first interface by causing the data processing system to issue instructions to cause the components to be activated in a processing pipeline.

몇몇 실시형태에서, 데이터 프로세싱 시스템은 네트워크 인터페이스 디바이스를 포함한다.In some embodiments, the data processing system includes a network interface device.

몇몇 실시형태에서, 데이터 프로세싱 시스템은: 네트워크 인터페이스 디바이스; 및 호스트 디바이스를 포함하되, 네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크와 인터페이싱하도록 구성된다.In some embodiments, a data processing system comprises: a network interface device; and a host device, wherein the network interface device is configured to interface the host device with a network.

다른 양태에 따르면, 적어도 하나의 프로세서 및 컴퓨터 프로그램 코드를 포함하는 적어도 하나의 메모리를 포함하는 데이터 프로세싱 시스템이 제공되는데, 적어도 하나의 메모리 및 컴퓨터 프로그램 코드는, 적어도 하나의 프로세서와 함께, 데이터 프로세싱 시스템으로 하여금, 컴퓨터 코드 명령어의 시퀀스의 복수의 프로세싱 스테이지 중 하나와 관련되는 적어도 하나의 동작을, 특정한 순서로, 수행할 것을 복수의 프로세싱 유닛의 각각에 할당하게 하도록 구성되고, 복수의 프로세싱 스테이지는 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 제1 데이터 패킷과 관련하여 제1 기능을 제공하고, 복수의 프로세싱 유닛의 각각은 복수의 타입의 프로세싱 중 하나를 수행하도록 구성되고, 복수의 프로세싱 유닛 중 적어도 일부는 상이한 타입의 프로세싱을 수행하도록 구성되고, 복수의 프로세싱 유닛의 각각에 대해, 할당하는 것은, 프로세싱 유닛이 각각의 적어도 하나의 동작을 수행하기에 적절한 타입의 프로세싱을 수행하도록 구성된다는 것을 결정하는 것에 의존하여 수행된다.In another aspect, a data processing system is provided, comprising at least one processor and at least one memory including computer program code, wherein the at least one memory and the computer program code, together with the at least one processor, are configured to cause the data processing system to assign, in a particular order, at least one operation associated with one of a plurality of processing stages of a sequence of computer code instructions to each of a plurality of processing units, the plurality of processing stages providing a first function in relation to a first data packet received at a first interface of a network interface device, each of the plurality of processing units configured to perform one of a plurality of types of processing, at least some of the plurality of processing units configured to perform different types of processing, and wherein, for each of the plurality of processing units, the assigning is performed based on determining that the processing unit is configured to perform a type of processing appropriate for performing each of the at least one operation.

다른 양태에 따르면, 컴퓨터 코드 명령어의 시퀀스의 복수의 프로세싱 스테이지 중 하나와 관련되는 적어도 하나의 동작을, 특정한 순서로, 수행할 것을 복수의 프로세싱 유닛의 각각에 할당하는 것을 포함하는 방법이 제공되는데, 복수의 프로세싱 스테이지는 네트워크 인터페이스 디바이스의 제1 인터페이스에서 수신되는 제1 데이터 패킷과 관련하여 제1 기능을 제공하고, 복수의 프로세싱 유닛의 각각은 복수의 타입의 프로세싱 중 하나를 수행하도록 구성되고, 복수의 프로세싱 유닛 중 적어도 일부는 상이한 타입의 프로세싱을 수행하도록 구성되고, 복수의 프로세싱 유닛의 각각에 대해, 할당하는 것은, 프로세싱 유닛이 각각의 적어도 하나의 동작을 수행하기에 적절한 타입의 프로세싱을 수행하도록 구성된다는 것을 결정하는 것에 의존하여 수행된다.According to another aspect, a method is provided, comprising assigning, to each of a plurality of processing units, to perform, in a particular order, at least one operation associated with one of a plurality of processing stages of a sequence of computer code instructions, the plurality of processing stages providing a first function in relation to a first data packet received at a first interface of a network interface device, each of the plurality of processing units configured to perform one of a plurality of types of processing, at least some of the plurality of processing units configured to perform different types of processing, and for each of the plurality of processing units, the assigning is performed based on determining that the processing unit is configured to perform a type of processing appropriate for performing each of the at least one operation.

하드웨어 모듈의 프로세싱 유닛은 단일의 단계에서 그들의 동작의 타입을 실행하는 것으로서 설명되었다. 그러나, 기술 분야의 숙련된 자는, 이 피쳐가 단지 바람직한 피쳐일 뿐이며 본 발명의 기능에 필수적이거나 또는 필수 불가결한 것은 아니다는 것을 인식할 것이다.The processing units of the hardware modules have been described as performing their types of operations in a single step. However, those skilled in the art will recognize that this feature is merely a desirable feature and is not essential or indispensable to the functioning of the present invention.

한 양태에 따르면, 다음의 것을 포함하는 방법이 제공된다: 컴파일러에서, 회로의 비트 파일 디스크립션(bit file description) - 상기 비트 파일 디스크립션은 회로의 일부의 라우팅의 설명을 포함함 - 및 프로그램을 수신하는 것; 및 상기 프로그램에 대한 비트 파일을 출력하기 위해 상기 비트 파일 디스크립션을 사용하여 상기 프로그램을 컴파일하는 것.According to one aspect, a method is provided comprising: receiving, in a compiler, a bit file description of a circuit, the bit file description including a description of the routing of a portion of the circuit, and a program; and compiling the program using the bit file description to output a bit file for the program.

방법은 상기 프로그램과 관련되는 기능을 수행하도록 상기 회로의 상기 일부의 적어도 일부를 구성하기 위해 상기 비트 파일을 사용하는 것을 포함할 수도 있다.The method may include using the bit file to configure at least a portion of the circuit to perform a function associated with the program.

비트 파일 디스크립션은 회로의 상기 일부의 복수의 프로세싱 유닛 사이의 라우팅에 관한 정보를 포함할 수도 있다.The bit file description may also include information regarding routing between multiple processing units of the circuit.

비트 파일 디스크립션은, 상기 복수의 프로세싱 유닛 중 적어도 하나에 대한, 다음의 것 중 적어도 하나를 나타내는 라우팅 정보를 포함할 수도 있다: 데이터가 어떤 하나 이상의 다른 프로세싱 유닛으로 출력될 수 있는지; 및 데이터가 어떤 하나 이상의 다른 프로세싱 유닛으로부터 수신될 수 있는지.The bit file description may include routing information for at least one of the plurality of processing units that indicates at least one of the following: to which one or more other processing units the data can be output; and from which one or more other processing units the data can be received.

비트 파일 디스크립션은 두 개 이상의 각각의 프로세싱 유닛 사이의 하나 이상의 루트(route)를 나타내는 라우팅 정보를 포함할 수도 있다.A bit file description may also include routing information indicating one or more routes between two or more respective processing units.

비트 파일 디스크립션은, 프로그램에 대한 비트 파일을 제공하기 위해 프로그램을 컴파일할 때 컴파일러에 의해 사용 가능한 루트만을 나타내는 정보를 포함할 수도 있다.A bit file description may also contain information indicating only the roots that are available to the compiler when compiling a program to provide a bit file for the program.

비트 파일은, 각각의 프로세싱 유닛에 대한, 다음의 것 중 적어도 하나를 나타내는 정보를 포함할 수도 있다: 입력이, 상기 하나 이상의 다른 프로세싱 유닛 중 어떤 하나 이상으로부터, 각각의 프로세싱 유닛에 대한 비트 파일 디스크립션에서 제공되어야 하는지; 출력이, 상기 하나 이상의 다른 프로세싱 유닛 중 어떤 하나 이상으로, 각각의 프로세싱 유닛에 대한 비트 파일 디스크립션에서 제공되어야 하는지.The bit file may include information indicating, for each processing unit, at least one of the following: whether an input is to be provided in the bit file description for each processing unit from any one or more of the other processing units; and whether an output is to be provided in the bit file description for each processing unit from any one or more of the other processing units.

회로의 일부는 복수의 프로세싱 유닛을 포함하는 구성 가능한 하드웨어 모듈의 적어도 일부를 포함할 수도 있되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 상기 비트 파일 디스크립션은 복수의 프로세싱 유닛 중 적어도 일부 사이의 라우팅에 관한 정보를 포함하고, 상기 방법은, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해 하드웨어로 하여금 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하게 하도록 상기 비트 파일을 사용하는 것을 포함할 수도 있다.A portion of the circuit may include at least a portion of a configurable hardware module comprising a plurality of processing units, each processing unit being associated with a predefined type of operation executable in a single step, at least some of the plurality of processing units being associated with different predefined types of operations, the bit file description including information regarding routing between at least some of the plurality of processing units, and the method may include using the bit file to provide a first data processing pipeline for processing one or more of the plurality of data packets to cause hardware to interconnect at least some of the plurality of processing units to perform a first function in relation to the one or more of the plurality of data packets.

비트 파일 디스크립션은 FPGA의 적어도 일부의 것일 수도 있다.The bit file description may be for at least part of the FPGA.

비트 파일 디스크립션은 동적으로 프로그래밍 가능한 FPGA의 일부의 것일 수도 있다.The bit file description may be part of a dynamically programmable FPGA.

프로그램은 eBPF 프로그램 및 P4 프로그램 중 하나를 포함할 수도 있다.The program may include either an eBPF program or a P4 program.

컴파일러 및 FPGA는 네트워크 인터페이스 디바이스에서 제공될 수도 있다.The compiler and FPGA may also be provided in the network interface device.

다른 양태에 따르면, 적어도 하나의 프로세서 및 하나 이상의 프로그램에 대한 컴퓨터 코드를 포함하는 적어도 하나의 메모리를 포함하는 장치가 제공되는데, 적어도 하나의 메모리 및 컴퓨터 코드는, 적어도 하나의 프로세서와 함께, 장치로 하여금 적어도: 비트 파일 디스크립션 - 상기 비트 파일 디스크립션은 회로의 일부의 라우팅의 설명을 포함함 - 및 프로그램을 수신하게 하도록; 그리고 상기 프로그램에 대한 비트 파일을 출력하기 위해 상기 비트 파일 디스크립션을 사용하여 상기 프로그램을 컴파일하도록 구성된다.According to another aspect, a device is provided comprising at least one processor and at least one memory comprising computer code for one or more programs, wherein the at least one memory and the computer code are configured to cause the device, together with the at least one processor, to at least: receive a bit file description, the bit file description including a description of the routing of a portion of a circuit, and a program; and compile the program using the bit file description to output a bit file for the program.

적어도 하나의 메모리 및 컴퓨터 코드는, 적어도 하나의 프로세서와 함께, 장치로 하여금, 상기 프로그램과 관련되는 기능을 수행하도록 상기 회로의 상기 일부의 적어도 일부를 구성하기 위해 상기 비트 파일을 사용하게 하도록 구성될 수도 있다.At least one memory and computer code may be configured to cause the device, together with at least one processor, to use the bit file to configure at least a portion of the circuitry to perform a function associated with the program.

비트 파일 디스크립션은 두 개 이상의 각각의 프로세싱 유닛 사이의 하나 이상의 루트를 나타내는 라우팅 정보를 포함할 수도 있다.A bit file description may also include routing information indicating one or more routes between two or more respective processing units.

회로의 일부는 복수의 프로세싱 유닛을 포함하는 구성 가능한 하드웨어 모듈의 적어도 일부를 포함할 수도 있되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 상기 비트 파일 디스크립션은 복수의 프로세싱 유닛 중 적어도 일부 사이의 라우팅에 관한 정보를 포함하고, 적어도 하나의 메모리 및 컴퓨터 코드는, 적어도 하나의 프로세서와 함께, 장치로 하여금, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해 하드웨어로 하여금 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하게 하도록 상기 비트 파일을 사용하게 하도록 구성된다.A portion of the circuit may include at least a portion of a configurable hardware module comprising a plurality of processing units, each processing unit being associated with a predefined type of operation executable in a single step, at least some of the plurality of processing units being associated with different predefined types of operations, the bit file description including information regarding routing between at least some of the plurality of processing units, and at least one memory and computer code configured to cause the device, together with at least one processor, to use the bit file to provide a first data processing pipeline for processing one or more of the plurality of data packets, thereby causing the hardware to interconnect at least some of the plurality of processing units to perform a first function in relation to the one or more of the plurality of data packets.

다른 양태에 따르면, 네트워크 인터페이스 디바이스가 제공되는데, 네트워크 인터페이스 디바이스는: 제1 인터페이스 - 제1 인터페이스는 복수의 데이터 패킷을 수신하도록 구성됨 - ; 복수의 프로세싱 유닛 - 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련됨 - 을 포함하는 구성 가능한 하드웨어 모듈; 컴파일러 - 상기 컴파일러는 비트 파일 디스크립션 - 상기 비트 파일 디스크립션은 상기 구성 가능한 하드웨어 모듈의 적어도 일부의 라우팅의 설명을 포함함 - 및 프로그램을 수신하도록, 그리고 상기 프로그램에 대한 비트 파일을 출력하기 위해 상기 비트 파일 디스크립션을 사용하여 상기 프로그램을 컴파일하도록 구성됨 - 를 포함하되, 상기 하드웨어 모듈은 상기 프로그램과 관련되는 제1 기능을 수행하도록 상기 비트 파일을 사용하여 구성 가능하다.In another aspect, a network interface device is provided, comprising: a configurable hardware module comprising: a first interface, the first interface configured to receive a plurality of data packets; a plurality of processing units, each processing unit associated with a predefined type of operation executable in a single step; a compiler, the compiler configured to receive a bit file description, the bit file description including a description of routing of at least a portion of the configurable hardware module; and a program, and to compile the program using the bit file description to output a bit file for the program, wherein the hardware module is configurable using the bit file to perform a first function associated with the program.

네트워크 인터페이스 디바이스는 호스트 디바이스를 네트워크에 인터페이싱하기 위한 것일 수도 있다.A network interface device may be intended to interface a host device to a network.

상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련될 수도 있다.At least some of the above plurality of processing units may be associated with different predefined types of operations.

하드웨어 모듈은, 상기 복수의 데이터 패킷 중 하나 이상을 프로세싱하기 위한 제1 데이터 프로세싱 파이프라인을 제공하여 상기 복수의 데이터 패킷 중 상기 하나 이상과 관련하여 제1 기능을 수행하기 위해 상기 복수의 상기 프로세싱 유닛 중 적어도 일부를 인터커넥트하도록 구성 가능할 수도 있다.The hardware module may be configured to interconnect at least some of the plurality of processing units to provide a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function in relation to the one or more of the plurality of data packets.

몇몇 실시형태에서, 제1 기능은 필터링 기능을 포함한다. 몇몇 실시형태에서, 기능은 터널링, 캡슐화 및 라우팅 기능 중 적어도 하나를 포함한다. 몇몇 실시형태에서, 제1 기능은 확장된 버클리 패킷 필터 기능을 포함한다.In some embodiments, the first function comprises a filtering function. In some embodiments, the function comprises at least one of tunneling, encapsulation, and routing functions. In some embodiments, the first function comprises an extended Berkeley packet filter function.

몇몇 실시형태에서, 제1 기능은 분산형 서비스 거부 스크러빙 동작을 포함한다.In some embodiments, the first function comprises a distributed denial-of-service scrubbing operation.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부의 두 개 이상의 각각은 클록 신호에 의해 정의되는 미리 정의된 길이의 시간 내에서 자신의 관련된 미리 정의된 타입의 동작을 수행하도록 구성된다.In some embodiments, at least two of the plurality of processing units are each configured to perform their associated predefined type of operation within a predefined length of time defined by a clock signal.

몇몇 실시형태에서, 복수의 프로세싱 유닛 중 적어도 일부 중 제1의 것은 복수의 프로세싱 유닛 중 제2의 것에 의한 상태의 값의 액세스 동안 스톨하도록 구성된다.In some embodiments, at least a first one of the plurality of processing units is configured to stall during access of a value of a state by a second one of the plurality of processing units.

몇몇 실시형태에서, 복수의 컴포넌트는 하드웨어 모듈과는 상이한 회로부에서 제1 기능을 제공하도록 구성되는 복수의 컴포넌트 중 제2의 것을 포함하되, 네트워크 인터페이스 디바이스는, 프로세싱 파이프라인을 통과하는 데이터 패킷으로 하여금, 복수의 컴포넌트 중 제1의 것 및 복수의 컴포넌트 중 제2의 것: 중 하나에 의해 프로세싱되게 하도록 구성되는 적어도 하나의 컨트롤러를 포함한다.In some embodiments, the plurality of components comprises a second one of the plurality of components configured to provide the first function in a circuit different from the hardware module, wherein the network interface device comprises at least one controller configured to cause a data packet passing through the processing pipeline to be processed by one of: the first one of the plurality of components and the second one of the plurality of components.

몇몇 실시형태에서, 네트워크 인터페이스 디바이스는 적어도 하나의 컨트롤러를 포함하되, 추가적인 회로부는, 하드웨어 모듈에서 수행될 제1 기능에 대한 컴파일 프로세스 동안 데이터 패킷과 관련하여 제1 기능을 수행하도록 구성되고, 적어도 하나의 컨트롤러는, 컴파일 프로세스의 완료에 응답하여, 데이터 패킷과 관련한 제1 기능의 수행을 시작하도록 하드웨어 모듈을 제어하도록 구성된다.In some embodiments, the network interface device comprises at least one controller, wherein additional circuitry is configured to perform a first function with respect to a data packet during a compilation process for a first function to be performed in a hardware module, and wherein the at least one controller is configured to control the hardware module to initiate performance of the first function with respect to the data packet in response to completion of the compilation process.

몇몇 실시형태에서, 컴파일 프로세스는, 하드웨어 모듈에서 제어 메시지에 응답하는 제어 평면 인터페이스를 제공하기 위한 명령어를 제공하는 것을 포함한다.In some embodiments, the compilation process includes providing instructions for providing a control plane interface that responds to control messages from the hardware module.

다른 양태에 따르면, 컴퓨터 구현 방법이 제공되는데, 컴퓨터 구현 방법은: 복수의 프로세싱 유닛을 포함하는 구성 가능한 하드웨어 모듈의 적어도 일부에 대한 라우팅 정보를 결정하는 것을 포함하되, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 상기 라우팅 정보는 적어도 복수의 프로세싱 유닛 사이의 이용 가능한 루트에 관한 정보를 제공한다.In another aspect, a computer-implemented method is provided, comprising: determining routing information for at least a portion of a configurable hardware module comprising a plurality of processing units, each processing unit being associated with a predefined type of operation executable in a single step, at least some of the plurality of processing units being associated with a different predefined type of operation, and wherein the routing information provides information about available routes at least between the plurality of processing units.

구성 가능한 하드웨어 모듈은, 실질적으로 정적인 부분 및 실질적으로 동적인 부분을 포함할 수도 있는데, 상기 결정은 상기 실질적으로 동적인 부분에 대한 라우팅 정보를 결정하는 것을 포함한다.The configurable hardware module may include a substantially static portion and a substantially dynamic portion, wherein the determination includes determining routing information for the substantially dynamic portion.

상기 실질적으로 동적인 부분에 대한 라우팅 정보를 결정하는 것은, 상기 실질적으로 정적인 부분에서 프로세싱 유닛 중 하나 이상에 의해 사용되는 라우팅을 상기 실질적으로 동적인 부분에서 결정하는 것을 포함할 수도 있다.Determining routing information for the substantially dynamic portion may include determining, in the substantially dynamic portion, routing used by one or more of the processing units in the substantially static portion.

결정하는 것은 상기 라우팅 정보를 결정하기 위해 상기 구성 가능한 하드웨어 모듈의 적어도 일부의 비트 파일 디스크립션을 분석하는 것을 포함할 수도 있다.Determining may include analyzing a bit file description of at least some of the configurable hardware modules to determine said routing information.

다른 양태에 따르면, 복수의 프로세싱 유닛을 포함하는 구성 가능한 하드웨어 모듈의 적어도 일부에 대한 라우팅 정보를 결정하기 위한: 프로그램 명령어를 포함하는 비일시적 컴퓨터 판독 가능 매체가 제공되는데, 각각의 프로세싱 유닛은 단일의 단계에서 실행 가능한 미리 정의된 타입의 동작과 관련되고, 상기 복수의 프로세싱 유닛 중 적어도 일부는 상이한 미리 정의된 타입의 동작과 관련되고, 상기 라우팅 정보는 적어도 복수의 프로세싱 유닛 사이의 이용 가능한 루트에 관한 정보를 제공한다.In another aspect, a non-transitory computer-readable medium is provided comprising program instructions for determining routing information for at least a portion of a configurable hardware module comprising a plurality of processing units, each processing unit being associated with a predefined type of operation executable in a single step, at least some of the plurality of processing units being associated with different predefined types of operations, and wherein the routing information provides information about available routes at least between the plurality of processing units.

방법(들)을 수행하도록 적응되는 프로그램 코드 수단을 포함하는 컴퓨터 프로그램이 또한 제공될 수도 있다. 컴퓨터 프로그램은 캐리어 매체에 의해 저장될 수도 있고 및/또는 다르게는 구현될 수도 있다.A computer program comprising program code means adapted to perform the method(s) may also be provided. The computer program may be stored on and/or otherwise embodied in a carrier medium.

상기에서, 많은 상이한 실시형태가 설명되었다. 추가적인 실시형태는 상기에서 설명되는 실시형태 중 임의의 두 개 이상의 조합에 의해 제공될 수도 있다는 것이 인식되어야 한다.Above, many different embodiments have been described. It should be recognized that additional embodiments may be provided by combinations of any two or more of the embodiments described above.

다양한 다른 양태 및 추가적인 실시형태가 다음의 상세한 설명에서 그리고 첨부된 청구범위에서 또한 설명된다.Various other aspects and additional embodiments are also described in the following detailed description and in the appended claims.

이제, 몇몇 실시형태가 첨부의 도면을 참조하여 단지 예로서 설명될 것인데, 첨부의 도면에서:
도 1은 네트워크에 커플링되는 데이터 프로세싱 시스템의 개략도를 도시한다;
도 2는 호스트 컴퓨팅 디바이스 상에서 유저 모드에서 실행되도록 구성되는 필터링 동작 애플리케이션을 포함하는 데이터 프로세싱 시스템의 개략도를 도시한다;
도 3은 호스트 컴퓨팅 디바이스 상에서 커널 모드에서 실행되도록 구성되는 필터링 동작을 포함하는 데이터 프로세싱 시스템의 개략도를 도시한다;
도 4는 데이터 패킷과 관련하여 기능을 수행하기 위한 복수의 CPU를 포함하는 네트워크 인터페이스 디바이스의 개략도를 도시한다;
도 5는 데이터 패킷과 관련하여 기능을 수행하기 위한 애플리케이션을 실행하는 필드 프로그래머블 게이트 어레이를 포함하는 네트워크 인터페이스 디바이스의 개략도를 도시한다;
도 6은 데이터 패킷과 관련하여 기능을 수행하기 위한 하드웨어 모듈을 포함하는 네트워크 인터페이스 디바이스의 개략도를 도시한다;
도 7은 데이터 패킷과 관련하여 기능을 수행하기 위한 적어도 하나의 프로세싱 유닛 및 필드 프로그래머블 게이트 어레이를 포함하는 네트워크 인터페이스 디바이스의 개략도를 도시한다;
도 8은 몇몇 실시형태에 따른 네트워크 인터페이스 디바이스에서 구현되는 방법을 예시한다;
도 9는 몇몇 실시형태에 따른 네트워크 인터페이스 디바이스에서 구현되는 방법을 예시한다;
도 10은 일련의 프로그램에 의해 데이터 패킷을 프로세싱하는 예를 예시한다;
도 11은 복수의 프로세싱 유닛에 의해 데이터 패킷을 프로세싱하는 예를 예시한다;
도 12는 복수의 프로세싱 유닛에 의해 데이터 패킷을 프로세싱하는 예를 예시한다;
도 13은 데이터 패킷을 프로세싱하기 위한 프로세싱 스테이지의 파이프라인의 예를 예시한다;
도 14는 복수의 플러그형(pluggable) 컴포넌트를 갖는 슬라이스 아키텍쳐의 예를 예시한다;
도 15는 복수의 프로세싱 유닛의 프로세싱의 배열 및 순서의 예시적인 표현을 예시한다;
도 16은 기능을 컴파일하는 예시적인 방법을 예시한다;
도 17은 상태 보존형(stateful) 프로세싱 유닛의 예를 예시한다;
도 18은 상태 비보존형(stateless) 프로세싱 유닛의 예를 예시한다;
도 19는 몇몇 실시형태의 방법을 도시한다;
도 20a 및 도 20b는 FPGA에서 슬라이스 사이의 라우팅을 예시한다; 그리고
도 21은 FGPA 상의 파티션을 개략적으로 예시한다.Now, some embodiments will be described by way of example only with reference to the attached drawings, in which:
Figure 1 illustrates a schematic diagram of a data processing system coupled to a network;
FIG. 2 illustrates a schematic diagram of a data processing system including a filtering operation application configured to run in user mode on a host computing device;
FIG. 3 illustrates a schematic diagram of a data processing system including filtering operations configured to run in kernel mode on a host computing device;
FIG. 4 illustrates a schematic diagram of a network interface device including multiple CPUs for performing functions in relation to data packets;
FIG. 5 illustrates a schematic diagram of a network interface device including a field programmable gate array executing an application to perform functions in relation to data packets;
FIG. 6 illustrates a schematic diagram of a network interface device including hardware modules for performing functions in relation to data packets;
FIG. 7 illustrates a schematic diagram of a network interface device including at least one processing unit and a field programmable gate array for performing a function in relation to a data packet;
FIG. 8 illustrates a method implemented in a network interface device according to some embodiments;
FIG. 9 illustrates a method implemented in a network interface device according to some embodiments;
Figure 10 illustrates an example of processing data packets by a series of programs;
Figure 11 illustrates an example of processing a data packet by multiple processing units;
Figure 12 illustrates an example of processing a data packet by multiple processing units;
Figure 13 illustrates an example of a pipeline of processing stages for processing data packets;
Figure 14 illustrates an example of a slice architecture having multiple pluggable components;
Figure 15 illustrates an exemplary representation of the arrangement and order of processing of multiple processing units;
Figure 16 illustrates an exemplary method for compiling a function;
Figure 17 illustrates an example of a stateful processing unit;
Figure 18 illustrates an example of a stateless processing unit;
Figure 19 illustrates a method of some embodiments;
Figures 20a and 20b illustrate routing between slices in an FPGA; and
Figure 21 schematically illustrates a partition on FGPA.

다음의 설명은, 임의의 기술 분야의 숙련된 자가 본 발명을 만들고 사용하는 것을 가능하게 하기 위해 제공되며, 특정한 애플리케이션의 맥락에서 제공된다. 개시된 실시형태에 대한 다양한 수정이 기술 분야의 숙련된 자에는 쉽게 명백할 것이다.The following description is provided to enable any person skilled in the art to make and use the present invention, and is presented in the context of a specific application. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art.

본원에서 정의되는 일반적인 원리는, 본 발명의 취지 및 범위를 벗어나지 않으면서 다른 실시형태 및 애플리케이션에 적용될 수도 있다. 따라서, 본 발명은 도시되는 실시형태로 제한되도록 의도되는 것이 아니라, 본원에서 개시되는 원리 및 피쳐와 일치하는 최광의의 범위를 제공받아야 한다.The general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Therefore, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

데이터가 네트워크와 같은 데이터 채널을 통해 두 개의 데이터 프로세싱 시스템 사이에서 전송되어야 하는 경우, 데이터 프로세싱 시스템의 각각은, 그 각각이 채널을 통해 통신하는 것을 허용하기 위한 적절한 네트워크 인터페이스를 구비한다. 종종, 네트워크는 이더넷(Ethernet) 기술에 기초한다. 네트워크를 통해 통신해야 하는 데이터 프로세싱 시스템은, 네트워크 프로토콜의 물리적 및 논리적 요건을 지원할 수 있는 네트워크 인터페이스를 구비해야 한다. 네트워크 인터페이스의 물리적 하드웨어 컴포넌트를 네트워크 인터페이스 디바이스 또는 네트워크 인터페이스 카드(network interface card; NIC)로 지칭된다.When data must be transferred between two data processing systems over a data channel, such as a network, each data processing system must be equipped with an appropriate network interface to allow them to communicate over the channel. Often, the network is based on Ethernet technology. Data processing systems that must communicate over a network must be equipped with a network interface capable of supporting the physical and logical requirements of the network protocol. The physical hardware component of the network interface is referred to as a network interface device or network interface card (NIC).

대부분의 컴퓨터 시스템은 오퍼레이팅 시스템(OS)을 포함하는데, 유저 레벨 애플리케이션은 그 오퍼레이팅 시스템(OS)을 통해 네트워크와 통신한다. 커널로 공지되어 있는 오퍼레이팅 시스템의 일부는, 애플리케이션과 네트워크 인터페이스 디바이스에 고유한 디바이스 드라이버 사이에서 커맨드 및 데이터를 변환하기 위한 프로토콜 스택을 포함한다. 디바이스 드라이버는 네트워크 인터페이스 디바이스를 직접적으로 제어할 수도 있다. 오퍼레이팅 시스템 커널에서 이들 기능을 제공하는 것에 의해, 네트워크 인터페이스 디바이스 사이의 복잡도 및 차이점이 유저 레벨 애플리케이션에서는 숨겨질 수 있다. 네트워크 하드웨어 및 다른 시스템 리소스(예컨대, 메모리)는 많은 애플리케이션에 의해 안전하게 공유될 수도 있고, 시스템은 결함이 있는 또는 악의적인 애플리케이션에 대해 보호될 수 있다.Most computer systems include an operating system (OS), through which user-level applications communicate with the network. The portion of the operating system known as the kernel contains a protocol stack that translates commands and data between applications and the device drivers specific to network interface devices. Device drivers may also directly control network interface devices. By providing these functions in the operating system kernel, the complexity and differences between network interface devices can be hidden from user-level applications. Network hardware and other system resources (e.g., memory) can be safely shared among many applications, and the system can be protected against flawed or malicious applications.

네트워크를 거쳐 송신을 실행하기 위한 통상적인 데이터 프로세싱 시스템(100)이 도 1에서 도시되어 있다. 데이터 프로세싱 시스템(100)은, 호스트를 네트워크(103)에 인터페이싱하도록 배열되는 네트워크 인터페이스 디바이스(102)에 커플링되는 호스트 컴퓨팅 디바이스(101)를 포함한다. 호스트 컴퓨팅 디바이스(101)는 하나 이상의 유저 레벨 애플리케이션(105)을 지원하는 오퍼레이팅 시스템(104)을 포함한다. 호스트 컴퓨팅 디바이스(101)는 또한 네트워크 프로토콜 스택(도시되지 않음)을 포함할 수도 있다. 예를 들면, 프로토콜 스택은 애플리케이션의 컴포넌트일 수도 있거나, 애플리케이션이 링크되는 라이브러리일 수도 있거나, 또는 오퍼레이팅 시스템에 의해 제공될 수도 있다. 몇몇 실시형태에서, 하나보다 더 많은 프로토콜 스택이 제공될 수도 있다.A typical data processing system (100) for performing transmission over a network is illustrated in FIG. 1. The data processing system (100) includes a host computing device (101) coupled to a network interface device (102) arranged to interface the host to a network (103). The host computing device (101) includes an operating system (104) that supports one or more user-level applications (105). The host computing device (101) may also include a network protocol stack (not shown). For example, the protocol stack may be a component of an application, a library to which an application is linked, or provided by the operating system. In some embodiments, more than one protocol stack may be provided.

네트워크 프로토콜 스택은 송신 제어 프로토콜(Transmission Control Protocol; TCP) 스택일 수도 있다. 애플리케이션(105)은, 소켓을 개방하는 것 및 소켓으로 데이터를 기록하고 그로부터 데이터를 판독하는 것에 의해 TCP/IP 메시지를 전송 및 수신할 수 있고, 오퍼레이팅 시스템(104)은 메시지로 하여금 네트워크를 거쳐 전송되게 한다. 예를 들면, 애플리케이션은 소켓을 통한 그리고 그 다음 오퍼레이팅 시스템(104)을 통한 네트워크(103)로의 데이터의 송신을 위해 시스템 호출(syscall)을 호출할 수 있다. 메시지를 송신하기 위한 이 인터페이스를 메시지 전달 인터페이스로서 공지되어 있을 수도 있다.The network protocol stack may be a Transmission Control Protocol (TCP) stack. An application (105) can send and receive TCP/IP messages by opening a socket, writing data to the socket, and reading data from the socket, and the operating system (104) causes the message to be transmitted over the network. For example, the application may invoke a system call (syscall) to transmit data to the network (103) through the socket and then through the operating system (104). This interface for sending messages may be known as a message passing interface.

호스트(101)에서 스택을 구현하는 대신, 몇몇 시스템은 프로토콜 스택을 네트워크 인터페이스 디바이스(102)로 오프로딩한다. 예를 들면, 스택이 TCP 스택인 경우, 네트워크 인터페이스 디바이스(102)는 TCP 프로토콜 프로세싱을 수행하기 위한 TCP 오프로드 엔진(TCP Offload Engine; TOE)을 포함할 수도 있다. 프로토콜 프로세싱을, 호스트 컴퓨팅 디바이스(101)에서 수행하는 대신, 네트워크 인터페이스 디바이스(102)에서 수행하는 것에 의해, 호스트 시스템의 프로세서(101)에 대한 요구가 감소될 수도 있다. 네트워크를 통해 송신될 데이터는, 커널 TCP/IP 스택을 부분적으로 또는 전체적으로 바이패스하는 것에 의해, TOE 대응 가상 인터페이스 드라이버(TOE-enabled virtual interface driver)를 통해 애플리케이션(105)에 의해 전송될 수도 있다. 따라서, 이 빠른 경로를 따라 전송되는 데이터는 TOE 드라이버의 요건을 충족하도록 포맷되기만 하면 된다.Instead of implementing the stack on the host (101), some systems offload the protocol stack to the network interface device (102). For example, if the stack is a TCP stack, the network interface device (102) may include a TCP Offload Engine (TOE) to perform TCP protocol processing. By performing protocol processing on the network interface device (102) instead of on the host computing device (101), the demands on the host system's processor (101) may be reduced. Data to be transmitted over the network may be transmitted by the application (105) via a TOE-enabled virtual interface driver, partially or completely bypassing the kernel TCP/IP stack. Therefore, data transmitted along this fast path only needs to be formatted to meet the requirements of the TOE driver.

호스트 컴퓨팅 디바이스(101)는 하나 이상의 프로세서 및 하나 이상의 메모리를 포함할 수도 있다. 몇몇 실시형태에서, 호스트 컴퓨팅 디바이스(101) 및 네트워크 인터페이스 디바이스(102)는 버스, 예를 들면, 주변장치 컴포넌트 인터커넥트 익스프레스(peripheral component interconnect express)(PCIe 버스)를 통해 통신할 수도 있다.The host computing device (101) may include one or more processors and one or more memories. In some embodiments, the host computing device (101) and the network interface device (102) may communicate via a bus, for example, a peripheral component interconnect express (PCIe bus).

데이터 프로세싱 시스템의 동작 동안, 네트워크 상으로 송신될 데이터는, 송신을 위해 호스트 컴퓨팅 디바이스(101)로부터 네트워크 인터페이스 디바이스(102)로 전송될 수도 있다. 하나의 예에서, 데이터 패킷은 호스트 프로세서에 의해 호스트로부터 네트워크 인터페이스 디바이스로 직접적으로 전송될 수도 있다. 호스트는 네트워크 인터페이스 디바이스(102) 상에 위치되는 하나 이상의 버퍼(106)에 데이터를 제공할 수도 있다. 네트워크 인터페이스 디바이스(102)는, 그 다음, 데이터 패킷을 준비할 수도 있고 그들을 네트워크(103)를 통해 송신할 수도 있다.During operation of the data processing system, data to be transmitted over a network may be transmitted from a host computing device (101) to a network interface device (102) for transmission. In one example, data packets may be transmitted directly from the host to the network interface device by the host processor. The host may provide data to one or more buffers (106) located on the network interface device (102). The network interface device (102) may then prepare the data packets and transmit them over the network (103).

대안적으로, 데이터는 호스트 시스템(101) 내의 버퍼(107)에 기록될 수도 있다. 그 다음, 데이터는 네트워크 인터페이스 디바이스에 의해 버퍼(107)로부터 검색될 수도 있고 네트워크(103)를 통해 송신될 수도 있다.Alternatively, the data may be written to a buffer (107) within the host system (101). The data may then be retrieved from the buffer (107) by a network interface device or transmitted over a network (103).

이들 경우 둘 모두에서, 데이터는 네트워크를 통한 송신 이전에 하나 이상의 버퍼에 일시적으로 저장된다. 네트워크를 통해 전송되는 데이터는 (룩백(lookback)에서) 호스트로 반환될 수 있다.In both of these cases, data is temporarily stored in one or more buffers before being transmitted over the network. Data transmitted over the network can be returned to the host (in a lookback).

데이터 패킷이 네트워크(103)를 통해 전송되고 그로부터 수신되는 경우, 네트워크를 통해 송신될 데이터 패킷 또는 네트워크로부터 수신되는 데이터 패킷 중 어느 하나인 데이터 패킷에 대한 동작으로서 표현될 수 있는 많은 프로세싱 작업이 존재한다. 예를 들면, 분산형 서비스 거부(distributed denial of service; DDOS) 필터링으로부터 호스트 시스템(101)을 보호하기 위해, 수신된 데이터 패킷에 대해 필터링 프로세스가 수행될 수도 있다. 그러한 필터링 프로세스는 단순 팩 검사(simple pack examination) 또는 확장된 버클리 패킷 필터(extended Berkley packet filter; eBPF)에 의해 수행될 수도 있다. 다른 예로서, 네트워크(103)를 통해 송신될 데이터 패킷에 대해 캡슐화 및 포워딩이 수행될 수도 있다. 이들 프로세스는 많은 CPU 사이클을 소비할 수도 있고 종래의 OS 아키텍쳐에 대한 부담이 될 수도 있다.When a data packet is transmitted over a network (103) and received therefrom, there are many processing operations that can be expressed as operations on the data packet, either the data packet to be transmitted over the network or the data packet received from the network. For example, to protect the host system (101) from distributed denial of service (DDOS) filtering, a filtering process may be performed on the received data packet. Such a filtering process may be performed by simple pack examination or an extended Berkley packet filter (eBPF). As another example, encapsulation and forwarding may be performed on the data packet to be transmitted over the network (103). These processes may consume a lot of CPU cycles and may be a burden on conventional OS architectures.

필터링 동작 또는 다른 패킷 프로세싱 동작이 호스트 시스템(220)에서 구현될 수도 있는 한 가지 방식을 예시하는 도 2에 대한 참조가 이루어진다. 호스트 시스템(220)에 의해 수행되는 프로세스는 유저 공간 또는 커널 공간 중 어느 하나에서 수행되는 것으로 도시된다. 네트워크 인터페이스 디바이스(210)에서 네트워크로부터 수신되는 데이터 패킷을 종단 애플리케이션(terminating application; 250)으로 전달하기 위한 수신 경로가 커널 공간에서 존재한다. 이 수신 경로는 드라이버(235), 프로토콜 스택(240), 및 소켓(245)을 포함한다. 필터링 동작(230)은 유저 공간에서 구현된다. 네트워크 인터페이스 디바이스(210)에 의해 호스트 시스템(220)에 제공되는 착신(incoming) 패킷은 커널(프로토콜 프로세싱이 발생하는 곳)을 우회하고 필터링 동작(230)으로 직접적으로 제공된다.Reference is made to FIG. 2, which illustrates one way in which a filtering operation or other packet processing operation may be implemented in a host system (220). The processing performed by the host system (220) is depicted as being performed in either user space or kernel space. A receive path exists in kernel space for forwarding data packets received from a network at the network interface device (210) to a terminating application (250). This receive path includes a driver (235), a protocol stack (240), and a socket (245). The filtering operation (230) is implemented in user space. Incoming packets provided to the host system (220) by the network interface device (210) bypass the kernel (where protocol processing occurs) and are provided directly to the filtering operation (230).

필터링 동작(230)은, 데이터 패킷을 호스트 시스템(220) 내의 다른 엘리먼트와 교환하기 위한 가상 인터페이스(이것은 에테르 패브릭 가상 인터페이스(ether fabric virtual interface; EFVI) 또는 데이터 평면 개발 키트(data plane development kit; DPDK) 또는 임의의 다른 적절한 인터페이스일 수도 있음)를 제공받을 수도 있다. 필터링 동작(230)은 DDOS 스크러빙 및/또는 다른 형태의 필터링을 수행할 수도 있다. DDOS 후보로서 쉽게 인식되는 모든 패킷 - 예를 들면, 샘플 패킷, 패킷의 사본, 및 아직 분류되지 않은 패킷 - 에 대해 DDOS 스크러빙 프로세스가 실행될 수도 있다. 필터링 동작(230)으로 전달되지 않는 패킷은, 네트워크 인터페이스로부터 드라이버(235)로 직접적으로 전달될 수도 있다. 동작(230)은 필터링을 수행하기 위한 확장된 버클리 패킷 필터(extended Berkeley packet filter; eBPF)를 제공할 수도 있다. 수신된 패킷이 동작(230)에 의해 제공되는 필터링을 통과하면, 동작(230)은 수신된 패킷을 프로세싱하기 위해 커널 내의 수신 경로에 패킷을 재주입하도록 구성된다. 구체적으로, 패킷은 드라이버(235) 또는 스택(240)으로 제공된다. 그 다음, 패킷은 프로토콜 스택(240)에 의해 프로토콜 프로세싱된다. 그 다음, 패킷은 종단 애플리케이션(250)과 관련되는 소켓(245)으로 전달된다. 종단 애플리케이션(250)은 관련된 소켓의 버퍼로부터 데이터 패킷을 검색하기 위해 recv() 호출을 발행한다.The filtering operation (230) may be provided with a virtual interface (which may be an ether fabric virtual interface (EFVI) or a data plane development kit (DPDK) or any other suitable interface) for exchanging data packets with other elements within the host system (220). The filtering operation (230) may perform DDOS scrubbing and/or other forms of filtering. A DDOS scrubbing process may be performed on all packets that are easily recognized as DDOS candidates, such as sample packets, copies of packets, and packets that have not yet been classified. Packets that are not passed to the filtering operation (230) may be passed directly from the network interface to the driver (235). The operation (230) may provide an extended Berkeley packet filter (eBPF) to perform the filtering. If a received packet passes the filtering provided by operation (230), operation (230) is configured to reinject the packet into a receive path within the kernel for processing the received packet. Specifically, the packet is provided to the driver (235) or stack (240). The packet is then subjected to protocol processing by the protocol stack (240). The packet is then delivered to a socket (245) associated with an end application (250). The end application (250) issues a recv() call to retrieve the data packet from the buffer of the associated socket.

그러나, 이 접근법에는 여러 가지 문제가 있다. 먼저, 필터링 동작(230)은 호스트 CPU 상에서 실행된다. 필터링(230)을 실행하기 위해, 호스트 CPU는, 데이터 패킷을, 그들이 네트워크로부터 수신되는 레이트에서 프로세싱해야 한다. 데이터가 네트워크로 전송되고 그로부터 수신되는 레이트가 높은 경우, 이것은 호스트 CPU의 프로세싱 리소스에 대한 큰 낭비를 구성할 수 있다. 필터링 동작(230)에 대한 높은 데이터 유량은 다른 제한된 리소스 - 예컨대 I/O 대역폭 및 내부 메모리/캐시 대역폭 - 의 대량의 소비를 초래할 수도 있다.However, this approach has several problems. First, the filtering operation (230) is executed on the host CPU. To perform the filtering (230), the host CPU must process data packets at the rate at which they are received from the network. If the rate at which data is transmitted to and received from the network is high, this can constitute a significant waste of the host CPU's processing resources. The high data flow for the filtering operation (230) can also result in a large consumption of other limited resources, such as I/O bandwidth and internal memory/cache bandwidth.

커널로의 데이터 패킷의 재주입을 수행하기 위해서는, 필터링 동작(230)에 재주입을 수행하기 위한 특권이 있는 API를 제공하는 것이 필요하다. 재주입 프로세스는 패킷 순서에 대한 주의를 필요하여 번거로울 수도 있다. 재주입을 수행하기 위해, 동작(230)은 많은 경우에 전용 CPU 코어를 필요로 할 수도 있다.To perform reinjection of data packets into the kernel, it is necessary to provide a privileged API for performing reinjection in the filtering operation (230). The reinjection process can be cumbersome, requiring attention to packet ordering. In many cases, the operation (230) may require a dedicated CPU core to perform reinjection.

동작에 데이터를 제공하고 재주입하는 단계는 데이터가 메모리로 복사되거나 또는 메모리로부터 복사되는 것을 필요로 한다. 이 복사는 시스템에 대한 리소스 부담이 된다.Providing and reinjecting data into an operation requires the data to be copied into or out of memory. This copying places a resource burden on the system.

네트워크를 통해 전송/수신될 데이터에 대한 필터링 이외의 다른 타입의 동작을 제공할 때, 유사한 문제가 발생할 수도 있다.Similar problems may arise when providing other types of behavior other than filtering on data to be transmitted/received over a network.

몇몇 동작(예컨대, DPDK 타입 동작)은 프로세싱된 패킷을 네트워크로 다시 포워딩하는 것을 필요로 할 수도 있다.Some operations (e.g., DPDK type operations) may require the processed packet to be forwarded back to the network.

다른 접근법을 예시하는 도 3에 대한 참조가 이루어진다. 동일한 엘리먼트는 동일한 참조 번호를 사용하여 참조된다. 이 예에서, 익스프레스 데이터 경로(express data path; XDP)(310)로 공지되어 있는 추가적인 계층이 커널 내의 송신 및 수신 경로에 삽입된다. XDP(310)에 대한 확장은 송신 경로로의 삽입을 허용한다. XDP 헬퍼(helper)는 (수신 동작의 결과로서) 패킷이 송신되는 것을 허용한다. XDP(310)는 오퍼레이팅 시스템의 드라이버 레벨에서 삽입되고, 데이터 패킷이 스택(240)에 의해 프로토콜 프로세싱되기 이전에, 네트워크로부터 수신되는 데이터 패킷에 대해 동작을 수행하기 위해 프로그램이 이 레벨에서 실행되는 것을 허용한다. XDP(310)는 또한, 네트워크를 통해 전송될 데이터 패킷에 대해 동작을 수행하기 위해 프로그램이 이 레벨에서 실행되는 것을 허용한다. 따라서, eBPF 프로그램 및 다른 프로그램은 송신 및 수신 경로에서 동작할 수 있다.Reference is made to FIG. 3, which illustrates another approach. Like elements are referenced using like reference numbers. In this example, an additional layer known as the express data path (XDP) (310) is inserted into the transmit and receive paths within the kernel. Extensions to XDP (310) allow for insertion into the transmit path. An XDP helper allows packets to be transmitted (as a result of a receive operation). XDP (310) is inserted at the driver level of the operating system and allows programs to execute at this level to perform operations on data packets received from the network before the data packets are protocol processed by the stack (240). XDP (310) also allows programs to execute at this level to perform operations on data packets to be transmitted over the network. Thus, eBPF programs and other programs can operate on the transmit and receive paths.

도 3에서 예시되는 바와 같이, XDP(310)의 일부인 프로그램(330)을 형성하기 위해, 필터링 동작(320)은 유저 공간으로부터 XDP로 삽입될 수도 있다. 동작(320)은, 수신 경로 상의 패킷에 대해 필터링 동작(예를 들면, DDOS 스크러빙)을 수행하는 프로그램(330)을 제공하기 위해 데이터 수신 경로 상에서 실행될 XDP 제어 평면을 사용하여 삽입된다. 그러한 프로그램(330)은 eBPF 프로그램일 수도 있다.As illustrated in FIG. 3, a filtering operation (320) may be inserted from user space into the XDP to form a program (330) that is part of the XDP (310). The operation (320) is inserted using the XDP control plane to be executed on the data receive path to provide a program (330) that performs a filtering operation (e.g., DDOS scrubbing) on packets on the receive path. Such a program (330) may be an eBPF program.

프로그램(330)은 드라이버(235)와 프로토콜 스택(240) 사이의 커널에 삽입되는 것으로 도시되어 있다. 그러나, 다른 예에서, 프로그램(330)은 커널의 수신 경로 내의 다른 지점에서 삽입될 수도 있다. 프로그램(330)은 데이터 패킷을 수신하는 별개의 제어 경로의 일부일 수도 있다. 프로그램(330)은, 그 애플리케이션에 대한 소켓(245)의 애플리케이션 프로그래밍 인터페이스(application programming interface; API)에 대한 확장을 제공하는 것에 의해 애플리케이션에 의해 제공될 수도 있다.The program (330) is depicted as being inserted into the kernel between the driver (235) and the protocol stack (240). However, in other examples, the program (330) may be inserted at another point within the kernel's receive path. The program (330) may also be part of a separate control path that receives data packets. The program (330) may also be provided by the application by providing an extension to the application programming interface (API) of the socket (245) for that application.

이 프로그램(330)은, 추가적으로 또는 대안적으로, 송신 경로를 통해 전송되고 있는 데이터에 대해 하나 이상의 동작을 수행할 수도 있다. 그 다음, XDP(310)는, 네트워크 인터페이스 디바이스(210)를 통해 네트워크를 통해 데이터를 전송하기 위해, 드라이버(235)의 송신 기능을 호출한다. 이 경우, 프로그램(330)은 네트워크를 통해 전송될 데이터 패킷과 관련하여 부하 분산 또는 라우팅 동작을 제공할 수도 있다. 프로그램(330)은 네트워크를 통해 전송될 데이터 패킷과 관련하여 세그먼트 재캡슐화 및 포워딩 동작을 제공할 수도 있다.The program (330) may additionally or alternatively perform one or more operations on data being transmitted over the transmission path. The XDP (310) then calls the transmission function of the driver (235) to transmit data over the network via the network interface device (210). In this case, the program (330) may provide load balancing or routing operations with respect to the data packets to be transmitted over the network. The program (330) may also provide segment recapsulation and forwarding operations with respect to the data packets to be transmitted over the network.

프로그램(330)은 방화벽 및 가상 스위칭 또는 프로토콜 종료 또는 애플리케이션 프로세싱을 필요로 하지 않는 다른 동작을 위해 사용될 수도 있다.The program (330) may also be used for firewalls and other operations that do not require virtual switching or protocol termination or application processing.

이러한 방식의 XDP(310)의 사용의 한 가지 이점은, 프로그램(330)이 중간 사본 없이, 드라이버에 의해 핸들링되는 메모리 버퍼에 직접적으로 액세스할 수 있다는 것이다.One advantage of using XDP (310) in this manner is that the program (330) can directly access the memory buffers handled by the driver, without intermediate copies.

이러한 방식으로 커널에서의 동작을 위해 프로그램(330)을 삽입하기 위해서는, 프로그램(330)이 안전하다는 것을 보장하는 것이 필요하다. 안전하지 않은 프로그램이 커널에 삽입되면, 이것은 다음과 같은 소정의 위험을 제시한다: 커널을 망가뜨릴 수 있는 무한 루프; 버퍼 오버 플로우, 초기화되지 않는 변수, 컴파일러 에러, 대형 프로그램에 의해 야기되는 성능 문제.In order to insert a program (330) into the kernel in this manner, it is necessary to ensure that the program (330) is secure. If an unsecure program is inserted into the kernel, it presents certain risks, including: infinite loops that can crash the kernel; buffer overflows, uninitialized variables, compiler errors, and performance issues caused by large programs.

이러한 방식으로 XDP(310)로의 삽입 이전에 프로그램(330)이 안전하다는 것을 보장하기 위해, 프로그램(330)의 안전성을 검증하기 위해 호스트 시스템(220) 상에서 검증자(verifier)가 실행될 수도 있다. 검증자는, 어떠한 루프도 존재하지 않는다는 것을 보장하도록 구성될 수도 있다. 루프를 야기하지 않는다면 역방향 점프 동작이 허용될 수도 있다. 검증자는, 프로그램(330)이 미리 정의된 수(예를 들면, 4000) 이하의 명령어를 갖는다는 것을 보장하도록 구성될 수도 있다. 검증자는 프로그램(330)의 데이터 경로를 통해 통과하는 것에 의해 레지스터 사용의 유효성에 대한 체크를 수행할 수도 있다. 가능한 경로가 너무 많으면, 프로그램(330)은 커널 모드에서 실행되기에 안전하지 않은 것으로 거부될 것이다. 예를 들면, 1000 개보다 더 많은 분기가 있는 경우, 프로그램(330)은 거부될 수도 있다.In order to ensure that the program (330) is safe prior to insertion into the XDP (310) in this manner, a verifier may be run on the host system (220) to verify the safety of the program (330). The verifier may be configured to ensure that no loops exist. Backward jump operations may be allowed if they do not cause loops. The verifier may be configured to ensure that the program (330) has no more than a predefined number of instructions (e.g., 4000). The verifier may also perform checks on the validity of register usage by traversing the data path of the program (330). If there are too many possible paths, the program (330) will be rejected as unsafe to run in kernel mode. For example, if there are more than 1000 branches, the program (330) may be rejected.

XDP - 이것에 의해 안전한 프로그램(330)은 커널에 설치될 수도 있음 - 가 하나의 예이다는 것, 및 이것이 달성될 수 있는 다른 방식이 존재한다는 것이 기술 분야의 숙련된 자에 의해 인식될 것이다.It will be recognized by those skilled in the art that XDP - by means of which a secure program (330) may be installed into the kernel - is one example, and that there are other ways in which this can be achieved.

예를 들면, 커널에서 코드를 실행하는 데 필요한 안전한(또는 샌드박스형(sandboxed)) 언어로 동작이 표현될 수 있다면, 도 3과 관련하여 상기에서 논의되는 접근법은, 도 2와 관련하여 상기에서 논의되는 접근법만큼 효율적일 수도 있다. eBPF 언어는 x86 프로세서 상에서 효율적으로 실행될 수 있고 JIT(Just in Time; 적시) 컴파일 기술은 eBPF 프로그램이 네이티브 머신 코드(native machine code)로 컴파일되는 것을 가능하게 한다. 언어는 안전하도록 설계된다, 예를 들면, 상태는, 공유된 데이터 구조(예컨대, 해시 테이블)인 구성만을 매핑하도록 제한된다. 제한된 루핑이 허용되고, 대신 하나의 eBPF 프로그램이 다른 프로그램을 테일콜(tail-call)하도록 허용된다. 상태 공간은 제한된다.For example, if the operations required to execute code in the kernel can be expressed in a safe (or sandboxed) language, the approach discussed above with respect to FIG. 3 may be just as efficient as the approach discussed above with respect to FIG. 2 . The eBPF language can run efficiently on x86 processors, and Just in Time (JIT) compilation technology enables eBPF programs to be compiled into native machine code. The language is designed to be safe, e.g., state is restricted to mapping only constructs that are shared data structures (e.g., hash tables). Limited looping is allowed, and instead, one eBPF program is allowed to tail-call another program. The state space is restricted.

그러나, 몇몇 구현예에서, 이러한 접근법에서 호스트 시스템(220)의 리소스(예를 들면, I/O 대역폭 및 내부 메모리/캐시 대역폭, 호스트 CPU)에 대한 큰 낭비가 있을 수도 있다. 데이터 패킷에 대한 동작은, 데이터가 전송/수신되고 있는 레이트에서 그러한 동작을 수행하도록 요구받는 호스트 CPU에 의해 여전히 수행되고 있다.However, in some implementations, this approach may result in a significant waste of resources (e.g., I/O bandwidth and internal memory/cache bandwidth, host CPU) of the host system (220). Operations on data packets are still performed by the host CPU, which is required to perform such operations at the rate at which data is being transmitted/received.

다른 제안은 상기 논의된 동작을, 호스트 시스템에서 수행하는 대신, 네트워크 인터페이스 디바이스에서 수행하는 것이다. 그렇게 하는 것은, 소비되는 I/O 대역폭, 메모리 및 캐시 대역폭 외에도, 동작을 실행할 때 호스트 CPU에서 사용되는 CPU 사이클을 확보할 수도 있다. 프로세싱 동작의 실행을 호스트로부터 네트워크 인터페이스 디바이스의 하드웨어로 이동하는 것은 어떤 도전 과제를 제시할 수도 있다.Another suggestion is to perform the operations discussed above on the network interface device instead of on the host system. Doing so would free up CPU cycles used by the host CPU when executing the operations, in addition to the I/O bandwidth, memory, and cache bandwidth consumed. Moving the processing operations from the host to the hardware of the network interface device can present certain challenges.

네트워크 하드웨어에서 프로세싱을 구현하기 위한 한 가지 제안은, 패킷 프로세싱 및/또는 조작 동작에 대해 특화되는, 복수의 CPU를 포함하는 네트워크 프로세싱 유닛(network processing unit; NPU)을 네트워크 인터페이스 디바이스에서 제공하는 것이다.One proposal for implementing processing in network hardware is to provide a network processing unit (NPU) in the network interface device, which includes multiple CPUs specialized for packet processing and/or manipulation operations.

중앙 프로세싱 유닛(central processing unit; CPU), 예를 들면, CPU(420)의 어레이(410)를 포함하는 네트워크 인터페이스 디바이스(400)의 예를 예시하는 도 4에 대한 참조가 이루어진다. CPU는 네트워크로 전송되고 그로부터 수신되는 데이터 패킷을 필터링하는 것과 같은 기능을 수행하도록 구성된다. CPU의 어레이(410)의 각각의 CPU는 NPU일 수도 있다. 도 4에서 도시되지는 않지만, CPU는, 추가적으로 또는 대안적으로, 네트워크를 통한 송신을 위해 호스트로부터 수신되는 데이터 패킷에 대해 부하 밸런싱과 같은 동작을 수행하도록 구성될 수도 있다. 이들 CPU는 그러한 패킷 프로세싱/조작 동작에 대해 특화되어 있다. CPU는 그러한 패킷 프로세싱/조작 동작에 대해 최적화되는 명령어 세트를 실행한다.Reference is made to FIG. 4, which illustrates an example of a network interface device (400) comprising an array (410) of central processing units (CPUs), for example, CPUs (420). The CPUs are configured to perform functions such as filtering data packets transmitted to and received from a network. Each CPU in the array (410) of CPUs may be an NPU. Although not shown in FIG. 4, the CPUs may additionally or alternatively be configured to perform operations such as load balancing on data packets received from a host for transmission over a network. These CPUs are specialized for such packet processing/manipulation operations. The CPUs execute an instruction set that is optimized for such packet processing/manipulation operations.

네트워크 인터페이스 디바이스(400)는, CPU의 어레이(410) 사이에서 공유되며 CPU의 어레이(410)가 액세스 가능한 메모리(도시되지 않음)를 추가적으로 포함한다.The network interface device (400) additionally includes memory (not shown) that is shared between the array of CPUs (410) and accessible to the array of CPUs (410).

네트워크 인터페이스 디바이스(400)는 네트워크 인터페이스 디바이스(400)를 네트워크와 인터페이싱하기 위한 네트워크 매체 액세스 제어(medium access control; MAC) 계층(430)을 포함한다. MAC 계층(430)은 네트워크로부터 데이터 패킷을 수신하고 네트워크를 통해 데이터 패킷을 전송하도록 구성된다.The network interface device (400) includes a network medium access control (MAC) layer (430) for interfacing the network interface device (400) with a network. The MAC layer (430) is configured to receive data packets from a network and transmit data packets through the network.

네트워크 인터페이스 디바이스(400)에서 수신되는 패킷에 대한 동작은 CPU를 통해 병렬화된다. 도시되는 바와 같이, 데이터 플로우가 MAC 계층(430)에서 수신되는 경우, 그것은 확산 기능부(440)로 전달되는데, 확산 기능부(440)는 플로우로부터 데이터 패킷을 추출하도록 그리고 그들을 NPU(410) 내의 복수의 CPU에 걸쳐 분배하여, CPU가 이들 데이터 패킷의 프로세싱, 예를 들면, 필터링을 수행하도록 구성된다. 확산 기능부(440)는, 수신된 데이터 패킷이 속하는 데이터 플로우를 식별하기 위해 수신된 데이터 패킷을 파싱할 수도 있다. 확산 기능부(440)는, 각각의 패킷에 대해, 그것이 속하는 데이터 플로우에서의 각각의 패킷의 위치의 표시를 생성한다. 표시는, 예를 들면, 태그일 수도 있다. 확산 기능부(440)는 각각의 패킷의 관련된 메타데이터에 각각의 표시를 추가한다. 각각의 데이터 패킷에 대한 관련된 메타데이터는 데이터 패킷에 부가될 수도 있다. 관련된 메타데이터는, 측대역 제어 정보(side-band control information)로서 확산 기능부(440)로 전달될 수 있다. 표시는, 데이터 패킷이 속하는 플로우에 의존하여 추가되고, 그 결과, 임의의 특정한 플로우에 대한 데이터 패킷의 순서는 재구성될 수도 있다.Operations on packets received at the network interface device (400) are parallelized via the CPU. As illustrated, when a data flow is received at the MAC layer (430), it is passed to the spreading function (440), which extracts data packets from the flow and distributes them across multiple CPUs within the NPU (410), such that the CPUs perform processing, for example, filtering, of these data packets. The spreading function (440) may parse the received data packets to identify the data flow to which the received data packets belong. The spreading function (440) generates, for each packet, an indication of the position of each packet in the data flow to which it belongs. The indication may be, for example, a tag. The spreading function (440) adds each indication to the associated metadata of each packet. The associated metadata for each data packet may be appended to the data packet. The relevant metadata may be transmitted to the spreading function unit (440) as side-band control information. The indication is added depending on the flow to which the data packet belongs, and as a result, the order of the data packets for any particular flow may be reconstructed.

복수의 CPU(410)에 의한 프로그래밍 이후, 데이터 패킷은, 그 다음, 데이터 플로우의 패킷을 호스트 인터페이스 계층(460)으로 전달하기 이전에 데이터 플로우의 패킷을 그들의 정확한 순서로 재정렬하는 재정렬 기능부(450)로 전달된다. 재정렬 기능부(450)는 데이터 패킷의 순서를 재구성하기 위해 플로우의 데이터 패킷 내의 표시(예를 들면, 태그)를 비교하는 것에 의해 플로우 내의 데이터 패킷을 재정렬할 수도 있다. 그 다음, 재정렬된 데이터 패킷은 호스트 인터페이스(460)를 통과하여 호스트 시스템(220)으로 전달된다.After programming by the multiple CPUs (410), the data packets are then passed to a reordering function (450) that reorders the packets of the data flow into their correct order before forwarding the packets of the data flow to the host interface layer (460). The reordering function (450) may reorder the data packets within the flow by comparing markings (e.g., tags) within the data packets of the flow to reconstruct the order of the data packets. The reordered data packets are then passed through the host interface (460) to the host system (220).

도 4가 네트워크로부터 수신되는 데이터 패킷에 대해서만 동작하는 CPU의 어레이(410)를 예시하지만, 유사한 원리(확산 및 재정렬을 포함함)가 네트워크를 통한 송신을 위해 호스트로부터 수신되는 데이터 패킷에 대해서도 수행될 수도 있는데, CPU의 어레이(410)는 호스트로부터 수신되는 이들 데이터 패킷에 대해 기능(예를 들면, 부하 밸런싱)을 수행한다.Although FIG. 4 illustrates an array of CPUs (410) that operates only on data packets received from a network, similar principles (including spreading and reordering) may also be performed on data packets received from a host for transmission over a network, with the array of CPUs (410) performing functions (e.g., load balancing) on these data packets received from the host.

CPU에 의해 실행되는 프로그램은, 도 3과 관련하여 상기에서 설명되는 예에서 호스트 CPU 상에서 실행될 프로그램의 컴파일된 또는 트랜스코딩된 버전일 수도 있다. 다시 말하면, 동작을 수행하기 위해 호스트 CPU 상에서 실행될 명령어 세트는, 네트워크 인터페이스(400) 내의 특수 CPU의 어레이의 각각의 CPU 상에서의 실행을 위해 변환된다.The program executed by the CPU may be a compiled or transcoded version of the program to be executed on the host CPU in the example described above with respect to FIG. 3. In other words, the set of instructions to be executed on the host CPU to perform an operation is converted for execution on each CPU in the array of special CPUs within the network interface (400).

CPU를 통한 병렬화를 달성하기 위해, 프로그램의 다수의 인스턴스가 컴파일되고 다수의 CPU 상에서 병렬로 실행된다. 프로그램의 각각의 인스턴스는 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷의 상이한 세트를 프로세싱하는 것을 담당할 수도 있다. 그러나, 각각의 개개의 데이터 패킷은, 그 데이터 패킷과 관련하여 프로그램의 기능을 제공할 때 단일의 CPU에 의해 프로세싱된다. 병렬 프로그램 실행의 전체적인 효과는 호스트 CPU 상에서의 단일의 프로그램(예를 들면, 프로그램(330))의 실행과 동일할 수도 있다.To achieve parallelism across CPUs, multiple instances of a program are compiled and executed in parallel on multiple CPUs. Each instance of the program may be responsible for processing a different set of data packets received from a network interface device. However, each individual data packet is processed by a single CPU when providing the program's functionality in relation to that data packet. The overall effect of parallel program execution may be the same as executing a single program (e.g., program (330)) on a host CPU.

특수 CPU 중 하나는 초당 5 천만 패킷 정도의 데이터 패킷을 프로세싱할 수도 있다. 이 동작 속도는 호스트 CPU의 동작 속도보다 더 낮을 수도 있다. 따라서, 호스트 CPU 상에서 동등한 프로그램을 실행하여 달성될 것과 동일한 성능을 달성하기 위해 병렬화가 사용될 수도 있다. 병렬화를 수행하기 위해, 데이터 패킷은 CPU에 걸쳐 분산되고, 그 다음, CPU에 의한 프로세싱 이후 재정렬된다. 재정렬 단계(450)와 함께 각각의 플로우의 데이터 패킷을 순서대로 프로세싱하는 요건은 병목 현상을 도입할 수도 있고, 메모리 리소스 오버헤드를 증가시킬 수도 있고, 디바이스의 이용 가능한 스루풋을 제한할 수도 있다. 이러한 요건 및 재정렬 단계(450)는, 프로세싱 스루풋이 네트워크 트래픽의 내용 및 병렬성이 적용될 수 있는 정도에 따라 변동될 수도 있기 때문에, 디바이스의 지터(jitter)를 증가시킬 수도 있다.One specialized CPU can process data packets at a rate of approximately 50 million packets per second. This rate may be lower than that of the host CPU. Therefore, parallelization may be used to achieve the same performance as would be achieved by running an equivalent program on the host CPU. To achieve parallelization, data packets are distributed across the CPUs and then reordered after processing by the CPUs. The requirement to process data packets of each flow sequentially, along with the reordering step (450), may introduce bottlenecks, increase memory resource overhead, and limit the available throughput of the device. This requirement and the reordering step (450) may also increase jitter on the device, as processing throughput may vary depending on the content of the network traffic and the degree of parallelism available.

그러한 특수 CPU의 사용의 한 가지 이점은 짧은 컴파일 시간일 수도 있다. 예를 들면, 그러한 CPU 상에서 1 초 이내에 실행되도록 필터링 애플리케이션을 컴파일하는 것이 가능할 수도 있다.One advantage of using such specialized CPUs may be shorter compilation times. For example, it may be possible to compile a filtering application to run on such a CPU in less than a second.

이 접근법이 더 높은 링크 속도로 확장될 때, CPU의 어레이의 사용에서 문제가 있을 수도 있다. 호스트 네트워크 인터페이스는, 가까운 미래에 테라비트/s 속도에 도달할 것을 요구받을 수도 있다. CPU의 그러한 어레이(410)를 이들 더 높은 속도로 확장할 때, 필요로 되는 전력의 양이 문제가 될 수 있다.As this approach scales to higher link speeds, the use of arrays of CPUs may become problematic. Host network interfaces may be required to reach terabit/s speeds in the near future. Scaling such arrays of CPUs (410) to these higher speeds could pose a challenge due to the amount of power required.

다른 제안은, 네트워크 인터페이스 디바이스 내에, 필드 프로그래머블 게이트 어레이(FPGA)를 포함하는 것 및 FPGA를 사용하여 네트워크로부터 수신되는 데이터 패킷에 대해 동작을 수행하는 것이다.Another proposal is to include a field programmable gate array (FPGA) within the network interface device and use the FPGA to perform operations on data packets received from the network.

네트워크 인터페이스 디바이스(500)에서 수신되는 데이터 패킷에 대해 동작을 수행하기 위한 FPGA 애플리케이션(515)을 구비하는 FPGA(510)의, 네트워크 인터페이스 디바이스(500)에서의 사용의 예를 예시하는 도 5에 대한 참조가 이루어진다. 도 4에서의 것들과 동일한 엘리먼트는 동일한 참조 번호를 사용하여 참조된다.Reference is made to FIG. 5, which illustrates an example of use of an FPGA (510) having an FPGA application (515) for performing an operation on a data packet received at the network interface device (500) in a network interface device (500). Elements identical to those in FIG. 4 are referenced using the same reference numerals.

도 5가 네트워크로부터 수신되는 데이터 패킷에 대해서만 동작하는 FPGA 애플리케이션(515)을 예시하고 있지만, 그러한 FPGA 애플리케이션(515)은, 네트워크를 통한 송신을 위해 또는 호스트 또는 시스템 상의 다른 네트워크 인터페이스로의 되송신을 위해 호스트로부터 수신되는 이들 데이터 패킷에 대해 기능(예를 들면, 부하 밸런싱 및/또는 방화벽 기능)을 수행하기 위해 사용될 수도 있다.Although FIG. 5 illustrates an FPGA application (515) that operates only on data packets received from a network, such an FPGA application (515) may also be used to perform functions (e.g., load balancing and/or firewall functions) on these data packets received from a host for transmission over the network or for retransmission to another network interface on the host or system.

FPGA 애플리케이션(515)은, FPGA(510) 상에서 실행되도록 C 또는 C++ 또는 스칼라와 같은 공통 시스템 레벨 언어로 작성되는 프로그램을 컴파일하는 것에 의해 제공될 수도 있다.The FPGA application (515) may be provided by compiling a program written in a common system level language such as C or C++ or Scala to run on the FPGA (510).

그 FPGA(510)는 네트워크 인터페이스 기능성(functionality) 및 FPGA 기능성을 가질 수도 있다. FPGA 기능성은, 네트워크 인터페이스 디바이스 유저의 필요에 따라 FPGA(510)로 프로그래밍될 수도 있는 FPGA 애플리케이션(515)을 제공할 수도 있다. FPGA 애플리케이션(515)은, 예를 들면, 네트워크(230)로부터 호스트로의 수신 경로 상의 메시지의 필터링을 제공할 수도 있다. FPGA 애플리케이션(515)은 방화벽을 제공할 수도 있다.The FPGA (510) may have network interface functionality and FPGA functionality. The FPGA functionality may provide an FPGA application (515) that may be programmed into the FPGA (510) according to the needs of the network interface device user. The FPGA application (515) may, for example, provide filtering of messages on the receive path from the network (230) to the host. The FPGA application (515) may also provide a firewall.

FPGA(510)는 FPGA 애플리케이션(515)을 제공하도록 프로그래밍 가능할 수도 있다. 네트워크 인터페이스 디바이스 기능성 중 일부는 FPGA(510) 내에서 "하드(hard)" 로직으로서 구현될 수도 있다. 예를 들면, 하드 로직은 주문형 집적 회로(application specific integrated circuit; ASIC) 게이트일 수도 있다. FPGA 애플리케이션(515)은 "소프트" 로직으로서 구현될 수도 있다. 소프트 로직은 FPGA LUT(look up table; 룩업 테이블)를 프로그래밍하는 것에 의해 제공될 수도 있다. 하드 로직은 소프트 로직과 비교하여 더 높은 레이트에서 클록킹될(clocked) 수 있을 수도 있다.The FPGA (510) may be programmable to provide an FPGA application (515). Some of the network interface device functionality may be implemented as "hard" logic within the FPGA (510). For example, the hard logic may be application specific integrated circuit (ASIC) gates. The FPGA application (515) may also be implemented as "soft" logic. The soft logic may be provided by programming an FPGA lookup table (LUT). The hard logic may be clocked at a higher rate compared to the soft logic.

네트워크 인터페이스 디바이스(500)는 호스트와 데이터를 전송 및 수신하도록 구성되는 호스트 인터페이스(505)를 포함한다. 네트워크 인터페이스 디바이스(520)는 네트워크와 데이터를 전송 및 수신하도록 구성되는 네트워크 매체 액세스 제어(MAC) 인터페이스(520)를 포함한다.The network interface device (500) includes a host interface (505) configured to transmit and receive data with a host. The network interface device (520) includes a network media access control (MAC) interface (520) configured to transmit and receive data with a network.

데이터 패킷이 MAC 인터페이스(520)에서 네트워크로부터 수신되는 경우, 데이터 패킷은, 데이터 패킷과 관련하여, 필터링과 같은 기능을 수행하도록 구성되는 FPGA 애플리케이션(515)으로 전달된다. 데이터 패킷은 (그것이 임의의 필터링을 통과하면) 그 다음, 호스트 인터페이스(505)로 전달되고, 그것은 여기서 호스트 인터페이스(505)로부터 호스트로 전달된다. 대안적으로, 데이터 패킷 FPGA 애플리케이션(515)은 데이터 패킷을 드랍할 것을 또는 재송신할 것을 결정할 수도 있다.When a data packet is received from the network at the MAC interface (520), the data packet is passed to an FPGA application (515) that is configured to perform functions, such as filtering, with respect to the data packet. The data packet (if it passes any filtering) is then passed to the host interface (505), from where it is passed to the host. Alternatively, the data packet FPGA application (515) may decide to drop or retransmit the data packet.

데이터 패킷과 관련하여 기능을 수행하기 위해 FPGA를 사용하는 이러한 접근법의 한 가지 문제는, 상대적으로 긴 컴파일 시간이 필요로 된다는 것이다. FPGA는, AND, OR, NOT, 등등과 같은 프리미티브 논리 연산을 개별적으로 나타내는 많은 로직 엘리먼트(예를 들면, 로직 셀)로 구성된다. 이들 로직 엘리먼트는 프로그래머블 인터커넥트를 사용하여 매트릭스로 배열된다. 기능을 제공하기 위해, 이들 로직 셀은 회로 정의 및 동기식 클록 타이밍 제약을 구현하기 위해 함께 동작할 필요가 있을 수도 있다. 각각의 로직 셀을 배치하는 것 및 셀 사이에서 라우팅하는 것은 알고리즘적으로 어려운 도전 과제일 수도 있다. 더 낮은 레벨의 활용도를 갖는 FPGA 상에서 컴파일하는 경우, 컴파일 시간은 10 분 미만일 수도 있다. 그러나, FPGA 디바이스가 다양한 애플리케이션에 의해 더 많이 활용되게 됨에 따라, 배치 및 루트(place and route)의 도전 과제는 증가할 수도 있고, 그 결과, 주어진 기능을 FPGA 상으로 컴파일하기 위한 시간은 증가한다. 그러한 만큼, 자신의 라우팅 리소스의 대부분이 이미 소비된 FPGA에 추가적인 로직을 추가하는 것은, 수 시간의 컴파일 시간이 걸릴 수도 있다.One problem with this approach of using FPGAs to perform functions related to data packets is the relatively long compile times required. FPGAs consist of numerous logic elements (e.g., logic cells) that individually represent primitive logic operations such as AND, OR, NOT, etc. These logic elements are arranged in a matrix using programmable interconnects. To provide functionality, these logic cells may need to work together to implement circuit definitions and synchronous clock timing constraints. Placing each logic cell and routing between them can be algorithmically challenging. Compile times can be less than 10 minutes for FPGAs with lower utilization levels. However, as FPGA devices become more widely utilized across various applications, the place-and-route challenge can increase, resulting in an increase in the time required to compile a given function onto the FPGA. Consequently, adding additional logic to an FPGA that already has most of its routing resources consumed can take hours of compile time.

한 가지 접근법은, 파싱, 매치 및 액션 프리미티브와 같은 특정한 프로세싱 프리미티브를 사용하여 하드웨어를 설계하는 것이다. 이들은, 모든 패킷이 세 가지 프로세스의 각각을 거치는 프로세싱 파이프라인을 구성하기 위해 사용될 수도 있다. 첫째, 프로토콜 헤더의 메타데이터 표현을 구성하기 위해 패킷이 파싱된다. 둘째, 패킷은 테이블에서 유지되는 규칙에 대해 유연하게 매치된다. 마지막으로, 매치가 발견되면, 매치 동작에서 선택되는 테이블로부터의 엔트리에 의존하여 패킷이 처리된다(actioned).One approach is to design hardware using specific processing primitives, such as parse, match, and action primitives. These can be used to construct a processing pipeline in which every packet passes through each of three processes. First, packets are parsed to construct a metadata representation of the protocol header. Second, packets are flexibly matched against rules maintained in a table. Finally, when a match is found, the packet is acted upon based on the entry from the table selected in the match action.

파싱/매치/액션 모델을 사용하여 기능을 구현하기 위해, P4 프로그래밍 언어(또는 유사한 언어)가 사용될 수도 있다. P4 프로그래밍 언어는 타겟 독립적인데, P4로 작성되는 프로그램은, CPU, FPGA, ASIC, NPU, 등등과 같은 상이한 타입의 하드웨어에서 실행되도록 컴파일될 수 있다는 것을 의미한다. 각각의 상이한 타입의 타겟은, P4 소스 코드를 적절한 타겟 스위치 모델로 매핑하는 자기 자신의 컴파일러를 구비한다.The P4 programming language (or a similar language) may be used to implement functionality using the parse/match/action model. The P4 programming language is target-independent, meaning that programs written in P4 can be compiled to run on different types of hardware, such as CPUs, FPGAs, ASICs, NPUs, and so on. Each different type of target has its own compiler that maps the P4 source code to the appropriate target switch model.

P4는, 하이 레벨 프로그램이 패킷 프로세싱 파이프라인에 대한 패킷 프로세싱 동작을 표현하는 것을 허용하는 프로그래밍 모델을 제공하기 위해 사용될 수도 있다. 이 접근법은, 자기 자신을 선언적 스타일로 자연스럽게 표현하는 동작에 대해 잘 작용한다. P4 언어에서, 프로그래머는 파싱, 매칭, 및 액션 스테이지를 수신된 데이터 패킷에 대해 수행될 동작으로서 표현한다. 이들 동작은 전용 하드웨어가 효율적으로 수행하도록 함께 모인다. 그러나, 이 선언적 스타일은, eBPF 프로그램과 같은 명령적 성격(imperative nature)의 프로그램을 표현하는 데 적합하지 않을 수도 있다.P4 can also be used to provide a programming model that allows high-level programs to express packet processing operations within a packet processing pipeline. This approach works well for operations that naturally express themselves in a declarative style. In the P4 language, programmers express parsing, matching, and action stages as operations to be performed on received data packets. These operations are then grouped together for efficient execution by dedicated hardware. However, this declarative style may not be suitable for expressing imperative programs, such as eBPF programs.

네트워크 인터페이스 디바이스에서, eBPF 프로그램의 시퀀스는 직렬로(serially) 실행될 것을 요구받을 수도 있다. 이 경우, 하나가 다른 것을 호출하는, eBPF 프로그램의 체인이 생성된다. 각각의 프로그램은 상태를 수정할 수 있고, 출력은, 마치 프로그램의 전체 체인이 직렬로 실행되는 것과 같다. 컴파일러가 모든 파싱, 매칭 및 액션 단계를 수집하는 것은 어려울 수도 있다. 그러나, 심지어 eBPF 프로그램의 체인이 이미 설치된 경우에도, 체인을 설치, 제거, 또는 수정하는 것이 필요할 수도 있는데, 이것은 추가적인 도전 과제를 제시할 수도 있다.On network interface devices, a sequence of eBPF programs may be required to execute serially. In this case, a chain of eBPF programs is created, each calling another. Each program can modify state, and the output is as if the entire chain of programs were executed serially. It may be difficult for a compiler to collect all the parsing, matching, and action steps. However, even if a chain of eBPF programs is already installed, it may be necessary to install, remove, or modify the chain, which presents additional challenges.

반복 실행을 필요로 하는 그러한 프로그램의 예를 제공하기 위해, 데이터 패킷을 프로세싱하도록 구성되는 프로그램(e₁, e₂, e₃)의 시퀀스의 예를 예시하는 도 10에 대한 참조가 이루어진다. 예를 들면, 프로그램의 각각은 eBPF 프로그램일 수도 있다. 프로그램의 각각은, 수신 데이터 패킷을 파싱하도록, 테이블(1010)에 대한 룩업을 수행하여 테이블(1010) 내의 매치하는 엔트리에서의 액션을 결정하도록, 그 다음, 데이터 패킷과 관련하여 액션을 수행하도록 구성된다. 액션은 패킷을 수정하는 것을 포함할 수도 있다. eBPF 프로그램의 각각은 로컬 및 공유된 상태에 의존하여 액션을 또한 수행할 수도 있다. 데이터 패킷(P₀)은, 파이프라인에서의 다음 번 프로그램(e₂)으로 전달, 수정되기 이전에, eBPF 프로그램(e₁)에 의해 초기에 프로세싱된다. 프로그램의 시퀀스의 출력은, 파이프라인에서의 최종 프로그램, 즉 e₃의 출력이다.To provide an example of such a program requiring repeated execution, reference is made to FIG. 10, which illustrates an example of a sequence of programs (e ₁ , e ₂ , e ₃ ) configured to process a data packet. For example, each of the programs may be an eBPF program. Each of the programs is configured to parse a received data packet, perform a lookup against a table (1010) to determine an action for a matching entry in the table (1010), and then perform an action with respect to the data packet. The action may include modifying the packet. Each of the eBPF programs may also perform the action depending on local and shared state. The data packet (P ₀ ) is initially processed by the eBPF program (e ₁ ) before being passed to, and modified by, the next program (e ₂ ) in the pipeline. The output of the sequence of programs is the output of the final program in the pipeline, i.e., e ₃ .

n 개의 그러한 프로그램의 각각의 효과를 단일의 P4 프로그램으로 결합하는 것은 컴파일러에 복잡할 수도 있다. 추가적으로, 소정의 프로그래밍 모델(예컨대 XDP)은, 변화하는 상황에 응답하여 프로그램의 시퀀스의 임의의 지점에서, 프로그램이 재빨리 동적으로 삽입되고 제거되는 것을 필요로 할 수도 있다.Combining the individual effects of n such programs into a single P4 program can be complex for a compiler. Additionally, certain programming models (e.g., XDP) may require programs to be dynamically inserted and removed quickly at arbitrary points in the program sequence in response to changing circumstances.

애플리케이션의 몇몇 실시형태에 따르면, 복수의 프로세싱 유닛을 포함하는 네트워크 인터페이스 디바이스가 제공된다. 각각의 프로세싱 유닛은 하드웨어에서 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다. 각각의 프로세싱 유닛은 그 자신의 로컬 상태를 저장하는 메모리를 포함한다. 각각의 프로세싱 유닛은 이 상태를 수정하는 디지털 회로를 포함한다. 디지털 회로는 주문형 집적 회로일 수도 있다. 각각의 프로세싱 유닛은, 각각의 복수의 동작을 수행하기 위해, 구성 가능한 파라미터를 포함하는 프로그램을 실행하도록 구성된다. 각각의 프로세싱 단위는 최소 단위(atom)일 수도 있다. 최소 단위는 미리 정의된 템플릿의 특정한 프로그래밍 및 라우팅에 의해 정의된다. 이것은, 연결된 복수의 프로세싱 유닛에 의해 제공되는 플로우에서의 그것의 특정한 동작 거동 및 논리적 장소를 정의한다. 본 명세서에서 용어 '최소 단위'가 사용되는 경우, 이것은, 단일의 단계에서 자신의 동작을 실행하도록 구성되는 데이터 프로세싱 유닛을 지칭하는 것으로 이해될 수도 있다. 다시 말하면, 최소 단위는 자신의 동작을 최소 단위 동작으로서 실행한다.According to some embodiments of the application, a network interface device is provided comprising a plurality of processing units. Each processing unit is configured to perform at least one predefined operation in hardware. Each processing unit includes memory that stores its own local state. Each processing unit includes digital circuitry that modifies this state. The digital circuitry may be an application-specific integrated circuit. Each processing unit is configured to execute a program including configurable parameters to perform each of a plurality of operations. Each processing unit may be an atom. An atom is defined by specific programming and routing of a predefined template. This defines its specific operational behavior and logical location in the flow provided by the plurality of connected processing units. When the term "atomic unit" is used herein, this may be understood to refer to a data processing unit configured to perform its operation in a single step. In other words, an atomic unit performs its operation as an atomic unit operation.

최소 단위는, 하나 이상의 입력을 취하고 하나 이상의 출력을 생성하는, 다양한 종류의 계산 중 하나를 반복적으로 수행하도록 구성될 수 있는 하드웨어 구조물의 모음으로서 간주될 수도 있다.A minimal unit may be thought of as a collection of hardware structures that can be configured to repeatedly perform one of various types of computations, taking one or more inputs and producing one or more outputs.

최소 단위는 하드웨어에 의해 제공된다. 최소 단위는 컴파일러에 의해 구성될 수도 있다. 최소 단위는 계산을 수행하도록 구성될 수도 있다.The minimum unit is provided by the hardware. The minimum unit can also be configured by the compiler. The minimum unit can also be configured to perform a calculation.

컴파일 동안, 복수의 프로세싱 유닛 중 적어도 일부는, 복수의 프로세싱 유닛 중 적어도 일부에 의해 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷과 관련하여 기능이 수행되도록 동작을 수행하도록 배열된다. 복수의 프로세싱 유닛 중 적어도 일부의 각각은, 데이터 패킷과 관련하여 기능을 수행하기 위해, 자신의 각각의 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다. 다시 말하면, 연결된 프로세싱 유닛이 수행하도록 구성되는 동작은 수신된 데이터 패킷과 관련하여 수행된다. 동작은 복수의 프로세싱 유닛 중 적어도 일부에 의해 순차적으로 수행된다. 집합적으로, 복수의 동작의 각각의 성능은 수신된 패킷과 관련하여, 기능, 예를 들면, 필터링을 제공한다.During compilation, at least some of the plurality of processing units are arranged to perform operations such that a function is performed in relation to a data packet received at the network interface device by at least some of the plurality of processing units. Each of the at least some of the plurality of processing units is configured to perform at least one predefined operation of its own to perform the function in relation to the data packet. In other words, the operation that the connected processing unit is configured to perform is performed in relation to the received data packet. The operations are performed sequentially by at least some of the plurality of processing units. Collectively, the performance of each of the plurality of operations provides a function, for example, filtering, in relation to the received packet.

기능을 수행하기 위해 그들 각각의 적어도 하나의 미리 정의된 동작을 실행하도록 최소 단위의 각각을 배열하는 것에 의해, 도 5와 관련하여 상기에서 설명되는 FPGA 애플리케이션 예와 비교하여 컴파일 시간은 감소될 수도 있다. 더구나, 하드웨어에서 특정한 동작을 수행하는 것으로 구체적으로 전용되는 프로세싱 유닛을 사용하여 기능을 수행하는 것에 의해, 도 4와 관련하여 상기에서 논의되는 바와 같은 각각의 데이터 패킷에 대한 기능을 수행하기 위해 네트워크 인터페이스 디바이스에서 소프트웨어를 실행하는 CPU를 사용하는 것과 관련하여, 기능이 수행될 수 있는 속도는 향상될 수도 있다.By arranging each of the minimal units to execute at least one predefined operation for performing a function, the compilation time may be reduced compared to the FPGA application example described above in connection with FIG. 5. Moreover, by performing the function using a processing unit specifically dedicated to performing a specific operation in hardware, the speed at which the function can be performed may be improved, as compared to using a CPU executing software in a network interface device to perform the function for each data packet, as discussed above in connection with FIG. 4.

본 출원의 실시형태에 따른 네트워크 인터페이스 디바이스(600)의 예를 예시하는 도 6에 대한 참조가 이루어진다. 네트워크 인터페이스 디바이스는, 네트워크 인터페이스 디바이스(600)의 인터페이스에서 수신되는 데이터 패킷의 프로세싱을 수행하도록 구성되는 하드웨어 모듈(610)을 포함한다. 도 6이 수신 경로 상의 데이터 패킷에 대한 기능(예를 들면, 필터링)을 수행하는 하드웨어 모듈(610)을 예시하지만, 하드웨어 모듈(610)은, 호스트로부터 수신되는 송신 경로 상의 데이터 패킷에 대한 기능(예를 들면, 부하 밸런싱 또는 방화벽)을 수행하기 위해 또한 사용될 수도 있다.Reference is made to FIG. 6, which illustrates an example of a network interface device (600) according to an embodiment of the present application. The network interface device includes a hardware module (610) configured to perform processing of data packets received at an interface of the network interface device (600). While FIG. 6 illustrates the hardware module (610) performing a function (e.g., filtering) on data packets on a receive path, the hardware module (610) may also be used to perform a function (e.g., load balancing or firewall) on data packets on a transmit path received from a host.

네트워크 인터페이스 디바이스(600)는, 호스트와 데이터 패킷을 전송 및 수신하기 위한 호스트 인터페이스(620) 및 네트워크와 데이터 패킷을 전송 및 수신하기 위한 네트워크 MAC 인터페이스(630)를 포함한다.The network interface device (600) includes a host interface (620) for transmitting and receiving data packets with a host and a network MAC interface (630) for transmitting and receiving data packets with a network.

네트워크 인터페이스 디바이스(600)는 복수의 프로세싱 유닛(640a, 640b, 640c, 640d)을 포함하는 하드웨어 모듈(610)을 포함한다. 프로세싱 유닛의 각각은 최소 단위 프로세싱 유닛일 수도 있다. 용어 최소 단위는 프로세싱 유닛을 지칭하기 위해 설명에서 사용된다. 프로세싱 유닛의 각각은 하드웨어에서 적어도 하나의 동작을 수행하도록 구성된다. 프로세싱 유닛의 각각은 적어도 하나의 동작을 수행하도록 구성되는 디지털 회로(645)를 포함한다. 디지털 회로(645)는 주문형 집적 회로일 수도 있다. 프로세싱 유닛의 각각은 상태 정보를 저장하는 메모리(650)를 추가적으로 포함한다. 디지털 회로(645)는 각각의 복수의 동작을 실행할 때 상태 정보를 업데이트한다. 로컬 메모리에 추가하여, 프로세싱 유닛의 각각은, 복수의 프로세싱 유닛의 각각이 액세스 가능한 상태 정보를 또한 저장할 수도 있는 공유 메모리(660)에 액세스할 수 있다.The network interface device (600) includes a hardware module (610) comprising a plurality of processing units (640a, 640b, 640c, 640d). Each of the processing units may be a minimum unit processing unit. The term minimum unit is used in the description to refer to a processing unit. Each of the processing units is configured to perform at least one operation in hardware. Each of the processing units includes a digital circuit (645) configured to perform at least one operation. The digital circuit (645) may be an application-specific integrated circuit. Each of the processing units additionally includes a memory (650) that stores state information. The digital circuit (645) updates the state information when executing each of the plurality of operations. In addition to the local memory, each of the processing units may have access to a shared memory (660) that may also store state information accessible to each of the plurality of processing units.

공유 메모리(660) 내의 상태 정보 및/또는 프로세싱 유닛의 메모리(650) 내의 상태 정보는 다음의 것 중 적어도 하나를 포함할 수도 있다: 프로세싱 유닛 사이에 전달되는 메타데이터, 임시 변수, 데이터 패킷의 콘텐츠, 하나 이상의 공유된 맵의 콘텐츠.State information within the shared memory (660) and/or state information within the memory (650) of the processing units may include at least one of the following: metadata passed between processing units, temporary variables, contents of data packets, contents of one or more shared maps.

정리하면, 복수의 프로세싱 유닛은 네트워크 인터페이스 디바이스(600)에서 수신되는 데이터 패킷과 관련하여 수행될 기능을 제공할 수 있다. 컴파일러는, 각각의 유입하는 데이터 패킷과 관련하여 그들 각각의 적어도 하나의 미리 정의된 동작을 수행하도록 복수의 프로세싱 유닛 중 적어도 일부를 배열하는 것에 의해 유입하는 데이터 패킷과 관련하여 기능을 수행하도록 하드웨어 모듈(610)을 구성하기 위한 명령어를 출력한다. 이것은, 연결된 프로세싱 유닛의 각각이 각각의 유입하는 데이터 패킷과 관련하여 그들 각각의 적어도 하나의 동작을 수행하도록, 프로세싱 유닛(640a, 640b, 640c, 640d) 중 적어도 일부를 함께 체인화(즉, 연결)하는 것에 의해 달성될 수도 있다. 프로세싱 유닛의 각각은 기능을 수행하기 위해 그들 각각의 적어도 하나의 동작을 특정한 순서로 수행한다. 순서는 프로세싱 유닛 중 두 개 이상이 서로 병렬로, 즉, 동시에 실행되도록 하는 그러한 것일 수도 있다. 예를 들면, 하나의 프로세싱 유닛은, 제2 프로세싱 유닛이 동일한 데이터 패킷 내의 상이한 위치로부터 또한 판독하는 시간 기간(하드웨어 모듈(610)의 주기적 신호(예를 들면, 클록 신호)에 의해 정의됨) 동안 데이터 패킷으로부터 판독할 수도 있다.In summary, a plurality of processing units may provide a function to be performed in relation to data packets received at the network interface device (600). The compiler outputs instructions for configuring the hardware module (610) to perform the function in relation to the incoming data packets by arranging at least some of the plurality of processing units to perform at least one predefined operation, respectively, in relation to each incoming data packet. This may be achieved by chaining (i.e., connecting) at least some of the processing units (640a, 640b, 640c, 640d) together such that each of the connected processing units performs at least one operation, respectively, in relation to each incoming data packet. Each of the processing units performs at least one of its operations in a specific order to perform the function. The order may be such that two or more of the processing units are executed in parallel, i.e., simultaneously. For example, one processing unit may read from a data packet during a time period (defined by a periodic signal (e.g., a clock signal) of the hardware module (610)) during which a second processing unit also reads from a different location within the same data packet.

몇몇 실시형태에서, 데이터 패킷은 시퀀스에서 프로세싱 유닛에 의해 표현되는 각각의 스테이지를 통과한다. 이 경우, 각각의 프로세싱 유닛은, 데이터 패킷을 다음 번 프로세싱 유닛의 프로세싱을 수행하기 위해 다음 번 프로세싱 유닛으로 전달하기 이전에, 자신의 프로세싱을 완료한다.In some embodiments, a data packet passes through each stage represented by a processing unit in a sequence. In this case, each processing unit completes its processing before passing the data packet to the next processing unit for processing.

도 6에서 도시되는 예에서, 프로세싱 유닛(640a, 640b, 및 640d)은 컴파일시에 함께 연결되고, 그 결과, 그들의 각각은, 수신된 데이터 패킷과 관련하여 기능, 예를 들면, 필터링을 수행하기 위해 그들 각각의 적어도 하나의 동작을 수행한다. 프로세싱 유닛(640a, 640b, 640d)은 데이터 패킷을 프로세싱하기 위한 파이프라인을 형성한다. 데이터 패킷은 동일한 시간 기간을 각각 갖는 스테이지에서 이 파이프라인을 따라 이동할 수도 있다. 시간 기간은 기간 신호 또는 비트에 따라 정의될 수도 있다. 시간 기간은 클록 신호에 의해 정의될 수도 있다. 클록의 여러 가지 기간은 파이프라인의 각각의 스테이지에 대한 하나의 시간 기간을 정의할 수도 있다. 데이터 패킷은 반복하는 시간 기간의 각각의 발생의 끝에서 파이프라인에서 하나의 스테이지를 따라 이동한다. 시간 기간은 고정된 간격일 수도 있다. 대안적으로, 파이프라인에서의 스테이지에 대한 각각의 시간 기간은 가변적인 양의 시간을 필요로 할 수도 있다. 이전 프로세싱 스테이지가 동작을 완료한 경우 파이프라인에서 다음 번 스테이지를 나타내는 신호가 생성될 수도 있는데, 이것은 가변적인 양의 시간을 필요로 할 수도 있다. 어떤 미리 결정된 양의 시간 동안 신호를 지연시키는 것에 의해 파이프라인에서의 임의의 스테이지에서 스톨(stall)이 도입될 수도 있다.In the example illustrated in FIG. 6, processing units (640a, 640b, and 640d) are linked together at compile time, so that each of them performs at least one operation, for example, filtering, in relation to a received data packet. Processing units (640a, 640b, and 640d) form a pipeline for processing data packets. A data packet may move along this pipeline in stages each having the same time period. The time period may be defined by a period signal or bit. The time period may also be defined by a clock signal. Multiple periods of the clock may define one time period for each stage of the pipeline. A data packet moves along one stage of the pipeline at the end of each occurrence of a repeating time period. The time period may be a fixed interval. Alternatively, each time period for a stage in the pipeline may require a variable amount of time. When a previous processing stage has completed its operation, a signal indicating the next stage in the pipeline may be generated, which may take a variable amount of time. A stall can be introduced at any stage in the pipeline by delaying the signal for a predetermined amount of time.

프로세싱 유닛(640a, 640b, 640d)의 각각은, 그들 각각의 적어도 하나의 동작의 일부로서 공유 메모리(660)에 액세스하도록 구성될 수도 있다. 프로세싱 유닛(640a, 640b, 640d)의 각각은, 그들 각각의 적어도 하나의 동작의 일부로서 서로 사이에서 메타데이터를 전달하도록 구성될 수도 있다. 프로세싱 유닛(640a, 640b, 640d)의 각각은, 그들 각각의 적어도 하나의 동작의 일부로서 네트워크로부터 수신되는 데이터 패킷에 액세스하도록 구성될 수도 있다.Each of the processing units (640a, 640b, 640d) may be configured to access shared memory (660) as part of at least one of their respective operations. Each of the processing units (640a, 640b, 640d) may be configured to transfer metadata between themselves as part of at least one of their respective operations. Each of the processing units (640a, 640b, 640d) may be configured to access data packets received from a network as part of at least one of their respective operations.

이 예에서, 프로세싱 유닛(640c)은, 기능을 제공하기 위해 수신된 데이터 패킷의 프로세싱을 수행하도록 사용되는 것이 아리나, 파이프라인으로부터 생략된다.In this example, the processing unit (640c) is omitted from the pipeline, as it is not used to perform processing of received data packets to provide functionality.

네트워크 MAC 계층(630)에서 수신되는 데이터 패킷은 프로세싱을 위해 하드웨어 모듈(610)로 전달될 수도 있다. 도 6에서 도시되지는 않지만, 하드웨어 모듈(610)에 의해 수행되는 프로세싱은, 하드웨어 모듈(610)에 의해 제공되는 기능 외에 데이터 패킷과 관련하여 추가적인 기능을 제공하는 더 큰 프로세싱 파이프라인의 일부일 수도 있다. 이것은 도 14와 관련하여 예시되며, 하기에서 더욱 상세하게 설명될 것이다.Data packets received at the network MAC layer (630) may be passed to a hardware module (610) for processing. Although not illustrated in FIG. 6 , the processing performed by the hardware module (610) may be part of a larger processing pipeline that provides additional functionality related to the data packets beyond the functionality provided by the hardware module (610). This is exemplified in connection with FIG. 14 and will be described in more detail below.

제1 프로세싱 유닛(640a)은 데이터 패킷과 관련하여 적어도 하나의 제1 동작을 수행하도록 구성된다. 이 제1의 적어도 하나의 동작은 다음의 것 중 적어도 하나를 포함할 수도 있다: 데이터 패킷으로부터의 판독, 메모리(660)에서 공유된 상태에 대한 판독 및 기록, 및/또는 액션을 결정하기 위한 테이블에 대한 룩업의 수행. 그 다음, 제1 프로세싱 유닛(640a)은 자신의 적어도 하나의 동작으로부터 결과를 생성하도록 구성된다. 결과는 메타데이터의 형태일 수도 있다. 결과는 데이터 패킷에 대한 수정을 포함할 수도 있다. 결과는 메모리(660)의 공유된 상태에 대한 수정을 포함할 수도 있다. 제2 프로세싱 유닛(640b)은, 제1 프로세싱 유닛(640a)에 의해 실행되는 동작으로부터의 결과에 의존하여 제1 데이터 패킷과 관련하여 자신의 적어도 하나의 동작을 수행하도록 구성된다. 제2 프로세싱 유닛(640b)은 자신의 적어도 하나의 동작으로부터 결과를 생성하고, 그 결과를, 제1 데이터 패킷과 관련하여 자신의 적어도 하나의 동작을 수행하도록 구성되는 제3 프로세싱 유닛(640d)으로 전달한다. 제1 프로세싱 유닛(640a), 제2 프로세싱 유닛(640b) 및 제3 프로세싱 유닛(640d)은, 함께, 데이터 패킷과 관련하여 기능을 제공하도록 구성된다. 그 다음, 데이터 패킷은 호스트 인터페이스(620)로 전달될 수도 있는데, 이곳으로부터 그것은 호스트 시스템으로 전달된다.The first processing unit (640a) is configured to perform at least one first operation with respect to the data packet. The at least one first operation may include at least one of the following: reading from the data packet, reading and writing shared state in the memory (660), and/or performing a lookup on a table to determine an action. The first processing unit (640a) is then configured to generate a result from its at least one operation. The result may be in the form of metadata. The result may include a modification to the data packet. The result may include a modification to the shared state in the memory (660). The second processing unit (640b) is configured to perform its at least one operation with respect to the first data packet based on a result from the operation executed by the first processing unit (640a). The second processing unit (640b) generates a result from its at least one operation and passes the result to a third processing unit (640d) that is configured to perform its at least one operation with respect to the first data packet. The first processing unit (640a), the second processing unit (640b), and the third processing unit (640d) are configured together to provide functions in relation to the data packet. The data packet may then be transmitted to the host interface (620), from where it is transmitted to the host system.

따라서, 연결된 프로세싱 유닛은 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷을 프로세싱하기 위한 파이프라인을 형성한다는 것을 알 수도 있다. 이 파이프라인은 eBPF 프로그램의 프로세싱을 제공할 수도 있다. 파이프라인은 복수의 eBPF 프로그램의 프로세싱을 제공할 수도 있다. 파이프라인은 순서대로 실행되는 복수의 모듈의 프로세싱을 제공할 수도 있다.Thus, it can be seen that the connected processing units form a pipeline for processing data packets received from the network interface device. This pipeline may provide processing of eBPF programs. The pipeline may also provide processing of multiple eBPF programs. The pipeline may also provide processing of multiple modules that are executed sequentially.

하드웨어 모듈(610)에서 프로세싱 유닛을 함께 연결하는 것은, 하드웨어 모듈(610)의 미리 합성된 상호 접속 패브릭(interconnection fabric)의 라우팅 기능을 프로그래밍하는 것에 의해 수행될 수도 있다. 이 상호 접속 패브릭은 하드웨어 모듈(610)의 다양한 프로세싱 유닛 사이의 연결을 제공한다. 상호 접속 패브릭은 패브릭에 의해 지원되는 토폴로지(topology)에 따라 프로그래밍된다. 가능한 예시적인 토폴로지가 도 15를 참조하여 하기에서 논의된다.Connecting processing units together in a hardware module (610) may also be accomplished by programming the routing functions of a pre-synthesized interconnection fabric of the hardware module (610). This interconnection fabric provides connections between the various processing units of the hardware module (610). The interconnection fabric is programmed according to a topology supported by the fabric. Possible exemplary topologies are discussed below with reference to FIG. 15 .

하드웨어 모듈(610)은 적어도 하나의 버스 인터페이스를 지원한다. 적어도 하나의 버스 인터페이스는 (예를 들면, 호스트 또는 네트워크로부터) 하드웨어 모듈(610)에서 데이터 패킷을 수신한다. 적어도 하나의 버스 인터페이스는 하드웨어 모듈(610)로부터 (예를 들면, 호스트 또는 네트워크로) 데이터 패킷을 출력한다. 적어도 하나의 버스 인터페이스는 하드웨어 모듈(610)에서 제어 메시지를 수신한다. 제어 메시지는 하드웨어 모듈(610)을 구성하기 위한 것일 수도 있다.The hardware module (610) supports at least one bus interface. The at least one bus interface receives data packets from the hardware module (610) (e.g., from a host or a network). The at least one bus interface outputs data packets from the hardware module (610) (e.g., to a host or a network). The at least one bus interface receives control messages from the hardware module (610). The control messages may be for configuring the hardware module (610).

도 6에서 도시되는 예는 도 5에서 도시되는 FPGA 애플리케이션(515)과 관련하여 감소된 컴파일 시간의 이점을 갖는다. 예를 들면, 도 6의 하드웨어 모듈(610)은 필터링 기능을 컴파일하는 데 10 초 미만이 필요할 수도 있다. 도 6에서 도시되는 예는 도 4에서 도시되는 CPU의 어레이의 예와 관련하여 향상된 프로세싱 속도의 이점을 갖는다.The example illustrated in FIG. 6 has the advantage of reduced compilation time relative to the FPGA application (515) illustrated in FIG. 5. For example, the hardware module (610) of FIG. 6 may require less than 10 seconds to compile a filtering function. The example illustrated in FIG. 6 has the advantage of improved processing speed relative to the example of the array of CPUs illustrated in FIG. 4.

애플리케이션은 일반 프로그램(또는 다수의 프로그램)을 미리 합성된 데이터 경로에 매핑하는 것에 의해 그러한 하드웨어 모듈(610)에서의 실행을 위해 컴파일될 수도 있다. 컴파일러는, 임의적인 수의 프로세싱 스테이지 인스턴스를 연결하는 것에 의해 데이터 경로를 구축하는데, 여기서 각각의 인스턴스는 미리 합성된 프로세싱 스테이지 최소 단위 중 하나로부터 구축된다.An application may be compiled for execution on such a hardware module (610) by mapping a general program (or multiple programs) to a pre-synthesized data path. The compiler constructs the data path by linking an arbitrary number of processing stage instances, each instance constructed from one of the pre-synthesized processing stage minimum units.

최소 단위의 각각은 회로로부터 구축된다. 각각의 회로는 RTL(register transfer language; 레지스터 전송 언어) 또는 하이 레벨 언어를 사용하여 정의될 수도 있다. 각각의 회로는 컴파일러 또는 도구 체인(tool chain)을 사용하여 합성된다. 최소 단위는 하드 로직으로 합성될 수도 있고 따라서 네트워크 인터페이스 디바이스의 하드웨어 모듈에서 하드(ASIC) 리소스로서 이용 가능할 수도 있다. 최소 단위는 소프트 로직으로 합성될 수도 있다. 소프트 로직의 최소 단위는, 물리적 디바이스 상에서의 합성된 로직의 배치 및 루트 정보를 할당하고 유지하는 제약을 구비할 수도 있다. 최소 단위는 최소 단위의 거동을 명시하는 구성 가능한 파라미터를 사용하여 설계될 수도 있다. 각각의 파라미터는, 프로세싱 파이프라인의 클록 사이클 동안 프로세싱 유닛에 의해 수행될 적어도 하나의 동작을 명시할 수도 있는 변수, 또는 심지어 동작의 시퀀스(마이크로 프로그램)일 수도 있다. 최소 단위를 구현하는 로직은 동기식 또는 비동기식으로 클록될 수도 있다.Each minimal unit is built from a circuit. Each circuit may be defined using RTL (register transfer language) or a high-level language. Each circuit is synthesized using a compiler or tool chain. The minimal unit may be synthesized as hard logic and thus available as a hard (ASIC) resource in the hardware module of the network interface device. The minimal unit may also be synthesized as soft logic. The minimal unit of soft logic may have constraints that allocate and maintain the placement and routing information of the synthesized logic on the physical device. The minimal unit may be designed using configurable parameters that specify the behavior of the minimal unit. Each parameter may be a variable, or even a sequence of operations (a microprogram), that may specify at least one operation to be performed by a processing unit during a clock cycle of the processing pipeline. The logic implementing the minimal unit may be clocked synchronously or asynchronously.

최소 단위 그 자체의 프로세싱 파이프라인은 주기적인 신호에 따라 동작하도록 구성될 수도 있다. 이 경우, 메타데이터 및 데이터 패킷의 각각은 신호의 각각의 발생에 응답하여 파이프라인을 따라 하나의 스테이지를 이동한다. 프로세싱 파이프라인은 비동기 방식으로 동작될 수도 있다. 이 경우, 파이프라인의 더 높은 레벨에서의 역압은, 각각의 다운스트림 스테이지로 하여금, 업스트림 스테이지로부터의 데이터가 자신에 제공된 경우에만 프로세싱을 시작하게 할 것이다.The processing pipeline of the minimal unit itself can be configured to operate in response to periodic signals. In this case, each metadata and data packet moves through a stage along the pipeline in response to each occurrence of the signal. The processing pipeline can also operate asynchronously. In this case, backpressure at a higher level of the pipeline will cause each downstream stage to begin processing only when data from the upstream stage is available to it.

복수의 그러한 최소 단위에 의해 실행될 기능을 컴파일할 때, 컴퓨터 코드 명령어의 시퀀스는 복수의 동작으로 분리되는데, 그 각각은 단일의 최소 단위로 매핑된다. 각각의 동작은 컴퓨터 코드 명령어에서의 분해된 명령어의 단일의 라인을 나타낼 수도 있다. 각각의 동작은 최소 단위 중 하나에 의해 실행되도록 최소 단위 중 하나에 할당된다. 컴퓨터 코드 명령어에서는 표현당 하나의 최소 단위가 있을 수도 있다. 각각의 최소 단위는 동작의 타입과 관련되며, 자신의 관련된 타입의 동작에 기초하여 컴퓨터 코드 명령어에서 적어도 하나의 동작을 실행하도록 선택된다. 예를 들면, 데이터 패킷으로부터 로드 동작을 수행하도록 최소 단위가 미리 구성될 수도 있다. 따라서, 그러한 최소 단위는 컴퓨터 코드의 데이터 패킷으로부터 로드 동작을 나타내는 명령어를 실행하도록 지정된다.When compiling a function to be executed by multiple such minimal units, a sequence of computer code instructions is broken down into multiple operations, each of which is mapped to a single minimal unit. Each operation may represent a single line of instructions decomposed in the computer code instruction. Each operation is assigned to one of the minimal units to be executed by that minimal unit. There may be one minimal unit per expression in the computer code instruction. Each minimal unit is associated with a type of operation and is selected to execute at least one operation in the computer code instruction based on its associated type of operation. For example, a minimal unit may be preconfigured to perform a load operation from a data packet. Accordingly, such a minimal unit is designated to execute an instruction representing a load operation from a data packet in the computer code.

컴퓨터 코드 명령어에서 라인당 하나의 최소 단위가 선택될 수도 있다. 따라서, 그러한 최소 단위를 포함하는 하드웨어 모듈에서 기능을 구현할 때, 그러한 최소 단위가 100 개가 있을 수도 있는데, 각각은 그 데이터 패킷과 관련하여 기능을 수행하기 위해 그들 각각의 동작을 각각 수행한다.In computer code instructions, one minimal unit may be selected per line. Therefore, when implementing a function in a hardware module containing such a minimal unit, there may be 100 such minimal units, each performing its own operation to perform the function in relation to that data packet.

각각의 최소 단위는, 자신의 관련된 동작의 타입을 결정하는 프로세싱 스테이지 템플릿의 세트 중 하나에 따라 구성될 수도 있다. 컴파일 프로세스는, 특정한 적어도 하나의 동작을 수행하도록 각각의 최소 단위를, 그것의 관련된 타입에 기초하여, 제어하기 위한 명령어를 생성하도록 구성된다. 예를 들면, 최소 단위가 패킷 액세스 동작을 수행하도록 미리 구성되는 경우, 컴파일 프로세스는, 그 최소 단위에, 패킷의 헤더로부터 소정의 정보(예를 들면, 패킷의 소스 ID)를 로딩하기 위한 동작을 할당할 수도 있다. 컴파일 프로세스는 하드웨어 모듈로 명령어를 전송하도록 구성되는데, 여기서 최소 단위는 컴파일 프로세스에 의해 그들에 할당되는 동작을 수행하도록 구성된다.Each minimal unit may be configured according to one of a set of processing stage templates that determine the type of operation associated with it. The compilation process is configured to generate instructions for controlling each minimal unit, based on its associated type, to perform at least one specific operation. For example, if a minimal unit is pre-configured to perform a packet access operation, the compilation process may assign an operation to the minimal unit for loading certain information (e.g., the packet's source ID) from a packet header. The compilation process is configured to transmit instructions to a hardware module, where the minimal units are configured to perform the operation assigned to them by the compilation process.

최소 단위의 거동을 명시하는 프로세싱 스테이지 템플릿은 로직 스테이지 템플릿(logic stage template)(예를 들면, 레지스터, 스크래치 패드 메모리(scratch pad memory), 및 스택뿐만 아니라 분기에 걸친 동작을 제공함), 패킷 액세스 상태 템플릿(예를 들면, 패킷 데이터 로드 및/또는 패킷 데이터 저장소를 제공함), 및 맵 액세스 스테이지 템플릿(예를 들면, 맵 룩업 알고리즘, 맵 테이블 사이즈)이다.Processing stage templates that specify the behavior of the smallest unit are logic stage templates (e.g., providing registers, scratch pad memory, and stack as well as branch-spanning operations), packet access state templates (e.g., providing packet data loads and/or packet data stores), and map access stage templates (e.g., map lookup algorithms, map table sizes).

패킷 액세스 스테이지는 다음의 것 중 적어도 하나를 포함할 수 있다: 데이터 패킷으로부터 바이트의 시퀀스를 판독하는 것; 데이터 패킷에서 바이트의 하나의 시퀀스를 바이트의 상이한 시퀀스로 대체하는 것; 데이터 패킷에 바이트를 삽입하는 것; 및 데이터 패킷에서 바이트를 삭제하는 것.The packet access stage may include at least one of the following: reading a sequence of bytes from a data packet; replacing one sequence of bytes in the data packet with a different sequence of bytes; inserting a byte into the data packet; and deleting a byte from the data packet.

맵 액세스 스테이지는, 직접 색인 어레이(direct indexed array) 및 연상 어레이(associative array)를 비롯한, 상이한 타입의 맵(예를 들면, 룩업 테이블)에 액세스하기 위해 사용될 수 있다. 맵 액세스 스테이지는 다음의 것 중 적어도 하나를 포함할 수도 있다: 위치로부터 값을 판독하는 것; 위치에 값을 기록하는 것; 맵 내의 한 위치에서의 값을 상이한 값으로 대체하는 것. 맵 액세스 스테이지는, 값이 맵 내의 한 위치로부터 판독되고 상이한 값과 비교되는 비교 동작을 포함할 수도 있다. 위치로부터 판독되는 값이 상이한 값보다 더 작으면, 그러면 제1 액션(예를 들면, 아무것도 하지 않음, 그 위치에서의 값을 상이한 값과 교환함, 또는 값을 함께 더함)이 수행될 수도 있다. 그렇지 않으면, 제2 액션(예를 들면, 아무것도 하지 않음, 값을 교환함, 또는 값을 추가함)이 수행될 수도 있다. 어느 경우든, 위치로부터 판독되는 값은 다음 번 프로세싱 스테이지로 제공될 수도 있다.The map access stage can be used to access different types of maps (e.g., lookup tables), including direct indexed arrays and associative arrays. The map access stage may include at least one of the following: reading a value from a location; writing a value to a location; or replacing a value at a location in the map with a different value. The map access stage may also include a comparison operation in which a value is read from a location in the map and compared to the different value. If the value read from the location is less than the different value, then a first action (e.g., doing nothing, swapping the value at that location with the different value, or adding the values together) may be performed. Otherwise, a second action (e.g., doing nothing, swapping the values, or adding the values) may be performed. In either case, the value read from the location may be provided to the next processing stage.

각각의 맵 액세스 스테이지는 상태 보존형 프로세싱 유닛에서 구현될 수도 있다. 맵 액세스 스테이지의 프로세싱을 수행하도록 구성되는 최소 단위에 포함될 수도 있는 회로부(1700)의 예를 예시하는 도 17에 대한 참조가 이루어진다. 회로부(1700)는, 룩업 테이블에 대한 입력으로서 사용되는 입력 값의 해시를 수행하도록 구성되는 해시 기능(1710)을 포함할 수도 있다. 회로부(1700)는 최소 단위의 동작에 관련되는 상태를 저장하도록 구성되는 메모리(1720)를 포함한다. 회로부(1700)는 연산을 수행하도록 구성되는 산술 로직 유닛(1730)을 포함한다.Each map access stage may be implemented in a stateful processing unit. Reference is made to FIG. 17, which illustrates an example of a circuit (1700) that may be included in a minimal unit configured to perform processing of the map access stage. The circuit (1700) may include a hash function (1710) configured to perform a hash of an input value used as an input to a lookup table. The circuit (1700) includes a memory (1720) configured to store a state related to the operation of the minimal unit. The circuit (1700) includes an arithmetic logic unit (1730) configured to perform an operation.

로직 스테이지(logic stage)가 이전 스테이지에 의해 제공되는 값에 대한 계산을 수행할 수도 있다. 로직 스테이지를 구현하도록 구성되는 프로세싱 유닛은 상태 비보존형 프로세싱 유닛일 수도 있다. 각각의 상태 비보존형 프로세싱 유닛은 간단한 산술 연산을 수행할 수 있다. 각각의 프로세싱 유닛은, 예를 들면, 8 비트 연산을 수행할 수도 있다.A logic stage may perform a calculation on a value provided by a previous stage. The processing unit configured to implement the logic stage may be a stateless processing unit. Each stateless processing unit may perform simple arithmetic operations. Each processing unit may, for example, perform 8-bit operations.

각각의 로직 스테이지는 상태 비보존형 프로세싱 유닛에서 구현될 수도 있다. 로직 스테이지의 프로세싱을 수행하도록 구성되는 최소 단위에 포함될 수도 있는 회로부(1800)의 예를 예시하는 도 18에 대한 참조가 이루어진다. 회로부(1800)는 산술 로직 유닛(ALU) 및 멀티플렉서의 어레이를 포함한다. ALU 및 멀티플렉서는 계층에서 배열되는데, ALU에 의한 프로세싱의 하나의 계층의 출력은, ALU의 다음 번 계층으로 입력을 제공하기 위해 멀티플렉서에 의해 사용된다.Each logic stage may be implemented in a stateless processing unit. Reference is made to FIG. 18, which illustrates an example of a circuit (1800) that may be included in a minimal unit configured to perform processing of the logic stages. The circuit (1800) includes an array of arithmetic logic units (ALUs) and multiplexers. The ALUs and multiplexers are arranged in a hierarchy, where the output of one layer of processing by the ALUs is used by the multiplexers to provide input to the next layer of ALUs.

하드웨어 모듈에서 구현되는 스테이지의 파이프라인은, 제1 패킷 액세스 스테이지(pkt0), 후속되는 제1 로직 스테이지(logic0), 후속되는 제1 맵 액세스 스테이지(map0), 후속되는 제2 로직 스테이지(logic1), 후속되는 제2 패킷 액세스 스테이지(pkt1), 및 등등을 포함할 수도 있다. 따라서, 그것은 다음의 형태를 취할 수도 있다:A pipeline of stages implemented in a hardware module may include a first packet access stage (pkt0), a subsequent first logic stage (logic0), a subsequent first map access stage (map0), a subsequent second logic stage (logic1), a subsequent second packet access stage (pkt1), and so on. Thus, it may take the following form:

Pkt0 -> logic0 -> map0 -> logic1 -> pkt1Pkt0 -> logic0 -> map0 -> logic1 -> pkt1

몇몇 예에서, 스테이지(pkt0)는 패킷으로부터 필요한 정보를 추출한다. 스테이지(pkt0)는 이 정보를 스테이지(logic0)로 전달한다. 스테이지(logic0)는 패킷이 유효한 IP 패킷인지의 여부를 결정한다. 몇몇 경우에, logic0이 맵 요청을 형성하고, 맵 동작을 실행하는 map0에 맵 요청을 전송한다. 스테이지(map0)는 룩업 테이블에 대한 업데이트를 수행할 수도 있다. 그 다음, 스테이지(logic1)는 맵 동작으로부터 결과를 수집하고 결과로서 패킷을 드랍할지의 여부를 결정한다.In some examples, stage (pkt0) extracts the necessary information from the packet. Stage (pkt0) passes this information to stage (logic0). Stage (logic0) determines whether the packet is a valid IP packet. In some cases, logic0 forms a map request and sends it to map0, which executes a map operation. Stage (map0) may also perform updates to a lookup table. Stage (logic1) then collects the results from the map operation and determines whether to drop the packet as a result.

몇몇 경우에, 맵 요청은, 이 패킷에 대해 맵 동작을 수행되지 않아야 하는 경우를 커버하기 위해 디스에이블된다. 맵 동작이 수행되지 않는 경우, logic0은, logic1에, 패킷이 유효한 IP 패킷인지 또는 아닌지의 여부에 의존하여 패킷이 드랍되어야 하는지 또는 아닌지의 여부를 나타낸다. 몇몇 예에서, 룩업 테이블은 256 개의 엔트리를 포함하는데, 여기서 각각의 엔트리는 8 비트 값이다.In some cases, the map request is disabled to cover cases where a map operation should not be performed on this packet. When a map operation is not performed, logic0 indicates to logic1 whether the packet should be dropped or not, depending on whether the packet is a valid IP packet or not. In some examples, the lookup table contains 256 entries, where each entry is an 8-bit value.

설명되는 이 예는 단지 다섯 개의 스테이지만을 포함한다. 그러나, 언급되는 바와 같이, 더 많은 것이 사용될 수도 있다. 더구나, 모든 동작은 모두가 순차적으로 실행될 필요는 없지만, 그러나, 동일한 데이터 패킷과 관련한 몇몇 동작은 상이한 프로세싱 유닛에 의해 동시적으로 실행될 수도 있다.This example only includes five stages. However, as noted, more stages may be used. Furthermore, not all operations need to be executed sequentially; however, several operations involving the same data packet may be executed concurrently by different processing units.

도 6에서 도시되는 하드웨어 모듈(610)은 데이터 패킷과 관련하여 기능을 수행하기 위한 최소 단위의 단일의 파이프라인을 예시한다. 그러나, 하드웨어 모듈(610)은 데이터 패킷을 프로세싱하기 위한 복수의 파이프라인을 포함할 수도 있다. 복수의 파이프라인의 각각은 데이터 패킷과 관련하여 상이한 기능을 수행할 수도 있다. 하드웨어 모듈(610)은 하드웨어 모듈(610)의 최소 단위의 제1 세트를 인터커넥트하여 제1 데이터 프로세싱 파이프라인을 형성하도록 구성 가능하다. 하드웨어 모듈(610)은 또한, 하드웨어 모듈(610)의 최소 단위의 제2 세트를 인터커넥트하여 제2 데이터 프로세싱 파이프라인을 형성하도록 구성 가능하다.The hardware module (610) illustrated in FIG. 6 exemplifies a single pipeline, which is a minimal unit for performing a function in relation to a data packet. However, the hardware module (610) may also include multiple pipelines for processing data packets. Each of the multiple pipelines may perform a different function in relation to the data packets. The hardware module (610) is configured to interconnect a first set of minimal units of the hardware module (610) to form a first data processing pipeline. The hardware module (610) is also configured to interconnect a second set of minimal units of the hardware module (610) to form a second data processing pipeline.

복수의 프로세싱 유닛을 포함하는 하드웨어 모듈에서 구현될 기능을 컴파일하기 위해, 컴퓨터 코드의 시퀀스로부터 시작하는 일련의 단계가 실행될 수도 있다. 호스트 디바이스 상의 또는 네트워크 인터페이스 디바이스 상의 프로세서 상에서 실행될 수도 있는 컴파일러는, 컴퓨터 코드의 분해된 시퀀스에 액세스할 수 있다.To compile a function to be implemented in a hardware module comprising multiple processing units, a series of steps may be executed starting from a sequence of computer code. A compiler, which may be running on a processor on a host device or on a network interface device, has access to the decomposed sequence of computer code.

첫째, 컴파일러는 컴퓨터 코드 명령어 시퀀스를 별개의 스테이지로 분할하도록 구성된다. 각각의 스테이지는 상기에서 설명되는 프로세싱 스테이지 템플릿 중 하나에 따른 동작을 포함할 수도 있다. 예를 들면, 하나의 스테이지는 데이터 패킷으로부터의 판독을 제공할 수도 있다. 하나의 스테이지는 맵 데이터의 업데이트를 제공할 수도 있다. 다른 스테이지는 패스 드랍 결정(pass drop)을 내릴 수도 있다. 컴파일러는 코드에 의해 표현되는 복수의 동작의 각각을 복수의 스테이지 중 하나에 할당한다.First, the compiler is configured to divide a sequence of computer code instructions into distinct stages. Each stage may include an operation according to one of the processing stage templates described above. For example, one stage may provide for reading from a data packet. Another stage may provide for updating map data. Another stage may make a pass-drop decision. The compiler assigns each of the multiple operations represented by the code to one of the multiple stages.

둘째, 컴파일러는 상이한 프로세싱 유닛에 의해 수행될 코드로부터 결정되는 프로세싱 스테이지의 각각을 할당하도록 구성된다. 이것은, 프로세싱 스테이지의 각각의 적어도 하나의 동작의 각각이 상이한 프로세싱 스테이지에 의해 실행된다는 것을 의미한다. 그 다음, 컴파일러의 출력은, 프로세싱 유닛으로 하여금, 기능을 수행하기 위해 각각의 스테이지의 동작을 특정한 순서로 수행하게 하기 위해 사용될 수 있다.Second, the compiler is configured to assign each processing stage, determined from the code, to be performed by a different processing unit. This means that at least one operation of each processing stage is executed by a different processing stage. The compiler's output can then be used to cause the processing unit to perform the operations of each stage in a specific order to perform the function.

컴파일러의 출력은, 하드웨어 모듈의 프로세싱 유닛으로 하여금 각각의 프로세싱 스테이지와 관련되는 동작을 실행하게 하기 위해 사용되는 생성된 명령어를 포함한다.The output of the compiler includes generated instructions that are used to cause the processing units of the hardware modules to execute the operations associated with each processing stage.

컴파일러의 출력은 또한, 하드웨어 모듈(610)을 구성하기 위한 제어 메시지에 응답하는 로직을 하드웨어 모듈에서 생성하기 위해 사용될 수도 있다. 그러한 제어 메시지는 도 14와 관련하여 하기에서 더욱 상세하게 설명된다.The compiler's output may also be used to generate logic in the hardware module that responds to control messages for configuring the hardware module (610). Such control messages are described in more detail below with reference to FIG. 14.

네트워크 인터페이스 디바이스(600) 상에서 실행될 기능을 컴파일하기 위한 컴파일 프로세스는, 그 기능을 제공하기 위한 프로세스가 호스트 디바이스의 커널에서의 실행에 안전하다는 것을 결정하는 것에 응답하여 수행될 수도 있다. 프로그램의 안전성의 결정은 도 3과 관련하여 상기에서 설명되는 바와 같이 적절한 검증자에 의해 실행될 수도 있다. 일단 프로세스가 커널에서의 실행에 안전한 것으로 결정되면, 프로세스는 네트워크 인터페이스 디바이스에서의 실행을 위해 컴파일될 수도 있다.The compilation process for compiling a function to be executed on the network interface device (600) may be performed in response to a determination that the process providing the function is safe for execution in the kernel of the host device. The determination of the safety of the program may be performed by an appropriate verifier, as described above with respect to FIG. 3. Once the process is determined to be safe for execution in the kernel, the process may be compiled for execution on the network interface device.

데이터 패킷과 관련하여 기능을 수행하기 위해 그들 각각의 적어도 하나의 동작을 수행하는 복수의 프로세싱 유닛 중 적어도 일부의 표현을 예시하는 도 15에 대한 참조가 이루어진다. 그러한 표현은 컴파일러에 의해 생성될 수도 있고, 기능을 수행하도록 하드웨어 모듈을 구성하기 위해 사용될 수도 있다. 표현은, 동작이 실행될 수도 있는 순서 및 프로세싱 유닛 중 일부가 그들의 동작을 병렬로 수행하는 방법을 나타낸다.Reference is made to FIG. 15, which illustrates a representation of at least some of a plurality of processing units, each of which performs at least one operation to perform a function in relation to a data packet. Such a representation may be generated by a compiler or may be used to configure a hardware module to perform the function. The representation indicates the order in which the operations may be executed and how some of the processing units perform their operations in parallel.

표현(1500)은 행과 열을 갖는 테이블의 형태이다. 테이블의 엔트리 중 일부는, 그들 각각의 동작을 수행하도록 구성되는 최소 단위, 예를 들면, 최소 단위(1510a)를 나타낸다. 프로세싱 유닛이 속하는 행은, 특정한 데이터 패킷과 관련하여 그 프로세싱 유닛에 의해 수행되는 동작의 타이밍을 나타낸다. 각각의 행은 클록 신호의 하나 이상의 사이클에 의해 표현되는 단일의 시간 기간에 대응할 수도 있다. 동일한 행에 속하는 프로세싱 유닛은 그들의 동작을 병렬로 수행한다.The representation (1500) is in the form of a table with rows and columns. Some entries in the table represent the smallest units configured to perform their respective operations, for example, the smallest unit (1510a). The row to which a processing unit belongs represents the timing of the operations performed by the processing unit in relation to a specific data packet. Each row may correspond to a single time period represented by one or more cycles of a clock signal. Processing units belonging to the same row perform their operations in parallel.

로직 스테이지에 대한 입력은, 행 0에서 제공되고, 계산은 나중의 행을 향해 순방향으로 흐른다. 디폴트로, 최소 단위는 그 자신과 동일한 열에 있는 그러나 이전 행에 있는 최소 단위에 의한 프로세싱으로부터 결과를 수신한다. 예를 들면, 최소 단위(1510b)는 최소 단위(1510a)에 의한 프로세싱으로부터 결과를 수신하고, 이들 결과에 의존하여 그 자신의 프로세싱을 수행한다.Input to the logic stage is provided in row 0, and computation flows forward toward subsequent rows. By default, the smallest unit receives results from processing by the smallest unit in the same column as itself but in the previous row. For example, the smallest unit (1510b) receives results from processing by the smallest unit (1510a) and performs its own processing based on these results.

로컬 라우팅 리소스를 사용하는 경우, 최소 단위는, 열 번호가 두 개 이하만큼 상이한 이전 행에 있는 최소 단위로부터의 출력에 또한 액세스할 수도 있다. 예를 들면, 최소 단위(1510d)는 최소 단위(1510c)에 의해 수행되는 프로세싱으로부터 결과를 수신할 수도 있다.When using local routing resources, the minimum unit may also access output from a minimum unit in a previous row whose column number differs by no more than two. For example, the minimum unit (1510d) may receive results from processing performed by the minimum unit (1510c).

글로벌 라우팅 리소스를 사용하는 경우, 최소 단위는 이전 두 개의 행에 있는 그리고 임의의 열에 있는 최소 단위로부터의 출력에 또한 액세스할 수도 있다. 이것은 글로벌 라우팅 리소스를 사용하여 수행될 수도 있다. 예를 들면, 최소 단위(1510f)는 최소 단위(1510e)에 의해 수행되는 프로세싱으로부터 결과를 수신할 수도 있다.When using global routing resources, the minimum unit can also access output from the minimum units in the previous two rows and any column. This can be accomplished using global routing resources. For example, the minimum unit (1510f) may receive results from processing performed by the minimum unit (1510e).

최소 단위 사이의 라우팅에 관한 이들 제약은 예로서 주어지며 다른 제약이 적용될 수도 있다. 더욱 제한적인 구속(restraint)을 적용하는 것은, 최소 단위 사이의 정보의 라우팅을 더 쉽게 만들 수도 있다. 덜 제한적인 구속을 적용하는 것은, 스케줄링을 더 쉽게 만들 수도 있다. 주어진 타입(예를 들면, 맵, 로직 또는 패킷 액세스)의 최소 단위의 수가 소진되거나 또는 최소 단위 사이의 라우팅이 이루어질 수 없는 경우, 그러면, 하드웨어 모듈로의 기능의 컴파일은 실패할 것이다.These constraints on routing between atomic units are provided as examples; other constraints may apply. Applying more restrictive constraints may facilitate the routing of information between atomic units. Applying less restrictive constraints may facilitate scheduling. If the number of atomic units of a given type (e.g., map, logic, or packet access) is exhausted or routing between atomic units is impossible, compilation of the function into the hardware module will fail.

특정한 제약은, 하드웨어 모듈에 의해 지원되는 상호 접속 패브릭에 의해 지원되는 토폴로지에 의해 결정된다. 상호 접속 패브릭은, 하드웨어 모듈의 최소 단위로 하여금 그들의 동작을 특정한 순서로 실행하게 하도록 그리고 제약 조건 내에서 서로 사이에서 데이터를 제공하도록 프로그래밍된다. 도 15는 상호 접속 패브릭이 어떻게 그렇게 프로그래밍될 수도 있는지의 하나의 특정한 예를 도시한다.Specific constraints are determined by the topology supported by the interconnection fabric supported by the hardware modules. The interconnection fabric is programmed to cause the smallest units of hardware modules to execute their operations in a specific order and to provide data to each other within the constraints. Figure 15 illustrates one specific example of how the interconnection fabric may be programmed.

(도 5에서 예시되는 바와 같은) FPGA 상으로의 FPGA 애플리케이션(515)의 합성 동안 배치 및 루트 알고리즘이 사용된다. 그러나, 이 경우, 솔루션 공간이 제한되고, 따라서, 알고리즘은 짧은 경계의 실행 시간을 갖는다.A place-and-route algorithm is used during the synthesis of an FPGA application (515) onto an FPGA (as exemplified in FIG. 5). However, in this case, the solution space is limited, and therefore, the algorithm has a short bounded execution time.

프로세싱 속도 또는 효율성과 컴파일 시간 사이에는 트레이드오프(trade-off)가 존재한다. 본 출원의 실시형태에 따르면, 수신된 데이터 패킷과 관련하여 기능을 제공하기 위한 적어도 하나의 프로세싱 유닛(이것은 도 6과 관련하여 상기에서 설명되는 바와 같이 최소 단위 또는 CPU일 수도 있음) 상에서 프로그램을 초기에 컴파일하고 실행하는 것이 바람직할 수도 있다. 그 다음, 적어도 하나의 프로세싱 유닛은 제1 시간 기간 동안 수신된 데이터 패킷과 관련하여 기능을 실행하고 수행할 수도 있다. 네트워크 인터페이스 디바이스의 동작 동안, 데이터 패킷과 관련하여 기능을 수행하기 위해 제2의 적어도 하나의 프로세싱 유닛(이것은 도 6과 관련하여 상기에서 설명되는 바와 같이 FPGA 애플리케이션 또는 템플릿 타입의 프로세싱 유닛일 수도 있음)이 구성될 수도 있다. 그 다음, 제2의 적어도 하나의 프로세싱 유닛이 네트워크 인터페이스 디바이스에서 후속하는 수신된 데이터 패킷에 대한 기능을 수행하도록, 기능은, 그 다음, 제1의 적어도 하나의 프로세싱 유닛으로부터 제2의 적어도 하나의 프로세싱 유닛으로 마이그레이션될 수 있다. 따라서, 제2의 적어도 하나의 프로세싱 유닛의 더 느린 컴파일 시간은, 기능이 제2의 적어도 하나의 프로세싱 유닛에 대해 컴파일 되기 이전에, 네트워크 인터페이스 디바이스가 데이터 패킷과 관련하여 기능을 수행하는 것을 방지하지 못하는데, 그 이유는, 제1의 적어도 하나의 프로세싱 유닛이 더 빨리 컴파일될 수 있고, 기능이 제2의 적어도 하나의 프로세싱 유닛에 대해 컴파일되는 동안 데이터 패킷과 관련하여 기능을 수행하기 위해 사용될 수 있기 때문이다. 제2의 적어도 하나의 프로세싱 유닛이 통상적으로 더 빠른 프로세싱 시간을 가지기 때문에, 컴파일될 때 제2의 적어도 하나의 프로세싱 유닛으로 이동하는 것은 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷의 더 빠른 프로세싱을 허용한다.There is a trade-off between processing speed or efficiency and compilation time. According to embodiments of the present application, it may be desirable to initially compile and execute a program on at least one processing unit (which may be a minimal unit or a CPU, as described above in connection with FIG. 6 ) for providing a function in relation to a received data packet. The at least one processing unit may then execute and perform the function in relation to the received data packet during a first period of time. During operation of the network interface device, a second at least one processing unit (which may be an FPGA application or template type processing unit, as described above in connection with FIG. 6 ) may be configured to perform the function in relation to the data packet. The function may then be migrated from the first at least one processing unit to the second at least one processing unit, such that the second at least one processing unit performs the function for subsequent received data packets in the network interface device. Thus, the slower compilation time of the second at least one processing unit does not prevent the network interface device from performing the function with respect to the data packet before the function is compiled for the second at least one processing unit, because the first at least one processing unit may be compiled faster and may be used to perform the function with respect to the data packet while the function is being compiled for the second at least one processing unit. Since the second at least one processing unit typically has a faster processing time, moving the compilation time to the second at least one processing unit allows for faster processing of the data packet received at the network interface device.

본 출원의 실시형태에 따르면, 컴파일 프로세스는 데이터 프로세싱 시스템의 적어도 하나의 프로세서 상에서 실행되도록 구성될 수도 있는데, 적어도 하나의 프로세서는, 적절한 시간에 데이터 패킷과 관련하여 적어도 하나의 기능을 수행하기 위해, 제1의 적어도 하나의 프로세싱 유닛 및 제2의 적어도 하나의 프로세싱 유닛에 대한 명령어를 전송하도록 구성된다. 적어도 하나의 프로세서는 호스트 CPU를 포함할 수도 있다. 적어도 하나의 프로세서는 네트워크 인터페이스 디바이스 상에서 제어 프로세서를 포함할 수도 있다. 적어도 하나의 프로세서는 호스트 시스템 상의 하나 이상의 프로세서와 네트워크 인터페이스 디바이스 상의 하나 이상의 프로세서의 조합을 포함할 수도 있다.According to an embodiment of the present application, the compilation process may be configured to execute on at least one processor of a data processing system, wherein the at least one processor is configured to transmit instructions to at least one first processing unit and at least one second processing unit to perform at least one function in relation to a data packet at an appropriate time. The at least one processor may comprise a host CPU. The at least one processor may comprise a control processor on a network interface device. The at least one processor may comprise a combination of one or more processors on a host system and one or more processors on a network interface device.

따라서, 적어도 하나의 프로세서는 네트워크 인터페이스 디바이스의 제1의 적어도 하나의 프로세싱 유닛에 의해 수행될 기능을 컴파일하기 위해 제1 컴파일 프로세스를 수행하도록 구성된다. 적어도 하나의 프로세싱 유닛은 또한, 네트워크 인터페이스 디바이스의 제2의 적어도 하나의 프로세싱 유닛에 의해 수행될 기능을 컴파일하기 위해 제2 컴파일 프로세스를 수행하도록 구성된다. 제2 컴파일 프로세스의 완료 이전에, 적어도 하나의 프로세싱 유닛은 네트워크로부터 수신되는 데이터 패킷과 관련하여 기능을 수행할 것을 제1의 적어도 하나의 프로세싱 유닛에 지시한다. 후속하여, 제2 컴파일 프로세스의 완료에 후속하여, 적어도 하나의 프로세싱 유닛은 네트워크로부터 수신되는 데이터 패킷과 관련한 기능의 수행을 시작할 것을 제2의 적어도 하나의 프로세싱 유닛에 지시한다.Accordingly, at least one processor is configured to perform a first compilation process to compile a function to be performed by a first at least one processing unit of the network interface device. The at least one processing unit is further configured to perform a second compilation process to compile a function to be performed by a second at least one processing unit of the network interface device. Prior to completion of the second compilation process, the at least one processing unit instructs the first at least one processing unit to perform a function in relation to a data packet received from the network. Subsequently, following completion of the second compilation process, the at least one processing unit instructs the second at least one processing unit to begin performing the function in relation to the data packet received from the network.

이들 단계를 수행하는 것은, 제2 컴파일 프로세스가 완료되기를 대기하는 동안, 네트워크 인터페이스 디바이스가 제1의 적어도 하나의 프로세싱 유닛(이것은 더 짧은 컴파일 시간을 가질 수도 있지만 그러나 더 느린 및/또는 덜 효율적인 프로세싱을 가질 수도 있음)를 사용하여 기능을 수행하는 것을 가능하게 한다. 제2 컴파일 프로세스가 완료되면, 네트워크 인터페이스 디바이스는, 그 다음, 제1의 적어도 하나의 프로세싱 유닛에 더하여 또는 그 대신, 제2의 적어도 하나의 프로세싱 유닛(이것은 더 긴 컴파일 시간을 가질 수도 있지만 그러나 더 빠른 및/또는 더 효율적인 프로세싱을 가질 수도 있음)를 사용하여 기능을 수행할 수도 있다.Performing these steps enables the network interface device to perform the function using the first at least one processing unit (which may have a shorter compilation time but may also have slower and/or less efficient processing) while waiting for the second compilation process to complete. Once the second compilation process is complete, the network interface device may then perform the function using the second at least one processing unit (which may have a longer compilation time but may also have faster and/or more efficient processing), in addition to or instead of the first at least one processing unit.

본 출원의 실시형태에 따른 예시적인 네트워크 인터페이스 디바이스(700)를 예시하는 도 7에 대한 참조가 이루어진다. 이전 도면에서 도시되는 것들과 동일한 참조 엘리먼트는 동일한 참조 번호를 사용하여 나타내어진다.Reference is made to FIG. 7, which illustrates an exemplary network interface device (700) according to an embodiment of the present application. The same reference elements as those depicted in the previous drawings are indicated using the same reference numbers.

네트워크 인터페이스 디바이스는 제1의 적어도 하나의 프로세싱 유닛(710)을 포함한다. 제1의 적어도 하나의 프로세싱 유닛(710)은, 복수의 프로세싱 유닛을 포함하는 도 6에서 도시되는 하드웨어 모듈(610)을 포함할 수도 있다. 제1의 적어도 하나의 프로세싱 유닛(710)은, 도 4에서 도시되는 바와 같이, 하나 이상의 CPU를 포함할 수도 있다.The network interface device includes at least one first processing unit (710). The at least one first processing unit (710) may include a hardware module (610) as illustrated in FIG. 6, which includes a plurality of processing units. The at least one first processing unit (710) may include one or more CPUs, as illustrated in FIG. 4.

제1 시간 기간 동안, 네트워크로부터 수신되는 데이터 패킷과 관련하여 제1의 적어도 하나의 프로세싱 유닛(710)에 의해 기능이 수행되도록, 기능은 제1의 적어도 하나의 프로세싱 유닛(710) 상에서 실행되도록 컴파일된다. 제1의 적어도 하나의 프로세싱 유닛(710)은, 제2의 적어도 하나의 프로세싱 유닛에 대한 제2 컴파일 프로세스의 완료 이전에, 네트워크로부터 수신되는 데이터 패킷과 관련하여 기능을 수행하도록 적어도 하나의 프로세서에 의해 지시받는다.During a first time period, the function is compiled to be executed on the first at least one processing unit (710) such that the function is performed by the first at least one processing unit (710) in relation to a data packet received from the network. The first at least one processing unit (710) is instructed by the at least one processor to perform the function in relation to the data packet received from the network prior to completion of a second compilation process for the second at least one processing unit.

네트워크 인터페이스 디바이스는 제2의 적어도 하나의 프로세싱 유닛(720)을 포함한다. 제2의 적어도 하나의 프로세싱 유닛(720)은 (도 5에서 예시되는 바와 같은) FPGA 애플리케이션을 구비하는 FPGA를 포함할 수도 있거나 또는 복수의 프로세싱 유닛을 포함하는 도 6에서 도시되는 하드웨어 모듈(610)을 포함할 수도 있다.The network interface device comprises at least one second processing unit (720). The at least one second processing unit (720) may comprise an FPGA having an FPGA application (as illustrated in FIG. 5) or may comprise a hardware module (610) as illustrated in FIG. 6 comprising a plurality of processing units.

제1 시간 기간 동안, 제2 컴파일 프로세스는 제2의 적어도 하나의 프로세싱 유닛 상에서의 실행을 위한 기능을 컴파일하기 위해 실행된다. 즉, 네트워크 인터페이스 디바이스는 FPGA 애플리케이션(515)을 즉석에서 컴파일하도록 구성된다.During the first time period, a second compilation process is executed to compile the function for execution on the second at least one processing unit. That is, the network interface device is configured to compile the FPGA application (515) on the fly.

제1 시간 기간에 후속하여(즉, 제2 컴파일 프로세스의 완료에 후속하여), 제2의 적어도 하나의 프로세싱 유닛(720)은 네트워크로부터 수신되는 데이터 패킷과 관련한 기능의 수행을 시작하도록 구성된다.Subsequent to the first time period (i.e., subsequent to completion of the second compilation process), the second at least one processing unit (720) is configured to begin performing a function related to a data packet received from the network.

제1 시간 기간에 후속하여, 제1의 적어도 하나의 프로세싱 유닛(710)은 네트워크로부터 수신되는 데이터 패킷과 관련한 기능의 수행을 중지할 수도 있다. 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛(710)은 데이터 패킷과 관련한 기능의 수행을, 부분적으로, 중지할 수도 있다. 예를 들면, 제1의 적어도 하나의 프로세싱 유닛이 복수의 CPU를 포함하는 경우, 제1 시간 기간에 후속하여, CPU 중 하나 이상은 네트워크로부터 수신되는 데이터 패킷과 관련하여 프로세싱의 수행을 중지할 수도 있는데, 복수의 CPU 중 나머지 CPU는 계속해서 프로세싱을 수행한다.Following the first time period, the first at least one processing unit (710) may stop performing a function related to the data packet received from the network. In some embodiments, the first at least one processing unit (710) may partially stop performing a function related to the data packet. For example, if the first at least one processing unit includes multiple CPUs, following the first time period, one or more of the CPUs may stop performing processing related to the data packet received from the network, while the remaining CPUs of the multiple CPUs continue to perform processing.

제1의 적어도 하나의 프로세싱 유닛(710)은 제1 데이터 플로우의 데이터 패킷과 관련하여 기능을 수행하도록 구성될 수도 있다. 제2 컴파일 프로세스가 완료되면, 제2의 적어도 하나의 프로세싱 유닛(720)은 제1 데이터 플로우의 데이터 패킷과 관련한 기능의 수행을 시작할 수도 있다. 제2 컴파일 프로세스가 완료되면, 제1의 적어도 하나의 프로세싱 유닛은 제1 데이터 플로우의 데이터 패킷과 관련한 기능의 수행을 중지할 수도 있다.The first at least one processing unit (710) may be configured to perform a function related to the data packets of the first data flow. Upon completion of the second compilation process, the second at least one processing unit (720) may begin performing a function related to the data packets of the first data flow. Upon completion of the second compilation process, the first at least one processing unit may stop performing a function related to the data packets of the first data flow.

제1의 적어도 하나의 프로세싱 유닛 및 제2의 적어도 하나의 프로세싱 유닛에 대해 상이한 조합도 가능하다. 예를 들면, 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛(710)은 (도 4에서 예시되는 바와 같이) 복수의 CPU를 포함하고, 한편 제2의 적어도 하나의 프로세싱 유닛(720)은 (도 6에서 예시되는 바와 같이) 복수의 프로세싱 유닛을 구비하는 하드웨어 모듈을 포함한다. 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛(710)은 (도 4에서 예시되는 바와 같이) 복수의 CPU를 포함하고, 한편, 제2의 적어도 하나의 프로세싱 유닛(720)은 (도 5에서 예시되는 바와 같이) FPGA를 포함한다. 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛(710)은 (도 6에서 예시되는 바와 같이) 복수의 프로세싱 유닛을 구비하는 하드웨어 모듈을 포함하고, 한편, 제2의 적어도 하나의 프로세싱 유닛(720)은 (도 5에서 예시되는 바와 같이) FPGA를 포함한다.Different combinations of the first at least one processing unit and the second at least one processing unit are also possible. For example, in some embodiments, the first at least one processing unit (710) comprises multiple CPUs (as illustrated in FIG. 4 ), while the second at least one processing unit (720) comprises a hardware module comprising multiple processing units (as illustrated in FIG. 6 ). In some embodiments, the first at least one processing unit (710) comprises multiple CPUs (as illustrated in FIG. 4 ), while the second at least one processing unit (720) comprises an FPGA (as illustrated in FIG. 5 ). In some embodiments, the first at least one processing unit (710) comprises a hardware module comprising multiple processing units (as illustrated in FIG. 6 ), while the second at least one processing unit (720) comprises an FPGA (as illustrated in FIG. 5 ).

연결된 복수의 프로세싱 유닛(640a, 640b, 640d)이 데이터 패킷과 관련하여 자신의 각각의 적어도 하나의 동작을 수행할 수도 있는 방법을 예시하는 도 11에 대한 참조가 이루어진다. 프로세싱 유닛의 각각은 수신된 데이터 패킷과 관련하여 자신의 각각의 적어도 하나의 동작을 수행하도록 구성된다.Reference is made to FIG. 11, which illustrates how a plurality of connected processing units (640a, 640b, 640d) may perform at least one of their respective operations with respect to a data packet. Each of the processing units is configured to perform at least one of its respective operations with respect to a received data packet.

각각의 프로세싱 유닛의 적어도 하나의 동작은 기능(예를 들면, eBPF 프로그램의 기능)에서의 로직 스테이지를 나타낼 수도 있다. 각각의 프로세싱 유닛의 적어도 하나의 동작은 프로세싱 유닛에 의해 실행되는 명령어에 의해 표현 가능할 수도 있다. 명령어는 최소 단위의 거동을 결정할 수도 있다.At least one operation of each processing unit may represent a logic stage in a function (e.g., a function of an eBPF program). At least one operation of each processing unit may be expressed by an instruction executed by the processing unit. The instruction may determine the behavior of the atomic unit.

도 11은, 패킷(P₀)이 각각의 프로세싱 유닛에 의해 구현되는 프로세싱 스테이지를 따라 진행되는 방법을 예시한다.Figure 11 illustrates how a packet (P ₀ ) progresses through the processing stages implemented by each processing unit.

각각의 프로세싱 유닛은 컴파일러에 의해 명시되는 특정한 순서로 패킷과 관련하여 프로세싱을 수행한다. 순서는, 프로세싱 유닛 중 일부가 그들의 프로세싱을 병렬로 수행하도록 구성되도록 하는 그러한 것일 수도 있다. 이 프로세싱은 메모리에서 유지되는 패킷의 적어도 일부에 액세스하는 것을 포함할 수도 있다. 추가적으로 또는 대안적으로, 이 프로세싱은 패킷에 대해 수행될 액션을 결정하기 위해 룩업 테이블에 대한 룩업을 수행하는 것을 포함할 수도 있다. 추가적으로 또는 대안적으로, 이 프로세싱은 상태(1110)를 수정하는 것을 포함할 수도 있다.Each processing unit performs processing on the packet in a specific order specified by the compiler. The order may be such that some of the processing units are configured to perform their processing in parallel. This processing may include accessing at least a portion of the packet maintained in memory. Additionally or alternatively, this processing may include performing a lookup on a lookup table to determine the action to be performed on the packet. Additionally or alternatively, this processing may include modifying state (1110).

프로세싱 유닛은 메타데이터(M₀, M₁ M₂, M₃)를 서로 교환한다. 제1 프로세싱 유닛(640a)은, 자신의 각각의 적어도 하나의 미리 정의된 동작을 수행하도록 그리고 응답으로 메타데이터(M₁)를 생성하도록 구성된다. 제1 프로세싱 유닛(640a)은 메타데이터(M₁)를 제2 프로세싱 유닛(640b)으로 전달하도록 구성된다.The processing units exchange metadata (M ₀ , M ₁ M ₂ , M ₃ ) with each other. The first processing unit (640a) is configured to perform at least one predefined operation of each of itself and generate metadata (M ₁ ) in response. The first processing unit (640a) is configured to transfer the metadata (M ₁ ) to the second processing unit (640b).

프로세싱 유닛 중 적어도 일부는 다음의 것 중 적어도 하나에 의존하여 그들 각각의 적어도 하나의 동작을 수행한다: 데이터 패킷의 콘텐츠, 자기 자신의 저장된 상태, 글로벌 공유된 상태, 및 데이터 패킷과 관련되는 메타데이터(예를 들면, M₀, M₁, M₂, M₃). 프로세싱 유닛 중 일부는 상태 비보존형일 수도 있다.At least some of the processing units perform at least one operation of each of them depending on at least one of the following: the contents of the data packet, its own stored state, global shared state, and metadata associated with the data packet (e.g., M ₀ , M ₁ , M ₂ , M ₃ ). Some of the processing units may be stateless.

프로세싱 유닛의 각각은 적어도 하나의 클록 사이클 동안 데이터 패킷(P₀)에 대해 자신의 관련된 타입의 동작을 수행할 수도 있다. 몇몇 실시형태에서, 프로세싱 유닛의 각각은 단일의 클록 사이클 동안 자신의 관련된 타입의 동작을 수행할 수도 있다. 프로세싱 유닛의 각각은 그들의 동작을 수행하기 위해 개별적으로 클록킹될 수도 있다. 이 클로킹은 프로세싱 유닛의 프로세싱 파이프라인의 클로킹에 추가될 수도 있다.Each of the processing units may perform its associated type of operation on the data packet (P ₀ ) during at least one clock cycle. In some embodiments, each of the processing units may perform its associated type of operation during a single clock cycle. Each of the processing units may be individually clocked to perform their operations. This clocking may be in addition to the clocking of the processing pipeline of the processing unit.

제2 프로세싱 유닛(640b)의 동작을 더욱 자세히 살펴보면, 제2 프로세싱 유닛(640b)은 제1 데이터 패킷과 관련하여 제1의 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제1 프로세싱 유닛(640a)에 연결되도록 구성된다. 제2 프로세싱 유닛(640b)은, 제1 추가적인 프로세싱 유닛으로부터, 제1의 적어도 하나의 미리 정의된 동작의 결과를 수신하도록 구성된다. 제2 프로세싱 유닛(640b)은 제1의 적어도 하나의 미리 정의된 동작의 결과에 의존하여 제2의 적어도 하나의 미리 정의된 동작을 수행하도록 구성된다. 제2 프로세싱 유닛(640b)은, 제1 데이터 패킷과 관련하여 제3의 적어도 하나의 미리 정의된 동작을 수행하도록 구성되는 제3 프로세싱 유닛(640d)에 연결되도록 구성된다. 제2 프로세싱 유닛(640b)은 제3의 적어도 하나의 미리 정의된 동작에서의 프로세싱을 위해 제2의 적어도 하나의 미리 정의된 동작의 결과를 제3 프로세싱 유닛(640d)으로 전송하도록 구성된다.Looking into the operation of the second processing unit (640b) in more detail, the second processing unit (640b) is configured to be connected to a first processing unit (640a) configured to perform at least one first predefined operation in relation to a first data packet. The second processing unit (640b) is configured to receive, from a first additional processing unit, a result of the first at least one predefined operation. The second processing unit (640b) is configured to perform at least one second predefined operation depending on the result of the first at least one predefined operation. The second processing unit (640b) is configured to be connected to a third processing unit (640d) configured to perform at least one third predefined operation in relation to the first data packet. The second processing unit (640b) is configured to transmit the result of the second at least one predefined operation to the third processing unit (640d) for processing in the third at least one predefined operation.

프로세싱 유닛은 복수의 데이터 패킷의 각각과 관련하여 기능을 제공하기 위해 유사하게 동작할 수도 있다.The processing unit may operate similarly to provide functionality in relation to each of the multiple data packets.

본 출원의 실시형태는, 기능이 허용하는 경우 다수의 패킷이 동시에 파이프라인화될(pipelined) 수도 있도록 하는 그러한 것이다.Embodiments of the present application are such that multiple packets may be pipelined simultaneously, if functionality permits.

데이터 패킷의 파이프라인화(pipelining)를 예시하는 도 12에 대한 참조가 이루어진다. 도시되는 바와 같이, 상이한 프로세싱 유닛에 의해 상이한 패킷이 동시에 프로세싱될 수도 있다. 제1 프로세싱 유닛(640a)은 제3 데이터 패킷(P₂)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다. 제2 프로세싱 유닛(640b)은 제2 데이터 패킷(P₁)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다. 제3 프로세싱 유닛(640d)은 제1 데이터 패킷(P₀)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다.Reference is made to FIG. 12, which illustrates pipelining of data packets. As illustrated, different packets may be processed concurrently by different processing units. A first processing unit (640a) is executing at least one of its respective operations at a first time (t ₀ ) in relation to a third data packet (P ₂ ). A second processing unit (640b) is executing at least one of its respective operations at a first time (t ₀ ) in relation to a second data packet (P ₁ ). A third processing unit (640d) is executing at least one of its respective operations at a first time (t ₀ ) in relation to a first data packet (P ₀ ).

각각의 적어도 하나의 동작이 프로세싱 유닛의 각각에 의해 실행된 이후, 패킷의 각각은 시퀀스에서 하나의 스테이지를 따라 이동한다. 예를 들면, 후속하는 제2 시간(t₁)에서, 제1 프로세싱 유닛(640a)은 제4 데이터 패킷(P₃)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다. 제2 프로세싱 유닛(640b)은 제3 데이터 패킷(P₂)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다. 제3 프로세싱 유닛(640d)은 제1 데이터 패킷(P₁)과 관련하여 제1 시간(t₀)에 자신의 각각의 적어도 하나의 동작을 실행하고 있다.After each of the at least one operation is executed by each of the processing units, each of the packets moves along one stage in the sequence. For example, at a subsequent second time (t ₁ ), the first processing unit (640a) is executing its respective at least one operation at the first time (t ₀ ) in relation to the fourth data packet (P ₃ ). The second processing unit (640b) is executing its respective at least one operation at the first time (t ₀ ) in relation to the third data packet (P 2 ) _{. The third processing unit (640d) is executing its respective at least one operation at the first time (t 0} ₎ in relation to the first data packet (P ₁ ).

몇몇 실시형태에서, 주어진 스테이지에서 존재할 복수의 패킷이 있을 수도 있다는 것이 인식되어야 한다.It should be recognized that in some embodiments, there may be multiple packets present at a given stage.

몇몇 실시형태에서, 패킷이 하나의 스테이지로부터 다음의 스테이지로 이동할 수도 있지만, 반드시 잠금 단계에 있는 것은 아니다.In some embodiments, packets may move from one stage to the next, but not necessarily in a locked state.

파이프라인 위험이 없는 한, 고정된 클록에서 동작하는 그러한 파이프라인은 일정한 대역폭을 가질 수도 있다. 이것은 시스템에서의 지터를 감소시킬 수도 있다.As long as there are no pipeline hazards, such pipelines operating at a fixed clock may have a constant bandwidth. This may reduce jitter in the system.

명령어를 실행할 때 위험(예컨대 공유된 상태에 액세스하는 경우의 충돌)을 방지하기 위해, 프로세싱 유닛의 각각은, 필요로 되는 경우, 동작 없음(동작 없음)(즉, 프로세싱 유닛이 스톨함) 명령어를 실행하도록 구성될 수도 있다.To avoid hazards when executing instructions (e.g., conflicts when accessing shared state), each of the processing units may be configured to execute a no-action (no-action) instruction (i.e., the processing unit stalls) when necessary.

몇몇 실시형태에서, 연산(예컨대 단순 산술, 증분, 상수 값의 가산/감산, 시프트, 데이터 패킷으로부터의 또는 메타데이터로부터의 값 가산/감산)은 프로세싱 유닛에 의해 하나의 클록 사이클이 실행되는 것을 필요로 한다. 이것은, 한 프로세싱 유닛에 의해 필요로 되는 공유된 상태에서의 값이 다른 프로세싱 유닛에 의해 아직 업데이트되지 않았다는 것을 의미할 수 있다. 따라서, 공유된 상태(1110)에 있는 오래된(out of date) 값은 그들을 필요로 하는 프로세싱 유닛에 의해 판독될 수도 있다. 따라서, 공유된 상태에 값을 판독하고 기록할 때 위험이 발생할 수도 있다. 다른 한편으로는, 중간 값에 대한 동작은, 위험 발생 없이 메타데이터로서 통과될 수도 있다.In some embodiments, operations (e.g., simple arithmetic, increments, additions/subtractions of constant values, shifts, additions/subtractions of values from data packets or from metadata) require one clock cycle to be executed by a processing unit. This may mean that a value in shared state needed by one processing unit has not yet been updated by another processing unit. Thus, out-of-date values in shared state (1110) may be read by the processing unit that needs them. Therefore, risks may arise when reading and writing values to shared state. On the other hand, operations on intermediate values may be passed through as metadata without risk.

방지될 수도 있는 공유된 상태(1110)에 대한 판독 및 기록시의 위험의 예는 증분 동작의 맥락에서 주어질 수 있다. 그러한 증분 동작은, 공유된 상태(1110)에서 패킷 카운터를 증분시키는 동작일 수도 있다. 증분 연산의 하나의 구현예에서, 파이프라인의 제1 시간 슬롯 동안, 제2 프로세싱 유닛(640b)은 공유된 상태(1110)로부터 카운터의 값을 판독하도록, 그리고 이 판독 동작의 출력을 (예를 들면, 메타데이터(M₂)로서) 제3 프로세싱 유닛(640d)으로 제공하도록 구성된다. 제3 프로세싱 유닛(640d)은 제2 프로세싱 유닛(640b)으로부터 카운터의 값을 수신하도록 구성된다. 제2 시간 슬롯 동안, 제3 프로세싱 유닛(640d)은 이 값을 증분시키고, 새로 증분된 값을 공유된 상태(1110)에 기록한다.An example of a risk in reading and writing to shared state (1110) that may be prevented can be given in the context of an incremental operation. Such an incremental operation may be an operation that increments a packet counter in the shared state (1110). In one implementation of the incremental operation, during a first time slot of the pipeline, the second processing unit (640b) is configured to read the value of the counter from the shared state (1110) and provide the output of this read operation (e.g., as metadata (M ₂ )) to a third processing unit (640d). The third processing unit (640d) is configured to receive the value of the counter from the second processing unit (640b). During the second time slot, the third processing unit (640d) increments this value and writes the newly incremented value to the shared state (1110).

그러한 증분 동작을 수행할 때 문제가 발생할 수도 있는데, 그 문제는, 제2 시간 슬롯 동안, 제2 프로세싱 유닛(640b)이 공유된 상태(1110)에서 저장되는 카운터에 액세스 하려고 시도하는 경우, 제2 프로세싱 유닛(640b)은, 공유된 상태(1110)에 있는 카운터 값이 제3 프로세싱 유닛(640d)에 의해 업데이트되기 이전에, 카운터의 이전 값을 판독할 수도 있다는 것이다.A problem may arise when performing such an incremental operation, that is, if during the second time slot the second processing unit (640b) attempts to access the counter stored in the shared state (1110), the second processing unit (640b) may read the previous value of the counter before the counter value in the shared state (1110) is updated by the third processing unit (640d).

따라서, 이러한 문제를 해결하기 위해, 제2 프로세싱 유닛(640b)은 (동작 없음 명령어 또는 파이프라인 버블의 제2 프로세싱 유닛(640b)에 의한 실행을 통해) 제2 시간 슬롯 동안 스톨될 수도 있다. 스톨은 다음 번 명령어의 실행에서 지연인 것으로 이해될 수도 있다. 이 지연은 다음 번 명령어 대신 "동작 없음" 명령어의 실행에 의해 구현될 수도 있다. 그 다음, 제2 프로세싱 유닛(640b)은, 후속하는 제3 시간 슬롯 동안 공유된 상태(1110)로부터 카운터 값을 판독한다. 제3 시간 슬롯 동안, 공유된 상태(1110)에서의 카운터는 업데이트되었고, 따라서, 제2 프로세싱 유닛(640b)이 업데이트된 값을 판독한다는 것이 보장된다.Therefore, to address this issue, the second processing unit (640b) may stall during the second time slot (via execution by the second processing unit (640b) of a no-action instruction or a pipeline bubble). A stall may be understood as a delay in the execution of the next instruction. This delay may be implemented by executing the "no-action" instruction instead of the next instruction. The second processing unit (640b) then reads the counter value from the shared state (1110) during the subsequent third time slot. During the third time slot, the counter in the shared state (1110) has been updated, and thus, it is guaranteed that the second processing unit (640b) reads the updated value.

몇몇 실시형태에서, 각각의 최소 단위는, 단일의 파이프라인 시간 슬롯 동안, 상태로부터 판독하도록, 상태를 업데이트하도록 그리고 업데이트된 상태를 기록하도록 구성된다. 이 경우, 상기에서 설명되는 프로세싱 유닛의 스톨링(stalling)은 사용되지 않을 수도 있다. 그러나, 프로세싱 유닛을 스톨링하는 것은 요구되는 메모리 인터페이스 비용을 감소시킬 수도 있다.In some embodiments, each minimal unit is configured to read from a state, update the state, and write the updated state during a single pipeline time slot. In this case, the stalling of processing units described above may not be used. However, stalling processing units may reduce the required memory interface cost.

몇몇 실시형태에서, 위험을 방지하기 위해, 파이프라인에서의 프로세싱 유닛은 그들 자신의 동작을 수행하기 이전에, 파이프라인에서의 다른 프로세싱 유닛이 그들의 프로세싱을 완료할 때까지 대기할 수도 있다.In some embodiments, to avoid hazards, processing units in a pipeline may wait until other processing units in the pipeline have completed their processing before performing their own operations.

언급되는 바와 같이, 컴파일러는 임의적인 수의 프로세싱 스테이지 인스턴스를 링크하는 것에 의해 데이터 경로를 구축하는데, 여기서 각각의 인스턴스는 미리 정의된 수(주어진 예에서는 세 개)의 미리 합성된 프로세싱 스테이지 템플릿 중 하나로부터 구축된다. 프로세싱 스테이지 템플릿은 로직 스테이지 템플릿(예를 들면, 레지스터, 스크래치 패드 메모리, 및 메타데이터에 걸친 산술 연산을 제공함), 패킷 액세스 상태 템플릿(예를 들면, 패킷 데이터 로드 및/또는 패킷 데이터 저장소를 제공함), 및 맵 액세스 스테이지 템플릿(예를 들면, 맵 룩업 알고리즘, 맵 테이블 사이즈)이다.As mentioned, the compiler builds the data path by linking an arbitrary number of processing stage instances, each of which is built from one of a predefined number (three in the given example) of pre-synthesized processing stage templates. The processing stage templates are logic stage templates (e.g., providing arithmetic operations across registers, scratch pad memory, and metadata), packet access state templates (e.g., providing packet data loads and/or packet data stores), and map access stage templates (e.g., providing map lookup algorithms, map table sizes).

각각의 프로세싱 스테이지 인스턴스는 프로세싱 유닛 중 단일의 하나에 의해 구현될 수도 있다. 즉, 각각의 프로세싱 스테이지는 프로세싱 유닛에 의해 실행되는 각각의 적어도 하나의 동작을 포함한다.Each processing stage instance may be implemented by a single one of the processing units, i.e., each processing stage includes at least one operation each executed by the processing unit.

도 13은 수신된 데이터 패킷을 프로세싱하기 위해 프로세싱 스테이지가 파이프라인(1300)에서 함께 연결될 수도 있는 방법의 예를 예시한다. 도 13에서 도시되는 바와 같이, 제1 데이터 패킷이 수신되어 FIFO(1305)에서 저장된다. 하나 이상의 호출 인수(calling argument)가 제1 로직 스테이지(1310)에서 수신된다. 호출 인수는 수신된 데이터 패킷에 대해 실행될 기능을 식별하는 프로그램 선택기(program selector)를 포함할 수도 있다. 호출 인수는 수신된 데이터 패킷의 패킷 길이의 표시를 포함할 수도 있다. 제1 로직 스테이지(1310)는 호출 인수를 프로세싱하도록 그리고 제1 패킷 액세스 스테이지(1315)에 출력을 제공하도록 구성된다.Figure 13 illustrates an example of how processing stages may be linked together in a pipeline (1300) to process a received data packet. As illustrated in Figure 13, a first data packet is received and stored in a FIFO (1305). One or more calling arguments are received by a first logic stage (1310). The calling arguments may include a program selector that identifies a function to be executed on the received data packet. The calling arguments may also include an indication of the packet length of the received data packet. The first logic stage (1310) is configured to process the calling arguments and provide output to the first packet access stage (1315).

제1 패킷 액세스 스테이지(1315)는 네트워크 탭(network tap; 1320)에서 제1 패킷으로부터 데이터를 로딩한다. 제1 패킷 액세스 스테이지(1315)는 또한 제1 로직 스테이지(1310)의 출력에 의존하여 데이터를 제1 패킷에 기록할 수도 있다. 제1 패킷 액세스 스테이지(1315)는 제1 데이터 패킷의 전방(front)에 데이터를 기록할 수도 있다. 제1 패킷 액세스 스테이지(1315)는 데이터 패킷 내의 데이터를 덮어쓸 수도 있다.The first packet access stage (1315) loads data from the first packet from the network tap (1320). The first packet access stage (1315) may also write data to the first packet based on the output of the first logic stage (1310). The first packet access stage (1315) may also write data to the front of the first data packet. The first packet access stage (1315) may also overwrite data within the data packet.

로딩된 데이터 및 임의의 다른 메타데이터 및/또는 인수는, 그 다음, 제2 로직 스테이지(1325)로 제공되는데, 제2 로직 스테이지(1325)는 제1 데이터 패킷과 관련하여 프로세싱을 수행하고 출력 인수를 제1 맵 액세스 스테이지(1330)로 제공한다. 제1 맵 액세스 스테이지(1330)는 제2 로직 스테이지(1325)로부터의 출력을 사용하여 룩업 테이블에 대한 룩업을 수행하여 제1 데이터 패킷과 관련하여 수행될 액션을 결정한다. 그 다음, 출력은 제3 로직 스테이지(1335)로 전달되는데, 제3 로직 스테이지(1335)는 이 출력을 프로세싱하고 결과를 제2 패킷 액세스 스테이지(1340)로 전달한다.The loaded data and any other metadata and/or arguments are then provided to a second logic stage (1325), which performs processing with respect to the first data packet and provides output arguments to a first map access stage (1330). The first map access stage (1330) uses the output from the second logic stage (1325) to perform a lookup on a lookup table to determine an action to be performed with respect to the first data packet. The output is then passed to a third logic stage (1335), which processes the output and passes the result to the second packet access stage (1340).

제2 패킷 액세스 스테이지(1340)는 제3 로직 스테이지(1335)의 출력에 의존하여 제1 데이터 패킷으로부터 데이터를 판독하고 및/또는 제1 데이터 패킷에 데이터를 기록할 수도 있다. 그 다음, 제2 패킷 액세스 스테이지(1340)의 결과는, 자신이 수신하는 입력과 관련하여 프로세싱을 수행하도록 구성되는 제4 로직 스테이지(1345)로 전달된다.The second packet access stage (1340) may read data from the first data packet and/or write data to the first data packet based on the output of the third logic stage (1335). The result of the second packet access stage (1340) is then passed to the fourth logic stage (1345), which is configured to perform processing with respect to the input it receives.

파이프라인은 복수의 패킷 액세스 스테이지, 로직 스테이지, 및 맵 액세스 스테이지를 포함할 수도 있다. 최종 로직 스테이지(1350)가 반환 인수를 출력하도록 구성될 수도 있다. 반환 인수는 데이터 패킷의 시작을 식별하는 포인터를 포함할 수도 있다. 반환 인수는 데이터 패킷과 관련하여 수행될 액션의 표시를 포함할 수도 있다. 액션의 표시는 패킷이 드랍되어야 하는지 또는 그렇지 않은지의 여부를 나타낼 수도 있다. 액션의 표시는 패킷이 호스트 시스템으로 포워딩되어야 하는지 또는 그렇지 않은지의 여부를 나타낼 수도 있다. 네트워크 인터페이스 디바이스는 패킷이 드랍되어야 한다는 표시에 응답하여 각각의 데이터 패킷을 드랍하도록 구성되는 적어도 하나의 프로세싱 유닛을 포함할 수도 있다.The pipeline may include a plurality of packet access stages, logic stages, and map access stages. The final logic stage (1350) may be configured to output a return argument. The return argument may include a pointer identifying the start of a data packet. The return argument may include an indication of an action to be performed with respect to the data packet. The indication of the action may indicate whether the packet should be dropped or not. The indication of the action may indicate whether the packet should be forwarded to the host system or not. The network interface device may include at least one processing unit configured to drop each data packet in response to an indication that the packet should be dropped.

파이프라인(1300)은 하나 이상의 바이패스 FIFO(l355a, l355b, l355c)를 추가적으로 포함할 수도 있다. 바이패스 FIFO는 프로세싱 데이터, 예를 들면, 맵 액세스 스테이지 및/또는 패킷 액세스 스테이지 주변의 제1 데이터 패킷으로부터의 데이터를 전달하기 위해 사용될 수도 있다. 몇몇 실시형태에서, 맵 액세스 스테이지 및/또는 패킷 액세스 스테이지는, 그들 각각의 적어도 하나의 동작을 수행하기 위해 제1 데이터 패킷으로부터의 데이터를 필요로 하지 않는다. 맵 액세스 스테이지 및/또는 패킷 액세스 스테이지는 입력 인수에 의존하여 그들 각각의 적어도 하나의 동작을 수행할 수도 있다.The pipeline (1300) may additionally include one or more bypass FIFOs (l355a, l355b, l355c). The bypass FIFOs may be used to pass processing data, for example, data from a first data packet around the map access stage and/or the packet access stage. In some embodiments, the map access stage and/or the packet access stage do not require data from the first data packet to perform at least one of their respective operations. The map access stage and/or the packet access stage may perform at least one of their respective operations depending on input arguments.

본 출원의 실시형태에 따른 네트워크 인터페이스 디바이스(600, 700)에 의해 수행되는 방법(800)을 예시하는 도 8에 대한 참조가 이루어진다.Reference is made to FIG. 8, which illustrates a method (800) performed by a network interface device (600, 700) according to an embodiment of the present application.

S810에서, 기능을 수행하기 위해 네트워크 인터페이스 디바이스의 하드웨어 모듈이 배치된다. 하드웨어 모듈은 데이터 패킷과 관련하여 하드웨어에서 한 타입의 동작을 수행하도록 각각 구성되는 복수의 프로세싱 유닛을 포함한다. S810은, 각각의 수신된 데이터 패킷과 관련하여 기능을 제공하기 위해 특정한 순서로 그들의 각각의 미리 정의된 타입의 동작을 수행하도록 복수의 프로세싱 유닛 중 적어도 일부를 배열하는 것을 포함한다. 그와 같이 하드웨어 모듈을 배열하는 것은, 수신된 데이터 패킷이 복수의 프로세싱 유닛 중 적어도 일부의 복수의 동작의 각각에 의한 프로세싱을 거치도록 복수의 프로세싱 유닛 중 적어도 일부를 연결하는 것을 포함한다. 연결은 프로세싱 유닛 사이에서 데이터 패킷 및 관련된 메타데이터를 라우팅하도록 하드웨어 모듈의 라우팅 하드웨어를 구성하는 것에 의해 달성될 수도 있다.In S810, a hardware module of a network interface device is arranged to perform a function. The hardware module includes a plurality of processing units, each of which is configured to perform a type of operation in hardware with respect to a data packet. S810 includes arranging at least some of the plurality of processing units to perform their respective predefined types of operations in a specific order to provide a function with respect to each received data packet. Arranging the hardware modules in this manner includes connecting at least some of the plurality of processing units such that the received data packet undergoes processing by each of the plurality of operations of at least some of the plurality of processing units. The connecting may be achieved by configuring routing hardware of the hardware module to route data packets and associated metadata between the processing units.

S820에서, 네트워크 인터페이스 디바이스의 제1 인터페이스에서 네트워크로부터 제1 데이터 패킷이 수신된다.In S820, a first data packet is received from a network on a first interface of a network interface device.

S830에서, 제1 데이터 패킷은 S810의 컴파일 프로세스 동안 연결되었던 적어도 일부 프로세싱 유닛의 각각에 의해 프로세싱된다. 적어도 일부 프로세싱 유닛의 각각은 적어도 하나의 데이터 패킷과 관련하여 수행하도록 미리 구성되는 동작의 타입을 수행한다. 그러므로, 기능은 제1 데이터 패킷과 관련하여 수행된다.In S830, the first data packet is processed by each of at least some processing units connected during the compilation process of S810. Each of at least some processing units performs a type of operation pre-configured to be performed in relation to at least one data packet. Therefore, the function is performed in relation to the first data packet.

S840에서, 프로세싱된 제1 데이터 패킷은 자신의 목적지로 계속 전송된다. 이것은 데이터 패킷을 호스트에도 역시 전송하는 것을 포함할 수도 있다. 이것은 네트워크를 통해 데이터 패킷을 전송하는 것을 포함할 수도 있다.In S840, the processed first data packet is then transmitted to its destination. This may also include transmitting the data packet to the host. This may also include transmitting the data packet over a network.

본 출원의 실시형태에 따른 네트워크 인터페이스 디바이스(700)에서 수행될 수도 있는 방법(900)을 예시하는 도 9에 대한 참조가 이루어진다.Reference is made to FIG. 9, which illustrates a method (900) that may be performed in a network interface device (700) according to an embodiment of the present application.

S910에서, 네트워크 인터페이스 디바이스의 제1의 적어도 하나의 프로세싱 유닛(즉, 제1 회로부)은 네트워크를 통해 수신되는 데이터 패킷을 수신 및 프로세싱하도록 구성된다. 이 프로세싱은 데이터 패킷과 관련하여 기능을 수행하는 것을 포함한다. 프로세싱은 제1 시간 기간 동안 수행된다.In S910, at least one first processing unit (i.e., the first circuit unit) of the network interface device is configured to receive and process a data packet received via a network. This processing includes performing a function with respect to the data packet. The processing is performed during a first time period.

S920에서, 제2 컴파일 프로세스는 제2의 적어도 하나의 프로세싱 유닛(즉, 제2 회로부)에 대한 수행을 위한 기능을 컴파일하기 위해 제1 시간 기간 동안 수행된다.In S920, a second compilation process is performed during a first time period to compile a function for execution on a second at least one processing unit (i.e., a second circuit unit).

S930에서, 제2 컴파일 프로세스이 완료되었는지 또는 아닌지의 여부가 결정된다. 만약 그렇지 않으면, S910 및 S920으로 다시 복귀하는데, 여기서 제1의 적어도 하나의 프로세싱 유닛은 네트워크로부터 수신되는 데이터 패킷과 관련하여 프로세싱을 계속 수행하고, 하고, 제2 컴파일 프로세스는 계속된다.At S930, it is determined whether the second compilation process is completed or not. If not, the process returns to S910 and S920, where at least one first processing unit continues processing data packets received from the network, and the second compilation process continues.

S940에서, 제2 컴파일이 완료되었다는 것을 결정하는 것에 응답하여, 제1의 적어도 하나의 프로세싱 유닛은 수신된 데이터 패킷과 관련한 기능의 수행을 중지한다. 몇몇 실시형태에서, 제1의 적어도 하나의 프로세싱 유닛은 소정의 데이터 플로우에 관해서만 기능의 수행을 중지할 수도 있다. 그 다음, 제2의 적어도 하나의 프로세싱 유닛은 그들 소정의 데이터 플로우와 관련하여 (S950에서) 기능을 대신 수행할 수도 있다.At S940, in response to determining that the second compilation is complete, the first at least one processing unit stops performing a function related to the received data packet. In some embodiments, the first at least one processing unit may stop performing the function only with respect to a given data flow. The second at least one processing unit may then perform the function (at S950) instead with respect to their given data flow.

S950에서, 제2 컴파일 프로세스가 완료되면, 제2의 적어도 하나의 프로세싱 유닛은 네트워크로부터 수신되는 데이터 패킷과 관련한 기능의 수행을 시작하도록 구성된다.In S950, when the second compilation process is completed, the second at least one processing unit is configured to start performing a function related to a data packet received from the network.

본 출원의 실시형태에 따른 방법(1600)을 예시하는 도 16에 대한 참조가 이루어진다. 방법(1600)은 네트워크 인터페이스 디바이스 또는 호스트 디바이스에서 수행될 수 있다.Reference is made to FIG. 16, which illustrates a method (1600) according to an embodiment of the present application. The method (1600) may be performed in a network interface device or a host device.

S1610에서, 제1의 적어도 하나의 프로세싱 유닛이 수행할 기능을 컴파일하기 위한 컴파일 프로세스가 수행된다.In S1610, a compilation process is performed to compile a function to be performed by at least one first processing unit.

S1620에서, 제2의 적어도 하나의 프로세싱 유닛에 의해 수행될 기능을 컴파일하기 위해 컴파일 프로세스가 수행된다. 이 프로세스는 제1 기능을 제공하기 위해 데이터 패킷을 프로세싱하기 위한 복수의 스테이지의 스테이지와 관련되는 적어도 하나의 동작을 수행할 것을 제2의 적어도 하나의 프로세싱 유닛의 복수의 프로세싱 유닛의 각각에 할당하는 것을 포함한다. 복수의 프로세싱 유닛의 각각은 한 타입의 프로세싱을 수행하도록 구성되고, 할당하는 것은, 프로세싱 유닛이 각각의 적어도 하나의 동작을 수행하기에 적절한 타입의 프로세싱을 수행하도록 구성된다는 결정에 의존하여 수행된다. 다시 말하면, 프로세싱 유닛은 그들의 템플릿에 따라 선택된다.In S1620, a compilation process is performed to compile a function to be performed by at least one second processing unit. This process includes assigning to each of the plurality of processing units of the at least one second processing unit at least one operation to be performed, wherein the at least one operation is associated with a stage of the plurality of stages for processing a data packet to provide the first function. Each of the plurality of processing units is configured to perform a type of processing, and the assignment is performed based on a determination that the processing unit is configured to perform a type of processing appropriate for performing each of the at least one operation. In other words, the processing units are selected according to their templates.

1630에서, S1620의 컴파일 프로세스의 완료 이전에, 제1의 적어도 하나의 프로세싱 유닛으로 하여금 기능을 수행하게 하기 위한 명령어가 전송된다. 이 명령어는 S1620의 컴파일 프로세스가 시작되기 이전에 전송될 수도 있다.At 1630, prior to completion of the compilation process of S1620, a command is transmitted to cause at least one first processing unit to perform a function. This command may be transmitted prior to the start of the compilation process of S1620.

S1640에서, S1620에서의 컴파일 프로세스의 완료에 후속하여, 제2 회로부로 하여금 데이터 패킷과 관련하여 기능을 수행하게 하기 위한 명령어가 제2 회로부로 전송된다. 이 명령어는 S1620에서 생성되는 컴파일된 명령어를 포함할 수도 있다.At S1640, following completion of the compilation process at S1620, a command is transmitted to the second circuit unit to cause the second circuit unit to perform a function related to the data packet. This command may include a compiled command generated at S1620.

본 출원의 실시형태에 따른 기능은, 네트워크 인터페이스에서 프로세싱 슬라이스의 플러그형 컴포넌트로서 제공될 수도 있다. 슬라이스(1425)가 네트워크 인터페이스 디바이스(600)에서 어떻게 사용될 수도 있는지의 예를 예시하는 도 14에 대한 참조가 이루어진다. 슬라이스(1425)는 프로세싱 파이프라인으로서 지칭될 수도 있다.The functionality according to embodiments of the present application may be provided as a pluggable component of a processing slice in a network interface. Reference is made to FIG. 14, which illustrates an example of how a slice (1425) may be used in a network interface device (600). The slice (1425) may also be referred to as a processing pipeline.

네트워크 인터페이스 디바이스(600)는, 슬라이스(1425)에 의해 프로세싱될 그리고 그 다음 네트워크를 통해 송신될 호스트로부터의 데이터 패킷을 수신 및 저장하기 위한 송신 큐(1405)를 포함한다. 네트워크 인터페이스 디바이스(600)는, 슬라이스(1425)에 의해 프로세싱될 그리고 그 다음 호스트로 전달될 네트워크(1410)로부터 수신되는 데이터 패킷을 저장하기 위한 수신 큐(1410)를 포함한다. 네트워크 인터페이스 디바이스(600)는 슬라이스(1425)에 의해 프로세싱된 그리고 호스트로의 전달을 위한 것인 네트워크로부터 수신되는 데이터 패킷을 저장하기 위한 수신 큐(1415)를 포함한다. 네트워크 인터페이스 디바이스(600)는 슬라이스(1425)에 의해 프로세싱된 그리고 네트워크로의 전달을 위한 것인 호스트로부터 수신되는 데이터 패킷을 저장하기 위한 송신 큐를 포함한다.The network interface device (600) includes a transmit queue (1405) for receiving and storing data packets from a host to be processed by the slice (1425) and then transmitted over a network. The network interface device (600) includes a receive queue (1410) for storing data packets received from a network (1410) to be processed by the slice (1425) and then transmitted to the host. The network interface device (600) includes a receive queue (1415) for storing data packets received from a network that have been processed by the slice (1425) and are intended for transmission to the host. The network interface device (600) includes a transmit queue for storing data packets received from a host that have been processed by the slice (1425) and are intended for transmission to the network.

네트워크 인터페이스 디바이스(600)의 슬라이스(1425)는 수신 경로 및 송신 경로 상의 데이터 패킷을 프로세싱하기 위한 복수의 프로세싱 기능을 포함한다. 슬라이스(1425)는 수신 경로 및 송신 경로 상의 데이터 패킷의 프로토콜 프로세싱을 수행하도록 구성되는 프로토콜 스택을 포함할 수도 있다. 몇몇 실시형태에서, 네트워크 인터페이스 디바이스(600)에는 복수의 슬라이스가 있을 수도 있다. 복수의 슬라이스 중 적어도 하나는 네트워크로부터 수신되는 수신 데이터 패킷을 프로세싱하도록 구성될 수도 있다. 복수의 슬라이스 중 적어도 하나는 네트워크를 통한 송신을 위해 송신 데이터 패킷을 프로세싱하도록 구성될 수도 있다. 슬라이스는, 적어도 하나의 FPGA 및/또는 적어도 하나의 ASIC와 같은 하드웨어 프로세싱 장치에 의해 구현될 수도 있다.A slice (1425) of a network interface device (600) includes a plurality of processing functions for processing data packets on a receive path and a transmit path. The slice (1425) may include a protocol stack configured to perform protocol processing of data packets on the receive path and the transmit path. In some embodiments, the network interface device (600) may have a plurality of slices. At least one of the plurality of slices may be configured to process a receive data packet received from a network. At least one of the plurality of slices may be configured to process a transmit data packet for transmission over the network. The slices may be implemented by a hardware processing device, such as at least one FPGA and/or at least one ASIC.

가속기 컴포넌트(l430a, l430b, l430c, l430d)는 도시되는 바와 같이 슬라이스의 상이한 스테이지에서 삽입될 수도 있다. 가속기 컴포넌트 각각은 슬라이스를 통과하는 데이터 패킷과 관련하여 기능을 제공한다. 가속기 컴포넌트는, 즉석에서, 즉, 네트워크 인터페이스 디바이스의 동작 동안 삽입될 수도 있거나 또는 제거될 수도 있다. 따라서, 가속기 컴포넌트는 플러그형 컴포넌트이다. 가속기 컴포넌트는 슬라이스(1425)에 대해 할당되는 로직 영역이다. 그들의 각각은, 슬라이스를 통과하는 패킷이 컴포넌트 안팎으로 스트리밍되는 것을 허용하는 스트리밍 패킷 인터페이스를 지원한다.Accelerator components (l430a, l430b, l430c, l430d) may be inserted at different stages of a slice as illustrated. Each accelerator component provides a function in relation to data packets passing through the slice. The accelerator components may be inserted or removed on the fly, i.e., during operation of the network interface device. Therefore, the accelerator components are pluggable components. The accelerator components are logical regions allocated for the slice (1425). Each of them supports a streaming packet interface that allows packets passing through the slice to be streamed into and out of the component.

예를 들면, 한 타입의 가속기 컴포넌트는 수신 또는 송신 경로 상의 데이터 패킷의 암호화를 제공하도록 구성될 수도 있다. 다른 타입의 가속기 컴포넌트는 수신 또는 송신 경로 상의 데이터 패킷의 복호화를 제공하도록 구성될 수도 있다.For example, one type of accelerator component may be configured to provide encryption of data packets along a receive or transmit path. Another type of accelerator component may be configured to provide decryption of data packets along a receive or transmit path.

(도 6을 참조하여 상기에서 논의되는 바와 같은) 복수의 연결된 프로세싱 유닛에 의해 수행되는 동작을 실행하는 것에 의해 제공되는 상기에서 논의되는 기능은 가속기 컴포넌트에 의해 제공될 수도 있다. 유사하게, (도 4를 참조하여 상기에서 논의되는 바와 같은) 네트워크 프로세싱 CPU의 어레이 및/또는 (도 5를 참조하여 상기에서 논의되는 바와 같은) FPGA 애플리케이션에 의해 제공되는 기능은 가속기 컴포넌트에 의해 제공될 수도 있다.The functionality discussed above, which is provided by executing operations performed by a plurality of connected processing units (as discussed above with reference to FIG. 6), may also be provided by an accelerator component. Similarly, functionality provided by an array of network processing CPUs (as discussed above with reference to FIG. 4) and/or an FPGA application (as discussed above with reference to FIG. 5) may also be provided by an accelerator component.

설명되는 바와 같이, 네트워크 인터페이스 디바이스의 동작 동안, 제1의 적어도 하나의 프로세싱 유닛(예컨대, 복수의 연결된 프로세싱 유닛)에 의해 수행되는 프로세싱은 제2의 적어도 하나의 프로세싱 유닛으로부터 마이그레이션될 수도 있다. 이 마이그레이션을 구현하기 위해, 슬라이스(1425)의 컴포넌트 중 제1의 적어도 하나의 프로세싱 유닛에 의한 프로세싱을 위한 컴포넌트는 제2의 적어도 하나의 프로세싱 유닛에 의한 프로세싱을 위한 컴포넌트에 의해 대체될 수도 있다.As described, during operation of the network interface device, processing performed by at least one first processing unit (e.g., a plurality of connected processing units) may be migrated from at least one second processing unit. To implement this migration, a component for processing by the at least one first processing unit among the components of the slice (1425) may be replaced by a component for processing by the at least one second processing unit.

네트워크 인터페이스 디바이스는 슬라이스(1425)로부터 컴포넌트를 삽입 및 제거하도록 구성되는 제어 프로세서를 포함할 수도 있다. 상기에서 논의되는 제1 시간 기간 동안, 제1의 적어도 하나의 프로세싱 유닛에 의한 기능을 수행하기 위한 컴포넌트가 슬라이스(1425)에 존재할 수도 있다. 제어 프로세서는, 제1 시간 기간에 후속하여: 제1의 적어도 하나의 프로세싱 유닛에 의한 기능을 제공하는 플러그형 컴포넌트를 슬라이스(1425)로부터 제거하도록 그리고 제2의 적어도 하나의 프로세싱 유닛에 의한 기능을 제공하는 플러그형 컴포넌트를 슬라이스(1425)에 삽입하도록 구성될 수도 있다.The network interface device may include a control processor configured to insert and remove components from a slice (1425). During the first time period discussed above, a component for performing a function by a first at least one processing unit may be present in the slice (1425). The control processor may be configured, subsequent to the first time period: to remove a pluggable component providing a function by the first at least one processing unit from the slice (1425) and to insert a pluggable component providing a function by the second at least one processing unit into the slice (1425).

슬라이스로부터 컴포넌트를 삽입 및 제거하는 것 외에도 또는 그 대신, 제어 프로세서는 컴포넌트에 프로그램을 로딩할 수도 있고 컴포넌트로의 프레임의 플로우를 제어하기 위한 제어 평면 커맨드(control-plane command)를 발행할 수도 있다. 이 경우, 컴포넌트는 파이프라인으로부터 삽입 또는 제거되지 않고도 동작하게 되거나 또는 동작하지 않게 될지도 모른다.In addition to or instead of inserting and removing components from a slice, the control processor may load programs into the component and issue control-plane commands to control the flow of frames to the component. In this case, the component may be enabled or disabled without being inserted or removed from the pipeline.

몇몇 실시형태에서, 제어 평면 또는 구성 정보는, 별개의 제어 버스를 필요로 하기보다는, 데이터 경로를 통해 전달된다. 몇몇 실시형태에서, 데이터 경로 컴포넌트의 구성을 업데이트하기 위한 요청은 네트워크 패킷과 동일한 버스를 통해 전달되는 메시지로서 인코딩된다. 따라서, 데이터 경로는 두 가지 타입의 패킷: 네트워크 패킷 및 제어 패킷을 전달할 수도 있다.In some embodiments, control plane or configuration information is conveyed via the data path, rather than requiring a separate control bus. In some embodiments, requests to update the configuration of data path components are encoded as messages conveyed via the same bus as network packets. Thus, the data path may convey two types of packets: network packets and control packets.

제어 패킷은 제어 프로세서에 의해 형성되고, 슬라이스(1425)를 사용하여 데이터 패킷을 전송 또는 수신하기 위해 사용되는 동일한 메커니즘을 사용하여 슬라이스(1425)에 주입된다. 이 동일한 메커니즘은 송신 큐 또는 수신 큐일 수도 있다. 제어 패킷은 임의의 적절한 방식으로 네트워크 패킷과 구별될 수도 있다. 몇몇 실시형태에서, 상이한 타입의 패킷은 메타데이터 워드 내의 비트 또는 비트들에 의해 구별될 수도 있다.Control packets are formed by the control processor and injected into the slice (1425) using the same mechanism used to transmit or receive data packets using the slice (1425). This same mechanism may be a transmit queue or a receive queue. Control packets may be distinguished from network packets in any suitable manner. In some embodiments, different types of packets may be distinguished by a bit or bits within a metadata word.

몇몇 실시형태에서, 제어 패킷은, 제어 패킷이 취하는 슬라이스(1425)를 통과하는 경로를 결정하는 라우팅 필드를 메타데이터 워드에서 포함한다. 제어 패킷은 제어 커맨드의 시퀀스를 전달할 수도 있다. 각각의 제어 커맨드는 슬라이스(1425)의 하나 이상의 컴포넌트를 타겟으로 할 수도 있다. 각각의 데이터 경로 컴포넌트는 컴포넌트 ID 필드에 의해 식별된다. 각각의 제어 커맨드는 각각의 식별된 컴포넌트에 대한 요청을 인코딩한다. 요청은 그 컴포넌트의 구성에 대해 변경을 행하는 것일 수도 있다. 요청은, 컴포넌트가 활성화되었는지 또는 그렇지 않은지의 여부, 즉, 컴포넌트가 슬라이스를 통과하는 데이터 패킷과 관련하여 자신의 기능을 수행하는지 또는 그렇지 않은지의 여부를 제어할 수도 있다.In some embodiments, a control packet includes a routing field in its metadata word that determines the path that the control packet takes through the slice (1425). The control packet may carry a sequence of control commands. Each control command may target one or more components of the slice (1425). Each data path component is identified by a component ID field. Each control command encodes a request for each identified component. The request may be to make a change to the configuration of the component. The request may control whether the component is enabled or disabled, i.e., whether the component performs its function with respect to data packets passing through the slice or not.

따라서, 몇몇 실시형태에서, 네트워크 인터페이스 디바이스(600)의 제어 프로세서는, 슬라이스의 컴포넌트 중 하나로 하여금 네트워크 인터페이스 디바이스에서 수신되는 데이터 패킷과 관련한 기능의 수행을 시작하게 하기 위해 메시지를 전송하도록 구성된다. 이 메시지는 플러그형 컴포넌트를 통해 전송되는 그리고 기능을 수행하기 위해 컴포넌트로의 프레임의 최소 단위의 스위치 오버(atomic switch over)를 야기하는 제어 평면 메시지이다. 그 다음, 이 컴포넌트는, 슬라이스가 스위치 아웃될 때까지, 슬라이스를 통과하는 모든 수신된 데이터 패킷에 대해 실행된다. 제어 프로세서는, 슬라이스의 컴포넌트 중 다른 것으로 하여금, 이 컴포넌트가 네트워크 인터페이스 디바이스(600)에서 수신되는 데이터 패킷과 관련한 기능의 수행을 중지하게 하기 위한 메시지를 전송하도록 구성된다.Accordingly, in some embodiments, the control processor of the network interface device (600) is configured to transmit a message to cause one of the components of the slice to begin performing a function associated with a data packet received at the network interface device. This message is a control plane message transmitted through the pluggable component and causes an atomic switch over of a frame to the component to perform the function. This component is then executed for all received data packets passing through the slice until the slice is switched out. The control processor is configured to transmit a message to another of the components of the slice to cause that component to stop performing a function associated with a data packet received at the network interface device (600).

컴포넌트를 데이터 슬라이스(1425) 안팎으로 스위칭하기 위해, 입구 및 출구 데이터 경로의 다양한 지점에서 소켓이 존재할 수도 있다. 제어 프로세서는 추가적인 로직을 슬라이스(1425) 안팎으로 연결할 수도 있다. 이 추가적인 로직은 컴포넌트 사이에서 배치되는 FIFO의 형태를 취할 수도 있다.Sockets may be present at various points along the ingress and egress data paths to switch components into and out of data slices (1425). The control processor may also connect additional logic into and out of slices (1425). This additional logic may take the form of a FIFO placed between components.

제어 프로세서는 슬라이스(1425)를 통해 슬라이스(1425)의 구성된 컴포넌트로 제어 평면 메시지를 전송할 수도 있다. 구성은 슬라이스(1425)의 컴포넌트에 의해 수행되는 기능을 결정할 수도 있다. 예를 들면, 슬라이스(1425)를 통해 전송되는 제어 메시지는 하드웨어 모듈로 하여금 데이터 패킷과 관련하여 기능을 수행하도록 구성되게 할 수도 있다. 그러한 제어 메시지는, 소정의 기능을 제공하기 위해, 하드웨어 모듈의 최소 단위로 하여금, 하드웨어 모듈의 파이프라인으로 인터커넥트되게 할 수도 있다. 그러한 제어 메시지는, 하드웨어 모듈의 개개의 최소 단위로 하여금, 개별적으로 선택된 최소 단위에 의해 수행될 동작을 선택하도록 구성되게 할 수도 있다. 각각의 최소 단위가 한 타입의 동작을 수행하도록 미리 구성되기 때문에, 각각의 최소 단위에 대한 동작의 선택은, 각각의 최소 단위가 수행하도록 미리 구성되는 동작의 타입에 의존하여 이루어진다.The control processor may transmit control plane messages to configured components of the slice (1425) via the slice (1425). The configuration may determine the functions performed by the components of the slice (1425). For example, a control message transmitted via the slice (1425) may cause a hardware module to be configured to perform a function in relation to a data packet. Such a control message may cause a minimum unit of the hardware module to be interconnected into a pipeline of the hardware module to provide a given function. Such a control message may cause individual minimum units of the hardware module to select an operation to be performed by the individually selected minimum unit. Since each minimum unit is pre-configured to perform a type of operation, the selection of an operation for each minimum unit is made depending on the type of operation that each minimum unit is pre-configured to perform.

이제, 몇몇 추가적인 실시형태가 도 19 내지 도 21을 참조하여 설명될 것이다. 이 실시형태에서, 패킷 프로세싱 프로그램 또는 피드포워드 파이프라인이 FPGA에서 실행된다. FPGA의 서브유닛으로 하여금 패킷 프로세싱 프로그램 또는 피드포워드 파이프라인(feedforward pipeline)을 구현하게 하기 위한 방법이 설명될 것이다. 패킷 프로세싱 프로그램 또는 피드포워드 파이프라인은 eBPF 프로그램 또는 P4 프로그램 또는 임의의 다른 적절한 프로그램일 수도 있다.Now, several additional embodiments will be described with reference to FIGS. 19 to 21. In these embodiments, a packet processing program or feedforward pipeline is executed in an FPGA. A method for causing a subunit of an FPGA to implement a packet processing program or feedforward pipeline will be described. The packet processing program or feedforward pipeline may be an eBPF program, a P4 program, or any other suitable program.

이 FPGA는 네트워크 인터페이스 디바이스에서 제공될 수도 있다. 몇몇 실시형태에서, 패킷 프로세싱 프로그램은, 네트워크 인터페이스 디바이스가 자신의 호스트와 관련하여 설치된 이후에만 배치되거나 또는 실행된다.This FPGA may be provided in a network interface device. In some embodiments, the packet processing program is deployed or executed only after the network interface device is installed in association with its host.

패킷 프로세싱 프로그램 또는 피드포워드 파이프라인은 루프가 없는 로직 플로우를 구현할 수도 있다.A packet processing program or feedforward pipeline may also implement a loop-free logic flow.

몇몇 실시형태에서, 프로그램은 예컨대 유저 레벨에서 더 낮은 특권이 있는 도메인 또는 특권이 없는 도메인에서 작성될 수도 있다. 프로그램은 커널과 같은 특권이 있는 또는 더 높은 특권이 있는 도메인 상에서 실행될 수도 있다. 프로그램을 실행하는 하드웨어는 어떠한 루프도 없다는 것을 요구할 수도 있다.In some embodiments, a program may be written in a less privileged domain, such as a user-level domain, or in an unprivileged domain. The program may also run on a privileged domain, such as the kernel, or on a more privileged domain. The hardware executing the program may require that it not contain any loops.

다음의 실시형태에서, eBPF 프로그램 예에 대한 참조가 이루어진다. 그러나, 다른 실시형태는 임의의 다른 적절한 프로그램과 함께 사용될 수도 있다는 것이 인식되어야 한다.In the following embodiments, reference is made to examples of eBPF programs. However, it should be recognized that other embodiments may be used with any other suitable program.

하기의 실시형태 중 하나 이상은 이전 실시형태 중 하나 이상과 연계하여 사용될 수도 있다는 것이 인식되어야 한다.It should be recognized that one or more of the following embodiments may be used in conjunction with one or more of the previous embodiments.

몇몇 실시형태는 FPGA, ASIC 또는 임의의 다른 적절한 하드웨어 디바이스의 맥락에서 제공될 수도 있다. 몇몇 실시형태는 FPGA 또는 ASIC 또는 등등의 서브유닛을 사용한다. 다음의 예는 FPGA를 참조하여 설명된다. 유사한 프로세스가 ASIC 또는 임의의 다른 적절한 하드웨어 디바이스를 사용하여 수행될 수도 있다는 것이 인식되어야 한다.Some embodiments may be provided in the context of an FPGA, an ASIC, or any other suitable hardware device. Some embodiments utilize subunits of an FPGA, an ASIC, or the like. The following examples are described with reference to an FPGA. It should be appreciated that similar processes may be performed using an ASIC or any other suitable hardware device.

서브유닛은 최소 단위일 수도 있다. 최소 단위의 몇몇 예는 이전에 설명되었다. 최소 단위의 그들 앞서 설명된 예 중 임의의 것은, 대안적으로 또는 추가적으로, 서브유닛으로서 사용될 수도 있다는 것이 인식되어야 한다. 대안적으로 또는 추가적으로, 이들 서브유닛은 "슬라이스" 또는 구성 가능한 로직 블록으로서 지칭될 수도 있다.A subunit may be a minimal unit. Some examples of minimal units have been previously described. It should be recognized that any of the previously described examples of minimal units may alternatively or additionally be used as a subunit. Alternatively or additionally, these subunits may be referred to as "slices" or configurable logic blocks.

이들 서브유닛의 각각은, 단일의 명령어 또는 복수의 관련된 명령어를 수행하도록 구성될 수도 있다. 후자의 경우, 관련된 명령어는 단일의 출력(이것은 하나 이상의 비트에 의해 정의될 수도 있음)을 제공할 수도 있다.Each of these subunits may be configured to execute a single instruction or multiple related instructions. In the latter case, the related instructions may provide a single output (which may be defined by one or more bits).

서브유닛은 계산 유닛인 것으로 간주될 수 있다. 서브유닛은 패킷이 순서대로 프로세싱되는 파이프라인에서 배열될 수도 있다. 몇몇 실시형태에서, 서브유닛은 프로그램에서 각각의 명령어(또는 명령어들)를 실행하도록 동적으로 할당받을 수 있다.A subunit can be considered a computational unit. The subunits may be arranged in a pipeline where packets are processed sequentially. In some embodiments, the subunits may be dynamically assigned to execute individual instructions (or instructions) in the program.

몇몇 실시형태에서, 서브유닛은, 예를 들면, FPGA의 블록을 정의하기 위해 사용되는 유닛의 모두 또는 일부일 수도 있다. 몇몇 FPGA에서, FPGA의 블록은 슬라이스로 지칭된다. 몇몇 실시형태에서, 서브유닛 또는 최소 단위는 슬라이스와 동일한 것으로 생각된다.In some embodiments, a subunit may be all or part of a unit used to define a block of an FPGA, for example. In some FPGAs, a block of an FPGA is referred to as a slice. In some embodiments, a subunit or minimum unit is considered to be identical to a slice.

각각의 최소 단위 또는 서브유닛을 FPGA의 각각의 블록 또는 슬라이스에 매핑하는 것에 의해, RTL 최소 단위를 FPGA 리소스에 매핑하는 접근법과 비교하여 향상된 리소스 활용이 달성될 수도 있다. 그러한 후자의 접근법은, RTL 최소 단위가 FPGA의 상대적으로 많은 수의 개개의 블록 또는 슬라이스를 필요로 하는 것을 초래할 수도 있다.By mapping each minimal unit or subunit to a separate block or slice of the FPGA, improved resource utilization may be achieved compared to the approach of mapping RTL minimal units to FPGA resources. This latter approach may result in each RTL minimal unit requiring a relatively large number of individual blocks or slices of the FPGA.

몇몇 실시형태에서, 컴파일링은 최소 단위 레벨에 대한 것일 수도 있다. 이것은 프로세싱이 파이프라인화된다는 이점을 가질 수도 있다. 패킷은 순서대로 프로세싱될 수도 있다. 컴파일 프로세스는 상대적으로 빠르게 수행될 수도 있다.In some embodiments, compilation may be performed at the smallest level. This may have the advantage of pipelined processing. Packets may be processed sequentially. The compilation process may also be performed relatively quickly.

몇몇 실시형태에서, 산술 연산은 바이트당 하나의 슬라이스를 요구할 수도 있다. 논리 연산은 바이트당 절반의 슬라이스를 필요로 할 수도 있다. 시프트 동작은 시프트 동작의 폭에 따라 슬라이스의 모음을 필요로 할 수도 있다. 비교 동작은 바이트당 하나의 슬라이스를 필요로 할 수도 있다. 선택 동작은 바이트당 절반의 슬라이스를 필요로 할 수도 있다.In some embodiments, arithmetic operations may require one slice per byte. Logical operations may require half a slice per byte. Shift operations may require a set of slices, depending on the width of the shift operation. Compare operations may require one slice per byte. Select operations may require half a slice per byte.

컴파일 프로세스의 일부로서, 배치 및 라우팅이 수행된다. 배치는 특정한 명령어 또는 명령어들을 수행하기 위해 특정한 물리적 서브유닛을 할당하는 것이다. 라우팅은, 특정한 서브유닛의 출력 또는 출력들이, 예를 들면, 다른 서브유닛 또는 서브유닛들일 수도 있는 올바른 목적지로 라우팅된다는 것을 보장한다.As part of the compilation process, placement and routing are performed. Placement is the assignment of a specific physical subunit to execute a specific instruction or instructions. Routing ensures that the output or outputs of a specific subunit are routed to the correct destination, which may be another subunit or subunits, for example.

배치 및 라우팅은 파이프라인의 한쪽 끝에서 시작하여 특정한 서브유닛에 동작이 할당되는 프로세스를 사용할 수도 있다. 몇몇 실시형태에서, 가장 중요한 동작은 덜 중요한 동작에 앞서 배치될 수도 있다. 몇몇 실시형태에서, 라우팅은 특정한 동작이 배치되고 있는 것과 동시에 할당될 수도 있다. 몇몇 실시형태에서, 루트는 미리 계산된 루트의 제한된 세트로부터 선택될 수도 있다. 이것은 나중에 상세하게 설명될 것이다.Placement and routing may use a process that begins at one end of the pipeline and assigns operations to specific subunits. In some embodiments, the most important operations may be placed before less important operations. In some embodiments, routing may be assigned simultaneously with the placement of specific operations. In some embodiments, routes may be selected from a limited set of pre-computed routes. This will be described in detail later.

몇몇 실시형태에서, 루트가 할당될 수 없는 경우, 동작은 나중을 위해 유지될 것이다.In some embodiments, if the root cannot be allocated, the operation will be retained for later.

몇몇 실시형태에서, 미리 계산된 루트는 바이트 폭 루트(byte wide route)일 수도 있다. 그러나, 이것은 단지 예에 불과하며, 다른 실시형태에서, 상이한 폭의 루트가 정의될 수도 있다. 몇몇 실시형태에서, 복수의 상이한 사이즈의 루트가 제공될 수도 있다.In some embodiments, the precomputed route may be a byte-wide route. However, this is merely an example, and in other embodiments, routes of different widths may be defined. In some embodiments, multiple routes of different sizes may be provided.

몇몇 실시형태에서, 라우팅은 인근 서브유닛 사이의 라우팅으로 제한될 수도 있다.In some embodiments, routing may be limited to routing between adjacent subunits.

몇몇 실시형태에서, 서브유닛은 FPGA 상의 규칙적인 구조물에서 물리적으로 배열될 수도 있다.In some embodiments, the subunits may be physically arranged in a regular structure on the FPGA.

몇몇 실시형태에서, 라우팅을 용이하게 하기 위해, 서브유닛이 통신할 수도 있는 방법에 관한 규칙이 만들어질 수도 있다. 예를 들면, 서브유닛은, 자신의 옆에, 자신의 위에 또는 자신의 아래에 있는 서브유닛으로만 출력을 제공할 수 있다.In some embodiments, rules may be established regarding how subunits may communicate to facilitate routing. For example, a subunit may only provide output to subunits next to it, above it, or below it.

대안적으로 또는 추가적으로, 라우팅 목적을 위해 다음 번 서브유닛이 얼마나 멀리 떨어져 있는지에 대한 제한을 둘 수도 있다. 예를 들면, 서브유닛은 인접한 서브유닛 또는 정의된 거리 이내에 있는(예를 들면, 단지 하나 개재하는 서브유닛이 존재함) 서브유닛으로만 데이터를 출력할 수도 있다.Alternatively or additionally, for routing purposes, a limit may be placed on how far away the next subunit is. For example, a subunit may only output data to adjacent subunits or to subunits within a defined distance (e.g., only one intervening subunit exists).

몇몇 실시형태의 방법을 도시하는 도 19에 대한 참조가 이루어진다.Reference is made to FIG. 19, which illustrates a method of some embodiments.

몇몇 실시형태에서, FPGA는 하나 이상의 "정적인" 영역 및 하나 이상의 "동적인" 영역을 구비할 수도 있다. 정적인 영역은 표준 구성을 제공하고 동적 기능은 엔드 유저의 요구에 따라 기능을 제공할 수도 있다. 정적인 부분은, 예를 들면, 엔드 유저가 네트워크 인터페이스 디바이스를 수신하기 이전에, 예를 들면, 네트워크 인터페이스 디바이스가 호스트와 관련하여 설치되기 이전에 정의될 수도 있다. 예를 들면, 정적인 영역은 네트워크 인터페이스 디바이스로 하여금 소정의 기능을 제공하게 하도록 구성될 수도 있다. 정적인 영역은 최소 단위 사이에서 미리 계산된 루트를 제공받을 것이다. 나중에 더욱 상세하게 논의될 바와 같이, 하나 이상의 동적인 영역을 통과하는 하나 이상의 정적인 영역 사이의 라우팅이 있을 수도 있다. 동적인 영역은, 네트워크 인터페이스 디바이스가 호스트와 관련하여 배치될 때, 엔드 유저에 의해 그들의 요구에 따라 구성될 수도 있다. 동적인 영역은 시간 경과에 따라 엔드 유저에 대해 상이한 기능을 수행하도록 구성될 수도 있다.In some embodiments, an FPGA may have one or more "static" regions and one or more "dynamic" regions. The static regions provide a standard configuration, while the dynamic regions may provide functionality based on end-user requests. The static regions may be defined, for example, prior to the end-user receiving the network interface device, for example, prior to the network interface device being installed relative to the host. For example, the static regions may be configured to enable the network interface device to provide certain functionality. The static regions may be provided with pre-computed routes between atomic units. As discussed in more detail later, routing between one or more static regions may occur through one or more dynamic regions. The dynamic regions may be configured by the end-user based on their requests when the network interface device is deployed relative to the host. The dynamic regions may be configured to perform different functions for the end-user over time.

단계(S1)에서, 메인 비트 파일(50) 및 도구 체크포인트(tool checkpoint)(52)로서 지칭되는 제1 비트 파일을 제공하기 위해 제1 컴파일 프로세스가 수행된다. 이것은 몇몇 실시형태에서 정적인 영역의 적어도 일부에 대한 비트 파일이다. 비트 파일은, FPGA로 다운로드되면, FPGA로 하여금, 프로그램 - 비트 파일은 이 프로그램으로부터 컴파일되었음 - 에서 명시되는 바와 같이 기능하게 할 것이다. 몇몇 실시형태에서, 제1 컴파일 프로세스에서 사용되는 프로그램은 임의의 하나 이상의 프로그램일 수도 있거나 또는 FPGA의 일부 내에서 라우팅의 결정을 지원하기 위해 특별히 설계되는 테스트 프로그램일 수도 있다. 몇몇 실시형태에서, 일련의 간단한 프로그램이 대안적으로 또는 추가적으로 사용될 수도 있다.In step (S1), a first compilation process is performed to provide a first bit file, referred to as a main bit file (50) and a tool checkpoint (52). This is a bit file for at least a portion of the static region in some embodiments. The bit file, when downloaded to the FPGA, will cause the FPGA to function as specified in the program from which the bit file was compiled. In some embodiments, the program used in the first compilation process may be any one or more programs, or may be a test program specifically designed to support routing decisions within a portion of the FPGA. In some embodiments, a series of simple programs may alternatively or additionally be used.

프로그램은 수정될 수도 있거나 또는 컴파일러에 의해 사용될 수 있는 재구성 가능한 파티션을 가질 수도 있다. 프로그램은, 재구성 가능한 파티션 밖으로 네트를 이동하는 것에 의해 컴파일러의 작업을 더 쉽게 만들 수도 있도록 수정될 수도 있다.A program may have reconfigurable partitions that can be modified or used by the compiler. The program may also be modified to make the compiler's task easier by moving the network outside the reconfigurable partitions.

단계(S1)는 설계 도구에서 수행될 수도 있다. 단지 예로서, Vivado(비바도) 도구는 Xilinx FPGA와 함께 사용될 수도 있다. 체크포인트 파일은 설계 도구에 의해 제공될 수도 있다. 체크포인트 파일은, 비트 파일이 생성되는 지점에서 설계의 스냅샷을 나타낸다. 체크포인트 파일은 하나 이상의 합성된 넷리스트, 설계 제약, 배치 정보 및 라우팅 정보를 포함할 수도 있다.Step (S1) may be performed in a design tool. For example, the Vivado tool may be used with Xilinx FPGAs. A checkpoint file may be provided by the design tool. A checkpoint file represents a snapshot of the design at the point where the bit file is generated. The checkpoint file may include one or more synthesized netlists, design constraints, placement information, and routing information.

단계(S2)에서, 비트 파일은 비트 파일 디스크립션(54)을 제공하기 위해 체크포인트 파일을 고려하면서 분석된다. 분석은 리소스를 검출하는 것, 루트를 생성하는 것, 타이밍을 체크하는 것, 하나 이상의 부분적인 비트 파일을 생성하는 것 및 비트 파일 디스크립션을 생성하는 것 중 하나 이상일 수도 있다.In step (S2), the bit file is analyzed while considering the checkpoint file to provide a bit file description (54). The analysis may include one or more of detecting resources, generating a route, checking timing, generating one or more partial bit files, and generating a bit file description.

분석은 비트 파일로부터 라우팅 정보를 추출하도록 구성될 수도 있다. 분석은 신호가 어떤 와이어 또는 루트 상에서 전파되었는지를 결정하도록 구성될 수도 있다.The analysis may be configured to extract routing information from the bit file. The analysis may also be configured to determine along which wire or route a signal was propagated.

분석 국면(phase)은 합성 또는 설계 도구에서 적어도 부분적으로 수행될 수도 있다. 몇몇 실시형태에서 Vivado의 스크립팅 도구가 사용될 수도 있다. 스크립팅 도구는 TCL(tool command language; 도구 커맨드 언어)일 수도 있다. TCL은 Vivado의 성능을 추가하거나 또는 수정하기 위해 사용될 수 있다. Vivado의 기능은 TCL 스크립트에 의해 호출되고 제어될 수도 있다.The analysis phase may be performed, at least partially, within a synthesis or design tool. In some embodiments, Vivado's scripting tools may be used. The scripting tool may be TCL (tool command language). TCL can be used to add to or modify Vivado's capabilities. Vivado's functions can also be invoked and controlled via TCL scripts.

비트 파일 디스크립션(54)은 FPGA의 주어진 부분이 어떻게 사용될 수 있는지를 정의한다. 예를 들면, 비트 파일 디스크립션은, 어떤 최소 단위가 어떤 다른 최소 단위로 라우팅될 수 있는지 및 그들 최소 단위 사이에서 라우팅하는 것을 가능하게 하는 하나 이상의 루트를 나타낼 것이다. 예를 들면, 각각의 최소 단위에 대해, 비트 파일 디스크립션은, 그 최소 단위에 대한 입력이 유래할 수 있는 곳 및 그 최소 단위로부터의 출력이 데이터 출력을 위한 하나 이상의 루트와 함께 라우팅될 수 있는 곳을 나타낼 것이다. 비트 파일 디스크립션은 어떠한 프로그램과도 독립적이다.A bit file description (54) defines how a given portion of an FPGA can be used. For example, a bit file description may indicate which minimum units can be routed to which other minimum units and one or more routes that enable routing between those minimum units. For example, for each minimum unit, a bit file description may indicate where inputs to that minimum unit can come from and where outputs from that minimum unit can be routed along with one or more routes for data output. A bit file description is independent of any program.

비트 파일 디스크립션은, 루트 정보, 루트의 어떤 쌍이 충돌하는지의 표시 및 최소 단위의 필요한 구성으로부터 비트 파일을 생성하는 방법의 설명 중 하나 이상을 포함할 수도 있다.A bit file description may include one or more of: root information, an indication of which pairs of roots collide, and a description of how to create a bit file from the minimum required configuration.

비트 파일 디스크립션은, 최소 단위의 세트 사이에서 이용 가능한 그러나 임의의 특정한 명령어가 주어진 최소 단위에 의해 수행되기 이전에 루트의 세트를 제공할 수도 있다.A bit file description may provide a set of roots between a set of atomic units, but before any particular instruction is executed by a given atomic unit.

비트 파일 디스크립션은 FPGA의 일부에 대한 것일 수도 있다. 비트 파일 디스크립션은 동적인 FPGA 부분에 대한 것일 수도 있다. 비트 파일 디스크립션은, 어떤 루트가 이용 가능한지 및/또는 어떤 루트가 이용 가능하지 않은지를 포함할 것이다. 예를 들면, 비트 파일은, FPGA의 동적인 부분에 대해, 예를 들면, FPGA의 정적인 부분(들)에 의해 필요로 되는 FPGA의 동적인 부분을 가로지르는 임의의 라우팅을 고려하여 어떤 루트가 이용 가능한지를 나타낼 수도 있다.A bit file description may be for a portion of an FPGA. A bit file description may be for a dynamic portion of an FPGA. A bit file description may include which routes are available and/or which routes are not available. For example, a bit file may indicate which routes are available for a dynamic portion of an FPGA, taking into account any routing across the dynamic portion of the FPGA that is required by the static portion(s) of the FPGA.

몇몇 실시형태에서, 비트 파일 디스크립션은 임의의 적절한 방식으로 획득될 수도 있다는 것이 인식되어야 한다. 예를 들면, FPGA 또는 ASIC 제공자에 의해 비트 파일 디스크립션이 제공될 수도 있다.It should be recognized that in some embodiments, the bit file description may be obtained in any suitable manner. For example, the bit file description may be provided by an FPGA or ASIC supplier.

몇몇 실시형태에서, 비트 파일 디스크립션은 설계 도구에 의해 제공될 수도 있다. 이 실시형태에서, 분석 단계는 생략될 수도 있다. 설계 도구는 비트 파일 디스크립션을 출력할 수도 있다. 비트 파일 디스크립션은, FPGA의 동적인 부분을 가로지르는 임의의 필요한 라우팅을 포함하는 FPGA의 정적인 부분에 대한 것일 수도 있다.In some embodiments, the bit file description may be provided by the design tool. In these embodiments, the analysis step may be omitted. The design tool may output the bit file description. The bit file description may be for the static portion of the FPGA, including any necessary routing across the dynamic portion of the FPGA.

비트 파일 디스크립션을 생성하기 위해 임의의 다른 적절한 기술이 사용될 수도 있다는 것이 인식되어야 한다. 앞서 설명된 예제에서, FPGA를 설계하기 위해 사용되는 도구는 비트 파일을 생성하기 위해 사용되는 분석을 제공하기 위해 사용된다.It should be noted that any other suitable technique could be used to generate the bit file description. In the example described above, the tool used to design the FPGA was used to provide the analysis used to generate the bit file.

다른 실시형태에서, 상이한 도구가 사용될 수도 있다는 것이 인식되어야 한다. 도구는 몇몇 실시형태에서, 제품 또는 일정 범위의 제품에 고유할 수도 있다. 예를 들면, FPGA 제공자는, 그 FPGA를 관리하기 위한 관련된 도구를 제공할 수도 있다.It should be recognized that in other embodiments, different tools may be used. In some embodiments, the tools may be product-specific or product-specific. For example, an FPGA supplier may provide relevant tools for managing its FPGA.

다른 실시형태에서, 일반적인 스크립팅 도구가 사용될 수도 있다.In other embodiments, general scripting tools may be used.

몇몇 실시형태에서, 부분적인 비트 파일을 결정하기 위해, 상이한 도구 또는 상이한 기술이 사용될 수도 있다. 예를 들면, 메인 비트 파일은, 어떤 피쳐가 어떤 피쳐에 대응하는지 결정하기 위해 분석될 수도 있다. 이것은 복수의 부분적인 비트 파일이 생성되는 것을 요구할 수도 있다.In some embodiments, different tools or techniques may be used to determine the partial bit files. For example, the main bit file may be analyzed to determine which features correspond to which features. This may require the creation of multiple partial bit files.

단계(S3)는, 네트워크 인터페이스 디바이스가 호스트와 관련하여 설치될 때 수행되고 물리적 FPGA 디바이스 상에서 실행된다는 것이 인식되어야 한다. 단계(S1 및 S2)는, 네트워크 인터페이스 디바이스를 구현하는 비트 파일 이미지를 생성하기 위해 설계 합성 프로세스의 일부로서 수행될 수도 있다. 몇몇 실시형태에서, 단계(S1) 및/또는 단계(S2)는 FPGA의 거동을 특성 묘사하기 위해 사용된다. 일단 FPGA가 특성 묘사되면, 비트 파일 디스크립션은, 주어진 정의된 방식으로 동작할 모든 물리적 네트워크 인터페이스 디바이스에 대한 메모리에서 저장된다.It should be noted that step (S3) is performed when the network interface device is installed in relation to the host and is executed on the physical FPGA device. Steps (S1 and S2) may also be performed as part of a design synthesis process to generate a bit file image implementing the network interface device. In some embodiments, steps (S1) and/or (S2) are used to characterize the behavior of the FPGA. Once the FPGA is characterized, the bit file description is stored in memory for all physical network interface devices that will operate in a given defined manner.

단계(S3)에서, 비트 파일 디스크립션 및 eBPF 프로그램을 사용하여 컴파일이 수행된다. 컴파일의 출력은 eBPF 프로그램에 대한 부분적인 비트 파일이다. 컴파일은, 루트를, 부분적인 비트 파일에 그리고 슬라이스 중 개개의 슬라이스에 의해 수행될 프로그래밍에 추가할 것이다.In step (S3), compilation is performed using the bit file description and the eBPF program. The output of the compilation is a partial bit file for the eBPF program. The compilation will add the root to the partial bit file and the programming to be performed by each slice among the slices.

비트 파일 디스크립션은 전개되는 시스템에서 제공될 수도 있다는 것이 인식되어야 한다. 비트 파일 디스크립션은 메모리에서 저장될 수도 있다. 비트 파일 디스크립션은 FPGA 상에서, 네트워크 인터페이스 디바이스 상에서 또는 호스트 디바이스 상에서 저장될 수도 있다. 몇몇 실시형태에서, 비트 파일 디스크립션은, 네트워크 인터페이스 디바이스 상의 FPGA에 연결되는 플래시 메모리 또는 등등에서 저장된다. 플래시 메모리는 메인 비트 파일을 또한 포함할 수도 있다.It should be noted that the bit file description may be provided by the deployed system. The bit file description may be stored in memory. The bit file description may be stored on the FPGA, on the network interface device, or on the host device. In some embodiments, the bit file description is stored in flash memory connected to the FPGA on the network interface device, or the like. The flash memory may also contain the main bit file.

eBPF 프로그램은 비트 파일 디스크립션과 함께 또는 별개로 저장될 수도 있다. eBPF 프로그램은 FPGA 상에서, 네트워크 인터페이스 디바이스 상에서 또는 호스트 상에서 저장될 수도 있다. eBPF의 경우, 프로그램은 유저 모드 프로그램으로부터 커널로 전송될 수도 있는데, 이들 둘 모두는 호스트 상에서 실행된다. 커널은 프로그램을 디바이스 드라이버로 전송할 것인데, 디바이스 드라이버는, 그 다음, 그것을, 호스트 또는 네트워크 인터페이스 디바이스 중 어느 하나 상에서 실행되는 컴파일러로 전송할 것이다. 몇몇 실시형태에서, eBPF 프로그램은, 호스트 OS가 부팅되기 이전에 실행될 수 있도록, 네트워크 인터페이스 디바이스 상에서 저장될 수도 있다.The eBPF program may be stored together with or separately from the bit file description. The eBPF program may be stored on the FPGA, on the network interface device, or on the host. In the case of eBPF, the program may be transferred from a user mode program to the kernel, both of which run on the host. The kernel will transfer the program to a device driver, which will then transfer it to a compiler running on either the host or the network interface device. In some embodiments, the eBPF program may be stored on the network interface device so that it can be executed before the host OS boots.

컴파일러는 네트워크 인터페이스 디바이스, FPGA 또는 호스트 상의 임의의 적절한 위치에서 제공될 수도 있다. 단지 예로서, 컴파일러는 네트워크 인터페이스 디바이스 상의 CPU 상에서 실행될 수도 있다.The compiler may be provided on the network interface device, the FPGA, or any other suitable location on the host. By way of example only, the compiler may run on the CPU on the network interface device.

이제, 컴파일러 플로우가 설명될 것이다. 컴파일러의 프론트 엔드는 eBPF 프로그램을 받아들인다. eBPF 프로그램은 임의의 적절한 언어로 작성될 수도 있다. 예를 들면, eBPF 프로그램은 C 타입 언어로 작성될 수도 있다. 컴파일러는 프로그램을 중간 표현(intermediate representation; IR)으로 변환하도록 프론트 엔드에서 구성된다. 몇몇 실시형태에서, IR은 LLVM-IR 또는 임의의 다른 적절한 IR일 수도 있다.Now, the compiler flow will be described. The compiler's front end accepts an eBPF program. The eBPF program may be written in any suitable language. For example, it may be written in a C-like language. The compiler is configured in the front end to convert the program into an intermediate representation (IR). In some embodiments, the IR may be LLVM-IR or any other suitable IR.

몇몇 실시형태에서, 패킷/맵 액세스 프리미티브(primitive)를 생성하기 위해 포인터 분석이 수행될 수도 있다.In some embodiments, pointer analysis may be performed to generate packet/map access primitives.

몇몇 실시형태에서, IR의 최적화가 컴파일러에 의해 수행될 수도 있다는 것이 인식되어야 한다. 이것은 몇몇 실시형태에서 옵션 사항일 수도 있다.It should be noted that in some embodiments, optimization of the IR may be performed by the compiler. This may be optional in some embodiments.

컴파일러의 하이 레벨 합성 백엔드는 프로그램 파이프라인을 스테이지로 분할하도록, 패킷 액세스 탭을 생성하도록, 그리고 C 코드를 방출하도록 구성된다. 몇몇 실시형태에서, 설계 도구의 HLS 부분 및/또는 사용되고 있는 설계 도구는, HLS 국면의 출력을 합성하기 위해 호출될 수도 있다.The compiler's high-level synthesis backend is configured to partition the program pipeline into stages, generate packet access tabs, and emit C code. In some embodiments, the HLS portion of the design tool and/or the design tool being used may be called to synthesize the output of the HLS phase.

FPGA 최소 단위에 대한 컴파일러 백엔드는, 파이프라인을 스테이지로 분할하고 패킷 액세스 탭을 생성한다. 제어 종속성을 데이터 종속성으로 변환하기 위해, if 변환(if-conversion)이 수행될 수도 있다. 설계는 배치되고 라우팅된다. eBPF 프로그램에 대한 부분적인 비트 파일은 방출된다.The compiler backend for the FPGA's smallest unit divides the pipeline into stages and generates packet access tabs. An if-conversion may be performed to convert control dependencies into data dependencies. The design is placed and routed. A partial bit file for the eBPF program is generated.

라우팅 충돌이 있는 도 20a에서 도시되는 바와 같이, 라우팅 문제가 발생할 수 있다. 예를 들면, 슬라이스 A는 슬라이스 C와 통신할 수도 있고, 슬라이스 B는 슬라이스 D와 통신할 수도 있다. 도 20a의 배열에서, 공통 라우팅 부분(60)은 슬라이스 A와 슬라이스 C 사이의 통신뿐만 아니라 슬라이스 B와 D 사이의 통신에도 할당되었다. 몇몇 실시형태에서, 이러한 라우팅 충돌은 방지될 수도 있다. 이와 관련하여 도 20b에 대한 참조가 이루어진다. 알 수 있는 바와 같이, 슬라이스 B와 슬라이스 D 사이의 루트(64)와 비교하여, 슬라이스 A와 슬라이스 C 사이에서 별개의 루트(62)가 제공된다.As illustrated in FIG. 20A, where there is a routing conflict, a routing problem may arise. For example, slice A may communicate with slice C, and slice B may communicate with slice D. In the arrangement of FIG. 20A, a common routing portion (60) is allocated not only for communication between slices A and C, but also for communication between slices B and D. In some embodiments, such routing conflicts may be avoided. In this regard, reference is made to FIG. 20B. As can be seen, a separate route (62) is provided between slices A and C, compared to a route (64) between slices B and D.

몇몇 실시형태에서, 비트 파일 디스크립션은 서브유닛의 적어도 몇몇 쌍에 대한 복수의 상이한 루트를 포함할 수도 있다. 컴파일 프로세스는 도 20a에서 도시되는 바와 같은 라우팅 충돌을 체크할 것이다. 라우팅 충돌의 경우, 컴파일러는 루트 중 적절한 대안적인 루트를 선택하는 것에 의해 그러한 충돌을 해결하거나 또는 방지할 수 있다.In some embodiments, a bit file description may include multiple different routes for at least some pairs of subunits. The compilation process will check for routing conflicts, as illustrated in Figure 20a. In the event of a routing conflict, the compiler can resolve or prevent such conflicts by selecting an appropriate alternative route among the routes.

도 21은 eBPF 프로그램을 수행하기 위한 FPGA의 파티션(66)을 도시한다. 파티션은, 예를 들면, 일련의 입력 플립플롭(68) 및 일련의 출력 플립플롭을 통해 FPGA의 정적인 부분과 인터페이싱한다. 몇몇 실시형태에서, 앞서 논의되는 바와 같이 설계 전역에서 라우팅(70)이 있을 수도 있다.Figure 21 illustrates a partition (66) of an FPGA for executing an eBPF program. The partition interfaces with the static portion of the FPGA, for example, through a series of input flip-flops (68) and a series of output flip-flops. In some embodiments, there may be routing (70) throughout the design, as discussed above.

컴파일러는 컴파일러에 의해 구성되고 있는 FPGA 영역에 걸친 라우팅을 처리할 필요가 있을 수도 있다. 컴파일러는 메인 비트 파일 내에서 재구성 가능한 파티션에 맞는 부분적인 비트 파일을 생성할 필요가 있다. 메인 비트 파일이 재구성 가능한 파티션을 가지고 생성되는 경우, 설계 도구는, 부분적인 비트 파일에 의해 로직 리소스가 사용될 수 있도록, 재구성 가능한 파티션 내에서 그들 리소스를 사용하는 것을 방지할 것이다. 그러나, 설계 도구는 재구성 가능한 파티션 내에서 라우팅 리소스의 사용을 방지할 수 없을 수도 있다.The compiler may need to handle routing across the FPGA region being configured by the compiler. The compiler needs to generate partial bit files that correspond to the reconfigurable partitions within the main bit file. If the main bit file is generated with reconfigurable partitions, the design tool will prevent the use of those resources within the reconfigurable partitions so that the logic resources can be used by the partial bit files. However, the design tool may not be able to prevent the use of routing resources within the reconfigurable partitions.

결과적으로, 분석 도구는, 메인 비트 파일 내에 있는 설계 도구에 의해 사용되었던 라우팅 리소스의 사용을 방지할 필요가 있을 것이다. 분석 도구는, 비트 파일 디스크립션에서의 이용 가능한 루트의 자신의 목록이 메인 비트 파일에 의해 사용되고 있는 리소스를 사용하는 어떠한 것도 포함하지 않는다는 것을 보장할 필요가 있을 수도 있다. 이용 가능한 루트는, FPGA가 고도로 규칙적이기 때문에, FPGA 내의 많은 수의 장소에서 사용될 수 있는 루트 템플릿의 관점에서 정의될 수도 있다. 메인 비트 파일에 의해 사용되는 라우팅 리소스는 규칙성을 깨뜨리고, 그들이 메인 비트 파일과 충돌할 장소에서 그들 템플릿을 사용하는 것을 분석 도구가 방지한다는 것을 의미한다. 분석 도구는 그들 장소에서 사용될 수 있는 새로운 루트 템플릿을 생성하는 것 및/또는 소정의 루트 템플릿이 특정한 위치에서 사용되는 것을 방지하는 것을 필요로 할 수도 있다.Consequently, the analysis tool will need to prevent the use of routing resources used by the design tool within the main bit file. The analysis tool may also need to ensure that its list of available routes in the bit file description does not include any routes that use resources already in use by the main bit file. Available routes may be defined in terms of route templates, which can be used in numerous locations within the FPGA, since FPGAs are highly regularized. Routing resources used by the main bit file break the regularity, meaning the analysis tool will prevent the use of these templates in locations where they would conflict with the main bit file. The analysis tool may need to generate new route templates that can be used in these locations and/or prevent certain route templates from being used in certain locations.

이제, 몇몇 예시적인 eBPF 프로그램 단편(fragment)을 최소 단위에 의해 수행될 명령어로 변환함에 있어서 컴파일러에 의해 제공되는 기능의 몇몇 예가 설명될 것이다.Now, some examples of the functionality provided by the compiler in converting some exemplary eBPF program fragments into instructions to be executed by the minimum unit will be described.

몇몇 실시형태는 비트 파일 디스크립션을 생성하기 위해 임의의 적절한 합성 도구를 사용할 수도 있다. 단지 예로서, 몇몇 실시형태는, 하드웨어에 대해 최소 단위 트랜잭션(atomic transaction)을 사용하는 모드에 기초하는 Bluespec(블루스펙) 도구를 사용할 수도 있다.Some embodiments may use any suitable synthesis tool to generate the bit file description. By way of example only, some embodiments may use the Bluespec tool, which relies on a mode that uses atomic transactions for hardware.

제1 예에서, eBPF 프로그램 단편은 두 개의 명령어를 갖는다:In the first example, the eBPF program fragment has two instructions:

명령어 1: r1 += r2Command 1: r1 += r2

명령어 2: r1 += r3Command 2: r1 += r3

제1 명령어는 레지스터 1(r1) 내의 숫자를 레지스터 2(r2)의 숫자에 더하고 결과를 r1에 배치한다. 제2 명령어는 r1을 r3에 더하고 결과를 r1에 배치한다. 이 예에서의 명령어 둘 모두는 64 비트 레지스터를 사용하지만 그러나 가장 낮은 32 비트만을 사용한다. 결과의 상위 32 비트는 0으로 채워진다.The first instruction adds the number in register 1 (r1) to the number in register 2 (r2) and places the result in r1. The second instruction adds r1 to r3 and places the result in r1. Both instructions in this example use 64-bit registers, but only the lowest 32 bits. The upper 32 bits of the result are filled with zeros.

컴파일러는 이들을 최소 단위에 의해 수행될 명령어로 변환할 것이다. 32 비트 가산 명령어(add instruction)는 32 쌍의 룩업 테이블(LUT), 32 비트 캐리 체인(carry chain) 및 32 개의 플립플롭을 필요로 한다.The compiler will translate these into instructions that can be performed by the smallest unit. A 32-bit add instruction requires 32 lookup table (LUT) pairs, a 32-bit carry chain, and 32 flip-flops.

룩업 테이블의 각각의 쌍은 두 개의 비트를 더하여 2 비트 결과를 생성한다. 캐리 체인은, 가산 동안, 한 비트가 숫자 열(digit column)로부터 다음 번 열로 옮겨지는 것을 허용하고, 한 비트가, 감산 동안, 다음 번 열로부터 빌려지는 것을 허용하는 구조이다.Each pair in the lookup table adds two bits to produce a two-bit result. The carry chain is a structure that allows a bit to be carried from one digit column to the next during addition, and allows a bit to be borrowed from the next column during subtraction.

32 개의 플립플롭은, 하나의 클록 사이클 상에서 값을 받아들이고 다음 번 클록 사이클 상에서 그것을 재현하는 저장 엘리먼트이다. 이들은 클록 사이클당 행해지는 작업의 양을 제한하기 위해 그리고 타이밍 분석을 단순화하기 위해 사용될 수도 있다.32 flip-flops are storage elements that accept a value on one clock cycle and reconstruct it on the next clock cycle. They can also be used to limit the amount of work performed per clock cycle and to simplify timing analysis.

몇몇 실시형태에서, FPGA는 다수의 슬라이스를 포함할 수도 있다. 몇몇 예시적인 슬라이스에서, 캐리 체인은 슬라이스의 저부(CIN)로부터 슬라이스의 상단(COUT) - 이것은, 그 다음, 다음 번 슬라이스의 CIN 입력에 연결됨 - 으로 전파된다.In some embodiments, an FPGA may include multiple slices. In some example slices, a carry chain propagates from the bottom of the slice (CIN) to the top of the slice (COUT), which is then connected to the CIN input of the next slice.

각각의 슬라이스가 4 비트 캐리 체인을 갖는 예에서, 32 비트 가산을 수행하기 위해서는 여덟 개의 슬라이스가 사용된다. 이 실시형태에서, 최소 단위는 한 쌍의 슬라이스에 의해 제공되는 것으로서 간주될 수도 있다. 이것은, 몇몇 실시형태에서는 최소 단위가 8 비트 값 상에서 동작하는 것이 편리할 수도 있기 때문이다.In the example where each slice has a 4-bit carry chain, eight slices are used to perform a 32-bit addition. In this embodiment, the minimum unit may be considered to be provided by a pair of slices. This is because in some embodiments, it may be convenient for the minimum unit to operate on an 8-bit value.

각각의 슬라이스가 8 비트 캐리 체인을 갖는 예에서, 32 비트 가산을 수행하기 위해 네 개의 슬라이스가 사용된다. 이 실시형태에서, 최소 단위는 슬라이스에 의해 제공되는 것으로서 간주될 수도 있다.In the example where each slice has an 8-bit carry chain, four slices are used to perform a 32-bit addition. In this embodiment, the minimum unit may be considered as provided by the slice.

이것은 단지 예에 불과하며, 앞서 논의되는 바와 같이, 최소 단위는 임의의 적절한 방식으로서 정의될 수도 있다는 것이 인식되어야 한다.It should be recognized that this is merely an example, and that, as discussed earlier, the smallest unit may be defined in any suitable way.

이 예에서는, 이제, FPGA가 8 비트 캐리 체인을 지원하는 슬라이스를 구비하는 경우가 제1 예시적인 eBPF 프로그램의 단편의 컴파일에서 사용될 것이다.In this example, the case where the FPGA has slices that support an 8-bit carry chain will now be used in the compilation of a fragment of the first exemplary eBPF program.

32 비트 폭의 3 개의 입력 값 및 32 비트 폭의 1 개의 출력 값이 있다. 그들 3 개의 입력 값을 생성한 다른 더 이전의 명령어가 있을 수도 있다. 다음에서는, 슬라이스(최소 단위)의 어떤 임의적인 위치가 가정될 것이다.There are three 32-bit wide input values and one 32-bit wide output value. There may have been other, earlier instructions that generated these three input values. In the following, some arbitrary location of the slice (minimum unit) will be assumed.

다음의 번호 지정 규칙(numbering convention)이 사용될 것이다. 슬라이스(최소 단위)는 규칙적인 행과 열 배열로 배열된다. XnYm은 배열에서의 최소 단위의 위치를 나타낸다. Xn은 열을 나타내고 Ym은 행을 나타낸다. X6Y0은 슬라이스가 열 6에 그리고 행 0에 있다는 것을 나타낸다. 다른 실시형태에서는 임의의 다른 적절한 번호 지정 스킴이 사용될 수도 있다는 것이 인식되어야 한다.The following numbering convention will be used. Slices (minimal units) are arranged in a regular arrangement of rows and columns. XnYm indicates the position of the minimum unit in the array. Xn indicates the column and Ym indicates the row. X6Y0 indicates that the slice is at column 6 and row 0. It should be recognized that any other suitable numbering scheme may be used in other embodiments.

다음의 위치에서 초기 값이 동시에 생성되었다는 것을 가정한다:Assume that the initial values are generated simultaneously at the following locations:

r1: 슬라이스 X6Y0, X6Y1, X6Y2 및 X6Y3r1: slices X6Y0, X6Y1, X6Y2, and X6Y3

r2: 슬라이스 X6Y4, X6Y5, X6Y6 및 X6Y7r2: Slice X6Y4, X6Y5, X6Y6, and X6Y7

r3: 슬라이스 X6Y8, X6Y9, X6Y10 및 X6Y11r3: Slice X6Y8, X6Y9, X6Y10, and X6Y11

제1 명령어의 결과는, 캐리 체인이 위로 올바르게 연결되도록 동일한 열에 있는 네 개의 인접한 슬라이스에 의해 계산될 필요가 있다. 컴파일러는 슬라이스 X7Y0, X7Y1, X7Y2 및 X7Y3에서 그 결과를 계산할 것을 선택할 수도 있다. 그것이 작동하기 위해서는, 입력은 위로 연결될 필요가 있다. X6Y0에서부터 X7Y0으로의 연결, X6Y1로부터 X7Y1로의 다른 연결, X6Y2에서부터 X7Y2로의 연결, X6Y3에서부터 X7Y3으로의 연결이 있을 것이다. 또한, X6Y4-X6Y7에서부터 X7Y0-X7Y3으로의 대응하는 연결이 있을 필요가 있다.The result of the first instruction needs to be computed by four adjacent slices in the same column so that the carry chain is correctly connected upward. The compiler may choose to compute the result in slices X7Y0, X7Y1, X7Y2, and X7Y3. For this to work, the inputs need to be connected upward. There will be a connection from X6Y0 to X7Y0, another connection from X6Y1 to X7Y1, a connection from X6Y2 to X7Y2, and a connection from X6Y3 to X7Y3. There also needs to be a corresponding connection from X6Y4-X6Y7 to X7Y0-X7Y3.

이들은 8 개의 입력 비트의 각각이 대응하는 출력 비트에 연결된다는 것을 의미하는 전체 바이트 연결(full-byte connection)일 것이다. 예를 들면:These would be full-byte connections, meaning that each of the eight input bits is connected to the corresponding output bit. For example:

슬라이스 X6Y0 플립 플립(flip-flip) 0으로부터의 출력이 슬라이스 X7Y0 LUT 0의 입력 0에 연결됨.Output from slice X6Y0 flip-flip 0 is connected to input 0 of slice X7Y0 LUT 0.

슬라이스 X6Y0 플립 플립 1의 출력이 슬라이스 X7Y0 LUT 1의 입력 0에 연결됨.The output of slice X6Y0 flip-flip 1 is connected to input 0 of slice X7Y0 LUT 1.

다음까지 계속 그런 식임It'll be like that until next time

슬라이스 X6Y0 플립 플립 7로부터의 출력이 슬라이스 X7Y0 LUT 7의 입력 0에 연결됨.The output from slice X6Y0 flip-flip 7 is connected to input 0 of slice X7Y0 LUT 7.

제1 클록 사이클 동안, 슬라이스 X6Y0-X6Y7의 r1 및 r2 값은, 슬라이스 X7Y0-X7Y3의 입력으로 전달될 것이고, LUT 및 캐리 체인에 의해 프로세싱될 것이고, 그리고 결과는, 다음 번 사이클 상에서의 사용되도록 준비가 된, 그들 슬라이스 X7Y0-X7Y3의 플립 플립에 저장될 것이다.During the first clock cycle, the r1 and r2 values of slices X6Y0-X6Y7 will be passed to the inputs of slices X7Y0-X7Y3, processed by the LUT and carry chain, and the results will be stored in the flip-flops of those slices X7Y0-X7Y3, ready to be used on the next cycle.

명령어 2로 이동한다. 컴파일러는 명령어 2의 결과를 계산할 장소를 선택할 필요가 있다. 그것은 슬라이스 X7Y4 내지 X7Y7을 선택할 수도 있다. 다시, 명령어 1(X7Y0 내지 X7Y3)의 결과로부터 명령어 2(X7Y4 내지 X7Y7)에 대한 입력까지 전체 바이트 연결이 있을 것이다.Go to instruction 2. The compiler needs to choose where to compute the result of instruction 2. It might choose slices X7Y4 through X7Y7. Again, there will be a full byte concatenation from the result of instruction 1 (X7Y0 through X7Y3) to the input to instruction 2 (X7Y4 through X7Y7).

r3 값도 또한 필요로 된다. r1, r2 및 r3이 사이클 0에서 생성되었다면, 그러면, r1 + r2는 사이클 1에서 생성될 것이다. r3의 값은, 그것이 사이클 1에서 생성되도록, 클록 사이클만큼 지연될 필요가 있다. 컴파일러는 슬라이스 X7Y8 내지 X7Y11을 사용하여 사이클 1에서 r3을 생성할 것을 선택할 수도 있다. 그 다음, 사이클 0(X6Y8 내지 X6Y11)에서 r3을 생성한 원래의 슬라이스로부터 사이클 1(X7Y8 내지 X7Y11)에서 동일한 값을 생성하는 새 슬라이스까지의 연결이 있을 필요가 있을 것이다. 그것을 수행하면, 이제, 그들 새로운 슬라이스로부터 명령어 2에 대한 슬라이스까지의 연결이 있을 필요가 있다. 따라서, 슬라이스 X7Y8로붜터의 출력은 슬라이스 X7Y4의 입력에 연결되고 계속 그런 식이다.The value of r3 is also needed. If r1, r2, and r3 were generated in cycle 0, then r1 + r2 will be generated in cycle 1. The value of r3 needs to be delayed by a clock cycle so that it is generated in cycle 1. The compiler may choose to generate r3 in cycle 1 using slices X7Y8 through X7Y11. Then, there will need to be a connection from the original slice that generated r3 in cycle 0 (X6Y8 through X6Y11) to a new slice that generates the same value in cycle 1 (X7Y8 through X7Y11). Having done that, there now needs to be a connection from those new slices to the slice for instruction 2. Thus, the output of slice X7Y8 is connected to the input of slice X7Y4, and so on.

그러면, FPGA 비트 파일은 다음의 피쳐를 포함할 것이다:Then, the FPGA bit file will contain the following features:

- X6Y0으로부터 X7Y0 입력 0까지의 전체 바이트 연결(초기 r1 바이트 0)- Concatenate all bytes from X6Y0 to X7Y0 input 0 (initial r1 byte 0)

- X6Y1로부터 X7Y1 입력 0까지의 전체 바이트 연결(초기 r1 바이트 1)- Concatenate all bytes from X6Y1 to X7Y1 input 0 (initial r1 byte 1)

- X6Y2로부터 X7Y2 입력 0까지의 전체 바이트 연결(초기 r1 바이트 2)- Concatenate all bytes from X6Y2 to X7Y2 input 0 (initial r1 byte 2)

- X6Y3으로부터 X7Y3 입력 0까지의 전체 바이트 연결(초기 r1 바이트 3)- Concatenate all bytes from X6Y3 to X7Y3 input 0 (initial r1 byte 3)

- X6Y4로부터 X7Y0 입력 1까지의 전체 바이트 연결(초기 r2 바이트 0)- Concatenate all bytes from X6Y4 to X7Y0 input 1 (initial r2 byte 0)

- X6Y5로부터 X7Y1 입력 1까지의 전체 바이트 연결(초기 r2 바이트 1)- Concatenate all bytes from X6Y5 to X7Y1 input 1 (initial r2 byte 1)

- X6Y6으로부터 X7Y2 입력 1까지의 전체 바이트 연결(초기 r2 바이트 2)- Concatenate all bytes from X6Y6 to X7Y2 input 1 (initial r2 byte 2)

- X6Y7로부터 X7Y3 입력 1까지의 전체 바이트 연결(초기 r2 바이트 3)- Concatenate all bytes from X6Y7 to X7Y3 input 1 (initial r2 byte 3)

- X6Y8로부터 X7Y8 입력 0까지의 전체 바이트 연결(초기 r3 바이트 0)- Concatenate all bytes from X6Y8 to X7Y8 input 0 (initial r3 byte 0)

- X6Y9로부터 X7Y9 입력 0까지의 전체 바이트 연결(초기 r3 바이트 1)- Concatenate all bytes from X6Y9 to X7Y9 input 0 (initial r3 byte 1)

- X6Y10으로부터 X7Y10 입력 0까지의 전체 바이트 연결(초기 r3 바이트 2)- Concatenate all bytes from X6Y10 to X7Y10 input 0 (initial r3 byte 2)

- X6Y11로부터 X7Y11 입력 0까지의 전체 바이트 연결(초기 r3 바이트 3)- Concatenate all bytes from X6Y11 to X7Y11 input 0 (initial r3 byte 3)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y0(명령어 1 바이트 0)- Slice X7Y0 (instruction 1 byte 0) configured to add input 0 to input 1

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y1(명령어 1 바이트 1)- Slice X7Y1 (instruction 1 byte 1) configured to add input 0 to input 1

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y2(명령어 1 바이트 2)- Slice X7Y2 (instruction 1 byte 2) configured to add input 0 to input 1

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y3(명령어 1 바이트 3)- Slice X7Y3 (instruction 1 byte 3) configured to add input 0 to input 1

- 입력 0을 출력에 복사하도록 구성되는 슬라이스 X7Y8(r3 지연 바이트 0)- Slice X7Y8 (r3 delay byte 0) configured to copy input 0 to output

- 입력 0을 출력에 복사하도록 구성되는 슬라이스 X7Y9(r3 지연 바이트 1)- Slice X7Y9 (r3 delay byte 1) configured to copy input 0 to output

- 입력 0을 출력에 복사하도록 구성되는 슬라이스 X7Y10(r3 지연 바이트 2)- Slice X7Y10 (r3 delay byte 2) configured to copy input 0 to output

- 입력 0을 출력에 복사하도록 구성되는 슬라이스 X7Y11(r3 지연 바이트 3)- Slice X7Y11 (r3 delay byte 3) configured to copy input 0 to output

- X7Y0으로부터 X7Y4 입력 0까지의 전체 바이트 연결(명령어 1 바이트 0)- Concatenate all bytes from X7Y0 to X7Y4 input 0 (command 1 byte 0)

- X7Y1로부터 X7Y5 입력 0까지의 전체 바이트 연결(명령어 1 바이트 1)- Concatenate all bytes from X7Y1 to X7Y5 input 0 (command 1 byte 1)

- X7Y2로부터 X7Y6 입력 0까지의 전체 바이트 연결(명령어 1 바이트 2)- Concatenate all bytes from X7Y2 to X7Y6 input 0 (command 1 byte 2)

- X7Y3로부터 X7Y7 입력 0까지의 전체 바이트 연결(명령어 1 바이트 3)- Concatenate all bytes from X7Y3 to X7Y7 input 0 (command 1 byte 3)

- X7Y8로부터 X7Y4 입력 1까지의 전체 바이트 연결(r3 지연 바이트 0)- Full byte concatenation from X7Y8 to X7Y4 input 1 (r3 delay byte 0)

- X7Y9로부터 X7Y5 입력 1까지의 전체 바이트 연결(r3 지연 바이트 1)- Full byte concatenation from X7Y9 to X7Y5 input 1 (r3 delay byte 1)

- X7Y10으로부터 X7Y6 입력 1까지의 전체 바이트 연결(r3 지연 바이트 2)- Full byte concatenation from X7Y10 to X7Y6 input 1 (r3 delay byte 2)

- X7Y11로부터 X7Y7 입력 1까지의 전체 바이트 연결(r3 지연 바이트 3)- Full byte concatenation from X7Y11 to X7Y7 input 1 (r3 delay byte 3)

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y4(명령어 2 바이트 0)- Slice X7Y4 (instruction 2 byte 0) configured to add input 0 to input 1

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y5(명령어 2 바이트 1)- Slice X7Y5 (instruction 2 byte 1) configured to add input 0 to input 1

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y6(명령어 2 바이트 2)- Slice X7Y6 (instruction 2 byte 2) configured to add input 0 to input 1

- 입력 0을 입력 1에 가산하도록 구성되는 슬라이스 X7Y7(명령어 2 바이트 3)- Slice X7Y7 (instruction 2 byte 3) configured to add input 0 to input 1

컴파일러는 명령어 2 결과의 상위 32 비트를 생성할 필요가 없는데, 그들이 0인 것으로 공지되어 있기 때문이다. 그것은 단지 그 사실을 메모해 둘 수 있고 그들이 사용될 때마다 제로를 사용할 수 있다.The compiler doesn't need to generate the upper 32 bits of the result of instruction 2 because they are known to be zero. It can simply note that fact and use zeros whenever they are used.

이제, eBPF 단편의 컴파일의 제2 예가 설명될 것이다.Now, a second example of compiling an eBPF fragment will be described.

명령어 1: r1 & = 0xffCommand 1: r1 & = 0xff

명령어 2: r2 & = 0xffCommand 2: r2 & = 0xff

명령어 3: r1 < r2인 경우 L1로 이동함Command 3: If r1 < r2, move to L1

명령어 4: r1 = r2Command 4: r1 = r2

라벨 L1.Label L1.

제1 명령어는 상수 0xff와의 r1의 비트 단위 AND를 수행하고 결과를 r1에 배치한다. 결과에서의 주어진 비트는, 대응하는 비트가 원래 r1에서 1로 설정되었고 대응하는 비트가 상수에서 1로 설정되는 경우, 1로 설정될 것이다. 그렇지 않으면, 그것은 제로로 설정될 것이다. 상수 0xff는 비트 0 내지 7이 설정되게 하고 비트 8 내지 63이 비워지게 하며, 따라서 결과는, r1의 비트 0 내지 7이 변경되지 않을 것이지만 비트 8 내지 63이 제로로 설정될 것이다는 것일 것이다. 비트 8 내지 63가 제로이고 그들을 생성할 필요가 없다는 것을 컴파일러가 이해하기 때문에, 이것은 컴파일러에 대한 것들을 단순화한다. 제2 명령어는 r2에 대해 동일한 것을 행한다.The first instruction performs a bitwise AND of r1 with the constant 0xff and places the result in r1. A given bit in the result will be set to 1 if the corresponding bit was originally set to 1 in r1 and the corresponding bit is set to 1 in the constant. Otherwise, it will be set to zero. The constant 0xff causes bits 0 through 7 to be set and bits 8 through 63 to be cleared, so the result will be that bits 0 through 7 of r1 will remain unchanged, but bits 8 through 63 will be set to zero. This simplifies things for the compiler, since it understands that bits 8 through 63 are zero and there is no need to generate them. The second instruction does the same for r2.

명령어 3은 r1이 r2보다 더 작은지의 여부를 체크하고, 만약 그렇다면, 라벨 L1로 점프한다. 이것은 명령어 4를 스킵한다. 명령어 4는 r2로부터의 값을 r1로 단순히 복사한다. 명령어의 이 시퀀스는, r1 바이트 0 및 r2 바이트 0 중 최소 값을 찾고, 결과를 r1 바이트 0에 배치한다.Instruction 3 checks whether r1 is less than r2, and if so, jumps to label L1. This skips instruction 4, which simply copies the value from r2 to r1. This sequence of instructions finds the minimum of r1 byte 0 and r2 byte 0, and places the result in r1 byte 0.

컴파일러는, 조건부 점프를 선택 명령어(select instruction)로 변환하기 위해 "if 변환(if conversion)"으로 공지되어 있는 기술을 사용할 수도 있다:The compiler may also use a technique known as "if conversion" to convert a conditional jump into a select instruction:

명령어 1: r1 & = 0xffCommand 1: r1 & = 0xff

명령어 2: r2 & = 0xffCommand 2: r2 & = 0xff

명령어 5: c1 = (r1 < r2)Command 5: c1 = (r1 < r2)

명령어 6: r1 = c1 ? r1 : r2Command 6: r1 = c1 ? r1 : r2

명령어 5는 r1을 r2와 비교하고, r1이 r2보다 더 작은 경우 c1을 1로 설정하고 그렇지 않다면 c1을 제로로 설정한다. 명령어 6은, c1이 설정되는 경우 r1을 r1로 복사하고(이것은 아무런 효과도 가지지 않음) 그렇지 않으면 r2를 r1로 복사하는 선택 명령어이다. c1이 1과 동일하면, 그러면, 명령어 3은 명령어 4를 스킵할 것인데, 이것은, r1이 명령어 1로부터의 자신의 값을 유지할 것이다는 것을 의미한다. 이 경우, 선택 명령어는 r1도 또한 변경되지 않은 상태로 유지한다. c1이 제로와 동일하면, 그러면, 명령어 3은 명령어 4를 스킵하지 않을 것이고, 따라서 r2는 명령어 4에 의해 r1에 복사될 것이다. 다시, 선택 명령어는 r2를 r1로 복사할 것이고, 따라서, 새로운 시퀀스는 이전 시퀀스와 동일한 효과를 갖는다.Instruction 5 compares r1 to r2 and sets c1 to 1 if r1 is less than r2, otherwise it sets c1 to zero. Instruction 6 is a select instruction that copies r1 to r1 if c1 is set (which has no effect), otherwise it copies r2 to r1. If c1 is equal to 1, then instruction 3 will skip instruction 4, which means r1 will keep its value from instruction 1. In this case, the select instruction also leaves r1 unchanged. If c1 is equal to zero, then instruction 3 will not skip instruction 4, so r2 will be copied to r1 by instruction 4. Again, the select instruction will copy r2 to r1, so the new sequence has the same effect as the previous sequence.

명령어 6은 유효한 eBPF 명령어가 아니다. 그러나, 컴파일러가 작동하고 있는 동안, 명령어는 LLVM-IR에서 표현된다. 명령어 6은 LLVM-IR에서 유효한 명령어일 것이다.Instruction 6 is not a valid eBPF instruction. However, while the compiler is running, the instruction is represented in LLVM-IR. Instruction 6 will be a valid instruction in LLVM-IR.

이제 이들 명령어는 최소 단위로 할당될 필요가 있다. 입력 r1이 슬라이스 X0Y0 내지 X0Y7에서 이용 가능하고 r2가 슬라이스 X0Y8 내지 X0Y15에서 이용 가능하다는 것을 가정한다. 명령어 1 및 2는, 컴파일러로 하여금 r1 및 r2의 상위 7 바이트가 제로로 설정되었다는 것을 메모하게 한다.Now these instructions need to be allocated in minimal units. Assume that input r1 is available in slices X0Y0 to X0Y7 and r2 is available in slices X0Y8 to X0Y15. Instructions 1 and 2 cause the compiler to note that the upper 7 bytes of r1 and r2 are set to zero.

그 다음, 컴파일러는 슬라이스 X1Y0에서 명령어 5의 결과를 계산할 것을 선택할 수도 있다. 슬라이스 X0Y0의 출력으로부터 슬라이스 X1Y0의 입력 0까지 전체 바이트 연결이 요구되고 슬라이스 X0Y8의 출력으로부터 슬라이스 X1Y0의 입력 1까지 전체 바이트 연결이 요구된다. 두 값을 비교하는 방식은, 다른 값으로부터 하나의 값을 감산하고, 다음 번 위쪽 비트로부터 빌리는 것을 시도하는 것에 의해 계산이 오버플로되는지를 확인하는 것이다. 이 비교의 결과는, 그 다음, 슬라이스 X1Y1의 플립플롭 7에 저장된다.Next, the compiler may choose to compute the result of instruction 5 in slice X1Y0. This requires a full byte concatenation from the output of slice X0Y0 to input 0 of slice X1Y0, and a full byte concatenation from the output of slice X0Y8 to input 1 of slice X1Y0. The way the two values are compared is to see if the calculation overflows by subtracting one value from the other and then attempting to borrow from the next higher bit. The result of this comparison is then stored in flip-flop 7 of slice X1Y1.

제1 예와 같이, r1 및 r2는 명령어 6에 올바른 시간에 값을 제공하기 위해 한 사이클만큼 지연될 필요가 있을 것이다. 컴파일러는 r1 및 r2에 대해 슬라이스 X1Y1 및 X1Y2를 각각 사용할 수도 있다.As in the first example, r1 and r2 would need to be delayed by one cycle to provide their values at the correct time for instruction 6. The compiler could also use slices X1Y1 and X1Y2 for r1 and r2, respectively.

선택 명령어는 세 가지 입력: c1, r1 및 r2를 필요로 한다. r1 및 r2는 1 바이트 폭이지만, 그러나 c1은 단지 1 비트 폭이다는 것을 유의한다. 컴파일이 선택 명령어 슬라이스 X2Y0의 결과를 계산한다고 가정한다. 선택은 슬라이스 X2Y0 내의 각각의 LUT가 1 비트를 핸들링하면서 비트 단위 기반으로 수행된다:The select instruction requires three inputs: c1, r1, and r2. Note that r1 and r2 are 1 byte wide, but c1 is only 1 bit wide. Assume the compilation computes the result of the select instruction slice X2Y0. The selection is performed on a bit-by-bit basis, with each LUT in slice X2Y0 handling 1 bit:

c1이 설정되면, 그러면, 결과의 비트 0은 r1 비트 0 및 r2 비트 0임If c1 is set, then bit 0 of the result is r1 bit 0 and r2 bit 0

그렇지 않고other

c1이 설정되면 그러면 결과의 비트 1은 r1 비트 1 및 r2 비트 1임If c1 is set, then bit 1 of the result is r1 bit 1 and r2 bit 1

그렇지 않고other

... 다음까지 계속 그런 식임...and it will continue like that until next time

c1이 설정되면 그러면, 결과의 비트 7은 r1 비트 7 및 r2 비트 7임If c1 is set, then bit 7 of the result is r1 bit 7 and r2 bit 7

그렇지 않고.other.

각각의 LUT는 r1로부터의 대응하는 비트 및 r2로부터의 대응하는 비트에 액세스할 필요가 있을 수도 있지만, 그러나 모든 LUT는 c1에 액세스할 필요가 있다. 이것은, c1이 슬라이스의 입력 0의 비트에 걸쳐 복제될 필요가 있다는 것을 의미한다. 따라서 명령어 6의 입력에 대한 연결은 다음의 것일 것이다:Each LUT may need to access a corresponding bit from r1 and a corresponding bit from r2, but all LUTs need to access c1. This means that c1 needs to be replicated across the bits of input 0 of the slice. Therefore, the connection to the input of instruction 6 would be:

슬라이스 X1Y0의 출력의 비트 7을 슬라이스 X2Y0의 입력 0에 복제함.Duplicate bit 7 of the output of slice X1Y0 to input 0 of slice X2Y0.

슬라이스 X1Y1의 출력으로부터 슬라이스 X2Y0의 입력 1까지의 전체 바이트 연결.Concatenate all bytes from the output of slice X1Y1 to input 1 of slice X2Y0.

슬라이스 X1Y2의 출력으로부터 슬라이스 X2Y0의 입력 2까지의 전체 바이트 연결.Concatenate all bytes from the output of slice X1Y2 to input 2 of slice X2Y0.

해결될 필요가 있는 다른 문제는 시프트 명령어에 관련된다. 다음의 예를 고려한다:Another issue that needs to be addressed concerns shift instructions. Consider the following example:

5 비트만큼의 16 비트 좌측 시프트는 다음의 것을 할 필요가 있다:A 16-bit left shift of 5 bits would require:

출력 비트 0을 제로로 설정함Set output bit 0 to zero

출력 비트 1을 제로로 설정함Set output bit 1 to zero

출력 비트 2를 제로로 설정함Set output bit 2 to zero

출력 비트 3을 제로로 설정함Set output bit 3 to zero

출력 비트 4를 제로로 설정함Set output bit 4 to zero

입력 비트 0을 출력 비트 5에 복사함Copy input bit 0 to output bit 5

입력 비트 1을 출력 비트 6에 복사함Copy input bit 1 to output bit 6

......

입력 비트 10을 출력 비트 15에 복사함Copy input bit 10 to output bit 15

여기에서 입력 및 출력은 연결의 것이다는 것을 유의한다. 연결의 입력은 제1 슬라이스의 출력으로부터 유래한다. 연결의 출력은 제2 슬라이스의 입력으로 진행한다.Note that the input and output here are connections. The input of the connection comes from the output of the first slice. The output of the connection goes to the input of the second slice.

슬라이스 내에서 이러한 종류의 연결을 만드는 것이 가능하지 않을 수도 있지만, 그러나 오히려 슬라이스 사이의 상호 접속에 의해 가능할 수도 있다. 컴파일러는, 16 비트 입력 값이 동일한 열 내의 두 개의 인접한 슬라이스에 의해 생성되었다는 것을 가정할 수 있는데, 값이 그곳에서 생성된다는 것을 컴파일러가 확인할 수 있기 때문이다.This type of connection may not be possible within a slice, but rather through interconnections between slices. The compiler can assume that a 16-bit input value is generated by two adjacent slices within the same column, because it can verify that the value was generated there.

한 예로서, 입력이 슬라이스 X0Y4 및 X0Y5에 의해 생성된다는 것 및 출력이 슬라이스 X1Y4 및 X1Y5로 진행한다는 것을 가정한다. 그 경우, 다음의 연결이 요구된다:As an example, assume that the inputs are generated by slices X0Y4 and X0Y5 and the outputs go to slices X1Y4 and X1Y5. In that case, the following connections are required:

슬라이스 X1Y4 비트 0은 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 0 is known to be zero and is therefore not needed.

슬라이스 X1Y4 비트 1은 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 1 is known to be zero and therefore not needed.

슬라이스 X1Y4 비트 2는 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 2 is known to be zero and therefore not required.

슬라이스 X1Y4 비트 3은 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 3 is known to be zero and therefore not required.

슬라이스 X1Y4 비트 4는 제로인 것으로 공지되어 있고 따라서 필요로 되지 않는다Slice X1Y4 bit 4 is known to be zero and therefore not required.

슬라이스 X1Y4 비트 5는 슬라이스 X0Y4 비트 0으로부터 유래한다Slice X1Y4 bit 5 is derived from slice X0Y4 bit 0

슬라이스 X1Y4 비트 6은 슬라이스 X0Y4 비트 1로부터 유래한다Slice X1Y4 bit 6 is derived from slice X0Y4 bit 1

슬라이스 X1Y4 비트 7은 슬라이스 X0Y4 비트 2로부터 유래한다Slice X1Y4 bit 7 is derived from slice X0Y4 bit 2

슬라이스 X1Y5 비트 0은 슬라이스 X0Y4 비트 3으로부터 유래한다Slice X1Y5 bit 0 comes from slice X0Y4 bit 3

슬라이스 X1Y5 비트 1은 슬라이스 X0Y4 비트 4로부터 유래한다Slice X1Y5 bit 1 comes from slice X0Y4 bit 4

슬라이스 X1Y5 비트 2는 슬라이스 X0Y4 비트 5로부터 유래한다Slice X1Y5 bit 2 is derived from slice X0Y4 bit 5

슬라이스 X1Y5 비트 3은 슬라이스 X0Y4 비트 6으로부터 유래한다Slice X1Y5 bit 3 is derived from slice X0Y4 bit 6

슬라이스 X1Y5 비트 4는 슬라이스 X0Y4 비트 7로부터 유래한다Slice X1Y5 bit 4 is derived from slice X0Y4 bit 7

슬라이스 X1Y5 비트 5는 슬라이스 X0Y5 비트 0으로부터 유래한다Slice X1Y5 bit 5 is derived from slice X0Y5 bit 0

슬라이스 X1Y5 비트 6은 슬라이스 X0Y5 비트 1로부터 유래한다Slice X1Y5 bit 6 is derived from slice X0Y5 bit 1

슬라이스 X1Y5 비트 7은 슬라이스 X0Y5 비트 2로부터 유래한다Slice X1Y5 bit 7 is derived from slice X0Y5 bit 2

슬라이스 X1Y5의 입력에 대한 8 개의 연결은 시프트된 연결 또는 시프트된 루트로서 간주될 수 있다. 슬라이스 X1Y4에 대해 동일한 구조가 사용될 수 있지만, 그러나 X1Y3 및 X1Y4로부터의 입력을 가지는데, 비트 5-7이 매치하고 슬라이스가 비트 0-4를 무시할 수 있고 따라서 그곳에서 어떤 입력이 제시되는지가 중요하지 않기 때문이다.The eight connections to the inputs of slice X1Y5 can be considered as shifted connections or shifted roots. The same structure can be used for slice X1Y4, but with inputs from X1Y3 and X1Y4, since bits 5-7 match and the slice can ignore bits 0-4, so it does not matter which input is presented there.

1 비트와 7 비트 사이에서 임의의 양만큼 시프트할 수 있을 필요가 있을 수도 있다. 0 비트 또는 8 비트만큼 시프트하는 연결은, 그 경우에 각각의 비트가 다른 슬라이스의 대응하는 비트에 연결되기 때문에, 전체 바이트 연결과 바로 동일하다.It may be necessary to be able to shift by any amount between 1 and 7 bits. A concatenation that shifts by 0 or 8 bits is exactly equivalent to a full byte concatenation, since in that case each bit is concatenated to the corresponding bit in the other slice.

가변 양만큼의 시프팅은, 시프트되고 있는 값의 폭에 따라, 두 개 또는 세 개의 스테이지에서 행해질 수도 있다. 스테이지는 다음과 같다:Shifting by a variable amount may be performed in two or three stages, depending on the width of the values being shifted. The stages are as follows:

스테이지 1: 0, 1, 2 또는 3만큼 시프트함.Stage 1: Shift by 0, 1, 2, or 3.

스테이지 2: 0, 4, 8 또는 12만큼 시프트함.Stage 2: Shift by 0, 4, 8, or 12.

스테이지 3: 0, 16, 32 또는 48만큼 시프트함(32 비트 또는 64 비트 전용).Stage 3: Shift by 0, 16, 32, or 48 (32-bit or 64-bit only).

다른 예로서, 가변적인 양만큼의 바이트의 산술적 우측 시프트가 있다고 가정하면, 시프트될 값은 슬라이스 X3Y2에 의해 생성되고 시프트 양은 X3Y3에 의해 생성된다.As another example, suppose there is an arithmetic right shift of a variable amount of bytes, the value to be shifted is produced by the slice X3Y2 and the shift amount is produced by X3Y3.

산술적 우측 시프트는 "산술적 우측 시프트" 타입의 연결을 필요로 한다. 이 타입의 연결은, 하나의 슬라이스의 출력을 취하고 그들을 다른 슬라이스의 입력에 연결하지만, 그러나, 프로세스에서 일정한 양만큼 그들을 우측으로 시프트하여, 필요에 따라 부호 비트를 복제한다.Arithmetic right shift requires a connection of the "arithmetic right shift" type. This type of connection takes the outputs of one slice and connects them to the inputs of another slice, but shifts them to the right by a constant amount in the process, duplicating the sign bit as needed.

예를 들면, "3만큼의 산술적 우측 시프트" 연결은 다음과 같을 것이다:For example, the "arithmetic right shift by 3" chain would be:

출력 비트 0은 입력 비트 3으로부터 유래함Output bit 0 comes from input bit 3

출력 비트 1은 입력 비트 4로부터 유래함Output bit 1 comes from input bit 4

출력 비트 2는 입력 비트 5로부터 유래함Output bit 2 comes from input bit 5

출력 비트 3은 입력 비트 6으로부터 유래함Output bit 3 comes from input bit 6

출력 비트 4는 입력 비트 7로부터 유래함Output bit 4 comes from input bit 7

출력 비트 5는 입력 비트 7(부호 비트)로부터 유래함Output bit 5 is derived from input bit 7 (sign bit)

출력 비트 6은 입력 비트 7(부호 비트)로부터 유래함Output bit 6 is derived from input bit 7 (sign bit)

출력 비트 7은 입력 비트 7(부호 비트)로부터 유래함Output bit 7 is derived from input bit 7 (sign bit)

스테이지 1은 슬라이스 X4Y2에서 계산될 수도 있는데, 이 경우, 그것은 다음의 연결을 필요로 할 것이다:Stage 1 could also be computed on slice X4Y2, in which case it would require the following concatenation:

슬라이스 X3Y2로부터 슬라이스 X4Y2 입력 0까지의 전체 바이트Total bytes from slice X3Y2 to slice X4Y2 input 0

슬라이스 X3Y2로부터 슬라이스 X4Y2 입력 1까지의 1만큼의 산술적 우측 시프트Arithmetic right shift of 1 from slice X3Y2 to slice X4Y2 input 1

슬라이스 X3Y2로부터 슬라이스 X4Y2 입력 2까지의 2만큼의 산술적 우측 시프트Arithmetic right shift by 2 from slice X3Y2 to slice X4Y2 input 2

슬라이스 X3Y2로부터 슬라이스 X4Y2 입력 3까지의 3만큼의 산술적 우측 시프트Arithmetic right shift of 3 from slice X3Y2 to slice X4Y2 input 3

슬라이스 X3Y3 비트 0을 슬라이스 X4Y2 입력 4에 복제함Duplicate slice X3Y3 bit 0 to slice X4Y2 input 4

슬라이스 X3Y3 비트 1을 슬라이스 X4Y2 입력 5에 복제함Duplicate slice X3Y3 bit 1 to slice X4Y2 input 5

그 다음, 슬라이스 X4Y2는 다음과 같이 입력 4 및 입력 5에 기초하여 처음 네 개의 입력 중 하나를 선택하도록 구성된다:Next, slice X4Y2 is configured to select one of the first four inputs based on inputs 4 and 5 as follows:

입력 4가 0이고 입력 5가 0임: 입력 0을 선택함Input 4 is 0 and input 5 is 0: Select input 0

입력 4가 1이고 입력 5가 0임: 입력 1을 선택함Input 4 is 1 and input 5 is 0: Select input 1

입력 4가 0이고 입력 5가 1임: 입력 2를 선택함Input 4 is 0 and input 5 is 1: Select input 2

입력 4가 1이고 입력 5가 1임: 입력 3을 선택함Input 4 is 1 and input 5 is 1: Select input 3

시프트 양은 지연된 버전을 제공하기 위해 슬라이스 X3Y3으로부터 슬라이스 X4Y3으로 복사될 수도 있다.The shift amount may also be copied from slice X3Y3 to slice X4Y3 to provide a delayed version.

스테이지 2는 슬라이스 X5Y2에서 계산될 수도 있는데, 이 경우, 그것은 다음의 연결을 필요로 할 것이다:Stage 2 could also be computed on slice X5Y2, in which case it would require the following concatenation:

슬라이스 X4Y2로부터 슬라이스 X5Y2 입력 0까지의 전체 바이트Total bytes from slice X4Y2 to slice X5Y2 input 0

슬라이스 X4Y2로부터 슬라이스 X5Y2 입력 1까지의 4만큼의 산술적 우측 시프트Arithmetic right shift of 4 from slice X4Y2 to slice X5Y2 input 1

슬라이스 X4Y3 비트 2를 슬라이스 X5Y2 입력 2에 복제함Duplicate slice X4Y3 bit 2 to slice X5Y2 input 2

그 다음, 슬라이스 X5Y2는 다음과 같이 입력 2에 기초하여 입력 0 또는 입력 1을 선택하도록 구성될 것이다:Next, slice X5Y2 will be configured to select input 0 or input 1 based on input 2 as follows:

입력 2가 0임: 입력 0을 선택함Input 2 is 0: Select input 0

입력 2가 1임: 입력 1을 선택함Input 2 is 1: Select input 1

슬라이스 X5Y2의 출력은 가변 산술적 우측 시프트 연산(variable arithmetic shift right operation)의 결과일 것이다.The output of slice X5Y2 will be the result of a variable arithmetic shift right operation.

주어진 최소 단위에 대한 비트 파일은 다음과 같을 수도 있다:A bit file for a given minimum unit might look like this:

최소 단위의 신원 정보(identity information)The smallest unit of identity information

주어진 최소 단위가 입력 및 그 입력에 대한 이용 가능한 루트를 수신할 수 있는 다른 최소 단위의 목록.A list of other minimal units that can receive input and available routes for that input for a given minimal unit.

주어진 최소 단위가 출력 및 그 출력에 대한 이용 가능한 루트를 제공할 수 있는 다른 최소 단위의 목록A given minimal unit is a list of other minimal units that can provide output and available routes to that output.

FPGA가 규칙적인 구조체이기 때문에, 필요에 따라 최소 단위의 개개의 최소 단위에 대한 수정을 갖는 복수의 최소 단위에 대해 사용될 수 있는 공통 템플릿이 있을 수도 있다는 것이 인식되어야 한다.It should be recognized that since FPGAs are regular structures, there may be a common template that can be used for multiple minimum units with modifications to the individual minimum units as needed.

예로서, 슬라이스 X7Y1에 대한 비트 파일 디스크립션은 다음의 가능한 입력 및 출력을 명시할 수도 있다:As an example, a bit file description for slice X7Y1 might specify the following possible inputs and outputs:

루트 A 또는 루트 B를 통한 X6Y1로부터의 입력Input from X6Y1 via root A or root B

루트 C 또는 루트 D를 통한 X6Y5로부터의 입력Input from X6Y5 via root C or root D

루트 E 또는 루트 F를 통한 X7Y0로부터의 입력Input from X7Y0 via root E or root F

루트 G 또는 루트 H를 통한 X8Y1로의 출력Output to X8Y1 via root G or root H

루트 I 또는 루트 J를 통한 X7Y2로의 출력Output to X7Y2 via root I or root J

루트 K 또는 루트 L을 통한 X7Y5로의 출력.Output to X7Y5 via root K or root L.

컴파일러는, 다음의 것의 앞서 설명된 제1 eBPF 예에 대한 슬라이스 X7Y1의 입력 및 출력에 대한 부분적인 비트 파일을 제공하기 위해 이 비트 파일 디스크립션을 사용할 것이다.The compiler will use this bit file description to provide partial bit files for the input and output of slice X7Y1 for the first eBPF example described above.

루트 A를 통한 X6Y1로부터의 입력Input from X6Y1 via root A

루트 C를 통한 X6Y5로부터의 입력Input from X6Y5 via root C

예로서, 슬라이스 XnYm에 대한 비트 파일 디스크립션은 다음의 가능한 입력 및 출력을 명시할 수도 있다:As an example, a bit file description for slice XnYm might specify the following possible inputs and outputs:

루트 A 또는 루트 B를 통한 Xn-1Ym으로부터의 입력Input from Xn-1Ym via root A or root B

루트 C 또는 루트 D를 통한 Xn-1Ym+4로부터의 입력Input from Xn-1Ym+4 via root C or root D

루트 E 또는 루트 F를 통한 XnYm-1로부터의 입력Input from XnYm-1 via root E or root F

루트 G 또는 루트 H를 통한 Xn+1Ym으로의 출력Output to Xn+1Ym via root G or root H

루트 I 또는 루트 J를 통한 XnYm+1로의 출력Output to XnYm+1 via root I or root J

루트 K 또는 루트 L을 통한 XnYm+4로의 출력.Output to XnYm+4 via root K or root L.

이 비트 파일 디스크립션은, 앞서 설명되는 바와 같이, 컴파일러가 사용하기에 이용 가능하지 않은 하나 이상의 루트를 제거하도록 수정될 수도 있다. 이것은, 루트가 다른 최소 단위에 의해 사용되거나 또는 파티션을 통한 라우팅을 위해 사용되기 때문일 수도 있다.This bit file description may be modified to remove one or more roots that are not available for use by the compiler, as described above. This may be because the roots are used by other atomic units or for routing through partitions.

컴파일러는, 하나 이상의 컴퓨터 프로세서에 의해 실행될 수도 있는 컴퓨터 실행 가능 명령어를 포함하는 컴퓨터 프로그램에 의해 구현될 수도 있다는 것이 인식되어야 한다. 컴파일러는 하나 이상의 메모리와 연계하여 동작하는 적어도 하나의 프로세서와 같은 하드웨어 상에서 실행될 수도 있다.It should be recognized that a compiler may be implemented by a computer program comprising computer-executable instructions that may be executed by one or more computer processors. The compiler may also be executed on hardware, such as at least one processor operating in conjunction with one or more memories.

상기에서 예시적인 실시형태를 설명하지만, 본 발명의 범위를 벗어나지 않으면서 개시된 솔루션에 대해 이루어질 수도 있는 여러 가지 변형 및 수정이 있다는 것을 유의한다.While exemplary embodiments have been described above, it is noted that there are many variations and modifications that can be made to the disclosed solution without departing from the scope of the present invention.

따라서, 실시형태는 첨부된 청구범위의 범위 내에서 변할 수도 있다. 일반적으로, 몇몇 실시형태는 하드웨어 또는 특수 목적 회로, 소프트웨어, 로직 또는 이들의 임의의 조합으로 구현될 수도 있다. 예를 들면, 몇몇 양태는 하드웨어로 구현될 수도 있고, 한편, 다른 양태는, 비록 실시형태가 컨트롤러, 마이크로프로세서 또는 다른 컴퓨팅 디바이스로 제한되지는 않지만, 이들에 의해 실행될 수도 있는 펌웨어 또는 소프트웨어로 구현될 수도 있다.Accordingly, embodiments may vary within the scope of the appended claims. In general, some embodiments may be implemented in hardware or special purpose circuitry, software, logic, or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device, although the embodiments are not limited to such devices.

실시형태는, 메모리에 저장되며 수반된 엔티티의 적어도 하나의 데이터 프로세서에 의해 실행 가능한 컴퓨터 소프트웨어에 의해, 또는 하드웨어에 의해, 또는 소프트웨어 및 하드웨어의 조합에 의해 구현될 수도 있다.Embodiments may be implemented by computer software stored in memory and executable by at least one data processor of the accompanying entity, or by hardware, or by a combination of software and hardware.

소프트웨어는, 메모리 칩과 같은 물리적 매체, 또는 프로세서 내에서 구현되는 메모리 블록, 하드 디스크 또는 플로피 디스크와 같은 자기 매체, 및 예를 들면, DVD 및 그 데이터 변이체인 CD와 같은 광학 매체 상에 저장될 수도 있다.The software may be stored on physical media such as memory chips, or memory blocks implemented within a processor, magnetic media such as a hard disk or floppy disk, and optical media such as, for example, DVDs and their data variants, CDs.

메모리는 로컬 기술 환경에 적절한 임의의 타입의 것일 수도 있고, 반도체 기반의 메모리 디바이스, 자기 메모리 디바이스 및 시스템, 광학 메모리 디바이스 및 시스템, 고정식 메모리 및 이동식 메모리와 같은 임의의 적절한 데이터 저장 기술을 사용하여 구현될 수도 있다.The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory, and removable memory.

데이터 프로세서는, 로컬 기술 환경에 적절한 임의의 타입의 것일 수도 있고, 비제한적인 예로서, 범용 컴퓨터, 특수 목적 컴퓨터, 마이크로프로세서, 디지털 신호 프로세서(digital signal processor; DSP), 주문형 집적 회로(ASIC), 게이트 레벨 회로 및 멀티 코어 프로세서 아키텍쳐에 기초한 프로세서 중 하나 이상을 포함할 수도 있다.The data processor may be of any type appropriate to the local technical environment and may include, but is not limited to, one or more of a general purpose computer, a special purpose computer, a microprocessor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a gate level circuit, and a processor based on a multi-core processor architecture.

첨부의 도면 및 첨부된 청구범위와 연계하여 판독될 때, 전술한 설명을 고려하여 관련 기술 분야에서 숙련된 자에 다양한 수정 및 적응이 명백하게 될 수도 있다. 그러나, 본 교시의 모든 그러한 그리고 유사한 수정은, 첨부된 청구범위에서 정의되는 바와 같은 범위 내에 여전히 속할 것이다.When read in conjunction with the accompanying drawings and the appended claims, various modifications and adaptations may become apparent to those skilled in the art in light of the foregoing description. However, all such and similar modifications of the present teachings will remain within the scope defined in the appended claims.

Claims

As a network interface device for interfacing a host device to a network,
A first interface configured to receive a plurality of data packets;
A configurable hardware module, wherein the hardware module comprises a plurality of processing units, each processing unit being associated with a predefined type of operation executable in a single step, and wherein at least some of the plurality of processing units are associated with different predefined types of operations.
Includes,
The hardware module is configured to interconnect at least some of the plurality of processing units to provide a first data processing pipeline for processing at least one of the plurality of data packets to perform a first function on at least one of the plurality of data packets,
A network interface device, wherein at least two of the plurality of processing units within the first data processing pipeline are configured to perform their respective predefined types of operations in parallel.

In the first paragraph,
At least two of the plurality of processing units described above are:
To perform its associated predefined type of operation within a predefined length of time defined by a clock signal; and
In response to the expiration of the above predefined length of time, transmit the result of each of the at least one operations to the next processing unit.
A network interface device that is configured.

In claim 1 or 2,
A network interface device, wherein each of said plurality of processing units includes an application specific integrated circuit configured to perform at least one operation associated with each of said processing units.

In claim 1 or 2,
At least one of the plurality of processing units includes a digital circuit and a memory that stores a state related to processing performed by the digital circuit,
A network interface device, wherein the digital circuit is configured to communicate with the memory and perform the predefined type of operation associated with each of the processing units.

In claim 1 or 2,
Two or more of the above plurality of processing units include accessible memory,
The above memory is configured to store a state associated with the first data packet,
A network interface device, wherein during performance of the first function by the hardware module, two or more of the plurality of processing units are configured to access and modify the state.

In paragraph 5,
A network interface device, wherein at least some of the plurality of processing units, a first processing unit, is configured to stall during access of the value of the state by a second processing unit of the plurality of processing units.

In claim 1 or 2,
A network interface device, wherein one or more of the plurality of processing units are individually configurable to perform operations specific to each pipeline based on their associated predefined types of operations.

In claim 1 or 2,
The hardware module is configured to receive a command and, in response to the command:
Interconnecting at least some of said plurality of processing units to provide a data processing pipeline for processing one or more of said plurality of data packets;
Causing one or more of the above plurality of processing units to perform an operation of its associated predefined type on one or more data packets;
Adding one or more of the above plurality of processing units to a data processing pipeline; and
Removing one or more of the plurality of processing units from the data processing pipeline;
A network interface device configured to perform at least one of the following:

In claim 1 or 2,
The above predefined actions are:
Loading at least one value of the first data packet from memory;
storing at least one value of a data packet in memory; and
Performing a lookup against a lookup table to determine what action to perform on the data packet.
A network interface device comprising at least one of:

In claim 1 or 2,
At least one of said plurality of processing units is configured to transmit at least one result of its associated at least one predefined operation to a subsequent processing unit in said first data processing pipeline,
A network interface device, wherein the above-described subsequent processing unit is configured to perform a next predefined action depending on the at least one result.

In claim 1 or 2,
A network interface device, wherein each of the operations of the above different predefined types is defined by a different template.

In claim 1 or 2,
The behavior of the above predefined types is:
Accessing data packets;
Accessing a lookup table stored within the memory of the above hardware module;
Performing logical operations on data loaded from data packets; and
Performing logical operations on data loaded from the above lookup table
A network interface device comprising at least one of:

In claim 1 or 2,
The above hardware module includes routing hardware,
A network interface device, wherein the hardware module is configured to interconnect at least some of the plurality of processing units to provide the first data processing pipeline by configuring the routing hardware to route data packets among the plurality of processing units in a specific order defined by the first data processing pipeline.

In claim 1 or 2,
A network interface device, wherein the hardware module is configured to interconnect at least some of the plurality of processing units to provide a second data processing pipeline for processing one or more of the plurality of data packets to perform a second function different from the first function.

In claim 1 or 2,
A network interface device, wherein the hardware module is configured to interconnect at least some of the plurality of processing units to provide a second data processing pipeline after interconnecting at least some of the plurality of processing units to provide a first data processing pipeline.

In claim 1 or 2,
A network interface device comprising additional circuitry separate from the hardware module and configured to perform the first function for one or more of the plurality of data packets.

In Article 16,
The above additional circuit part:
field programmable gate array; and
Multiple central processing units
A network interface device comprising at least one of:

In Article 16,
The above network interface device comprises at least one controller,
The additional circuit unit is configured to perform the first function on a data packet during a compilation process for the first function to be performed in the hardware module,
A network interface device, wherein said at least one controller is configured to control said hardware module to start performing said first function on a data packet in response to completion of said compilation process.

In Article 18,
A network interface device, wherein said at least one controller is configured to control said additional circuitry to stop performing said first function for a data packet in response to a determination that said compilation process for said first function to be performed in said hardware module is complete.

In Article 16,
The above network interface device comprises at least one controller,
The hardware module is configured to perform the first function on a data packet during a compilation process for the first function to be performed in the additional circuit unit,
A network interface device, wherein said at least one controller is configured to determine that said compilation process for said first function to be performed in said additional circuitry is complete, and in response to said determination, control said additional circuitry to start performing said first function for a data packet.

In Article 20,
A network interface device, wherein said at least one controller is configured to control said hardware module to stop performing said first function for a data packet in response to a determination that said compilation process for said first function to be performed in said additional circuitry has been completed.

In claim 1 or 2,
A network interface device comprising at least one controller configured to perform a compilation process to provide the first function to be performed in the hardware module.

A data processing system comprising a network interface device according to claim 1 or claim 2, and a host device,
A data processing system comprising at least one controller configured to perform a compilation process to provide a first function to be performed in a hardware module.

In Article 23,
At least one controller above:
the above network interface device; and
The above host device
A data processing system provided by one or more of:

In Article 23,
A data processing system, wherein the compilation process is performed in response to a determination by at least one controller that the computer program expressing the first function is safe for execution in kernel mode of the host device.

In Article 23,
wherein said at least one controller is configured to perform said compilation process by assigning each of said plurality of processing units to perform at least one operation from a plurality of operations expressed as a sequence of computer code instructions in a specific order in said first data processing pipeline;
A data processing system, wherein the plurality of operations provide the first function for at least one of the plurality of data packets.

In Article 25,
At least one controller above:
Prior to completion of the above compilation process, transmitting a first command to cause an additional circuit of the network interface device to perform the first function on the data packet; and
Following completion of the above compilation process, transmit a second command to cause the hardware module to start performing the first function for the data packet.
A data processing system comprising:

As a method for implementation in a network interface device,
In the first interface, a step of receiving a plurality of data packets; and
A step of configuring the hardware module to interconnect at least some of the plurality of processing units of the hardware module to provide a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function on one or more of the plurality of data packets.
Includes,
Each processing unit is associated with a predefined type of operation that can be executed in a single step,
At least some of the above plurality of processing units are associated with operations of different predefined types,
A method for implementation in a network interface device, wherein at least two of the plurality of processing units within the first data processing pipeline are configured to perform their respective predefined types of operations in parallel.

A non-transitory computer-readable medium comprising program instructions for causing a network interface device to perform a method,
The above method is:
In the first interface, a step of receiving a plurality of data packets; and
A step of configuring the hardware module to interconnect at least some of the plurality of processing units of the hardware module to provide a first data processing pipeline for processing one or more of the plurality of data packets to perform a first function on one or more of the plurality of data packets.
Includes,
Each processing unit is associated with a predefined type of operation that can be executed in a single step,
At least some of the above plurality of processing units are associated with operations of different predefined types,
A non-transitory computer-readable medium, wherein at least two of the plurality of processing units within the first data processing pipeline are configured to perform their respective predefined types of operations in parallel.

delete