Berkeley Packet Filter Instruction Set Architecture
Since the publication of The BSD Packet Filter: A New Architecture for User-level Packet Capture ("the 1993 specification" hereinafter) both the reference userspace implementation (libpcap) and various kernel implementations have developed and at times diverged from the specification and one another in various ways. This page compares libpcap, other BPF implementations and the 1993 specification, and clarifies various points of the specification where required.
Mind that in BPF a "byte" is an 8-bit unsigned integer, a "halfword" is a 16-bit unsigned integer, a "word" is a 32-bit unsigned integer and an instruction is a 64-bit structure. Loading a byte or a halfword into a register implicitly resets all bits that are not present in the loaded value. The BPF machine performs all initialization before every start of the program.
Registers, Memory and Data
| name | program access mode, type, size | description | initialization |
| increment-only, register, unspecified-size integer |
The implicit program counter. To execute a step of the program, the
BPF machine fetches the instruction this register points to,
increments the register by one, executes the instruction, and, if the
instruction is a branch instruction, again increments the register by
the immediate value of L, Lt or
Lf.
|
to point to the first statement of the program | |
| a | read-write, register, word | The accumulator register. | to 0 |
| x | read-write, register, word | The index register. | to 0 |
| M[] | read-write, memory, array of words | The scratch memory store. A program that attempts accessing beyond the end of the array is invalid. In libpcap an attempt to use such a program fails, the scratch memory store comprises 16 words. | to 0 in every element |
| pktlen[1] | read-only, register, word | The packet length register. | to the wire length |
| [] | read-only, data, array of bytes |
The packet data. The number of elements in the array is unknown, but
typically is greater than zero and different from one packet to
another; the number does not exceed the value of pktlen.
Attempting a load from beyond the end of the array immediately
terminates the program with the effect of ret #0.
|
to the captured length worth of the packet bytes |
| random[7] | read-only, register, word | The random value register. Each load from this register produces a random word. |
Instruction Set
| group | mnemonic | addressing mode(s) | |||||||||||||||||||||
| load | ldb |
|
|||||||||||||||||||||
| ldh | |||||||||||||||||||||||
| ld | |||||||||||||||||||||||
| ldx | |||||||||||||||||||||||
| ldxb[2] | 4*([k]&0xf) | ||||||||||||||||||||||
| store | st | M[k] | |||||||||||||||||||||
| stx | M[k] | ||||||||||||||||||||||
| branch | ja[3] | L | |||||||||||||||||||||
| jeq | #k, Lt, Lf | x, Lt, Lf[4] | |||||||||||||||||||||
| jgt | #k, Lt, Lf | x, Lt, Lf[4] | |||||||||||||||||||||
| jge | #k, Lt, Lf | x, Lt, Lf[4] | |||||||||||||||||||||
| jset | #k, Lt, Lf | x, Lt, Lf[4] | |||||||||||||||||||||
| ALU | neg[5] | ||||||||||||||||||||||
| add | #k | x | |||||||||||||||||||||
| sub | #k | x | |||||||||||||||||||||
| mul | #k | x | |||||||||||||||||||||
| div | #k | x | |||||||||||||||||||||
| mod[6] | #k | x | |||||||||||||||||||||
| xor[6] | #k | x | |||||||||||||||||||||
| and | #k | x | |||||||||||||||||||||
| or | #k | x | |||||||||||||||||||||
| lsh | #k | x | |||||||||||||||||||||
| rsh | #k | x | |||||||||||||||||||||
| return | ret | #k | a | ||||||||||||||||||||
| miscellaneous | tax | ||||||||||||||||||||||
| txa | |||||||||||||||||||||||
| cop[8] | #k, a | ||||||||||||||||||||||
| copx[8] | x, a | ||||||||||||||||||||||
Addressing Modes
| mode | description |
| #k |
The immediate value k, that is, the word in the k
field of the instruction.
|
|
a x |
The current value in the register. |
|
#pktlen[1] #random[7] |
The value in the register. Note that although this address mode
traditionally prefixes a register name with #, the value
comes from the register rather than from the instruction.
|
| M[k] |
The word at the word offset k in the scratch memory store:
M[0] is the first word, M[1] is the second
word and so on.
|
| [k] | The byte, halfword, or word at the byte offset k in the packet. Offset 0 points to the first byte of the packet. Offset 1 points to the second byte of the packet regardless if the instruction loads a byte, a halfword or a word, and so on. The offset does not have to be a multiple of the data size. |
| [x+k] | Same as the above, but the offset is the sum of the current value in the index register and the immediate value k. |
| L |
The offset from the current instruction to L. Zero means the next
instruction after the current. Note that although the traditional
notation for this address mode is not #k, the offset is
the immediate value in the k field of the instruction.
|
| #k, Lt, Lf |
The offset, in the same sense as the above, to Lt if the
predicate is true, otherwise the offset to Lf. The
first (implicit) operand of the predicate is the current value in the
accumulator and the second (explicit) operand is the immediate value
k.
|
| x, Lt, Lf[4] | Same as the above, but the second operand is the current value in the index register. |
| 4*([k]&0xf) | Four times the value of the low four bits of the byte at offset k in the packet. |
Footnotes
-
In libpcap
pktlen is the same as len in the
1993 specification.
-
For the instruction that loads a 4-bit value from the packet and stores
a 6-bit value into the index register the mnemonic is
ldxb
in libpcap and ldx in the 1993 specification.
-
In libpcap
ja L ("jump always") is the same as
jmp L in the 1993 specification.
-
In libpcap all conditional branch instructions can also take the second
operand from the index register. The 1993 specification says this in
Section 3.3, but not in tables 1 and 2.
-
The
neg (negation) ALU instruction is not a part of the
1993 specification. libpcap implements it, kernel BPF implementations
typically implement it. This instruction does the same as
tax; ld #0; sub x, except it does not change the
x register.
-
The
mod (modulo) and xor (bitwise XOR) ALU
instructions are not a part of the 1993 specification. These
instructions have the same semantics as the ALU instructions defined
in the 1993 specification. Only FreeBSD ≥ 12.0, Linux ≥ 3.7,
NetBSD ≥ 8.0, and Npcap ≥ 1.81 support these instructions in
their kernel-mode BPF implementations.
-
The
random register is not a part of the 1993
specification, it is an extension that only OpenBSD kernel and OpenBSD
libpcap implement.
-
The
cop and copx instructions are not a part
of the 1993 specification, these are an extension that only NetBSD
kernel implements. Note that NetBSD defines these as two distinct
instructions rather than one instruction with two addressing modes.
pktlen is the same as len in the
1993 specification.
ldxb
in libpcap and ldx in the 1993 specification.
ja L ("jump always") is the same as
jmp L in the 1993 specification.
neg (negation) ALU instruction is not a part of the
1993 specification. libpcap implements it, kernel BPF implementations
typically implement it. This instruction does the same as
tax; ld #0; sub x, except it does not change the
x register.
mod (modulo) and xor (bitwise XOR) ALU
instructions are not a part of the 1993 specification. These
instructions have the same semantics as the ALU instructions defined
in the 1993 specification. Only FreeBSD ≥ 12.0, Linux ≥ 3.7,
NetBSD ≥ 8.0, and Npcap ≥ 1.81 support these instructions in
their kernel-mode BPF implementations.
random register is not a part of the 1993
specification, it is an extension that only OpenBSD kernel and OpenBSD
libpcap implement.
cop and copx instructions are not a part
of the 1993 specification, these are an extension that only NetBSD
kernel implements. Note that NetBSD defines these as two distinct
instructions rather than one instruction with two addressing modes.