Crate nom [−] [src]
Nom, eating data byte by byte
The goal is to make a parser combinator library that is safe, supports streaming (push and pull), and as much as possible zero copy.
The code is available on Github
Example
#[macro_use] extern crate nom; use nom::{Consumer,ConsumerState,MemProducer,IResult}; use nom::IResult::*; // Parser definition named!( om_parser, tag!( "om" ) ); named!( nomnom_parser< &[u8], Vec<&[u8]> >, many1!( tag!( "nom" ) ) ); named!( end_parser, tag!( "kthxbye") ); // Streaming parsing and state machine #[derive(PartialEq,Eq,Debug)] enum State { Beginning, Middle, End, Done } struct TestConsumer { state: State, counter: usize } impl Consumer for TestConsumer { fn consume(&mut self, input: &[u8]) -> ConsumerState { match self.state { State::Beginning => { match om_parser(input) { Error(_) => ConsumerState::ConsumerError(0), Incomplete(_) => ConsumerState::Await(0, 2), Done(_,_) => { // "om" was recognized, get to the next state self.state = State::Middle; ConsumerState::Await(2, 3) } } }, State::Middle => { match nomnom_parser(input) { Error(a) => { // the "nom" parser failed, let's get to the next state self.state = State::End; ConsumerState::Await(0, 7) }, Incomplete(_) => ConsumerState::Await(0, 3), Done(i,noms_vec) => { // we got a few noms, let's count them and continue self.counter = self.counter + noms_vec.len(); ConsumerState::Await(input.len() - i.len(), 3) } } }, State::End => { match end_parser(input) { Error(_) => ConsumerState::ConsumerError(0), Incomplete(_) => ConsumerState::Await(0, 7), Done(_,_) => { // we recognized the suffix, everything was parsed correctly self.state = State::Done; ConsumerState::ConsumerDone } } }, State::Done => { // this should not be called ConsumerState::ConsumerError(42) } } } fn failed(&mut self, error_code: u32) { println!("failed with error code: {}", error_code); } fn end(&mut self) { println!("we counted {} noms", self.counter); } } fn main() { let mut p = MemProducer::new(b"omnomnomnomkthxbye", 4); let mut c = TestConsumer{state: State::Beginning, counter: 0}; c.run(&mut p); assert_eq!(c.counter, 3); assert_eq!(c.state, State::Done); }
Macros
| alt! |
try a list of parser, return the result of the first successful one |
| alt_parser! |
Internal parser, do not use directly |
| apply! | |
| call! |
Used to wrap common expressions and function as macros |
| chain! |
chains parsers and assemble the results through a closure |
| chaining_parser! |
Internal parser, do not use directly |
| closure! |
Wraps a parser in a closure |
| cond! |
Conditional combinator |
| count! |
Applies the child parser a specified number of times |
| count_fixed! |
Applies the child parser a fixed number of times and returns a fixed size array |
| dbg! |
Prints a message if the parser fails |
| dbg_dmp! |
Prints a message and the input if the parser fails |
| delimited! |
delimited(opening, X, closing) returns X |
| error! |
Prevents backtracking if the child parser fails |
| expr_opt! |
evaluate an expression that returns a Result |
| expr_res! |
evaluate an expression that returns a Result |
| filter! |
returns the longest list of bytes until the provided parser fails |
| flat_map! |
flat_map! combines a parser R -> IResult |
| is_a! |
returns the longest list of bytes that appear in the provided array |
| is_not! |
returns the longest list of bytes that do not appear in the provided array |
| length_value! |
returns |
| many0! |
Applies the parser 0 or more times and returns the list of results in a Vec |
| many1! |
Applies the parser 1 or more times and returns the list of results in a Vec |
| map! |
maps a function on the result of a parser |
| map_opt! |
maps a function returning an Option on the output of a parser |
| map_res! |
maps a function returning a Result on the output of a parser |
| named! |
Makes a function from a parser combination |
| opt! |
make the underlying parser optional |
| pair! |
pair(X,Y), returns (x,y) |
| peek! |
returns a result without consuming the input |
| preceded! |
preceded(opening, X) returns X |
| pusher! |
Prepares a parser function for a push pipeline |
| separated_list! |
separated_list(sep, X) returns Vec |
| separated_nonempty_list! |
separated_nonempty_list(sep, X) returns Vec |
| separated_pair! |
separated_pair(X,sep,Y) returns (x,y) |
| tag! |
declares a byte array as a suite to recognize |
| take! |
generates a parser consuming the specified number of bytes |
| take_str! |
same as take! but returning a &str |
| take_until! | |
| take_until_and_consume! |
generates a parser consuming bytes until the specified byte sequence is found |
| take_until_either! | |
| take_until_either_and_consume! | |
| terminated! |
terminated(X, closing) returns X |
Structs
| FileProducer |
Can produce data from a file |
| MemProducer |
Can parse data from an already in memory byte array |
Enums
| ConsumerState |
Holds the current state of the consumer |
| Err | |
| ErrorCode | |
| IResult |
Holds the result of parsing functions |
| Needed | |
| ProducerState |
Holds the data producer's current state |
Traits
| AsBytes | |
| Consumer |
Implement the consume method, taking a byte array as input and returning a consumer state |
| GetInput | |
| GetOutput | |
| HexDisplay | |
| Producer |
A producer implements the produce method, currently working with u8 arrays |
Functions
| add_error_pattern | |
| alpha |
Recognizes lowercase and uppercase alphabetic characters: a-zA-Z |
| alphanumeric |
Recognizes numerical and alphabetic characters: 0-9a-zA-Z |
| be_f32 |
Recognizes big endian 4 bytes floating point number |
| be_f64 |
Recognizes big endian 8 bytes floating point number |
| be_i16 |
Recognizes big endian signed 2 bytes integer |
| be_i32 |
Recognizes big endian signed 4 bytes integer |
| be_i64 |
Recognizes big endian signed 8 bytes integer |
| be_i8 |
Recognizes big endian signed 1 byte integer |
| be_u16 |
Recognizes big endian unsigned 2 bytes integer |
| be_u32 |
Recognizes big endian unsigned 4 bytes integer |
| be_u64 |
Recognizes big endian unsigned 8 bytes integer |
| be_u8 |
Recognizes big endian unsigned 1 byte integer |
| begin | |
| code_from_offset | |
| compare_error_paths | |
| digit |
Recognizes numerical characters: 0-9 |
| eof |
Recognizes empty input buffers |
| error_to_list | |
| generate_colors | |
| is_alphabetic | |
| is_alphanumeric | |
| is_digit | |
| is_space | |
| le_i16 |
Recognizes little endian signed 2 bytes integer |
| le_i32 |
Recognizes little endian signed 4 bytes integer |
| le_i64 |
Recognizes little endian signed 8 bytes integer |
| le_i8 |
Recognizes little endian signed 1 byte integer |
| le_u16 |
Recognizes little endian unsigned 2 bytes integer |
| le_u32 |
Recognizes little endian unsigned 4 bytes integer |
| le_u64 |
Recognizes little endian unsigned 8 bytes integer |
| le_u8 |
Recognizes little endian unsigned 1 byte integer |
| length_value | |
| line_ending |
Recognizes a line feed |
| multispace |
Recognizes spaces, tabs, carriage returns and line feeds |
| not_line_ending | |
| prepare_errors | |
| print_codes | |
| print_error | |
| print_offsets | |
| reset_color | |
| sized_buffer | |
| slice_to_offsets | |
| space |
Recognizes spaces and tabs |
| tag_cl | |
| write_color |