arrow-avro
Transfer data between the Apache Arrow memory format and Apache Avro.
This crate provides:
- a reader that decodes Avro
- Object Container Files (OCF),
- Avro Single‑Object Encoding (SOE), and
- Confluent Schema Registry wire format
into ArrowRecordBatches; and
- a writer that encodes Arrow
RecordBatches into Avro (OCF or SOE).
The latest API docs for
main(unreleased) are published on the Arrow website: arrow_avro.
Install
[]
= "57.0.0"
Disable defaults and pick only what you need (see Feature Flags):
[]
= { = "57.0.0", = false, = ["deflate", "snappy"] }
Quick start
Read an Avro OCF file into Arrow
use File;
use BufReader;
use ReaderBuilder;
use RecordBatch;
Write Arrow to Avro OCF (in‑memory)
use Arc;
use AvroWriter;
use ;
use ;
See the crate docs for runnable SOE and Confluent round‑trip examples.
Feature Flags (what they do and when to use them)
Compression codecs (OCF block compression)
arrow-avro supports the Avro‑standard OCF codecs. The defaults include all five: deflate, snappy, zstd, bzip2, and xz.
| Feature | Default | What it enables | When to use |
|---|---|---|---|
deflate |
✅ | DEFLATE compression via flate2 (pure‑Rust backend) |
Most compatible; widely supported; good compression, slower than Snappy. |
snappy |
✅ | Snappy block compression via snap with CRC‑32 as required by Avro |
Fastest decode/encode; common in streaming/data‑lake pipelines. (Avro requires a 4‑byte big‑endian CRC of the uncompressed block.) |
zstd |
✅ | Zstandard block compression via zstd |
Great compression/speed trade‑off on modern systems. May pull in a native library. |
bzip2 |
✅ | BZip2 block compression | For compatibility with older datasets that used BZip2. Slower; larger deps. |
xz |
✅ | XZ/LZMA block compression | Highest compression for archival data; slowest; larger deps. |
Avro defines these codecs for OCF:
null(no compression),deflate,snappy,bzip2,xz, andzstandard(recent spec versions).
Notes
- Only OCF uses these codecs (they compress per‑block). They do not apply to raw Avro frames used by Confluent wire format or SOE. The crate’s
compressionmodule is specifically for OCF blocks. deflateusesflate2with therust_backend(no system zlib required).
Schema fingerprints & custom logical type helpers
| Feature | Default | What it enables | When to use |
|---|---|---|---|
md5 |
⬜ | md5 dep for optional MD5 schema fingerprints |
If you want to compute MD5 fingerprints of writer schemas (i.e. for custom prefixing/validation). |
sha256 |
⬜ | sha2 dep for optional SHA‑256 schema fingerprints |
If you prefer longer fingerprints; affects max prefix length (i.e. when framing). |
small_decimals |
⬜ | Extra handling for small decimal logical types (Decimal32 and Decimal64) |
If your Avro decimal values are small and you want more compact Arrow representations. |
avro_custom_types |
⬜ | Annotates Avro values using Arrow specific custom logical types | Enable when you need arrow-avro to reinterpret certain Avro fields as Arrow types that Avro doesn’t natively model. |
canonical_extension_types |
⬜ | Re‑exports Arrow’s canonical extension types support from arrow-schema |
Enable if your workflow uses Arrow canonical extension types and you want arrow-avro to respect them. |
Lower‑level/internal toggles (rarely used directly)
flate2,snap,crc,zstd,bzip2,xzare optional dependencies wired to the user‑facing features above. You normally enabledeflate/snappy/zstd/bzip2/xz, not these directly.
Feature snippets
-
Minimal, fast build (common pipelines):
= { = "56", = false, = ["deflate", "snappy"] } -
Include Zstandard too (modern data lakes):
= { = "56", = false, = ["deflate", "snappy", "zstd"] } -
Fingerprint helpers:
= { = "56", = ["md5", "sha256"] }
What formats are supported?
- OCF (Object Container Files): self‑describing Avro files with header, optional compression, sync markers; reader and writer supported.
- Confluent Schema Registry wire format: 1‑byte magic
0x00+ 4‑byte BE schema ID + Avro body; supports decode + encode helpers. - Avro Single‑Object Encoding (SOE): 2‑byte magic
0xC3 0x01+ 8‑byte LE CRC‑64‑AVRO fingerprint + Avro body; supports decode + encode helpers.
Examples
- Read/write OCF in memory and from files (see crate docs “OCF round‑trip”).
- Confluent wire‑format and SOE quickstarts are provided as runnable snippets in docs.
There are additional examples under arrow-avro/examples/ in the repository.