The central type in Apache Arrow are arrays, which are a known-length sequence of values
all having the same type. This crate provides concrete implementations of each type, as
well as an [Array] trait that can be used for type-erasure.
Downcasting an Array
Arrays are often passed around as a dynamically typed &dyn Array or [ArrayRef].
For example, RecordBatch stores columns as [ArrayRef].
Whilst these arrays can be passed directly to the compute, csv, json, etc... APIs,
it is often the case that you wish to interact with the data directly.
This requires downcasting to the concrete type of the array:
# use ;
// Note: the values for positions corresponding to nulls will be arbitrary
Additionally, there are convenient functions to do this casting
such as [cast::as_primitive_array<T>] and [cast::as_string_array]:
# use Array;
# use as_primitive_array;
# use Float32Type;
Building an Array
Most [Array] implementations can be constructed directly from iterators or [Vec]
# use ;
# use Int32Type;
from;
from;
from_iter;
from_iter;
from;
from;
from_iter;
from_iter_values;
;
Additionally ArrayBuilder implementations can be
used to construct arrays with a push-based interface
# use Int16Array;
#
// Create a new builder with a capacity of 100
let mut builder = builder;
// Append a single primitive value
builder.append_value;
// Append a null value
builder.append_null;
// Append a slice of primitive values
builder.append_slice;
// Build the array
let array = builder.finish;
assert_eq!;
assert_eq!;
assert_eq!
Zero-Copy Slicing
Given an [Array] of arbitrary length, it is possible to create an owned slice of this
data. Internally this just increments some ref-counts, and so is incredibly cheap
# use Arc;
# use ;
let array = new as ArrayRef;
// Slice with offset 1 and length 2
let sliced = array.slice;
let ints = sliced.as_any..unwrap;
assert_eq!;
Internal Representation
Internally, arrays are represented by one or several Buffer, the number and meaning of
which depend on the array’s data type, as documented in the Arrow specification.
For example, the type [Int16Array] represents an array of 16-bit integers and consists of:
- An optional
NullBufferidentifying any null values - A contiguous
Bufferof 16-bit integers
Similarly, the type [StringArray] represents an array of UTF-8 strings and consists of:
- An optional
NullBufferidentifying any null values - An offsets
Bufferof 32-bit integers identifying valid UTF-8 sequences within the values buffer - A values
Bufferof UTF-8 encoded string data