A comparable row-oriented representation of a collection of [Array].
[Row]s are normalized for sorting, and can therefore be very efficiently compared,
using memcmp under the hood, or used in non-comparison sorts such as radix sort.
This makes the row format ideal for implementing efficient multi-column sorting,
grouping, aggregation, windowing and more, as described in more detail
in this blog post.
For example, given three input [Array], [RowConverter] creates byte
sequences that compare the same as when using lexsort.
┌─────┐ ┌─────┐ ┌─────┐
│ │ │ │ │ │
├─────┤ ┌ ┼─────┼ ─ ┼─────┼ ┐ ┏━━━━━━━━━━━━━┓
│ │ │ │ │ │ ─────────────▶┃ ┃
├─────┤ └ ┼─────┼ ─ ┼─────┼ ┘ ┗━━━━━━━━━━━━━┛
│ │ │ │ │ │
└─────┘ └─────┘ └─────┘
...
┌─────┐ ┌ ┬─────┬ ─ ┬─────┬ ┐ ┏━━━━━━━━┓
│ │ │ │ │ │ ─────────────▶┃ ┃
└─────┘ └ ┴─────┴ ─ ┴─────┴ ┘ ┗━━━━━━━━┛
UInt64 Utf8 F64
Input Arrays Row Format
(Columns)
[Rows] must be generated by the same [RowConverter] for the comparison
to be meaningful.
Basic Example
# use Arc;
# use ;
# use ;
# use ;
# use Int32Type;
# use DataType;
let a1 = new as ArrayRef;
let a2 = new as ArrayRef;
let arrays = vec!;
// Convert arrays to rows
let converter = new.unwrap;
let rows = converter.convert_columns.unwrap;
// Compare rows
for i in 0..4
assert_eq!;
// Convert rows back to arrays
let converted = converter.convert_rows.unwrap;
assert_eq!;
// Compare rows from different arrays
let a1 = new as ArrayRef;
let a2 = new as ArrayRef;
let arrays = vec!;
let rows2 = converter.convert_columns.unwrap;
assert!;
assert!;
// Convert selection of rows back to arrays
let selection = ;
let converted = converter.convert_rows.unwrap;
let c1 = converted.;
assert_eq!;
let c2 = converted.;
let c2_values: = c2.iter.flatten.collect;
assert_eq!;
Lexicographic Sorts (lexsort)
The row format can also be used to implement a fast multi-column / lexicographic sort
# use ;
# use ;
Flattening Dictionaries
For performance reasons, dictionary arrays are flattened ("hydrated") to their underlying values during row conversion. See the issue for more details.
This means that the arrays that come out of [RowConverter::convert_rows]
may not have the same data types as the input arrays. For example, encoding
a Dictionary<Int8, Utf8> and then will come out as a Utf8 array.
# use ;
# use Int8Type;
# use ;
# use DataType;
# use Arc;
// Input is a Dictionary array
let dict: = .into_iter.collect;
let sort_fields = vec!;
let arrays = vec!;
let converter = new.unwrap;
// Convert to rows
let rows = converter.convert_columns.unwrap;
let converted = converter.convert_rows.unwrap;
// result was a Utf8 array, not a Dictionary array
assert_eq!;