[go: up one dir, main page]

ORC Files

Owen O’Malley
owen@hortonworks.com


December 2012




© Hortonworks Inc. 2012   Page 1
Top Level




                             Page 2
   © Hortonworks Inc. 2012
File Structure




                              Page 3
    © Hortonworks Inc. 2012
Stripe Structure




                              Page 4
    © Hortonworks Inc. 2012
File Layout




                              Page 5
    © Hortonworks Inc. 2012
Integer Column Serialization




                               Page 6
    © Hortonworks Inc. 2012
String Column Serialization




                              Page 7
    © Hortonworks Inc. 2012
Compression




                             Page 8
   © Hortonworks Inc. 2012
Projection and Predicate Filtering




                                     Page 9
    © Hortonworks Inc. 2012
Example File Sizes




                             Page 10
   © Hortonworks Inc. 2012
Final notes




                              Page 11
    © Hortonworks Inc. 2012
Comparison

                               RC File   Trevni   ORC File
 Hive Type Model               N         N        Y
 Separate complex columns      N         Y        Y
 Splits found quickly          N         Y        Y
 Default column group size     4MB       64MB*    250MB
 Files per a bucket            1         >1       1
 Store min, max, sum, count    N         N        Y
 Versioned metadata            N         Y        Y
 Run length data encoding      N         N        Y
 Store strings in dictionary   N         N        Y
 Store row count               N         Y        Y
 Skip compressed blocks        N         N        Y
 Store internal indexes        N         N        Y

                                                             Page 12
     © Hortonworks Inc. 2012

ORC File Introduction