Apache XTable (Incubating)’s cover photo
Apache XTable (Incubating)

Apache XTable (Incubating)

Data Infrastructure and Analytics

Menlo Park, CA 6,503 followers

Seamless cross-table interop between Apache Hudi, Delta Lake, and Apache Iceberg

About us

Apache XTable (Incubating) is a cross-table omni-directional interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. XTable is formerly known as and recently renamed from OneTable. XTable is NOT a new or separate format, XTable provides abstractions and tools for the translation of lakehouse table format metadata. Choosing a table formats is a costly evaluation. Each project has rich features that may fit different use-cases. Some vendors use a table format as a point of lock-in. Your data should be UNIVERSAL! https://github.com/apache/incubator-xtable

Website
https://xtable.apache.org
Industry
Data Infrastructure and Analytics
Company size
11-50 employees
Headquarters
Menlo Park, CA
Type
Partnership
Founded
2023
Specialties
Data Lakehouse, Data Engineering, Lakehouse, Apache Iceberg, Apache Hudi, Delta Lake, Apache Spark, Trino, Apache Flink, and Presto

Locations

Updates

  • Apache XTable (Incubating) reposted this

    View profile for Shiyan Xu

    Data Architect | O'Reilly Author | Creator of Hudi-rs | PMC member of Apache Hudi

    [Blog] Struggling with Apache Iceberg performance when your data dimensions get too hot? 🔥🌡️ Frequent updates and deletes in Iceberg can lead to a "chilly meltdown," forcing a tough choice between fast writes and efficient reads. 🥶 But what if you didn't have to compromise? 🤔 In this recent blog, I explored how you can get the best of both worlds by combining the power of Apache Hudi with Apache XTable (Incubating) to serve fast, native Iceberg tables. Read the full post to learn how to get fast writes and reads with Iceberg: 👉 https://lnkd.in/gtEvysrs #ApacheIceberg #ApacheHudi #ApacheXTable #DataLakehouse

    • No alternative text description for this image
  • Apache XTable (Incubating) reposted this

    Struggling with slow updates in your Apache Iceberg 🧊 tables? When dimensions change too fast, Iceberg writers can face chilly performance bottlenecks. What if there was another way you could write Iceberg tables with blistering throughput, no compromises? Read our latest engineering blog to learn how your Iceberg tables can leverage techniques that standard OSS might not get even by the v6 spec 😉. 👉 https://lnkd.in/gS5nPPai #ApacheIceberg #ApacheXTable #ApacheHudi #DataLakehouse

    • No alternative text description for this image
  • Kudos to the cross community collaborations in these Apache Software Foundation projects 🎉

    View organization page for Onehouse

    10,592 followers

    Last week Snowflake and Onehouse presented the new Generic Tables API in Apache Polaris (Incubating) and how Apache XTable (Incubating) enables Polaris to work with non-iceberg tables. Running XTable via REST API is as simple as = POST  /v1/conversion/table/ {"source-format": "HUDI | DELTA" ...} This active work across the communities introduces a TableConverter interface powered by XTable, allowing on-demand or policy-driven conversions to vend Iceberg metadata from "Generic Tables" reducing client lock-in and enabling Apache Hudi and Delta Lake reads from Apache Iceberg-only engines. Decoupling Catalog specifications from Table Formats will ultimately future proof and standardize the APIs to be resilient to future table format innovations. (ApachePaimon, LanceDB, or others do you want to join? 🤝) Kudos to Rahil Chertara and Yufei Gu! #datalakehouse #apacheiceberg #apachepolaris #apachehudi #deltalake #datacatalog

    • No alternative text description for this image
  • More progress and updates from Microsoft OneLake leveraging XTable to seamlessly support multiple table formats! "Behind the scenes, this feature utilizes Apache XTable for table format metadata conversion. XTable provides cross-table omni-directional interop between the open table formats. We have also enhanced XTable functionality – for example, by converting Delta deletion vectors into Iceberg positional delete files. We look forward to contributing these features upstream to the open-source community." The blog link 👉 https://lnkd.in/gJu-Eadp

  • Multi-Catalog Sync with Apache XTable. Open table formats like Apache Hudi, Apache Iceberg & Delta Lake have redefined how organizations manage and store data - offering flexibility, openness, and engine-agnostic access. But adopting an open table format is just the beginning. Achieving a truly open data architecture requires seamless interoperability across not only table formats, but also catalogs and compute engines. That’s where XTable comes in - it tackles interoperability at the table format layer by enabling translation between formats, without rewriting data! But there are challenges at the catalog layer: - Many vendor platforms today require users to adopt proprietary catalogs in order to fully support open table formats - Different teams may rely on distinct catalogs as part of the ecosystem they are part of (hence fragmented) This is where the new Multi-Catalog sync capability comes in. Now, you can: ✅ Sync table metadata from one catalog (like Hive Metastore) to others (like AWS Glue, etc.) ✅ No need to rewrite data or recreate tables ✅ Share the same table across platforms and engines, without fragmentation or vendor lock-in Read the full blog: https://lnkd.in/eDYmceRW

    • No alternative text description for this image
  • Apache XTable (Incubating) reposted this

    View profile for Dipankar Mazumdar

    Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

    Deletion Vector Support in Apache XTable (Incubating) We have made significant progress in interoperability at the metadata layer, i.e. syncing schemas, commit logs, partition specs, stats, etc. across Apache Iceberg, Apache Hudi & Delta Lake. And now there's growing effort to extend that interoperability into other layers as well. Take Apache XTable for example: - It began with metadata translation between formats like Delta, Hudi, and Iceberg (table format layer) - Then it expanded to multi-catalog sync across Hive, Glue & other catalogs (catalog layer) - And now, the latest RFC brings support for deletion vector conversion, i.e. enabling logical deletes to be translated from Delta Lake into Apache Iceberg (data layer) What are Deletion Vectors? Deletion vectors are a way to track deleted rows without modifying the original data files. Think of them as sidecar files that store which rows to ignore during reads. Delta Lake introduced Deletion Vectors (DVs) to support row-level delete operations using compressed bitmaps (e.g., RoaringBitmap) Apache Iceberg on the other hand, tracks deletes via position delete files (v2) , stored in Parquet with file_path and poscolumns. The latest RFC proposes native support for converting DVs from Delta Lake to Iceberg format preserving semantics and correctness in converted tables. What’s being proposed: ✅ Parse DVs from Delta’s commit logs or separate files (inline or referenced). ✅ Stream deleted row ordinals efficiently using a new InternalDeletionVector abstraction. ✅ Convert them into Iceberg-compatible position delete files. ✅ Write them to a dedicated deletion-vectors/ directory to avoid polluting partition data dirs. ✅ Add them to Iceberg manifests via a row-level transaction. This is a great step towards the next phase - where we focus on interoperability at various layers of the #lakehouse stack. Especially at the data layer where deletions needs to be preserved for correctness when interoperating between formats. Looking forward! #dataengineering #softwareengineering

    • No alternative text description for this image
  • Apache XTable (Incubating) reposted this

    View profile for Dipankar Mazumdar

    Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

    My talk from Netflix's Data Engineering Forum is now available on YouTube 🎉 I presented about Apache XTable (Incubating) and how it enables: ✅ interoperability between open table formats ✅ multi catalog syncing capabilities The talk starts with a deep-dive on open table formats - Apache Hudi, Apache Iceberg & Delta Lake and the need for openness & interoperability. So, if you are curious in general about the lakehouse space, this should give a good idea (+ a quick demo). Once again, I really appreciate Xinran Waibel & the rest of the data team at Netflix for having me and share our work with such a vibrant community. PS: The entire playlist is a gem, so should be a good weekend watch. #dataengineering #softwareengineering

    • No alternative text description for this image
  • Apache XTable (Incubating) reposted this

    View profile for Dipankar Mazumdar

    Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

    Sync Open Table Formats to Multiple Catalog at once! Open table formats like Apache Hudi, Apache Iceberg & Delta Lake have fundamentally shifted how organizations approach data storage and management. These formats sets an open and flexible data foundation, allowing enterprises to select compute engines best suited to their workloads. And frees them from the limitations of proprietary storage formats (vendor lock-ins). Yet, achieving a truly open data architecture goes beyond simply adopting open table formats. It requires seamless "interoperability" across open table formats, catalogs, and compute engines. - Apache XTable (Incubating)™ takes a major step toward this goal by addressing interoperability challenges at the "table format" layer. - It enables users to translate from one table format to another (e.g. Hudi to Iceberg) While solutions like XTable have enabled storage format interoperability, the "catalog layer" is quickly emerging as a new potential bottleneck in achieving a truly open lakehouse architecture. Many vendor platforms today require users to adopt proprietary catalogs in order to fully support open table formats. This creates a significant limitation! True interoperability is compromised, forcing organizations to remain within a single vendor’s ecosystem. And constrains their ability to access & manage data freely across different engines. Beyond vendor lock-in, another growing operational challenge is the fragmentation of catalog usage within organizations. Different teams may rely on distinct catalogs as part of the ecosystem they are part of. Sometimes even different implementations of the same specification, such as the Iceberg REST Catalog. While these catalogs may adhere to common APIs or standards, there is no straightforward method to synchronize tables across them without manually recreating or migrating metadata. Introducing "Multi-Catalog Sync" ✅ Automatically sync metadata for a given table from a source catalog to one or more target catalogs ✅ There is no need to recreate table definitions, copy metadata manually, or modify the underlying data files. ✅ a lakehouse table written once and exposed safely across multiple catalogs and platforms. ✅ For example, a table registered in Hive Metastore (HMS) can now be made available in AWS Glue Data Catalog with a single configuration & execution step. Read the blog link in comments! #dataengineering #softwareengineering

    • No alternative text description for this image
  • Apache XTable (Incubating) reposted this

    View profile for Dipankar Mazumdar

    Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

    Research Papers on Lakehouse Systems. If you want to go beyond jargons & understand some of the intricate details about the #lakehouse architecture and open table formats, here are 4 research papers to bookmark for the weekend. This has a mix of both theoretical concepts and applied use cases touching on technologies like Apache XTable (Incubating), Apache Iceberg, Apache Hudi & Delta Lake. ✅ Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics - https://lnkd.in/dMy_rRcg ✅ The Data Lakehouse: Data Warehousing and More - https://lnkd.in/dMGJuNJe ✅ Analyzing and Comparing Lakehouse Storage Systems - https://lnkd.in/dnT6G5RF ✅ XTable in Action: Seamless Interoperability in Data Lakes  - https://lnkd.in/dHmAqrxM Happy reading! #dataengineering #softwareengineering

    • No alternative text description for this image

Similar pages

Browse jobs