Apache XTable (Incubating)

Data Infrastructure and Analytics

Menlo Park, CA 6,503 followers

Seamless cross-table interop between Apache Hudi, Delta Lake, and Apache Iceberg

About us

Apache XTable (Incubating) is a cross-table omni-directional interop of lakehouse table formats Apache Hudi, Apache Iceberg, and Delta Lake. XTable is formerly known as and recently renamed from OneTable. XTable is NOT a new or separate format, XTable provides abstractions and tools for the translation of lakehouse table format metadata. Choosing a table formats is a costly evaluation. Each project has rich features that may fit different use-cases. Some vendors use a table format as a point of lock-in. Your data should be UNIVERSAL! https://github.com/apache/incubator-xtable

Website: https://xtable.apache.org
External link for Apache XTable (Incubating)
Industry: Data Infrastructure and Analytics
Company size: 11-50 employees
Headquarters: Menlo Park, CA
Type: Partnership
Founded: 2023
Specialties: Data Lakehouse, Data Engineering, Lakehouse, Apache Iceberg, Apache Hudi, Delta Lake, Apache Spark, Trino, Apache Flink, and Presto

Locations

Primary

Menlo Park, CA 94025, US

Get directions

Updates

Apache XTable (Incubating) reposted this
Shiyan Xu

Data Architect | O'Reilly Author | Creator of Hudi-rs | PMC member of Apache Hudi
3w
Report this post
[Blog] Struggling with Apache Iceberg performance when your data dimensions get too hot? 🔥🌡️ Frequent updates and deletes in Iceberg can lead to a "chilly meltdown," forcing a tough choice between fast writes and efficient reads. 🥶 But what if you didn't have to compromise? 🤔 In this recent blog, I explored how you can get the best of both worlds by combining the power of Apache Hudi with Apache XTable (Incubating) to serve fast, native Iceberg tables. Read the full post to learn how to get fast writes and reads with Iceberg: 👉 https://lnkd.in/gtEvysrs #ApacheIceberg #ApacheHudi #ApacheXTable #DataLakehouse
15 Comments

Like Comment Share
Apache XTable (Incubating) reposted this
Onehouse

10,592 followers
1mo
Report this post
Struggling with slow updates in your Apache Iceberg 🧊 tables? When dimensions change too fast, Iceberg writers can face chilly performance bottlenecks. What if there was another way you could write Iceberg tables with blistering throughput, no compromises? Read our latest engineering blog to learn how your Iceberg tables can leverage techniques that standard OSS might not get even by the v6 spec 😉. 👉 https://lnkd.in/gS5nPPai #ApacheIceberg #ApacheXTable #ApacheHudi #DataLakehouse
1 Comment

Like Comment Share
Apache XTable (Incubating)

6,503 followers
1mo
Report this post
Kudos to the cross community collaborations in these Apache Software Foundation projects 🎉
Onehouse

10,592 followers
1mo

Last week Snowflake and Onehouse presented the new Generic Tables API in Apache Polaris (Incubating) and how Apache XTable (Incubating) enables Polaris to work with non-iceberg tables. Running XTable via REST API is as simple as = POST /v1/conversion/table/ {"source-format": "HUDI | DELTA" ...} This active work across the communities introduces a TableConverter interface powered by XTable, allowing on-demand or policy-driven conversions to vend Iceberg metadata from "Generic Tables" reducing client lock-in and enabling Apache Hudi and Delta Lake reads from Apache Iceberg-only engines. Decoupling Catalog specifications from Table Formats will ultimately future proof and standardize the APIs to be resilient to future table format innovations. (ApachePaimon, LanceDB, or others do you want to join? 🤝) Kudos to Rahil Chertara and Yufei Gu! #datalakehouse #apacheiceberg #apachepolaris #apachehudi #deltalake #datacatalog
Like Comment Share
Apache XTable (Incubating)

6,503 followers
2mo
Report this post
More progress and updates from Microsoft OneLake leveraging XTable to seamlessly support multiple table formats! "Behind the scenes, this feature utilizes Apache XTable for table format metadata conversion. XTable provides cross-table omni-directional interop between the open table formats. We have also enhanced XTable functionality – for example, by converting Delta deletion vectors into Iceberg positional delete files. We look forward to contributing these features upstream to the open-source community." The blog link 👉 https://lnkd.in/gJu-Eadp

Josh Caplan

Partner director of Product for Microsoft OneLake
2mo Edited

We continue to invest in expanding or #ApacheIceberg support in #OneLake. Anand and Kevin Liu pull back the curtain and reveal how we transparently support multiple table formats at the same time. #MicrosoftFabric. https://lnkd.in/dNGRDThV

How Microsoft OneLake seamlessly provides Apache Iceberg support for all Fabric Data | Microsoft Fabric Blog | Microsoft Fabric blog.fabric.microsoft.com

Like Comment Share
Apache XTable (Incubating)

6,503 followers
3mo
Report this post
Multi-Catalog Sync with Apache XTable. Open table formats like Apache Hudi, Apache Iceberg & Delta Lake have redefined how organizations manage and store data - offering flexibility, openness, and engine-agnostic access. But adopting an open table format is just the beginning. Achieving a truly open data architecture requires seamless interoperability across not only table formats, but also catalogs and compute engines. That’s where XTable comes in - it tackles interoperability at the table format layer by enabling translation between formats, without rewriting data! But there are challenges at the catalog layer: - Many vendor platforms today require users to adopt proprietary catalogs in order to fully support open table formats - Different teams may rely on distinct catalogs as part of the ecosystem they are part of (hence fragmented) This is where the new Multi-Catalog sync capability comes in. Now, you can: ✅ Sync table metadata from one catalog (like Hive Metastore) to others (like AWS Glue, etc.) ✅ No need to rewrite data or recreate tables ✅ Share the same table across platforms and engines, without fragmentation or vendor lock-in Read the full blog: https://lnkd.in/eDYmceRW
2 Comments

Like Comment Share
Apache XTable (Incubating) reposted this
Dipankar Mazumdar

Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”
4mo
Report this post
Deletion Vector Support in Apache XTable (Incubating) We have made significant progress in interoperability at the metadata layer, i.e. syncing schemas, commit logs, partition specs, stats, etc. across Apache Iceberg, Apache Hudi & Delta Lake. And now there's growing effort to extend that interoperability into other layers as well. Take Apache XTable for example: - It began with metadata translation between formats like Delta, Hudi, and Iceberg (table format layer) - Then it expanded to multi-catalog sync across Hive, Glue & other catalogs (catalog layer) - And now, the latest RFC brings support for deletion vector conversion, i.e. enabling logical deletes to be translated from Delta Lake into Apache Iceberg (data layer) What are Deletion Vectors? Deletion vectors are a way to track deleted rows without modifying the original data files. Think of them as sidecar files that store which rows to ignore during reads. Delta Lake introduced Deletion Vectors (DVs) to support row-level delete operations using compressed bitmaps (e.g., RoaringBitmap) Apache Iceberg on the other hand, tracks deletes via position delete files (v2) , stored in Parquet with file_path and poscolumns. The latest RFC proposes native support for converting DVs from Delta Lake to Iceberg format preserving semantics and correctness in converted tables. What’s being proposed: ✅ Parse DVs from Delta’s commit logs or separate files (inline or referenced). ✅ Stream deleted row ordinals efficiently using a new InternalDeletionVector abstraction. ✅ Convert them into Iceberg-compatible position delete files. ✅ Write them to a dedicated deletion-vectors/ directory to avoid polluting partition data dirs. ✅ Add them to Iceberg manifests via a row-level transaction. This is a great step towards the next phase - where we focus on interoperability at various layers of the #lakehouse stack. Especially at the data layer where deletions needs to be preserved for correctness when interoperating between formats. Looking forward! #dataengineering #softwareengineering
2 Comments

Like Comment Share
Apache XTable (Incubating) reposted this
Dipankar Mazumdar

Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”
5mo
Report this post
My talk from Netflix's Data Engineering Forum is now available on YouTube 🎉 I presented about Apache XTable (Incubating) and how it enables: ✅ interoperability between open table formats ✅ multi catalog syncing capabilities The talk starts with a deep-dive on open table formats - Apache Hudi, Apache Iceberg & Delta Lake and the need for openness & interoperability. So, if you are curious in general about the lakehouse space, this should give a good idea (+ a quick demo). Once again, I really appreciate Xinran Waibel & the rest of the data team at Netflix for having me and share our work with such a vibrant community. PS: The entire playlist is a gem, so should be a good weekend watch. #dataengineering #softwareengineering
11 Comments

Like Comment Share
Apache XTable (Incubating) reposted this
Dipankar Mazumdar

Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”
5mo
Report this post
Sync Open Table Formats to Multiple Catalog at once! Open table formats like Apache Hudi, Apache Iceberg & Delta Lake have fundamentally shifted how organizations approach data storage and management. These formats sets an open and flexible data foundation, allowing enterprises to select compute engines best suited to their workloads. And frees them from the limitations of proprietary storage formats (vendor lock-ins). Yet, achieving a truly open data architecture goes beyond simply adopting open table formats. It requires seamless "interoperability" across open table formats, catalogs, and compute engines. - Apache XTable (Incubating)™ takes a major step toward this goal by addressing interoperability challenges at the "table format" layer. - It enables users to translate from one table format to another (e.g. Hudi to Iceberg) While solutions like XTable have enabled storage format interoperability, the "catalog layer" is quickly emerging as a new potential bottleneck in achieving a truly open lakehouse architecture. Many vendor platforms today require users to adopt proprietary catalogs in order to fully support open table formats. This creates a significant limitation! True interoperability is compromised, forcing organizations to remain within a single vendor’s ecosystem. And constrains their ability to access & manage data freely across different engines. Beyond vendor lock-in, another growing operational challenge is the fragmentation of catalog usage within organizations. Different teams may rely on distinct catalogs as part of the ecosystem they are part of. Sometimes even different implementations of the same specification, such as the Iceberg REST Catalog. While these catalogs may adhere to common APIs or standards, there is no straightforward method to synchronize tables across them without manually recreating or migrating metadata. Introducing "Multi-Catalog Sync" ✅ Automatically sync metadata for a given table from a source catalog to one or more target catalogs ✅ There is no need to recreate table definitions, copy metadata manually, or modify the underlying data files. ✅ a lakehouse table written once and exposed safely across multiple catalogs and platforms. ✅ For example, a table registered in Hive Metastore (HMS) can now be made available in AWS Glue Data Catalog with a single configuration & execution step. Read the blog link in comments! #dataengineering #softwareengineering
1 Comment

Like Comment Share
Apache XTable (Incubating) reposted this
Dipankar Mazumdar

Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”
5mo
Report this post
Research Papers on Lakehouse Systems. If you want to go beyond jargons & understand some of the intricate details about the #lakehouse architecture and open table formats, here are 4 research papers to bookmark for the weekend. This has a mix of both theoretical concepts and applied use cases touching on technologies like Apache XTable (Incubating), Apache Iceberg, Apache Hudi & Delta Lake. ✅ Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics - https://lnkd.in/dMy_rRcg ✅ The Data Lakehouse: Data Warehousing and More - https://lnkd.in/dMGJuNJe ✅ Analyzing and Comparing Lakehouse Storage Systems - https://lnkd.in/dnT6G5RF ✅ XTable in Action: Seamless Interoperability in Data Lakes - https://lnkd.in/dHmAqrxM Happy reading! #dataengineering #softwareengineering
8 Comments

Like Comment Share
Apache XTable (Incubating) reposted this
Kyle Weller

VP of Product @ Onehouse.ai | ex Azure Databricks | "EM R U Ready for better Spark? 🤘"
5mo
Report this post
I spy with my little eye... 👀 Apache XTable (Incubating) powered OneLake format conversions on the main stage with Satya Nadella at Microsoft Build conference 👏
5 Comments

Like Comment Share

LinkedIn respects your privacy

Apache XTable (Incubating)

Data Infrastructure and Analytics

Menlo Park, CA 6,503 followers

Seamless cross-table interop between Apache Hudi, Delta Lake, and Apache Iceberg

About us

Locations

Updates

Join now to see what you are missing

Similar pages

Apache Hudi

Onehouse

Apache Iceberg

Delta Lake

Apache Polaris (Incubating)

Unity Catalog

Apache Gluten

Apache Arrow

Apache Doris

DuckDB

Browse jobs

Specialist jobs

Developer jobs

Engineer jobs

Data Engineer jobs

Junior Engineer jobs

Data Analyst jobs

Human Resources Specialist jobs

Administrator jobs

Architect jobs

Software Engineer jobs

Solutions Architect jobs

Director jobs

Analyst jobs

Vice President jobs