Apache Ozone

Director of Engineering @ Cloudera, Apache Software Foundation Member, Apache {Hadoop, Ozone, BookKeeper, Ratis...} PMC Member,Served as Community Over Code/ApacheCon Conference Tracks Chair

1d Edited

Excited to invite you all to be part of the upcoming storage meetup — “Next-Generation Storage for the Modern Data Lakehouse and AI” — happening on November 20, 2025, at the Cloudera office in Santa Clara! If you are around, let's catch up in person. 👉 Event details & RSVP: https://lnkd.in/gD8-XHkv This session brings together open-source enthusiasts, data engineers, and storage system experts to discuss how the Apache Ozone ecosystem continues to evolve for large-scale, AI-ready data platforms. Here’s the exciting lineup: 🕓 4:30 PM – 5:00 PM | Check-in, Welcome & Networking 🗣 5:00 PM – 5:10 PM | Apache Ozone Adoption Growth — Shiv Moorthy 📈 Insights into Ozone’s expanding adoption and community momentum. 💡 5:10 PM – 5:40 PM | Apache Ozone Best Practices at Shopee — YIYANG ZHOU How Shopee manages billions of objects in production using lifecycle and storage-class optimization. ⚙️ 5:40 PM – 6:00 PM | Event Notifications for Distributed Storage (Ozone) — Paul Scott-Murphy (Cirata CTO) Exploring design trade-offs, performance optimizations, and downstream integrations for event-driven systems. 🔐 6:00 PM – 6:20 PM | AWS STS Design for Ozone S3 — Fabian Morgan & Madhan Neethiraj (Cloudera ) Building scalable, secure, token-based access for Ozone’s S3 workloads. 🤖 6:20 PM – 6:40 PM | GenAI: S3Vector Buckets API Support in Ozone — Swaminathan Balachandran & saketa chalamchala (Cloudera) Enabling fast vector-store backends with S3Vector compatibility for GenAI workloads. 🚀 6:40 PM – 7:00 PM | Apache Ozone State of the Union — Rishabh Patel & Sammi Chen Updates on new features, releases, and what’s next for the Ozone roadmap. 🍻 7:00 PM – 8:00 PM | Networking, Q&A, and Refreshments If you’re passionate about scalable storage, data lakehouse design, or AI infrastructure, this event is for you. Looking forward to connecting with the community and exchanging ideas! #ApacheOzone #DataEngineering #StorageSystems #AIInfrastructure #OpenSource #FutureOfData #Meetup #CluderaObjectStoarge #AIStore #ObjectStoarge #S3Store #PureOpenSource #datalakes Sergio Gago Arpit Agarwal Shiv Moorthy Karthik Krishnamoorthy Sunitha Velpula Dipankar Mazumdar Samriddhi Bhatnagar

Next-generation storage for the modern Data Lakehouse and AI, Thu, Nov 20, 2025, 4:30 PM | Meetup meetup.com

1 Comment

Apache Ozone

450 followers

2d

1 Comment

Apache Ozone

450 followers

3d

Join us for a full evening of deep tech talks, live demos, and real-world insights from Cirata, Cloudera, and Shopee — plus food, drinks, and networking with the builders shaping the future of scalable data infrastructure!

Samriddhi Bhatnagar

Open Source Community Leader (APAC & AMER & EMEA)

4d

🚀 Exciting Announcement! We’re hosting our next Future of Data – Silicon Valley meetup at the Cloudera Santa Clara office on November 20, 2025! Join us for an evening of deep-dive sessions on Next-Generation Storage for the Modern Data Lakehouse and AI, featuring experts from the Apache Ozone community. Get insights into Ozone S3 design on AWS, data tiering, disk balancing, and best practices from Shopee — plus plenty of networking opportunities with data and AI enthusiasts. 📅 Date: November 20, 2025 🕟 Time: 4:30 PM – 8:00 PM PST 📍 Venue: Cloudera Office, Santa Clara 👉 Click here to learn more and RSVP: https://lnkd.in/gYWXMQir Come connect, learn, and shape the future of Data + AI with us! 💡 Uma Maheswara Rao Gangumalla| Arpit Agarwal| Shiv Moorthy| Paul Scott-Murphy| Swaminathan Balachandran| saketa chalamchala| Sammi Chen| YIYANG ZHOU| Dipankar Mazumdar| Laura Hughes| Donna Beasley| Sergio Gago| Leo Brunnick| David Dichmann| Diego Mastroianni, PhD| Jeff Healey| Wim Stoop| #FutureOfData #ApacheOzone #AI #DataLakehouse #OpenSource #Cloudera #Meetup #SantaClara

Next-generation storage for the modern Data Lakehouse and AI, Thu, Nov 20, 2025, 4:30 PM | Meetup meetup.com

Apache Ozone reposted this

Apache Ozone

450 followers

2mo

Thank for the collective effort!

The Apache Software Foundation

78,908 followers

2mo

🚨 NEWS 🚨 The Apache Software Foundation Announces Apache Ozone 2.0.0 https://buff.ly/E0Mvboh #opensource

Apache Ozone reposted this

Apache Ozone

450 followers

3w

Wow! Congrats!

Venkateswara Varma Srivatsavaya

Principal Solutions Architect @ Cloudera

3w

I am happy to share that my paper “Tackling Power Consumption Challenges in the Tech Industry: Apache Ozone’s Role in Greener Data Centers” got published in the International Journal of Emerging Research in Engineering and Technology In this paper, I dive into power challenges tech industry is facing with the explosion of compute needs with AI workloads. And how Ozone can help reduce Datacenter power consumption by significantly cutting down your storage footprint (almost half) compared to traditional 3 way replication while maintaining the same reliability. Provided a sample case study on how large organizations can have significant energy savings while lowering hardware procurement costs and power/cooling/DC space needs, at the same time contributing to great Carbon savings. With the help of Ozone, organizations can realize substantial Total Cost of Ownership savings with a favorable Return on Investment while aligning with their ESG goals. Full paper details are below, https://lnkd.in/eFhHpdhk

Tackling Power Consumption Challenges in the Tech Industry: Apache Ozone's Role in Greener Data Centers ijeret.org

Apache Ozone reposted this

Souvik Pratiher

Solutions Architect at Databricks UK&I

1w

Great overview — from a lakehouse architecture lens this is especially compelling: • By combining a true object store with file-system semantics, Apache Ozone offers the atomic rename, consistent listing and strong metadata guarantees that open-table formats like Apache Iceberg rely on for reliable merges, deletes and time-travel. • The dual S3 + Hadoop-FS interfaces mean you can standardise on one storage layer for your lake (ingest, archive, raw) and analytics tier (Spark, Trino, Presto), avoiding “object store semantics” surprise behaviours. • The decoupled metadata plane (OM) and data plane (SCM), together with Raft-based replication, addresses a classic scaling-and-consistency pain-point in large lakehouse deployments — especially at PB+ scale where small-file and rename behaviours bite you. • And for organisations in hybrid / on-prem / edge scenarios the architecture clearly shows promise for an enterprise lakehouse strategy that isn’t 100% cloud-native. If you’re building or migrating a lakehouse platform (e.g., via Databricks, Spark + Iceberg, or similar), I’d bookmark Ozone’s role as “the storage foundation that ticks both performance & semantics boxes” — then verify how your table format, data-engine and governance layer align end-to-end.

Dipankar Mazumdar

Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

1w

What is Apache Ozone? This is my first exploration post of Apache Ozone - an open, distributed storage system designed for lakehouse + AI workloads. Sounds familiar? Well, yes of course we have distributed object stores like Amazon S3, Azure Blob, MinIO, Ceph, etc. Ozone reimagines object storage for large-scale, on-prem and hybrid environments! What it brings to the table? - Open-source, cloud-native object + file store designed for billions of objects and hundreds of PB - Dual-access semantics: S3 API for modern data platforms + Hadoop FS semantics - Strong consistency with no NameNode bottleneck, thanks to a fully decoupled metadata (OM) and storage (SCM) plane coordinated via Apache Ratis (Raft) - Proven in production PB+ scale What is it Under the hood? ✅ Containerized block storage: data is stored as batches of blocks inside “containers” managed by RocksDB enabling 400 TB+ dense nodes and predictable IO ✅ Snapshots + SnapshotDiff: versioned, point-in-time views for incremental replication, rollback, or reproducible AI corpora ✅ Erasure Coding: 3+2 or 6+3 schemes cut storage cost by ~50 % while maintaining durability ✅ Multi-OM/SCM HA: horizontal metadata scaling + fault isolation via Ratis quorums Use Cases: - Open Lakehouse storage: serves as the data lake storage for open table formats (Apache Iceberg, Hudi, Delta Lake). Provides the atomic rename + consistent listing semantics required by these formats, while exposing S3 APIs - RAG pipelines: Ozone Snapshots let LLMs retrieve from a consistent corpus version pinning datasets for traceability and repeatability across experiments. - Hybrid cloud + edge: decoupled OM/SCM layers and Ratis replication enable geo-distributed deployments and rack-aware placement What I see is - other open source object stores stop at S3-compatibility. Ozone goes further by bringing file-system semantics, snapshots, and strong consistency to the object-storage world. Also, its decoupled OM/SCM architecture & Ratis replication make it resilient across racks, data centers, and even clouds enabling "hybrid" deployments that stay online when a single cloud region goes dark. Go explore (link in comments) #dataengineering #softwareengineering

Apache Ozone reposted this

Dipankar Mazumdar

Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

1w

What is Apache Ozone? This is my first exploration post of Apache Ozone - an open, distributed storage system designed for lakehouse + AI workloads. Sounds familiar? Well, yes of course we have distributed object stores like Amazon S3, Azure Blob, MinIO, Ceph, etc. Ozone reimagines object storage for large-scale, on-prem and hybrid environments! What it brings to the table? - Open-source, cloud-native object + file store designed for billions of objects and hundreds of PB - Dual-access semantics: S3 API for modern data platforms + Hadoop FS semantics - Strong consistency with no NameNode bottleneck, thanks to a fully decoupled metadata (OM) and storage (SCM) plane coordinated via Apache Ratis (Raft) - Proven in production PB+ scale What is it Under the hood? ✅ Containerized block storage: data is stored as batches of blocks inside “containers” managed by RocksDB enabling 400 TB+ dense nodes and predictable IO ✅ Snapshots + SnapshotDiff: versioned, point-in-time views for incremental replication, rollback, or reproducible AI corpora ✅ Erasure Coding: 3+2 or 6+3 schemes cut storage cost by ~50 % while maintaining durability ✅ Multi-OM/SCM HA: horizontal metadata scaling + fault isolation via Ratis quorums Use Cases: - Open Lakehouse storage: serves as the data lake storage for open table formats (Apache Iceberg, Hudi, Delta Lake). Provides the atomic rename + consistent listing semantics required by these formats, while exposing S3 APIs - RAG pipelines: Ozone Snapshots let LLMs retrieve from a consistent corpus version pinning datasets for traceability and repeatability across experiments. - Hybrid cloud + edge: decoupled OM/SCM layers and Ratis replication enable geo-distributed deployments and rack-aware placement What I see is - other open source object stores stop at S3-compatibility. Ozone goes further by bringing file-system semantics, snapshots, and strong consistency to the object-storage world. Also, its decoupled OM/SCM architecture & Ratis replication make it resilient across racks, data centers, and even clouds enabling "hybrid" deployments that stay online when a single cloud region goes dark. Go explore (link in comments) #dataengineering #softwareengineering

11 Comments

Apache Ozone

450 followers

1w

Thank you Dipankar Mazumdar for sharing your thoughts. 👍

Dipankar Mazumdar

Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

1w

What is Apache Ozone? This is my first exploration post of Apache Ozone - an open, distributed storage system designed for lakehouse + AI workloads. Sounds familiar? Well, yes of course we have distributed object stores like Amazon S3, Azure Blob, MinIO, Ceph, etc. Ozone reimagines object storage for large-scale, on-prem and hybrid environments! What it brings to the table? - Open-source, cloud-native object + file store designed for billions of objects and hundreds of PB - Dual-access semantics: S3 API for modern data platforms + Hadoop FS semantics - Strong consistency with no NameNode bottleneck, thanks to a fully decoupled metadata (OM) and storage (SCM) plane coordinated via Apache Ratis (Raft) - Proven in production PB+ scale What is it Under the hood? ✅ Containerized block storage: data is stored as batches of blocks inside “containers” managed by RocksDB enabling 400 TB+ dense nodes and predictable IO ✅ Snapshots + SnapshotDiff: versioned, point-in-time views for incremental replication, rollback, or reproducible AI corpora ✅ Erasure Coding: 3+2 or 6+3 schemes cut storage cost by ~50 % while maintaining durability ✅ Multi-OM/SCM HA: horizontal metadata scaling + fault isolation via Ratis quorums Use Cases: - Open Lakehouse storage: serves as the data lake storage for open table formats (Apache Iceberg, Hudi, Delta Lake). Provides the atomic rename + consistent listing semantics required by these formats, while exposing S3 APIs - RAG pipelines: Ozone Snapshots let LLMs retrieve from a consistent corpus version pinning datasets for traceability and repeatability across experiments. - Hybrid cloud + edge: decoupled OM/SCM layers and Ratis replication enable geo-distributed deployments and rack-aware placement What I see is - other open source object stores stop at S3-compatibility. Ozone goes further by bringing file-system semantics, snapshots, and strong consistency to the object-storage world. Also, its decoupled OM/SCM architecture & Ratis replication make it resilient across racks, data centers, and even clouds enabling "hybrid" deployments that stay online when a single cloud region goes dark. Go explore (link in comments) #dataengineering #softwareengineering

Apache Ozone reposted this

Daoud AbdelMonem Faleh

An Architect who Codes BUGS

2w

MinIO has recently and silently decided to stop releasing binaries for its community edition. This has left thousands of teams in the dark as their image pulls fail, their CI/CD pipelines turn red, and their quickly-spun temporary environments miss a critical component. While this decision may make business sense for the company, its abrupt nature -coupled with a lack of communication and effectively no notice- has left the user community with a sense of betrayal. This situation has once again raised questions about the sustainability of open-source projects. Once again, we are learning the lesson the hard way: open source isn't truly open unless it comes with open governance (period). This sheds light on the importance of open-source foundations and the immense role they play within the ecosystem. In this particular case, while many alternatives exist and you may be tempted to quickly settle for one, do yourself a favor by carefully choosing a project that is backed by an established foundation with clear and transparent governance. Apache Ozone comes to mind, but I am sure there are many viable options. Also, take the time to establish an open-source strategy or committee within your organization that can create clear guidelines for Open-source projects adoption.

Apache Ozone reposted this

Apache Ozone

450 followers

2w Edited

If you’re looking for an S3-compatible object store that’s still 100% open source — no license surprises — check out Apache Ozone. ✅ Apache License 2.0 ✅ Vendor-neutral ✅ Community-driven As a Top-Level Apache project, Ozone will stay Apache-licensed forever. Try it on Docker today: https://lnkd.in/gPrsY7vQ And check out our Apache Ozone 2.0 release: https://lnkd.in/gMY9Mivi

Pseudo-cluster ozone.apache.org

LinkedIn respects your privacy

Technology, Information and Internet

Scalable, distributed object store designed for lakehouse workloads, AI/ML, and cloud-native applications.

About us

Updates

Join now to see what you are missing

Similar pages

Apache Spark

Cirata

Apache Hive

The Apache Software Foundation

FAST PROCESSING DATA TECH INC.

Putki

Tradery Capital

Data Engineer Germany

London School of AI

Apache Local Community Indore