Apache Ozone’s cover photo
Apache Ozone

Apache Ozone

Technology, Information and Internet

Scalable, distributed object store designed for lakehouse workloads, AI/ML, and cloud-native applications.

About us

Website
https://ozone.apache.org/
Industry
Technology, Information and Internet
Company size
51-200 employees
Type
Nonprofit

Updates

  • Apache Ozone reposted this

    View profile for Uma Maheswara Rao Gangumalla

    Director of Engineering @ Cloudera, Apache Software Foundation Member, Apache {Hadoop, Ozone, BookKeeper, Ratis...} PMC Member,Served as Community Over Code/ApacheCon Conference Tracks Chair

    Excited to invite you all to be part of the upcoming storage meetup — “Next-Generation Storage for the Modern Data Lakehouse and AI” — happening on November 20, 2025, at the Cloudera office in Santa Clara! If you are around, let's catch up in person. 👉 Event details & RSVP: https://lnkd.in/gD8-XHkv This session brings together open-source enthusiasts, data engineers, and storage system experts to discuss how the Apache Ozone ecosystem continues to evolve for large-scale, AI-ready data platforms. Here’s the exciting lineup: 🕓 4:30 PM – 5:00 PM | Check-in, Welcome & Networking 🗣 5:00 PM – 5:10 PM | Apache Ozone Adoption Growth — Shiv Moorthy 📈 Insights into Ozone’s expanding adoption and community momentum. 💡 5:10 PM – 5:40 PM | Apache Ozone Best Practices at ShopeeYIYANG ZHOU How Shopee manages billions of objects in production using lifecycle and storage-class optimization. ⚙️ 5:40 PM – 6:00 PM | Event Notifications for Distributed Storage (Ozone) — Paul Scott-Murphy (Cirata CTO) Exploring design trade-offs, performance optimizations, and downstream integrations for event-driven systems. 🔐 6:00 PM – 6:20 PM | AWS STS Design for Ozone S3 — Fabian Morgan & Madhan Neethiraj (Cloudera ) Building scalable, secure, token-based access for Ozone’s S3 workloads. 🤖 6:20 PM – 6:40 PM | GenAI: S3Vector Buckets API Support in Ozone — Swaminathan Balachandran & saketa chalamchala (Cloudera) Enabling fast vector-store backends with S3Vector compatibility for GenAI workloads. 🚀 6:40 PM – 7:00 PM | Apache Ozone State of the Union — Rishabh Patel & Sammi Chen Updates on new features, releases, and what’s next for the Ozone roadmap. 🍻 7:00 PM – 8:00 PM | Networking, Q&A, and Refreshments If you’re passionate about scalable storage, data lakehouse design, or AI infrastructure, this event is for you. Looking forward to connecting with the community and exchanging ideas! #ApacheOzone #DataEngineering #StorageSystems #AIInfrastructure #OpenSource #FutureOfData #Meetup #CluderaObjectStoarge #AIStore #ObjectStoarge #S3Store #PureOpenSource #datalakes Sergio Gago Arpit Agarwal Shiv Moorthy Karthik Krishnamoorthy Sunitha Velpula Dipankar Mazumdar Samriddhi Bhatnagar

  • Join us for a full evening of deep tech talks, live demos, and real-world insights from Cirata, Cloudera, and Shopee — plus food, drinks, and networking with the builders shaping the future of scalable data infrastructure!

    View profile for Samriddhi Bhatnagar

    Open Source Community Leader (APAC & AMER & EMEA)

    🚀 Exciting Announcement! We’re hosting our next Future of Data – Silicon Valley meetup at the Cloudera Santa Clara office on November 20, 2025! Join us for an evening of deep-dive sessions on Next-Generation Storage for the Modern Data Lakehouse and AI, featuring experts from the Apache Ozone community. Get insights into Ozone S3 design on AWS, data tiering, disk balancing, and best practices from Shopee — plus plenty of networking opportunities with data and AI enthusiasts. 📅 Date: November 20, 2025 🕟 Time: 4:30 PM – 8:00 PM PST 📍 Venue: Cloudera Office, Santa Clara 👉 Click here to learn more and RSVP: https://lnkd.in/gYWXMQir Come connect, learn, and shape the future of Data + AI with us! 💡 Uma Maheswara Rao Gangumalla| Arpit Agarwal| Shiv Moorthy| Paul Scott-Murphy| Swaminathan Balachandran| saketa chalamchala| Sammi Chen| YIYANG ZHOU| Dipankar Mazumdar| Laura Hughes| Donna Beasley| Sergio Gago| Leo Brunnick| David Dichmann| Diego Mastroianni, PhD| Jeff Healey| Wim Stoop| #FutureOfData #ApacheOzone #AI #DataLakehouse #OpenSource #Cloudera #Meetup #SantaClara

  • Apache Ozone reposted this

    Wow! Congrats!

    View profile for Venkateswara Varma Srivatsavaya

    Principal Solutions Architect @ Cloudera

    I am happy to share that my paper “Tackling Power Consumption Challenges in the Tech Industry: Apache Ozone’s Role in Greener Data Centers” got published in the International Journal of Emerging Research in Engineering and Technology In this paper, I dive into power challenges tech industry is facing with the explosion of compute needs with AI workloads. And how Ozone can help reduce Datacenter power consumption by significantly cutting down your storage footprint (almost half) compared to traditional 3 way replication while maintaining the same reliability. Provided a sample case study on how large organizations can have significant energy savings while lowering hardware procurement costs and power/cooling/DC space needs, at the same time contributing to great Carbon savings. With the help of Ozone, organizations can realize substantial Total Cost of Ownership savings with a favorable Return on Investment while aligning with their ESG goals. Full paper details are below, https://lnkd.in/eFhHpdhk

  • Apache Ozone reposted this

    View profile for Souvik Pratiher

    Solutions Architect at Databricks UK&I

    Great overview — from a lakehouse architecture lens this is especially compelling: • By combining a true object store with file-system semantics, Apache Ozone offers the atomic rename, consistent listing and strong metadata guarantees that open-table formats like Apache Iceberg rely on for reliable merges, deletes and time-travel. • The dual S3 + Hadoop-FS interfaces mean you can standardise on one storage layer for your lake (ingest, archive, raw) and analytics tier (Spark, Trino, Presto), avoiding “object store semantics” surprise behaviours. • The decoupled metadata plane (OM) and data plane (SCM), together with Raft-based replication, addresses a classic scaling-and-consistency pain-point in large lakehouse deployments — especially at PB+ scale where small-file and rename behaviours bite you. • And for organisations in hybrid / on-prem / edge scenarios the architecture clearly shows promise for an enterprise lakehouse strategy that isn’t 100% cloud-native. If you’re building or migrating a lakehouse platform (e.g., via Databricks, Spark + Iceberg, or similar), I’d bookmark Ozone’s role as “the storage foundation that ticks both performance & semantics boxes” — then verify how your table format, data-engine and governance layer align end-to-end.

    View profile for Dipankar Mazumdar

    Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

    What is Apache Ozone? This is my first exploration post of Apache Ozone - an open, distributed storage system designed for lakehouse + AI workloads. Sounds familiar? Well, yes of course we have distributed object stores like Amazon S3, Azure Blob, MinIO, Ceph, etc. Ozone reimagines object storage for large-scale, on-prem and hybrid environments! What it brings to the table? - Open-source, cloud-native object + file store designed for billions of objects and hundreds of PB - Dual-access semantics: S3 API for modern data platforms + Hadoop FS semantics - Strong consistency with no NameNode bottleneck, thanks to a fully decoupled metadata (OM) and storage (SCM) plane coordinated via Apache Ratis (Raft) - Proven in production PB+ scale What is it Under the hood? ✅ Containerized block storage: data is stored as batches of blocks inside “containers” managed by RocksDB enabling 400 TB+ dense nodes and predictable IO ✅ Snapshots + SnapshotDiff: versioned, point-in-time views for incremental replication, rollback, or reproducible AI corpora ✅ Erasure Coding: 3+2 or 6+3 schemes cut storage cost by ~50 % while maintaining durability ✅ Multi-OM/SCM HA: horizontal metadata scaling + fault isolation via Ratis quorums Use Cases: - Open Lakehouse storage: serves as the data lake storage for open table formats (Apache Iceberg, Hudi, Delta Lake). Provides the atomic rename + consistent listing semantics required by these formats, while exposing S3 APIs - RAG pipelines: Ozone Snapshots let LLMs retrieve from a consistent corpus version pinning datasets for traceability and repeatability across experiments. - Hybrid cloud + edge: decoupled OM/SCM layers and Ratis replication enable geo-distributed deployments and rack-aware placement What I see is - other open source object stores stop at S3-compatibility. Ozone goes further by bringing file-system semantics, snapshots, and strong consistency to the object-storage world. Also, its decoupled OM/SCM architecture & Ratis replication make it resilient across racks, data centers, and even clouds enabling "hybrid" deployments that stay online when a single cloud region goes dark. Go explore (link in comments) #dataengineering #softwareengineering

    • No alternative text description for this image
  • Apache Ozone reposted this

    View profile for Dipankar Mazumdar

    Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

    What is Apache Ozone? This is my first exploration post of Apache Ozone - an open, distributed storage system designed for lakehouse + AI workloads. Sounds familiar? Well, yes of course we have distributed object stores like Amazon S3, Azure Blob, MinIO, Ceph, etc. Ozone reimagines object storage for large-scale, on-prem and hybrid environments! What it brings to the table? - Open-source, cloud-native object + file store designed for billions of objects and hundreds of PB - Dual-access semantics: S3 API for modern data platforms + Hadoop FS semantics - Strong consistency with no NameNode bottleneck, thanks to a fully decoupled metadata (OM) and storage (SCM) plane coordinated via Apache Ratis (Raft) - Proven in production PB+ scale What is it Under the hood? ✅ Containerized block storage: data is stored as batches of blocks inside “containers” managed by RocksDB enabling 400 TB+ dense nodes and predictable IO ✅ Snapshots + SnapshotDiff: versioned, point-in-time views for incremental replication, rollback, or reproducible AI corpora ✅ Erasure Coding: 3+2 or 6+3 schemes cut storage cost by ~50 % while maintaining durability ✅ Multi-OM/SCM HA: horizontal metadata scaling + fault isolation via Ratis quorums Use Cases: - Open Lakehouse storage: serves as the data lake storage for open table formats (Apache Iceberg, Hudi, Delta Lake). Provides the atomic rename + consistent listing semantics required by these formats, while exposing S3 APIs - RAG pipelines: Ozone Snapshots let LLMs retrieve from a consistent corpus version pinning datasets for traceability and repeatability across experiments. - Hybrid cloud + edge: decoupled OM/SCM layers and Ratis replication enable geo-distributed deployments and rack-aware placement What I see is - other open source object stores stop at S3-compatibility. Ozone goes further by bringing file-system semantics, snapshots, and strong consistency to the object-storage world. Also, its decoupled OM/SCM architecture & Ratis replication make it resilient across racks, data centers, and even clouds enabling "hybrid" deployments that stay online when a single cloud region goes dark. Go explore (link in comments) #dataengineering #softwareengineering

    • No alternative text description for this image
  • Thank you Dipankar Mazumdar for sharing your thoughts. 👍

    View profile for Dipankar Mazumdar

    Director-Data+GenAI @Cloudera | Apache Iceberg, Hudi Contributor | Author of “Engineering Lakehouses”

    What is Apache Ozone? This is my first exploration post of Apache Ozone - an open, distributed storage system designed for lakehouse + AI workloads. Sounds familiar? Well, yes of course we have distributed object stores like Amazon S3, Azure Blob, MinIO, Ceph, etc. Ozone reimagines object storage for large-scale, on-prem and hybrid environments! What it brings to the table? - Open-source, cloud-native object + file store designed for billions of objects and hundreds of PB - Dual-access semantics: S3 API for modern data platforms + Hadoop FS semantics - Strong consistency with no NameNode bottleneck, thanks to a fully decoupled metadata (OM) and storage (SCM) plane coordinated via Apache Ratis (Raft) - Proven in production PB+ scale What is it Under the hood? ✅ Containerized block storage: data is stored as batches of blocks inside “containers” managed by RocksDB enabling 400 TB+ dense nodes and predictable IO ✅ Snapshots + SnapshotDiff: versioned, point-in-time views for incremental replication, rollback, or reproducible AI corpora ✅ Erasure Coding: 3+2 or 6+3 schemes cut storage cost by ~50 % while maintaining durability ✅ Multi-OM/SCM HA: horizontal metadata scaling + fault isolation via Ratis quorums Use Cases: - Open Lakehouse storage: serves as the data lake storage for open table formats (Apache Iceberg, Hudi, Delta Lake). Provides the atomic rename + consistent listing semantics required by these formats, while exposing S3 APIs - RAG pipelines: Ozone Snapshots let LLMs retrieve from a consistent corpus version pinning datasets for traceability and repeatability across experiments. - Hybrid cloud + edge: decoupled OM/SCM layers and Ratis replication enable geo-distributed deployments and rack-aware placement What I see is - other open source object stores stop at S3-compatibility. Ozone goes further by bringing file-system semantics, snapshots, and strong consistency to the object-storage world. Also, its decoupled OM/SCM architecture & Ratis replication make it resilient across racks, data centers, and even clouds enabling "hybrid" deployments that stay online when a single cloud region goes dark. Go explore (link in comments) #dataengineering #softwareengineering

    • No alternative text description for this image
  • Apache Ozone reposted this

    View profile for Daoud AbdelMonem Faleh

    An Architect who Codes BUGS

    MinIO has recently and silently decided to stop releasing binaries for its community edition. This has left thousands of teams in the dark as their image pulls fail, their CI/CD pipelines turn red, and their quickly-spun temporary environments miss a critical component. While this decision may make business sense for the company, its abrupt nature -coupled with a lack of communication and effectively no notice- has left the user community with a sense of betrayal. This situation has once again raised questions about the sustainability of open-source projects. Once again, we are learning the lesson the hard way: open source isn't truly open unless it comes with open governance (period). This sheds light on the importance of open-source foundations and the immense role they play within the ecosystem. In this particular case, while many alternatives exist and you may be tempted to quickly settle for one, do yourself a favor by carefully choosing a project that is backed by an established foundation with clear and transparent governance. Apache Ozone comes to mind, but I am sure there are many viable options. Also, take the time to establish an open-source strategy or committee within your organization that can create clear guidelines for Open-source projects adoption.

  • Apache Ozone reposted this

    View organization page for Apache Ozone

    450 followers

    If you’re looking for an S3-compatible object store that’s still 100% open source — no license surprises — check out Apache Ozone. ✅ Apache License 2.0 ✅ Vendor-neutral ✅ Community-driven As a Top-Level Apache project, Ozone will stay Apache-licensed forever. Try it on Docker today: https://lnkd.in/gPrsY7vQ And check out our Apache Ozone 2.0 release: https://lnkd.in/gMY9Mivi

Similar pages