big data free download

Showing 315 open source projects for "big data"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Employees get more done with Rippling
Streamline your business with an all-in-one platform for HR, IT, payroll, and spend management.

Effortlessly manage the entire employee lifecycle, from hiring to benefits administration. Automate HR tasks, ensure compliance, and streamline approvals. Simplify IT with device management, software access, and compliance monitoring, all from one dashboard. Enjoy timely payroll, real-time financial visibility, and dynamic spend policies. Rippling empowers your business to save time, reduce costs, and enhance efficiency, allowing you to focus on growth. Experience the power of unified management with Rippling today.

Learn More
1

Big-AGI

AI suite powered by state-of-the-art models and providing advanced AI

...The workspace includes advanced features like Beam, which enables multi-model consensus and comparative responses to improve reliability and reduce hallucination, and robust persona management to tailor responses to specific roles or workflows. Big-AGI can be self-hosted or deployed in cloud environments, giving users full control over data and model access limits and avoiding vendor lock-in.

Downloads: 3 This Week

Last Update: 2026-02-04
See Project
2

data.table

Extends base R’s data for high-performance data manipulation

data.table is an R package that extends base R’s data.frame for high-performance data manipulation. It offers concise syntax, blazing speed, and memory-efficient operations. It supports fast file reading/writing, joins, grouping, reshaping, and updates by reference. It is heavily used in large data workflows, big data in R, production pipelines, etc. Extremely efficient grouping/aggregation/summarization; can handle very large datasets (hundreds of millions to billions of rows) in memory (if available). ...

Downloads: 1 This Week

Last Update: 2026-01-27
See Project
3

Genie

Distributed Big Data Orchestration Service

Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.

Downloads: 0 This Week

Last Update: 2025-08-05
See Project
4

Querybook

Big Data Querying UI, combining collocated table metadata

Querybook is Pinterest’s open-source big data IDE via a notebook interface. Querybook’s core focus is to make composing queries, creating analyses, and collaborating with others as simple as possible. Organize rich text, queries, and charts into a notebook to easily document your analyses. Work collaboratively with others in a DataDoc and get real-time updates. The Query Editor is aware of your tables and their columns, as such it provides autocompletion, syntax highlighting, and the ability to hover or click on a table to view its information. ...

Downloads: 3 This Week

Last Update: 2025-04-22
See Project
Quality Management Software
Ideal for small to medium-sized businesses. Pay for all the modules or only the ones you need.

isoTracker Quality Management is a popular cloud-based quality management software (QMS) that is used by small to medium sized businesses on a worldwide basis. It helps to manage ISO 9001, ISO 13485, ISO 22000, ISO 17025, ISO 14001 systems...plus many similar other systems. It also conforms to the requirements of 21 CFR Part 11.

Learn More
5

pandas

Fast, flexible and powerful Python data analysis toolkit

pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase. pandas is continuously being developed to be a fundamental high-level building...

Downloads: 99 This Week

Last Update: 2026-01-21
See Project
6

Apache HBase

Get random, realtime read/write access to your Big Data

Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables, billions of rows X millions of columns, atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. ...

Downloads: 5 This Week

Last Update: 2025-11-14
See Project
7

Logan

Logan is a lightweight case logging system based on mobile platform

...To put it simply, the traditional idea is to piece together the problems that appear in the logs of each system, but the new idea is to aggregate and analyze all the logs generated by the user to find the scenes with problems. In the future, we will provide a data platform based on Logan big data, including advanced functions such as machine learning, troubleshooting log solution, and big data feature analysis.

Downloads: 7 This Week

Last Update: 2025-08-05
See Project
8

Fluid

Fluid, elastic data abstraction and acceleration for BigData/AI apps

Fluid, elastic data abstraction and acceleration for BigData/AI applications in the cloud. Provide DataSet abstraction for underlying heterogeneous data sources with multidimensional management in a cloud environment. Enable dataset warmup and acceleration for data-intensive applications by using a distributed cache in Kubernetes with observability, portability, and scalability. Taking characteristics of application and data into consideration for cloud application/dataset scheduling to...

Downloads: 9 This Week

Last Update: 2025-10-31
See Project
9

FinMind

Open Data, more than 50 financial data

In the era of big data, data is the foundation of everything. We collect more than 50 kinds of Taiwan stock related information and provide download, online analysis, and backtesting. Regardless of the program, you can download data through the api provided by FinMind, or you can download data directly from the website. After data is available, statistical analysis, regression analysis, time series analysis, machine learning, and deep learning can be performed. ...

Downloads: 2 This Week

Last Update: 2026-02-03
See Project
Software for managing apps and accounts | WebCatalog
Tired of juggling countless browser tabs? WebCatalog Desktop turns your favorite web apps into dedicated desktop apps

Turn websites into desktop apps with WebCatalog Desktop—your all-in-one tool to manage apps and accounts. Switch between multiple accounts, organize apps by workflow, and access a curated catalog of desktop apps for Mac and Windows.

Learn More
10

GridDB

GridDB is a next-generation open source database

A cyber-physical systems is a system that collects a variety of data in physical space (the real world), analyzes and converts it into knowledge in cyberspace, and feeds the knowledge back to the real world to revitalize industry and solve social problems. GridDB is an open database that enables real-time processing of vast amounts of time-series data in physical space, which is necessary to realize a cyber-physical system. Multi-model architecture capable of supporting various data stores...

Downloads: 2 This Week

Last Update: 2025-06-03
See Project
11

XCharts

A charting and data visualization library for Unity

A charting and data visualization library for Unity. Unity data visualization chart plugin. A UGUIpowerful, easy-to-use, parameter-configurable data visualization chart plug-in. It supports ten built-in charts. A powerful, easy-to-use, configurable charting and data visualization library for Unity. Visual configuration of parameters, real-time preview of effects, and pure code drawing without additional resources. Support ten built-in charts such as line chart, column chart, pie chart, radar...

Downloads: 6 This Week

Last Update: 2025-03-16
See Project
12

OnlineStats.jl

Single-pass algorithms for statistics

OnlineStats does statistics and data visualization for big/streaming data via online algorithms. High-performance single-pass algorithms for statistics and data viz. Updated one observation at a time. Algorithms use O(1) memory. Algorithms use O(1) memory.

Downloads: 4 This Week

Last Update: 2025-12-01
See Project
13

Arroyo

Distributed stream processing engine in Rust

Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.

Downloads: 2 This Week

Last Update: 2025-12-01
See Project
14

Apache Hudi

Upserts, Deletes And Incremental Processing on Big Data

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...

Downloads: 1 This Week

Last Update: 2025-12-18
See Project
15

HugeGraph

A graph database that supports more than 100+ billion data

...HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports Gremlin graph query language and RESTful API but also provides commonly used graph algorithm APIs. To help users easily implement various queries and analyses, HugeGraph has a full range of accessory tools, such as supporting distributed storage, data replication, scaling horizontally, and supports many built-in backends of storage engines.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
16

Apache InLong

Apache InLong - a one-stop integration framework for massive data

...InLong was originally built at Tencent, which has served online businesses for more than 8 years, to support massive data (data scale of more than 80 trillion pieces of data per day) reporting services in big data scenarios. The entire platform has integrated 5 modules: Ingestion, Convergence, Caching, Sorting, and Management, so that the business only needs to provide data sources, data service quality, data landing clusters and data landing formats.

Downloads: 0 This Week

Last Update: 2025-11-13
See Project
17

Apache Doris

MPP-based interactive SQL data warehousing for reporting and analysis

Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Apache Doris can meet various data analysis demands, including history data reports, real-time data analysis, interactive data analysis, and exploratory data analysis. Make your data analysis easier! Support standard SQL language, compatible with MySQL...

Downloads: 0 This Week

Last Update: 2026-01-30
See Project
18

Apache Bigtop

Bigtop is an Apache Foundation project for Infrastructure Engineers

Apache Bigtop is a project focused on building and packaging the Hadoop ecosystem and related big data components. It provides a consistent framework for testing, packaging, and deploying Hadoop distributions, including tools like HDFS, YARN, Spark, Hive, HBase, and more. By maintaining cross-platform builds (RPMs, DEBs, Docker images, and Kubernetes support), Bigtop makes it easier for organizations to deploy big data stacks in different environments. ...

Downloads: 0 This Week

Last Update: 2025-09-03
See Project
19

Blue Whale Configuration Platform

Blue Whale smart cloud configuration platform

Has accumulated experience in supporting hundreds of Tencent businesses, compatible with various complex system architectures, born in operation and maintenance, and proficient in operation and maintenance. From configuration management to job execution, task scheduling and monitoring self-healing, and then through operation and maintenance big data analysis to assist operational decision-making, it covers the full-cycle assurance management of business operations in a comprehensive manner. The open PaaS has a powerful development framework and scheduling engine, as well as a complete operation and maintenance development training system, which helps the rapid transformation and upgrading of operation and maintenance. ...

Downloads: 1 This Week

Last Update: 2025-05-30
See Project
20

JuiceFS

JuiceFS is a distributed POSIX file system built on top of Redis

...Whether it's a public cloud, private cloud, or hybrid cloud, JuiceFS is available on any cloud of your choice and delivers flexibility, availability, scalability and strong consistency for your data-intensive applications. Purposely built to serve big data scenarios such as self-driving model training, recommendation engine, and Next-generation Gene Sequencing, JuiceFS specializes in high performance and easier management of tens of billion of files management. We bring JuiceFS to developers with the hope that it will be easy to use, reliable, high-performance, and solve all your file storage problems in a cloud environment.

Downloads: 2 This Week

Last Update: 2025-12-02
See Project
21

Apache RocketMQ

Distributed messaging and streaming platform with low latency

...A variety of cross language clients, such as Java, C/C++, Python, Go. Pluggable transport protocols, such as TCP, SSL, AIO. Built-in message tracing capability, also support opentracing. Versatile big-data and streaming ecosytem integration. Message retroactivity by time or offset. Reliable FIFO and strict ordered messaging in the same queue. Efficient pull and push consumption model. Million-level message accumulation capacity in a single queue. Multiple messaging protocols like JMS and OpenMessaging. Flexible distributed scale-out deployment architecture. ...

Downloads: 3 This Week

Last Update: 2025-12-24
See Project
22

ODD Platform

First open-source data discovery and observability platform

Unlock the power of big data with OpenDataDiscovery Platform. Experience seamless end-to-end insights, powered by unprecedented observability and trust - from ingestion to production - while building your ideal tech stack! Democratize data and accelerate insights. Find data that fits your use case and discover hints left by your peers to leverage existing knowledge.

Downloads: 0 This Week

Last Update: 2025-02-19
See Project
23

.NET for Apache Spark

A free, open-source, and cross-platform big data analytics framework

.NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...

Downloads: 2 This Week

Last Update: 2 days ago
See Project
24

ElasticJob

Distributed scheduled job framework

ElasticJob is a distributed scheduling solution consisting of two separate projects, ElasticJob-Lite and ElasticJob-Cloud. ElasticJob-Lite is a lightweight, decentralized solution that provides distributed task sharding services. ElasticJob-Cloud uses Mesos to manage and isolate resources. It uses a unified job API for each project. Developers only need code one time and can deploy at will. Support job sharding and high availability in distributed system. Scale out for throughput and...

Downloads: 1 This Week

Last Update: 2026-01-31
See Project
25

Curve

Curve is a sandbox project hosted by the CNCF Foundation

A cloud-native distributed storage system. A sandbox project hosted by the CNCF Foundation. Curve is a modern storage system developed by netease, currently supporting file storage(CurveFS) and block storage(CurveBS). Now it's hosted at CNCF as a sandbox project. The performance, mixed, capacity cloud disk or persistent volume of virtual machine/container, and remote disks of physical machines. High-performance separation of storage and computation architecture: high-performance and low...

Downloads: 6 This Week

Last Update: 2024-03-13
See Project