[go: up one dir, main page]

Showing 174 open source projects for "etl"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • The sales CRM that makes your life easy, so all you have to do is sell. Icon
    The sales CRM that makes your life easy, so all you have to do is sell.

    The simpler way to sell

    Welcome to the simpler way to sell. Pipedrive is CRM software that makes your life easy, for less legwork and more sales. Let us track your sales conversations, eliminate admin tasks, get you more leads and uncover how you win, because your day belongs to you. Join more than 100,000 sales teams around the world that use the CRM rated #1 by SoftwareReviews in 2019. Start your free 14-day trial and get full access – no credit card needed.
    Try it free
  • 1
    Ethereum ETL

    Ethereum ETL

    Python scripts for ETL (extract, transform and load) jobs for Ethereum

    Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery. Ethereum ETL lets you convert blockchain data into convenient formats like CSVs and relational databases.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Embedded Template Library (ETL)

    Embedded Template Library (ETL)

    Embedded Template Library

    C++ is a great language to use for embedded applications and templates are a powerful aspect. The standard library can offer a great deal of well-tested functionality, but there are some parts of the standard library that do not fit well with deterministic behavior and limited resource requirements. These limitations usually preclude the use of dynamically allocated memory and containers with open-ended sizes. What is needed is a template library where the user can declare the size, or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Addax

    Addax

    Addax is a versatile open-source ETL tool

    Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    omniparser

    omniparser

    Native Golang ETL streaming parser and transform library

    Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Self-hosted n8n: No-code AI workflows Icon
    Self-hosted n8n: No-code AI workflows

    Connect workflows. Integrate data

    A free-to-use workflow automation tool, n8n lets you connect all your apps and data in one customizable, no-code platform. Design workflows and process data from a simple, unified dashboard.
    Learn More
  • 5
    Steampipe

    Steampipe

    Zero-ETL, infinite possibilities. Live query APIs, code & more

    ...Your cloud is a live database that changes fast. Don't wait on ETL to sync, or rely on old data. Crunch it where it's born, fueling new use cases and swift decisions.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Logstash

    Logstash

    Centralize, transform and stash your data

    Logstash is a server-side data processing pipeline that dynamically ingests data from numerous sources, transforms it, and ships it to your favorite “stash” regardless of format or complexity. It supports and ingests data of all shapes, sizes and sources, dynamically transforms and prepares this data, and transports it to the output of your choice. Logstash is extensible, with over 200 plugins available to let you create and configure your pipeline how you choose.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    ...Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. Convert the column name to be compatible with Amazon Athena and the AWS Glue Catalog. Run a query against AWS CloudWatchLogs Insights and convert the results to Pandas DataFrame. Get QuickSight dashboard ID given a name and fails if there is more than 1 ID associated with this name. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    SyncLite

    SyncLite

    Build Anything Sync Anywhere

    ...SyncLite enables real-time, transactional data replication and consolidation from various sources including edge/desktop applications using popular embedded databases (SQLite, DuckDB, Apache Derby, H2, HyperSQL), data streaming applications, IoT message brokers, traditional database systems(ETL) and more into a diverse array of databases, data warehouses, and data lakes, enabling AI and ML use-cases at all three levels: Edge, Fog and Cloud. SyncLite's novel CDC replication framework for embedded databases, is designed to assist developers in rapidly building general-purpose data-intensive applications, Gen AI Search/RAG applications for edge, desktop, and mobile environments. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    InfluxDB

    InfluxDB

    The open source time series database

    ...InfluxDB provides infrastructure and application monitoring, IoT monitoring and analytics and more. It has APIs for storing and querying data, processing it in the background for ETL or monitoring and alerting purposes. This data can also be visualized, explored and more to help businesses seize opportunities and make the best decisions. InfluxDB is easy to start and easy to scale. Learn more about it on https://www.influxdata.com/
    Downloads: 35 This Week
    Last Update:
    See Project
  • Spidergap: Top Rated 360 Degree Feedback Software Icon
    Spidergap: Top Rated 360 Degree Feedback Software

    Create and run 360° Feedback assessments that help your employees to take action on personal development.

    With an intuitive interface, Spidergap makes it easy to customize feedback assessments, generate clear reports, and guide employees toward impactful growth. But you’re not just getting software—you’re gaining a team of 360° Feedback Experts to support your strategy, planning, and ROI measurement. Whether you’re running large-scale leadership programs or one-off employee reviews, Spidergap ensures feedback leads to real results. With Spidergap, personal development has never been more effective.
    Learn More
  • 10
    Rubix ML

    Rubix ML

    A high-level machine learning and deep learning library for PHP

    Rubix ML is a free open-source machine learning (ML) library that allows you to build programs that learn from your data using the PHP language. We provide tools for the entire machine learning life cycle from ETL to training, cross-validation, and production with over 40 supervised and unsupervised learning algorithms. In addition, we provide tutorials and other educational content to help you get started using ML in your projects. Our intuitive interface is quick to grasp while hiding alot of power and complexity. Write less code and iterate faster leaving the hard stuff to us. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    NVIDIA Merlin

    NVIDIA Merlin

    Library providing end-to-end GPU-accelerated recommender systems

    ...Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, which is all accessible through easy-to-use APIs. For more information, see NVIDIA Merlin on the NVIDIA developer website. Transform data (ETL) for preprocessing and engineering features. Accelerate your existing training pipelines in TensorFlow, PyTorch, or FastAI by leveraging optimized, custom-built data loaders. Scale large deep learning recommender models by distributing large embedding tables that exceed available GPU and CPU memory. Deploy data transformations and trained models to production with only a few lines of code.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Dungbeetle

    Dungbeetle

    A distributed job server

    Dungbeetle is a metadata and data lineage tracking tool developed by Zerodha to map and visualize how data flows across systems. It helps teams maintain data transparency by tracking dependencies between databases, tables, and reports, offering a centralized view of data pipelines. Dungbeetle is designed to enhance observability and trust in analytics ecosystems.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    TiDB

    TiDB

    Open Source NewSQL Database

    TiDB is an open source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is currently the most actively developed open source NewSQL database, and has a rich set of features including horizontal scalability, strong consistency, and high availability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Prefect

    Prefect

    Prefect is a workflow orchestration framework

    Prefect is an open-source modern workflow orchestration tool for scheduling, monitoring, and managing data workflows and tasks. It enables Python-native pipeline definitions with robust retries, caching, observability, and a powerful UI—ideal for data engineering and ETL processes.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Pyper

    Pyper

    Concurrent Python made simple

    Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    ImportExcel

    ImportExcel

    PowerShell module to import/export Excel spreadsheets, without Excel

    ...Advanced features include adding and formatting tables, setting number/date formats, creating charts, and applying styling or conditional formatting programmatically. The module is optimized for performance (streaming where possible) and supports large datasets, making it useful for ETL tasks, automated reporting, and data analysis in pure PowerShell environments. It integrates well with scheduled jobs and CI pipelines where generating or consuming spreadsheets is part of an automated workflow.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 18
    Superduper

    Superduper

    Superduper: Integrate AI models and machine learning workflows

    ...Developers may leverage Superduper by building compositional and declarative objects that out-source the details of deployment, orchestration versioning, and more to the Superduper engine. This allows developers to completely avoid implementing MLOps, ETL pipelines, model deployment, data migration, and synchronization. Using Superduper is simply "CAPE": Connect to your data, apply arbitrary AI to that data, package and reuse the application on arbitrary data, and execute AI-database queries and predictions on the resulting AI outputs and data.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    Daft

    Daft

    Distributed DataFrame for Python designed for the cloud

    Daft is a framework for ETL, analytics and ML/AI at scale. Its familiar Python Dataframe API is built to outperform Spark in performance and ease of use. Daft plugs directly into your ML/AI stack through efficient zero-copy integrations with essential Python libraries such as Pytorch and Ray. It also allows requesting GPUs as a resource for running models. Daft runs locally with a lightweight multithreaded backend.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Hamilton DAGWorks

    Hamilton DAGWorks

    Helps scientists define testable, modular, self-documenting dataflow

    ...As shown below, it results in readable code that can always be visualized. Hamilton loads that definition and automatically builds the DAG for you. Hamilton brings modularity and structure to any Python application moving data: ETL pipelines, ML workflows, LLM applications, RAG systems, BI dashboards, and the Hamilton UI allows you to automatically visualize, catalog, and monitor execution.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Trellis AI

    Trellis AI

    All-in-one AI framework & toolkit for Claude Code & Cursor

    ...Trellis also includes tooling for monitoring, scheduling, and tracing the execution of complex multi-step jobs, helping teams maintain visibility into how work progresses and where bottlenecks emerge. The platform can integrate with external services, databases, and model endpoints, making it suitable for automation, ETL pipelines, AI-driven processes, and business logic orchestration.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    Pathway is an open-source framework designed for building real-time data applications using reactive and declarative paradigms. It enables seamless integration of live data streams and structured data into analytical pipelines with minimal latency. Pathway is especially well-suited for scenarios like financial analytics, IoT, fraud detection, and logistics, where high-velocity and continuously changing data is the norm. Unlike traditional batch processing frameworks, Pathway continuously...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    AlaSQL

    AlaSQL

    JavaScript SQL database for browser and Node.js for relational tables

    AlaSQL.js - JavaScript SQL database for browser and Node.js. Handles both traditional relational tables and nested JSON data (NoSQL). Export, store, and import data from localStorage, IndexedDB, or Excel. We focus on speed by taking advantage of the dynamic nature of JavaScript when building up queries. Real-world solutions demand flexibility regarding where data comes from and where it is to be stored. We focus on flexibility by making sure you can import/export and query directly on data...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Datapipe

    Datapipe

    Real-time, incremental ETL library for ML with record-level depend

    Datapipe is a real-time, incremental ETL library for Python with record-level dependency tracking. Datapipe is designed to streamline the creation of data processing pipelines. It excels in scenarios where data is continuously changing, requiring pipelines to adapt and process only the modified data efficiently. This library tracks dependencies for each record in the pipeline, ensuring minimal and efficient data processing.
    Downloads: 123 This Week
    Last Update:
    See Project
  • 25
    Erigon

    Erigon

    Ethereum implementation on the efficiency frontier

    Erigon is an implementation of Ethereum (execution client), on the efficiency frontier, written in Go. For an Archive node of Ethereum Mainnet we recommend >=3TB storage space: 1.8TB state (as of March 2022), 200GB temp files (can symlink or mount folder <datadir>/etl-tmp to another disk). Ethereum Mainnet Full node ( see --prune* flags): 400Gb. Erigon by default is "all in one binary" solution, but it's possible start TxPool as separated processes. Same true about: JSON RPC layer (RPCDaemon), p2p layer (Sentry), history download layer (Downloader), consensus. Don't start services as separated processes unless you have clear reason for it: resource limiting, scale, replace by your own implementation, security. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next