sparklyr

sparklyr is an R package that provides seamless interfacing with Apache Spark clusters—either local or remote—while letting users write code in familiar R paradigms. It supplies a dplyr-compatible backend, Spark machine learning pipelines, SQL integration, and I/O utilities to manipulate and analyze large datasets distributed across cluster environments.

Features

Connects to Spark via YARN, Mesos, Kubernetes, Livy or local mode
Enables dplyr-style data transformation on Spark DataFrames
Supports SQL queries and ML pipelines (ml_* API)
Includes tools for distributed computing, window functions, streaming
Extensible with packages like sparkxgb, graphframes, H2O
Handles reading/writing CSV, Parquet, JSON, and caching operations

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow sparklyr

sparklyr Web Site

User Reviews

Be the first to post a review of sparklyr!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Related Categories

R Data Management System

Registered

2025-07-30

Similar Business Software

Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
Teradata VantageCloud

Teradata VantageCloud: The complete cloud analytics and data platform for AI. Teradata VantageCloud is an enterprise-grade, cloud-native data and analytics platform that unifies data management, advanced analytics, and AI/ML capabilities in a single environment. Designed for scalability and...

See Software
Oxylabs

Oxylabs is a market leader in web intelligence with enterprise-grade, ethical, and compliant solutions. Its proxy infrastructure spans one of the largest global networks, offering residential, ISP, mobile, datacenter, & dedicated datacenter proxies, along with Web Unblocker – an AI-driven...

See Software
Google Cloud Platform

Google Cloud is a cloud-based service that allows you to create anything from simple websites to complex applications for businesses of all sizes. New customers get $300 in free credits to run, test, and deploy workloads. All customers can use 25+ products for free, up to monthly usage...

See Software
Google Cloud BigQuery

BigQuery is a serverless, multicloud data warehouse that simplifies the process of working with all types of data so you can focus on getting valuable business insights quickly. At the core of Google’s data cloud, BigQuery allows you to simplify data integration, cost effectively and securely...

See Software
DbVisualizer

DbVisualizer is one of the world's most popular database editors. With almost 7 million downloads and Pro users in 150 countries worldwide, it won't disappoint you. Free and Pro versions are available. Developers, analysts, and DBAs use it to elevate their SQL experience with modern tools to...

See Software