S
spark

Projects with this topic

View dīgerō project

Jan Berkel / dīgerō

Process MediaWiki/Wiktionary database dumps

spark mediawiki

5

Updated Dec 07, 2025

5 3 0 7

Updated Dec 07, 2025
View DataRider ETL with Spark project

DataRider v2 / DataRider ETL with Spark

DataRider bloc for ETL Stream with Spark+Scala

ETL spark

0

Updated Nov 26, 2025

0 0 1 12

Updated Nov 26, 2025
View Scalable Machine Learning with SparkML - Census Income Classification project

Cristian Vasu Data Portfolio / Scalable Machine Learning with SparkML - Census Income Classification

Built a complete machine learning pipeline in SparkML using the Adult Census dataset (~48k rows, 14 features). Implemented data preprocessing, feature encoding, cross-validation, and model training with Logistic Regression and Random Forest. Evaluated models with metrics such as AUC and F1-score. Reflected on scalability trade-offs and optimizations in distributed ML.

spark sparkML machine-lear... classification pyspark big-data data-science

1

Updated Sep 04, 2025

1 0 0 0

Updated Sep 04, 2025
View HealthTrend Innovations Big Data Architecture project

Cristian Vasu Data Portfolio / HealthTrend Innovations Big Data Architecture

End-to-end design of a Hadoop-based ecosystem for healthcare data at scale (50 TB, IoT streams, medical imaging). Proposed a 10-node cluster architecture integrating HDFS, Spark, Hive, NiFi, Kafka, and Docker with HIPAA-compliant security (Kerberos, TLS, Apache Ranger). Delivered a proof-of-concept Docker deployment and professional proposal document.

big-data hadoop spark hive kafka nifi Docker data-enginee... healthcare hipaa architecture

1

Updated Sep 04, 2025

1 0 0 0

Updated Sep 04, 2025
View Spark for Batch + Streaming - Market Analysis Kafka Pipeline project

Cristian Vasu Data Portfolio / Spark for Batch + Streaming - Market Analysis Kafka Pipeline

Unified project demonstrating both batch analytics and real-time streaming pipelines with Apache Spark:

Batch (PySpark/Jupyter): Processed S&P 500 stock data, applied transformations, and ran distributed computations.

Streaming (Spark + Kafka): Built a streaming pipeline to consume Kafka topics, process messages in real-time, and visualize outputs.

Deployed using Docker and Jupyter for reproducibility.

spark pyspark kafka streaming ETL real-time batch-proces... Docker data-pipeline

1

Updated Sep 04, 2025

1 0 0 0

Updated Sep 04, 2025
View twitter-java-spark-structured-streaming project

Emerson Yudi Nakashima / twitter-java-spark-structured-streaming

Process Twitter messages using Spark Structured Streaming integrated Kafka

Java spark structured-s...

0

Updated Feb 23, 2025

0 0 0

Updated Feb 23, 2025
View SparkDatabase project

Philipp / SparkDatabase

testing-case to run Stackable Spark-Operator to run any SQL query (on a PostgreSQL) e.g. for copying data. A Micronaut service creates a gRPC service

spark Java Kubernetes operator micronaut PostgreSQL grpc

1

Updated Jul 29, 2024

1 0 0

Updated Jul 29, 2024
View codeigniterpower project

codeigniterpower / codeigniterpower

mirror of https://codeberg.org/codeigniter/codeigniterpower CI2 and CI 3 up to date with php8, php7 and php5

venenux PHP spark codeigniter framework php7 php8 codeigniter3 codeigniter2 web-framework php framework development

0

Updated Jun 12, 2024

0

Updated Jun 12, 2024
View VKR project

Sav Maksim / VKR

Дипломный проект с составлением датасета и его использованием для машинного обучения с целью кластеризации.

Scala spark Python parquet kmeans dataset

0

Updated Apr 24, 2024

0 0 0 0

Updated Apr 24, 2024
View kipinä project

catvayor / kipinä

compiler synchronous lustre spark Ada

0

Updated Jan 19, 2024

0 0 0 0

Updated Jan 19, 2024
View apache-spark-bin-arm64 project

TheRack.io / Big Data / Binaries / apache-spark-bin-arm64

spark ARM64 apache big data analytics

0

Updated Aug 28, 2023

0 0 0 0

Updated Aug 28, 2023
View spark project

ufscar / hpc / Exemplos / spark

Apache-Spark with Master-Slave setup to work out of the box using OpenHPC and Slurm

spark apache-spark slurm HPC openhpc

1

Updated Jun 24, 2023

1 1 0 0

Updated Jun 24, 2023
View SparkPlugin project

Eduard Mielieshkin / SparkPlugin

The "Stage Metrics" plugin for Apache Spark to creating metrics by stage status

spark Apache Spark plugin plugins

0

Updated Jul 05, 2022

0 0 0 0

Updated Jul 05, 2022
View twitter-scala-spark-structured-streaming project

Emerson Yudi Nakashima / twitter-scala-spark-structured-streaming

Process Twitter messages using Spark Structured Streaming integrated Kafka

Scala spark structured-s...

0

Updated Jun 09, 2022

0 0 0 0

Updated Jun 09, 2022
View containalytics project

containalytics / containalytics

"Cloud container data analytics, statistical modeling, and machine learning on distributed databases". "A free opensource alternative to SPSS, SAS, MATLAB, PowerBI, Tableau and Alteryx". Runs on Linux, Windows, MacOS, and in the cloud via containers.

LaTeX statistics sas spss matlab Python R spark cloud gcp Oracle azure Amazon Web S... Kubernetes containers Docker ML machine lear... regression clustering TiDB Yugabyte MySQL MariaDB SQL sparkr pyspark RStudio - KNIME Anal... Apache Spark... PyTorch MXNet Chainer keras gluon Scikit-learn... ONNX MLOps - Anac... NumPy Ipython) StatsModels pytest dask Koalas API -... Tornado - Py... Altair Bokeh Jupyter Voila Plotly/Dash matplotlib Seaborn - C#... SASPy - R: T... ggplot2 shiny dash Sparklyr BlueSky Stat... Jamovi - Int... vs code Vim - Tableau TabPy Tableau Buil... Python) - PL... SQL Developer PostgreSQL MySQL/MariaDB pgAdmin4 dbeaver MySQL Workbench Spark SQL Delta Lake Angular 2+ React .NET Core JavaScript (JS) Typescript (TS) Blazor Razor html5 CSS3 AWS EC2 Servers docker-compose podman Red Hat Ente... Oracle Linux fedora centos Ubuntu (WSL 2) debian Kestrel nginx Apache web s... jira Git Gitlab CI/CD... Code Climate... Ansible helm Terraform Cloudera Dat... nifi blender godot MS Office

2

Updated May 11, 2022

2 0 1

Updated May 11, 2022
View Spark - Google Collab project

Matheus Kempa Severino / Spark - Google Collab

My first interaction with Spark.

google colab spark

0

Updated Oct 07, 2021

0 0 0 0

Updated Oct 07, 2021
View Hadoop project

MSEM / Hadoop

Strategic Information Management - Hadoop with Spark

hadoop spark study master msem

0

Updated Jul 05, 2021

0 0 0 0

Updated Jul 05, 2021
View sparkconf-app project

Matteo / sparkconf-app

This web app finds the best configuration of a Spark Application given the hardware of the cluster

spark hadoop

0

Updated Oct 22, 2020

0 0 0 0

Updated Oct 22, 2020
View RoffildLibrary project

Roffild / RoffildLibrary

Library for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS https://roffild.com/

mq5 spark library random-forest neural-network forex mql mql5

1

Updated May 04, 2020

1 0 0 0

Updated May 04, 2020
View workshop_2_bigdata_hadoop project

DataHack Formation / Community ❤️ / DataLive / workshop_2_bigdata_hadoop

Workshop de Big Data a cargo de Jimmy Farfán docente del curso online "Desarrollo de Aplicaciones de Big Data en Hadoop". Si requieren más información o cualquier duda pueden ubicarnos en facebook como Data Hack Formation.

hadoop big data cloudera hive spark

1

Updated Apr 28, 2020

1 0 0 0

Updated Apr 28, 2020