Projects with this topic
-
-
DataRider bloc for ETL Stream with Spark+Scala
Updated -
Cristian Vasu Data Portfolio / Scalable Machine Learning with SparkML - Census Income Classification
Built a complete machine learning pipeline in SparkML using the Adult Census dataset (~48k rows, 14 features). Implemented data preprocessing, feature encoding, cross-validation, and model training with Logistic Regression and Random Forest. Evaluated models with metrics such as AUC and F1-score. Reflected on scalability trade-offs and optimizations in distributed ML.
Updated -
End-to-end design of a Hadoop-based ecosystem for healthcare data at scale (50 TB, IoT streams, medical imaging). Proposed a 10-node cluster architecture integrating HDFS, Spark, Hive, NiFi, Kafka, and Docker with HIPAA-compliant security (Kerberos, TLS, Apache Ranger). Delivered a proof-of-concept Docker deployment and professional proposal document.
Updated -
Unified project demonstrating both batch analytics and real-time streaming pipelines with Apache Spark:
Batch (PySpark/Jupyter): Processed S&P 500 stock data, applied transformations, and ran distributed computations.
Streaming (Spark + Kafka): Built a streaming pipeline to consume Kafka topics, process messages in real-time, and visualize outputs.
Deployed using Docker and Jupyter for reproducibility.
Updated -
Process Twitter messages using Spark Structured Streaming integrated Kafka
Updated -
testing-case to run Stackable Spark-Operator to run any SQL query (on a PostgreSQL) e.g. for copying data. A Micronaut service creates a gRPC service
Updated -
mirror of https://codeberg.org/codeigniter/codeigniterpower CI2 and CI 3 up to date with php8, php7 and php5
UpdatedUpdated -
-
-
-
Apache-Spark with Master-Slave setup to work out of the box using OpenHPC and Slurm
Updated -
The "Stage Metrics" plugin for Apache Spark to creating metrics by stage status
Updated -
Process Twitter messages using Spark Structured Streaming integrated Kafka
Updated -
"Cloud container data analytics, statistical modeling, and machine learning on distributed databases". "A free opensource alternative to SPSS, SAS, MATLAB, PowerBI, Tableau and Alteryx". Runs on Linux, Windows, MacOS, and in the cloud via containers.
LaTeX statistics sas spss matlab Python R spark cloud gcp Oracle azure Amazon Web S... Kubernetes containers Docker ML machine lear... regression clustering TiDB Yugabyte MySQL MariaDB SQL sparkr pyspark RStudio - KNIME Anal... Apache Spark... PyTorch MXNet Chainer keras gluon Scikit-learn... ONNX MLOps - Anac... NumPy Ipython) StatsModels pytest dask Koalas API -... Tornado - Py... Altair Bokeh Jupyter Voila Plotly/Dash matplotlib Seaborn - C#... SASPy - R: T... ggplot2 shiny dash Sparklyr BlueSky Stat... Jamovi - Int... vs code Vim - Tableau TabPy Tableau Buil... Python) - PL... SQL Developer PostgreSQL MySQL/MariaDB pgAdmin4 dbeaver MySQL Workbench Spark SQL Delta Lake Angular 2+ React .NET Core JavaScript (JS) Typescript (TS) Blazor Razor html5 CSS3 AWS EC2 Servers docker-compose podman Red Hat Ente... Oracle Linux fedora centos Ubuntu (WSL 2) debian Kestrel nginx Apache web s... jira Git Gitlab CI/CD... Code Climate... Ansible helm Terraform Cloudera Dat... nifi blender godot MS OfficeUpdated -
My first interaction with Spark.
Updated -
-
This web app finds the best configuration of a Spark Application given the hardware of the cluster
Updated -
Library for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS https://roffild.com/
Updated -
Workshop de Big Data a cargo de Jimmy Farfán docente del curso online "Desarrollo de Aplicaciones de Big Data en Hadoop". Si requieren más información o cualquier duda pueden ubicarnos en facebook como Data Hack Formation.
Updated