Projects with this topic
Sort by:
-
Unified project demonstrating both batch analytics and real-time streaming pipelines with Apache Spark:
Batch (PySpark/Jupyter): Processed S&P 500 stock data, applied transformations, and ran distributed computations.
Streaming (Spark + Kafka): Built a streaming pipeline to consume Kafka topics, process messages in real-time, and visualize outputs.
Deployed using Docker and Jupyter for reproducibility.
Updated -
Airflow pipeline, mainly for exploration and self-learning, but particularly for scraping LINE Webtoon data and store it in external MySQL database. The ingested data are used to create dashboard on https://ammarchalifah.com/webtoon-insights
Updated