Import public NYC taxi and for-hire vehicle (Uber, Lyft)
An AI-powered data science team of agents
Links to everything you'd ever want to learn about data engineering
Simple tools for data cleaning in R
An end-to-end Data Scientist
Analytics for developers, setup Analytics in 30 seconds
Basic To Intermediate Python data science guide
ExtractThinker is a Document Intelligence library for LLMs
CSV Lint plug-in for Notepad++ for syntax highlighting
Data and tools for generating and inspecting OLMo pre-training data
The open source mesh processing system
Clean Jupyter notebooks of outputs, metadata, and empty cells
FDUPES is a program for identifying or deleting duplicate files
PandasAI is a Python library that integrates generative AI
Java dataframe and visualization library
AI agent that streamlines the entire process of data analysis
Miller is like awk, sed, cut, join, and sort for name-indexed data
A utility to improve performance and help manage storage on Steam Deck
Converts books written in Markdown to HTML, LaTeX/PDF and EPUB
Scan and remove junk files, caches, logs, and more
Automated Tool for Optimized Modelling
Scalable data pre processing and curation toolkit for LLMs
Haskell code prettifier
A natural language interface for computers
Cleans HTML to avoid XSS attacks