[go: up one dir, main page]

15 projects for "data cleaning" with 1 filter applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Assembled is the only unified platform for staffing and managing your human and AI support team. Icon
    Assembled is the only unified platform for staffing and managing your human and AI support team.

    AI for world-class support operations

    Assembled is the only platform that unifies AI agents and intelligent workforce management to power fast and flexible support operations. Built for scale, we help teams automate over 50% of customer interactions, forecast with 90%+ accuracy, and optimize staffing across in-house and BPO teams. Orchestrate every chat, email, or call, balancing workloads between human and AI agents in real time — without sacrificing quality or control. Trusted by Stripe, Canva, and Robinhood, Assembled transforms support from a cost center into a strategic advantage. Our Workforce and Vendor Management tools connect forecasting, scheduling, and performance for smarter staffing decisions. AI Agents automate conversations across channels with your workflows and brand voice. AI Copilot empowers agents with real-time guidance, suggested replies, and one-click actions for faster, higher-quality resolutions.
    Learn More
  • 1
    NYC Taxi Data

    NYC Taxi Data

    Import public NYC taxi and for-hire vehicle (Uber, Lyft)

    The nyc-taxi-data repository is a rich dataset and exploratory project around New York City taxi trip records. It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    AI Data Science Team

    AI Data Science Team

    An AI-powered data science team of agents

    AI Data Science Team is a Python library and agent ecosystem designed to accelerate and automate common data science workflows by modeling them as specialized AI “agents” that can be orchestrated to perform tasks like data cleaning, transformation, analysis, visualization, and machine learning. It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    The Data Engineering Handbook

    The Data Engineering Handbook

    Links to everything you'd ever want to learn about data engineering

    ...It includes beginner and intermediate boot camps, interview guides, data cleaning and transformation resources, and curated lists of newsletters and industry communities, making it useful both for self-study and technical interview preparation. The repository is actively maintained and widely starred, reflecting its role as a go-to reference for newcomers and experienced practitioners alike.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Agentic Data Scientist

    Agentic Data Scientist

    An end-to-end Data Scientist

    Agentic Data Scientist is an experimental AI-driven research framework that orchestrates data science workflows through autonomous agents that can reason, plan, and execute complex analytics tasks. Unlike traditional scripted pipelines, this project lets AI agents break down high-level research goals into sub-tasks such as data acquisition, cleaning, modeling, evaluation, and reporting, with minimal human direction.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Contractor Foreman is the most affordable all-in-one construction management software for contractors and is trusted by contractors in more than 75 countries. Icon
    Contractor Foreman is the most affordable all-in-one construction management software for contractors and is trusted by contractors in more than 75 countries.

    For Residential, Commercial and Public Works Contractors

    Starting at $49/m for the WHOLE company, Contractor Foreman is the most affordable all-in-one construction management system for contractors. Our customers in 75+ countries and industry awards back it up. And it's all backed by a 100 day guarantee.
    Learn More
  • 5
    Perfect Roadmap To Learn Data Science

    Perfect Roadmap To Learn Data Science

    Basic To Intermediate Python data science guide

    ...What makes it particularly valuable is its holistic nature: rather than focusing only on modeling or theory, it also addresses the broader lifecycle of data-science work, data ingestion, cleaning, EDA, feature engineering, model building, validation, deployment, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    FDUPES

    FDUPES

    FDUPES is a program for identifying or deleting duplicate files

    ...Because it operates directly on file content rather than just filenames, fdupes can accurately detect true copies and guide cleaning operations in data cleanup or migration tasks. It’s a simple, efficient, and widely used utility on Unix-like systems, appreciated by administrators, developers, and power users.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 7
    LLM Datasets

    LLM Datasets

    Curated list of datasets and tools for post-training

    ...Licensing and provenance are surfaced to encourage compliant usage and to guide dataset selection in commercial settings. For practitioners, the repo is a practical “starting pantry” that accelerates experimentation and helps keep data wrangling from dominating the project timeline.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Data Preprocessing Automate

    Data Preprocessing Automate

    Data Preprocessing Automation: A GUI for easy data cleaning & visualiz

    Data Preprocessing Automation is a Python-based GUI application designed to simplify and automate data preprocessing tasks. It allows users to upload Excel files, automatically handle missing values, remove duplicates, and detect and remove outliers using statistical methods. The application provides data visualization tools, including box plots for distribution analysis and scatter plots for exploring relationships between variables. Users can download the processed data for further...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    funNLP

    funNLP

    Resources, corpora, and tools for Chinese natural language processing

    ...The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion dictionaries, stopwords). It also includes links to academic papers, open-source model implementations, and practical utilities like word segmentation or text cleaning scripts. The project is highly community-oriented, frequently updated with contributions and new resources, and it’s widely used in both academic and applied NLP research. Its value lies in providing not just tools but also curated, domain-specific data, which can be hard to find elsewhere.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure User Management, Made Simple | Frontegg Icon
    Secure User Management, Made Simple | Frontegg

    Get 7,500 MAUs, 50 tenants, and 5 SSOs free – integrated into your app with just a few lines of code.

    Frontegg powers modern businesses with a user management platform that’s fast to deploy and built to scale. Embed SSO, multi-tenancy, and a customer-facing admin portal using robust SDKs and APIs – no complex setup required. Designed for the Product-Led Growth era, it simplifies setup, secures your users, and frees your team to innovate. From startups to enterprises, Frontegg delivers enterprise-grade tools at zero cost to start. Kick off today.
    Start for Free
  • 10
    R4DS (R for Data Science)

    R4DS (R for Data Science)

    R for data science: a book

    ...Includes many example datasets, diagrams, code samples, and “hands-on” exercises. Comprehensive coverage of data-science workflow: data import, cleaning, transformation, exploration, modelling etc. Includes topics beyond basics: relational data (joins), date/time, strings, working with missing values, visualizing data, etc.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11

    queXC

    Web based system for cleaning and classifying open text fields

    An Open Source, web based data cleaning and coding system. queXC takes a data file (such as questionnaire data) and cleans the text input fields by spacing and spell checking them. Operators then code text fields using new/existing coding schemes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    cocoNLP

    cocoNLP

    A Chinese information extraction tool

    cocoNLP is a lightweight natural-language processing toolkit geared toward practical information extraction from raw text, especially for Chinese and mixed Chinese–English content. Instead of requiring a heavy pipeline, it focuses on quick wins such as extracting names, places, organizations, emails, phone numbers, and dates directly from unstructured sentences. The project blends pattern-based methods with NLP heuristics, giving developers dependable results for real-world texts like chats,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DataScienceR

    DataScienceR

    a curated list of R tutorials for Data Science, NLP

    The DataScienceR repository is a curated collection of tutorials, sample code, and project templates for learning data science using the R programming language. It includes an assortment of exercises, sample datasets, and instructional code that cover the core steps of a data science project: data ingestion, cleaning, exploratory analysis, modeling, evaluation, and visualization. Many of the modules demonstrate best practices in R, such as using the tidyverse, R Markdown, modular scripting, and reproducible workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Data Science Specialization

    Data Science Specialization

    Course materials for the Data Science Specialization on Coursera

    ...It spans essential topics such as R programming, data cleaning, exploratory data analysis, statistical inference, regression models, machine learning, and practical data science projects. By providing centralized resources, the repo makes it easier for students to practice concepts and replicate examples from the curriculum. It also offers a structured view of how multiple disciplines—programming, statistics, and applied data analysis—come together in a professional workflow.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 15
    Clear4Amiga

    Clear4Amiga

    Joomla! plugin for cleaning HTML code and convert it to HTML 3.2 or 4

    Clear4Amiga is plugin intended to clear HTML code generated with Joomla!, to strip codes not compatible with HTML 3.2 or HTML 4.0. Plugin will clear code only for selected Web browsers so data will be altered only for e.g. older or mobile browsers. In combination with compatible template same site and same articles can be CSS formated for modern browsers and converted to HTML 3 for ancient browsers. Plugin sould work with Joomla! 2.5.x (1.6, 1.7), up to version 3.x.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next