scraping free download

Showing 167 open source projects for "scraping"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
La version gratuite d'Auth0 s'enrichit !
Gratuit pour 25 000 utilisateurs avec intégration Okta illimitée : concentrez-vous sur le développement de vos applications.

Vous l'avez demandé, nous l'avons fait ! Les versions gratuite et payante d'Auth0 incluent des options qui vous permettent de développer, déployer et faire évoluer vos applications en toute sécurité. Utilisez Auth0 dès maintenant pour découvrir tous ses avantages.

Essayez Auth0 gratuitement
1

Web Scraping for Laravel

Laravel adapter for Roach, the complete web scraping toolkit for PHP

This is the Laravel adapter for Roach, the complete web scraping toolkit for PHP. Easily integrate Roach into any Laravel application. The Laravel adapter mostly provides the necessary container bindings for the various services Roach uses, as well as making certain configuration options available via a config file. The Laravel adapter of Roach registers a few Artisan commands to make out development experience as pleasant as possible. Roach ships with an interactive shell (often called Read...

Downloads: 3 This Week

Last Update: 2025-03-21
See Project
2

Jackett

API Support for your favorite torrent trackers

Jackett works as a proxy server, it translates queries from apps (Sonarr, Radarr, SickRage, CouchPotato, Mylar3, Lidarr, DuckieTV, qBittorrent, Nefarious, etc.) into tracker-site-specific HTTP queries, parses the HTML or JSON response, and then sends results back to the requesting software. This allows for getting recent uploads (like RSS) and performing searches. Jackett is a single repository of maintained indexer scraping & translation logic, removing the burden from other apps. Trackers...

Downloads: 177 This Week

Last Update: 6 hours ago
See Project
3

Kazumi

Flutter-based, rule-driven anime collection

Kazumi is a cross-platform “anime (番剧)” fetching and streaming application built with Flutter. It allows users to define custom scraping rules using XPath-style selectors (up to five lines) to collect anime metadata and streaming sources. The app supports streaming with real-time super resolution (via Anime4K), danmaku (on-screen comments), multiple video sources, offline caching, and even collaborative watching modes. It targets many platforms (Android, iOS, Windows, macOS, Linux) and supports...

Downloads: 42 This Week

Last Update: 2025-09-25
See Project
4

Automa

A chrome extension for automating your browser by connecting blocks

Automa is a browser extension for browser automation. From auto-fill forms, doing a repetitive task, taking a screenshot, to scraping data of the website, it's up to you what you want to do with this extension. Automa has provided various kinds of blocks that will help you do automation, and all you need to do is connect them. Want your workflow to run every day or every time you visit a specific website? You can set the workflow trigger on the trigger block. Try a workflow from the marketplace...

Downloads: 46 This Week

Last Update: 2025-08-11
See Project
Easy-to-Use Website Accessibility Widget
An accessibility solution for quick website accessibility improvement.

All in One Accessibility is an AI based accessibility tool that helps organizations to enhance the accessibility and usability of websites quickly.

Learn More
5

Ulixee Hero

The web browser built for scraping

It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise your...

Downloads: 8 This Week

Last Update: 2025-09-08
See Project
6

X-Crawl

Flexible Node.js AI-assisted crawler library

A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.

Downloads: 4 This Week

Last Update: 2025-04-06
See Project
7

UI.Vision RPA

Open-Source RPA Software (formerly Kantu)

The UI Vision RPA software is the tool for visual process automation, codeless UI test automation, web scraping and screen scraping. Automate tasks on Windows, Mac and Linux. The UI Vision RPA core is open-source with enterprise security. The free and open-source browser extension can be extended with local apps for desktop UI automation. UI.Vision RPA's computer-vision visual UI testing commands allow you to write automated visual tests with UI.Vision RPA - this makes UI.Vision RPA the first...

Downloads: 17 This Week

Last Update: 2025-03-22
See Project
8

Elasticsearch Exporter

Elasticsearch stats exporter for Prometheus

Prometheus exporter for various metrics about Elasticsearch, written in Go. The exporter fetches information from an Elasticsearch cluster on every scrape, therefore having a too short scrape interval can impose load on ES master nodes, particularly if you run with --es.all and --es.indices. We suggest you measure how long fetching /_nodes/stats and /_all/_stats takes for your ES cluster to determine whether your scraping interval is too short. As a last resort, you can scrape this exporter...

Downloads: 16 This Week

Last Update: 2025-03-03
See Project
9

Parsera

Lightweight library for scraping web-sites with LLMs

Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.

Downloads: 6 This Week

Last Update: 5 days ago
See Project
FusionAuth: Authentication and User Management Software
Offer your users flexible authentication options, including passwords, passwordless, single sign-on (SSO), and multi-factor authentication (MFA).

FusionAuth adds login, registration, SSO, MFA, and a bazillion other features to your app in days - not months.

Learn More
10

Scrapy

A fast, high-level web crawling and web scraping framework

Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...

Downloads: 21 This Week

Last Update: 2025-07-02
See Project
11

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 5 This Week

Last Update: 2025-08-13
See Project
12

rvest

Simple web scraping for R

rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.

Downloads: 2 This Week

Last Update: 2025-08-29
See Project
13

crawlee

A web scraping and browser automation library for Node.js

Crawlee is a web scraping and browser automation library. It helps you build reliable crawlers. Fast. Crawlee won't fix broken selectors for you (yet), but it helps you build and maintain your crawlers faster. When a website adds JavaScript rendering, you don't have to rewrite everything, only switch to one of the browser crawlers. When you later find a great API to speed up your crawls, flip the switch back. It keeps your proxies healthy by rotating them smartly with good fingerprints...

Downloads: 8 This Week

Last Update: 2025-09-26
See Project
14

saml2aws

CLI tool which enables you to login and retrieve AWS credentials

CLI tool which enables you to log in and retrieve AWS temporary credentials using ADFS or PingFederate Identity Providers. Aside from Okta, most of the providers in this project are using screen scraping to log users into SAML, this isn't ideal and hopefully, vendors make this easier in the future.

Downloads: 10 This Week

Last Update: 2025-03-13
See Project
15

Actors MCP Server

Model Context Protocol (MCP) Server for Apify's Actors

The Apify Actors MCP Server is a Model Context Protocol (MCP) server that enables AI assistants to interact with Apify Actors. This integration allows AI models to utilize various web scraping and automation tools provided by Apify, facilitating tasks such as data extraction and web automation.

Downloads: 5 This Week

Last Update: 3 days ago
See Project
16

mtail

Extract internal monitoring data from application logs

... to instrument them or writing custom extraction code for every such application. The extraction is controlled by mtail programs which define patterns and actions. Metrics are exported for scraping by a collector as JSON or Prometheus format over HTTP, or can be periodically sent to a collectd, StatsD, or Graphite collector socket. Precompiled binaries for released versions are available in the Releases page on Github. Using the latest production release binary is the recommended way of installing mtail.

Downloads: 7 This Week

Last Update: 2024-08-08
See Project
17

Firecrawl

Turn entire websites into LLM-ready markdown or structured data

Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap is required.

Downloads: 6 This Week

Last Update: 2025-09-19
See Project
18

Scrapling

An undetectable, powerful, flexible, high-performance Python library

Scrapling is a Python scraping framework built for the modern web, combining high-performance fetchers with a rapid parsing engine to handle dynamic sites and anti-bot countermeasures. It emphasizes being “undetectable,” flexible, and fast, offering an approachable API for both experienced scrapers and newcomers. The library targets the full scraping pipeline: session handling, fetching, rendering when needed, parsing, and export—while keeping ergonomics front and center. Community posts...

Downloads: 4 This Week

Last Update: 1 day ago
See Project
19

EMAGNET

Automated hacking tool to find leaked databases with 97.1% accuracy

Automated hacking tool that will find leaked databases with 97.1% accurate to grab mail + password. Before using Emagnet, please remember that with great power comes great responsibility. Pastebin patched the vulnerability I previously used in order to get recent uploads, so at the moment it is not possible to get recently uploaded files, you are now limited to all syntaxes exempt the default one (95% get's uploaded as 'text' and this is removed from all recent upload lists). Bruteforce...

Downloads: 10 This Week

Last Update: 2025-03-30
See Project
20

Yahoo! Finance market data downloader

Yahoo! Finance market data downloader

Ever since Yahoo! finance decommissioned their historical data API, many programs that relied on it to stop working. yfinance aims to solve this problem by offering a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance. yfinance aimed to offer a temporary fix to the problem by scraping the data from Yahoo! Finance and returning a the data in the same format as pandas_datareader's get_data_yahoo(), thus keeping the code changes in existing software...

Downloads: 10 This Week

Last Update: 2025-09-17
See Project
21

FreshRSS

A free, self-hostable news aggregator

FreshRSS is a self-hosted RSS and Atom feed aggregator. It is lightweight, easy to work with, powerful, and customizable. Follow websites, podcasts, and video channels in a single place. Read your articles directly in FreshRSS. Search and save queries for quick access. Generate feeds by scraping external websites. Generate new feeds based on your filters. Import and export your feeds with OPML. Stay connected to your feeds in real time. Adapt to your needs thanks to a lot of options. Follow...

Downloads: 6 This Week

Last Update: 2025-09-27
See Project
22

Colly

Elegant Scraper and Crawler Framework for Golang

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping. Distributed scraping. Caching, automatic encoding of non-unicode responses...

Downloads: 1 This Week

Last Update: 2025-03-27
See Project
23

CyberScraper 2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

Downloads: 3 This Week

Last Update: 2024-11-08
See Project
24

DrissionPage

Python based web automation tool. Powerful and elegant

DrissionPage is a Python-based automation framework that blends the capabilities of Selenium for browser automation with Requests-HTML for fast, headless web data extraction. It enables seamless switching between browser-controlled and headless HTTP sessions within the same interface. Ideal for web scraping, testing, and automation, DrissionPage is lightweight and highly efficient, offering more flexibility than standard Selenium or Requests usage alone.

Downloads: 6 This Week

Last Update: 2025-07-01
See Project
25

JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you have...

Downloads: 2 This Week

Last Update: 2024-09-29
See Project