crawler free download

Showing 288 open source projects for "crawler"

View related business solutions

La version gratuite d'Auth0 s'enrichit !
Gratuit pour 25 000 utilisateurs avec intégration Okta illimitée : concentrez-vous sur le développement de vos applications.

Vous l'avez demandé, nous l'avons fait ! Les versions gratuite et payante d'Auth0 incluent des options qui vous permettent de développer, déployer et faire évoluer vos applications en toute sécurité. Utilisez Auth0 dès maintenant pour découvrir tous ses avantages.

Essayez Auth0 gratuitement
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Crawler Detect

CrawlerDetect is a PHP class for detecting bots/crawlers/spiders

Crawler Detect is a PHP library that detects bots, crawlers, and spiders by analyzing user-agent headers and comparing them against a constantly updated list of known crawlers. It's useful for analytics, rate-limiting, or displaying alternative content for automated tools. It is fast, lightweight, and easy to integrate into any PHP application.

Downloads: 3 This Week

Last Update: 2025-09-30
See Project
2

Spatie Crawler

An easy to use, powerful crawler implemented in PHP

Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.

Downloads: 1 This Week

Last Update: 2025-05-20
See Project
3

SiteOne Crawler

SiteOne Crawler is a website analyzer and exporter

SiteOne Crawler is a very useful and easy-to-use tool you'll ♥ as a Dev/DevOps, website owner or consultant. Works on all popular platforms - Windows, macOS, and Linux (x64 and arm64 too). It will crawl your entire website in depth, analyze and report problems, show useful statistics and reports, generate an offline version of the website, generate sitemaps, or send reports via email. Watch a detailed video with a sample report for Astro. build website. This crawler can be used as a command...

1 Review

Downloads: 1 This Week

Last Update: 2025-06-08
See Project
4

EasySpider

A visual no-code/code-free web crawler/spider

A visual code-free/no-code web crawler/spider, supporting both Chinese and English.

Downloads: 16 This Week

Last Update: 2025-01-01
See Project
Monitoring, Securing, Optimizing 3rd party scripts
For developers looking for a solution to monitor, script, and optimize 3rd party scripts

c/side is crawling many sites to get ahead of new attacks. c/side is the only fully autonomous detection tool for assessing 3rd party scripts. We do not rely purely on threat feed intel or easy to circumvent detections. We also use historical context and AI to review the payload and behavior of scripts.

Learn More
5

Heritrix

Internet Archive's open-source, web-scale, web crawler project

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives...

Downloads: 8 This Week

Last Update: 2025-09-22
See Project
6

TorBot

Dark Web OSINT Tool

Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. On Linux platforms, you can make an executable for TorBot by using the install.sh script. You will need to give the script the correct permissions using chmod +x...

Downloads: 14 This Week

Last Update: 2023-10-12
See Project
7

WebMagic

A scalable web crawler framework for Java

WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features...

Downloads: 0 This Week

Last Update: 2025-02-10
See Project
8

miniblink49

Lighter, faster browser kernel of blink to integrate HTML UI in apps

... electron). Customize as you wish, simulate another browser environment. Perfect HTML5 support, friendly to various front-end libraries (support HTML5, and friendly to front framework). After turning off the cross-domain switch, you can use various cross-domain functions (support cross-domain). Headless mode, which greatly saves resources and is suitable for crawlers (headless mode, be suitable for Web Crawler).

Downloads: 10 This Week

Last Update: 2025-09-28
See Project
9

SiteOne Crawler (desktop app)

A free, feature-rich web analyzer and exporter/cloner you will love!

A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).

Downloads: 2 This Week

Last Update: 2024-10-02
See Project
AI-based, Comprehensive Service Management for Businesses and IT Providers
Modular solutions for change management, asset management and more

ChangeGear provides IT staff with the functions required to manage everything from ticketing to incident, change and asset management and more. ChangeGear includes a virtual agent, self-service portals and AI-based features to support analyst and end user productivity.

Learn More
10

crwlr

Library for Rapid (Web) Crawler and Scraper Development

This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler could...

Downloads: 2 This Week

Last Update: 2025-08-05
See Project
11

DocSearch

The easiest way to add search to your documentation

Initially created to fulfill our own developers' needs, DocSearch quickly evolved into a successful community project. Over the years, we've explored new ways to address the complexities of search for the open-source community. DocSearch understands how the user input fits into the context of your project and instantly presents the most relevant content with fewer interactions than any other method. With a design very close to the native experience on mobile, we leverage users acquaintance...

Downloads: 6 This Week

Last Update: 4 days ago
See Project
12

crawley

The unix-way web crawler

Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for...

Downloads: 4 This Week

Last Update: 2025-08-15
See Project
13

Nebula libp2p DHT

A libp2p DHT crawler, monitor, and measurement tool

A libp2p DHT crawler and monitor that tracks the liveness of peers. The crawler connects to DHT bootstrap peers and then recursively follows all entries in their k-buckets until all peers have been visited. The crawler supports the IPFS, Filecoin, Polkadot, Kusama, Rococo, Westend networks and more. The crawler can store its results as JSON documents or in a postgres database - the --dry-run flag prevents it from doing either. Nebula will print a summary of the crawl at the end instead. A crawl...

Downloads: 1 This Week

Last Update: 2025-03-25
See Project
14

Modlishka

Powerful and flexible HTTP reverse proxy

Modlishka is a powerful and flexible HTTP reverse proxy. It implements an entirely new and interesting approach of handling browser-based HTTP traffic flow, which allows to transparently proxy of multi-domain destination traffic, both TLS and non-TLS, over a single domain, without the requirement of installing any additional certificate on the client. What exactly does this mean? In short, it simply has a lot of potential, that can be used in many use case scenarios. Modlishka was written as...

Downloads: 4 This Week

Last Update: 2025-05-18
See Project
15

X-Crawl

Flexible Node.js AI-assisted crawler library

A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.

Downloads: 2 This Week

Last Update: 2025-04-06
See Project
16

Snap Lens Web Crawler

Crawl and download Snap Lenses from lens.snapchat.com with ease.

Crawl and download Snap Lenses from lens.snapchat.com with ease. This crawler is a dependency of Snap Camera Server https://snap-camera-server.sourceforge.io

Downloads: 0 This Week

Last Update: 2025-07-18
See Project
17

Proxy_Pool

Python crawler proxy IP pool (proxy pool)

The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, you can also expand the proxy source to increase the quality and quantity of the proxy pool IP.

Downloads: 0 This Week

Last Update: 2024-01-08
See Project
18

ngx_waf

Handy, High performance, ModSecurity compatible Nginx firewall module

Handy, High-performance Nginx firewall module. Such as black and white list of IPs or IP range, uri black and white list, and request body black list, etc. Directives and rules are easy to write and readable. The IP detection is a constant-time operation. Most of the remaining inspections use caching to improve performance. Compatible with ModSecurity's rules, you can use OWASP ModSecurity Core Rule Set. Supports verifying Google, Bing, Baidu and Yandex crawlers and allowing them...

Downloads: 1 This Week

Last Update: 2025-01-25
See Project
19

Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper

Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.

Downloads: 0 This Week

Last Update: 2025-08-17
See Project
20

Colly

Elegant Scraper and Crawler Framework for Golang

Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping. Distributed scraping. Caching, automatic encoding of non-unicode responses...

Downloads: 0 This Week

Last Update: 2025-03-27
See Project
21

RobotsTxt

The repository contains Google's robots.txt parser

This is a high-performance, production-tested library for parsing and evaluating robots.txt rules against crawler user agents. It implements the core semantics of the Robots Exclusion Protocol: user-agent sections, Allow/Disallow directives, wildcard handling, and precedence rules. The code is optimized for speed and low memory so large crawls can evaluate millions of URLs quickly. It also focuses on correctness—edge cases like overlapping patterns and longest-match resolution are handled...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
22

LambdaHack

Haskell game engine library for roguelike dungeon crawlers

Haskell game engine library for roguelike dungeon crawlers. LambdaHack is a Haskell game engine library for ASCII roguelike games of arbitrary theme, size and complexity, with optional tactical squad combat. It's packaged together with a sample dungeon crawler in a quirky fantasy setting. To use the engine, you need to specify the content to be procedurally generated. You declare what the game world is made of (entities, their relations, physics and lore) and the engine builds the world...

Downloads: 1 This Week

Last Update: 2023-10-29
See Project
23

Roach

The complete web scraping toolkit for PHP

Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core package...

Downloads: 1 This Week

Last Update: 2025-03-21
See Project
24

Laravel Sitemap

Create and generate sitemaps with ease

... it in the callable you pass to hasCrawled. You can also instruct the underlying crawler to not crawl some pages by passing a callable to shouldCrawl. You can configure the crawler used by the sitemap generator. The sitemap generator can execute JavaScript on each page so it will discover links that are generated by your JS scripts. You can enable this feature by setting execute_javascript in the config file to true.

Downloads: 0 This Week

Last Update: 2025-08-25
See Project
25

DNS Crawler

A Bulk Domain Assessment Tool

DNS Crawler is a lightweight, Python-based utility designed for efficient batch processing and assessment of internet domain names. It reads from a list of domains formatted as: domain_name <tab> or ; optional_comment and generates a detailed, Excel-compatible CSV report with columns including: DOMAIN: Domain name REG: Registrar SOA, NS, MX, TXT, SPF, DMARC, MS, A, PTR: Common DNS records for comprehensive domain analysis NOTE: Optional comments from the original input file...

Downloads: 0 This Week

Last Update: 2024-11-01
See Project