[go: up one dir, main page]

Browse free open source Web Scrapers and projects below. Use the toggles on the left to filter open source Web Scrapers by OS, license, language, programming language, and project status.

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • La version gratuite d'Auth0 s'enrichit ! Icon
    La version gratuite d'Auth0 s'enrichit !

    Gratuit pour 25 000 utilisateurs avec intégration Okta illimitée : concentrez-vous sur le développement de vos applications.

    Vous l'avez demandé, nous l'avons fait ! Les versions gratuite et payante d'Auth0 incluent des options qui vous permettent de développer, déployer et faire évoluer vos applications en toute sécurité. Utilisez Auth0 dès maintenant pour découvrir tous ses avantages.
    Essayez Auth0 gratuitement
  • 1
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 2
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose what you've already scraped. Start on one page and move to the next easily. The flow is predictable, following a breadth-first crawl through each of the pages. X-ray has support for concurrency, throttles, delays, timeouts and limits to help you scrape any page responsibly. Swap in different scrapers depending on your needs. Currently supports HTTP and PhantomJS driver drivers. In the future, I'd like to see a Tor driver for requesting pages through the Tor network.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    OpenWebSpider
    OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    A function-testing, performance-measuring, site-mirroring, web spider that is widely portable and capable of using scenarios to process a wide range of web transactions, including ssl and forms.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Teradata VantageCloud Enterprise is a data analytics platform for performing advanced analytics on AWS, Azure, and Google Cloud. Icon
    Teradata VantageCloud Enterprise is a data analytics platform for performing advanced analytics on AWS, Azure, and Google Cloud.

    Power faster innovation with Teradata VantageCloud

    VantageCloud is the complete cloud analytics and data platform, delivering harmonized data and Trusted AI for all. Built for performance, flexibility, and openness, VantageCloud enables organizations to unify diverse data sources, run complex analytics, and deploy AI models—all within a single, scalable platform.
    Learn More
  • 5
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    WallPaper (alias crawlpaper)
    WallPaper (alias crawlpaper) is a desktop changer (NOT a screensaver) which includes a web crawler for picture download, an audio stream ripper, an audio player, a mini mp3 tag editor,etc. Also included support for .zip and .rar files and an interface to the BerkleyDB code for small databases.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    A Java implementation of a flexible and extensible web spider engine. Optional modules allow functionality to be added (searching dead links, testing the performance and scalability of a site, creating a sitemap, etc ..
    Downloads: 2 This Week
    Last Update:
    See Project
  • Cynet All-in-One Cybersecurity Platform Icon
    Cynet All-in-One Cybersecurity Platform

    All-in-One Managed Cybersecurity for MSPs

    Cynet empowers MSPs and MSSPs with a comprehensive, fully managed cybersecurity platform that consolidates essential security functions into a single, easy-to-use solution. Cynet simplifies cybersecurity management, reduces operational overhead, and lowers costs by eliminating the need for multiple vendors and complex integrations.
    Learn More
  • 10
    Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Es un software diseñado para suplir la necesidad de algunas personas de tener un Web Crawler o Spider duro, navega de forma automática por los diferentes sitios o paginas Web, extrayendo los enlaces a otras paginas.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13

    dorker-py

    Descubre archivos, rutas escondidas realizando busquedas avanzadas

    Dorking Google - Dorker Py Descubre archivos, rutas escondidas realizando busquedas avanzadas (ES) Discover files, hidden paths by performing advanced searches (EN)
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Arachnid is a Java-based web spider framework. It includes a simple HTML parser object that parses an input stream containing HTML content. Simple Web spiders can be created by sub-classing Arachnid and adding a few lines of code called after each page
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    A new Web Crawler including sophisticated searching process especialized by language !
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    BTV Rename
    The goal of this project is 100% TVDB recognition and SxxExx renaming of all files generated by BeyondTV while maintaining the BeyondTV database. Currently the project relies on TVrage and scans the entire folder which is selected by the user in the conf
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    arachne is a C++ library for HTTP crawling, link, text and metadata extraction designed to run in a distributed environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Crawler.NET is a component-based distributed framework for web traversal intended for the .NET platform. It comprises of loosely coupled units each realizing a specific web crawler task. The main design goals are efficiency and flexibility.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Webhunter is a distributed, multi-threaded web crawler designed for both general indexing and crawling the web for focused content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Easyspider - Distributed Web Crawler

    Easyspider - Distributed Web Crawler

    Easy Spider is a distributed Perl Web Crawler Project from 2006

    Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing Software: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiben.com/ https://www.buzzerstar.com/ https://easyperlspider.sourceforge.io/ https://www.sebastianenger.com/ https://www.artikelschreiber.com/opensource/ It is fun to look at some code that is few years ago and to see how one has improved himself. If you want to write text automatically try https://www.artikelschreiber.com/en/ or https://www.unaique.net/en/!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    A multi-threaded web spider that finds free porn thumbnail galleries by visiting a list of known TGPs (Thumbnail Gallery Posts). It optionally downloads the located pictures and movies. TGP list is included. Public domain perl script running on Linux.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Funnel is a project for use on intranets, or selected sites on the Internet to gather together and index information from several different sources and make it available through a sane, usable interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    A toolkit for crawling information from web pages by combining different kinds of "actions". Actions are simple operations such as navigation to a specified url or extraction of text from the html. Also available is a graphic user interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Harvest is a web indexing package, originally disigned for distributed indexing, it can form a powerful system for indexing both large and small web sites. Also now includes Harvest-NG a highly efficient, modular, perl-based web crawler.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next