RobotsTxt

This is a high-performance, production-tested library for parsing and evaluating robots.txt rules against crawler user agents. It implements the core semantics of the Robots Exclusion Protocol: user-agent sections, Allow/Disallow directives, wildcard handling, and precedence rules. The code is optimized for speed and low memory so large crawls can evaluate millions of URLs quickly. It also focuses on correctness—edge cases like overlapping patterns and longest-match resolution are handled consistently. Consumers integrate it to decide whether a specific URL may be fetched by a particular bot name and to respect crawl-delay or sitemaps hints where applicable. The library serves both search-scale crawlers and smaller tools that need a reliable decision engine for polite crawling.

Features

Fast parser and matcher for Allow/Disallow rules
Correct handling of wildcards and longest-match precedence
User-agent specific rule sections with sensible fallbacks
Low-overhead evaluation for high-throughput crawlers
Support for common extensions like Sitemap hints
Clear API to check URL fetch permissions per bot name

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow RobotsTxt

RobotsTxt Web Site

User Reviews

Be the first to post a review of RobotsTxt!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

C++

Related Categories

C++ Robotics Software

Registered

2025-10-09

Similar Business Software

Dronedesk

Are you wasting hours on drone flight planning? Still using spreadsheets, doc templates, and paper checklists? If so, it's time to switch to Dronedesk, the web-based drone operations management application that makes planning safe drone flights super-efficient. Dronedesk does all the...

See Software
The Asset Guardian EAM (TAG)

Meet The Asset Guardian (TAG) Mobi – Tackle Downtime Now TAG Mobi is the solution for preventive maintenance and asset management (EAM) within Microsoft Dynamics 365 Business Central. It helps manufacturing teams reduce risk and minimize downtime by offering dependable, integrated asset...

See Software
Altium Develop

Altium Develop is a multidisciplinary product creation platform that breaks down silos and empowers teams to design collaboratively without limits. Built on Altium Designer and Altium 365, it unifies electrical, mechanical, software, sourcing, and manufacturing teams in a shared environment....

See Software
Houzz Pro

Houzz Pro is the #1 construction management solution for residential contractors and designers. Get an all-in-one solution that spans the full customer lifecycle, including marketing, CRM, estimates, takeoffs, 3D floor plans, project management, selections, online invoicing & payments,...

See Software
Evocon

Trusted by manufacturers worldwide, Evocon is a simple and easy-to-use OEE software that helps manufacturing companies improve their production efficiency and reduce waste. The system enables automated data collection, real-time data visualization, downtime tracking, bottleneck identification,...

See Software
AvPro Software

AvPro Software is comprehensive and easy-to-use. It's perfect for Aircraft MRO, Certified Repair Station (CRS), Aircraft Operators, and parts brokers. You can track Aircraft Parts I(nventory, Work Orders, and much more. Modular in nature and specifically designed for aircraft maintenance...

See Software

Report inappropriate content

RobotsTxt

The repository contains Google's robots.txt parser

Get an email when there's a new version of RobotsTxt

Features

Project Samples

Project Activity

Categories

License

Follow RobotsTxt

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered