Free Robots.txt Generator

Create robots.txt files to control how search engines crawl your website. Configure user-agent rules, set crawl delays, and reference your sitemap.

100% Private & Secure: All robots.txt generation happens in your browser. No data is sent to our servers.

Completely Free - All Features Included

Common Templates

User-Agent Rules

Block 1

Or enter a custom user-agent name below

Examples: Allow: /, Disallow: /admin/, Disallow: /*.pdf$

Time the bot should wait between requests (not supported by all bots)

Tell search engines where to find your sitemap

# robots.txt generated by ZipConvert
# https://zipconvert.com

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

Quick Guide

Basic Syntax:

  • Allow: / - Allow crawling all pages
  • Disallow: /admin/ - Block admin section
  • Disallow: /*.pdf$ - Block PDF files

Common Patterns:

  • * - Wildcard (any characters)
  • $ - End of URL
  • ? - Query parameters

What is a Robots.txt File?

A robots.txt file is a text file that webmasters create to instruct web robots (typically search engine crawlers) how to crawl and index pages on their website. It's part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web.

Why Do You Need a Robots.txt File?

  • Control Crawling: Prevent search engines from crawling sensitive or duplicate pages
  • Save Crawl Budget: Direct crawlers to important content and avoid wasting resources on unimportant pages
  • Prevent Indexing: Keep private areas like admin panels out of search results
  • Manage Server Load: Use crawl-delay to prevent aggressive crawlers from overloading your server
  • Guide Crawlers: Point search engines to your sitemap for efficient discovery of content
  • SEO Best Practice: A well-configured robots.txt is essential for proper SEO

Where Should You Place Robots.txt?

Important Location Requirements:

  • Must be in the root directory of your website
  • Must be accessible at: https://yoursite.com/robots.txt
  • Must be named exactly robots.txt (lowercase)
  • Must be a plain text file with UTF-8 encoding
  • Cannot be placed in a subdirectory

Understanding Robots.txt Syntax

User-agent

Specifies which crawler the rules apply to.

User-agent: *

The asterisk (*) means all crawlers. You can also specify individual bots like Googlebot, Bingbot, etc.

Allow

Explicitly allows crawling of specific paths.

Allow: /public/

Useful for allowing specific subdirectories when parent directory is disallowed.

Disallow

Prevents crawling of specific paths.

Disallow: /admin/

Blocks crawlers from accessing the specified path and all its subdirectories.

Crawl-delay

Sets the delay (in seconds) between successive requests.

Crawl-delay: 10

Note: Not supported by Googlebot. Mainly used for Bing and other crawlers.

Sitemap

Points crawlers to your XML sitemap.

Sitemap: https://example.com/sitemap.xml

Helps search engines discover all your pages more efficiently. Can include multiple sitemap entries.

Wildcards and Pattern Matching

Asterisk (*) - Wildcard

Matches any sequence of characters.

Disallow: /*.pdf$

Blocks all PDF files across the entire site.

Disallow: /*?

Blocks all URLs with query parameters.

Dollar Sign ($) - End of URL

Matches the end of the URL.

Disallow: /*.json$

Blocks URLs ending with .json, but not /file.json?param=value

Common Use Cases and Examples

Block Admin Area

User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /dashboard/

Prevents all crawlers from accessing administrative sections.

E-commerce Site

User-agent: *
Allow: /
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /*?*sort=
Disallow: /*?*filter=

Sitemap: https://example.com/sitemap.xml

Blocks user-specific pages and filtered/sorted product listings to avoid duplicate content.

Block Specific Bot

User-agent: BadBot
Disallow: /

User-agent: *
Allow: /

Blocks a specific problematic bot while allowing all others.

Allow Only Specific Files

User-agent: *
Disallow: /
Allow: /public/
Allow: /blog/
Allow: /*.html$

Blocks everything except specific directories and HTML files.

Development/Staging Site

User-agent: *
Disallow: /

Completely blocks all crawlers from indexing your development or staging site.

Common Mistakes to Avoid

  • ❌ Wrong Location: Robots.txt must be in the root directory, not /assets/robots.txt or any subdirectory.
  • ❌ Using Robots.txt for Security: Robots.txt is publicly accessible. Don't use it to hide sensitive content - use proper authentication instead.
  • ❌ Blocking CSS and JavaScript: Don't block /css/ or /js/ folders. Google needs these to render pages properly.
  • ❌ Trailing Slash Confusion: /admin blocks /admin and /admin.html, while /admin/ only blocks the directory.
  • ❌ Syntax Errors: Extra spaces, wrong capitalization (use "Disallow" not "disallow"), or missing colons can break rules.
  • ❌ Blocking Everything: Disallow: / for all user-agents will prevent your site from appearing in search results.
  • ❌ No Sitemap Reference: Always include your sitemap to help search engines discover your content.
  • ❌ Not Testing: Always test your robots.txt file after deployment to ensure it works as intended.

Testing Your Robots.txt File

Google Search Console Robots.txt Tester

Test how Googlebot sees your robots.txt file and verify specific URLs.

Test in Google Search Console →

Manual Testing

After uploading, verify your robots.txt is accessible:

https://yoursite.com/robots.txt

You should see your robots.txt content in plain text.

Online Validators

Use online tools to validate syntax and check for errors before deployment.

Best Practices for Robots.txt

  • ✅ Keep It Simple: Only block what's necessary. Over-blocking can hurt SEO.
  • ✅ Include Your Sitemap: Always reference your XML sitemap.
  • ✅ Use Comments: Add comments (starting with #) to document your rules.
  • ✅ Be Specific: Target specific user-agents when needed rather than blocking all crawlers.
  • ✅ Test Before Deploy: Always test changes before pushing to production.
  • ✅ Monitor Regularly: Check Google Search Console for crawl errors related to robots.txt.
  • ✅ Update as Needed: Review and update when site structure changes.
  • ✅ Allow Important Resources: Don't block CSS, JavaScript, or images needed for rendering.

Understanding User-Agents

Googlebot

Google's main crawler for web search. Use Googlebot-Image, Googlebot-News, or Googlebot-Video for specific media types.

Bingbot

Microsoft Bing's crawler. Respects crawl-delay directive unlike Googlebot.

Slurp

Yahoo's crawler (now powered by Bing, but still used for some Yahoo properties).

DuckDuckBot

DuckDuckGo's crawler, known for respecting privacy and robots.txt rules.

Baiduspider

Baidu's crawler for the Chinese search engine. Can be aggressive, so crawl-delay is often used.

YandexBot

Yandex's crawler for the Russian search engine.

Frequently Asked Questions

Q: Is robots.txt required for every website?

A: Not required, but highly recommended. If you don't have a robots.txt file, crawlers will assume they can access everything. Having one gives you control over crawling behavior and helps with SEO.

Q: Does robots.txt prevent pages from appearing in search results?

A: Not necessarily. Robots.txt prevents crawling but not indexing. If other sites link to a blocked page, it might still appear in results. Use noindex meta tags or HTTP headers for true indexing prevention.

Q: Can I use robots.txt to hide sensitive content?

A: No! Robots.txt is publicly accessible. Anyone can view it. Use proper authentication, password protection, or noindex tags for sensitive content. Robots.txt should never be your security strategy.

Q: Do all crawlers respect robots.txt?

A: Reputable crawlers (Google, Bing, etc.) respect robots.txt. However, malicious bots, scrapers, and email harvesters often ignore it. Robots.txt is a request, not a security mechanism.

Q: How long does it take for changes to take effect?

A: Crawlers cache robots.txt for up to 24 hours. After updating, it may take a day for all crawlers to see the changes. Google typically re-crawls it more frequently.

Q: Should I block Googlebot to hide duplicate content?

A: No! Use canonical tags, noindex meta tags, or consolidate content instead. Blocking Googlebot prevents Google from understanding your site structure and can harm SEO.

Q: Can I have multiple robots.txt files?

A: Each (sub)domain can have its own robots.txt in the root. For example, example.com/robots.txt and blog.example.com/robots.txt are separate. But you can't have multiple in the same domain.

Robots.txt vs Meta Robots vs X-Robots-Tag

MethodPurposeWhen to Use
robots.txtControl crawlingBlock entire sections, save crawl budget
Meta RobotsControl indexingPrevent specific pages from appearing in results
X-Robots-TagControl indexing (HTTP header)Control non-HTML files (PDFs, images, etc.)