Free Robots.txt Generator
Create robots.txt files to control how search engines crawl your website. Configure user-agent rules, set crawl delays, and reference your sitemap.
100% Private & Secure: All robots.txt generation happens in your browser. No data is sent to our servers.
Common Templates
User-Agent Rules
Block 1
Or enter a custom user-agent name below
Examples: Allow: /, Disallow: /admin/, Disallow: /*.pdf$
Time the bot should wait between requests (not supported by all bots)
Tell search engines where to find your sitemap
# robots.txt generated by ZipConvert # https://zipconvert.com User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml
Quick Guide
Basic Syntax:
Allow: /- Allow crawling all pagesDisallow: /admin/- Block admin sectionDisallow: /*.pdf$- Block PDF files
Common Patterns:
*- Wildcard (any characters)$- End of URL?- Query parameters
What is a Robots.txt File?
A robots.txt file is a text file that webmasters create to instruct web robots (typically search engine crawlers) how to crawl and index pages on their website. It's part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web.
Why Do You Need a Robots.txt File?
- Control Crawling: Prevent search engines from crawling sensitive or duplicate pages
- Save Crawl Budget: Direct crawlers to important content and avoid wasting resources on unimportant pages
- Prevent Indexing: Keep private areas like admin panels out of search results
- Manage Server Load: Use crawl-delay to prevent aggressive crawlers from overloading your server
- Guide Crawlers: Point search engines to your sitemap for efficient discovery of content
- SEO Best Practice: A well-configured robots.txt is essential for proper SEO
Where Should You Place Robots.txt?
Important Location Requirements:
- Must be in the root directory of your website
- Must be accessible at:
https://yoursite.com/robots.txt - Must be named exactly
robots.txt(lowercase) - Must be a plain text file with UTF-8 encoding
- Cannot be placed in a subdirectory
Understanding Robots.txt Syntax
User-agent
Specifies which crawler the rules apply to.
User-agent: *The asterisk (*) means all crawlers. You can also specify individual bots like Googlebot, Bingbot, etc.
Allow
Explicitly allows crawling of specific paths.
Allow: /public/Useful for allowing specific subdirectories when parent directory is disallowed.
Disallow
Prevents crawling of specific paths.
Disallow: /admin/Blocks crawlers from accessing the specified path and all its subdirectories.
Crawl-delay
Sets the delay (in seconds) between successive requests.
Crawl-delay: 10Note: Not supported by Googlebot. Mainly used for Bing and other crawlers.
Sitemap
Points crawlers to your XML sitemap.
Sitemap: https://example.com/sitemap.xmlHelps search engines discover all your pages more efficiently. Can include multiple sitemap entries.
Wildcards and Pattern Matching
Asterisk (*) - Wildcard
Matches any sequence of characters.
Disallow: /*.pdf$Blocks all PDF files across the entire site.
Disallow: /*?Blocks all URLs with query parameters.
Dollar Sign ($) - End of URL
Matches the end of the URL.
Disallow: /*.json$Blocks URLs ending with .json, but not /file.json?param=value
Common Use Cases and Examples
Block Admin Area
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /dashboard/Prevents all crawlers from accessing administrative sections.
E-commerce Site
User-agent: *
Allow: /
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /*?*sort=
Disallow: /*?*filter=
Sitemap: https://example.com/sitemap.xmlBlocks user-specific pages and filtered/sorted product listings to avoid duplicate content.
Block Specific Bot
User-agent: BadBot
Disallow: /
User-agent: *
Allow: /Blocks a specific problematic bot while allowing all others.
Allow Only Specific Files
User-agent: *
Disallow: /
Allow: /public/
Allow: /blog/
Allow: /*.html$Blocks everything except specific directories and HTML files.
Development/Staging Site
User-agent: *
Disallow: /Completely blocks all crawlers from indexing your development or staging site.
Common Mistakes to Avoid
- ❌ Wrong Location: Robots.txt must be in the root directory, not /assets/robots.txt or any subdirectory.
- ❌ Using Robots.txt for Security: Robots.txt is publicly accessible. Don't use it to hide sensitive content - use proper authentication instead.
- ❌ Blocking CSS and JavaScript: Don't block /css/ or /js/ folders. Google needs these to render pages properly.
- ❌ Trailing Slash Confusion:
/adminblocks /admin and /admin.html, while/admin/only blocks the directory. - ❌ Syntax Errors: Extra spaces, wrong capitalization (use "Disallow" not "disallow"), or missing colons can break rules.
- ❌ Blocking Everything:
Disallow: /for all user-agents will prevent your site from appearing in search results. - ❌ No Sitemap Reference: Always include your sitemap to help search engines discover your content.
- ❌ Not Testing: Always test your robots.txt file after deployment to ensure it works as intended.
Testing Your Robots.txt File
Google Search Console Robots.txt Tester
Test how Googlebot sees your robots.txt file and verify specific URLs.
Test in Google Search Console →Manual Testing
After uploading, verify your robots.txt is accessible:
https://yoursite.com/robots.txtYou should see your robots.txt content in plain text.
Online Validators
Use online tools to validate syntax and check for errors before deployment.
Best Practices for Robots.txt
- ✅ Keep It Simple: Only block what's necessary. Over-blocking can hurt SEO.
- ✅ Include Your Sitemap: Always reference your XML sitemap.
- ✅ Use Comments: Add comments (starting with #) to document your rules.
- ✅ Be Specific: Target specific user-agents when needed rather than blocking all crawlers.
- ✅ Test Before Deploy: Always test changes before pushing to production.
- ✅ Monitor Regularly: Check Google Search Console for crawl errors related to robots.txt.
- ✅ Update as Needed: Review and update when site structure changes.
- ✅ Allow Important Resources: Don't block CSS, JavaScript, or images needed for rendering.
Understanding User-Agents
Googlebot
Google's main crawler for web search. Use Googlebot-Image, Googlebot-News, or Googlebot-Video for specific media types.
Bingbot
Microsoft Bing's crawler. Respects crawl-delay directive unlike Googlebot.
Slurp
Yahoo's crawler (now powered by Bing, but still used for some Yahoo properties).
DuckDuckBot
DuckDuckGo's crawler, known for respecting privacy and robots.txt rules.
Baiduspider
Baidu's crawler for the Chinese search engine. Can be aggressive, so crawl-delay is often used.
YandexBot
Yandex's crawler for the Russian search engine.
Frequently Asked Questions
Q: Is robots.txt required for every website?
A: Not required, but highly recommended. If you don't have a robots.txt file, crawlers will assume they can access everything. Having one gives you control over crawling behavior and helps with SEO.
Q: Does robots.txt prevent pages from appearing in search results?
A: Not necessarily. Robots.txt prevents crawling but not indexing. If other sites link to a blocked page, it might still appear in results. Use noindex meta tags or HTTP headers for true indexing prevention.
Q: Can I use robots.txt to hide sensitive content?
A: No! Robots.txt is publicly accessible. Anyone can view it. Use proper authentication, password protection, or noindex tags for sensitive content. Robots.txt should never be your security strategy.
Q: Do all crawlers respect robots.txt?
A: Reputable crawlers (Google, Bing, etc.) respect robots.txt. However, malicious bots, scrapers, and email harvesters often ignore it. Robots.txt is a request, not a security mechanism.
Q: How long does it take for changes to take effect?
A: Crawlers cache robots.txt for up to 24 hours. After updating, it may take a day for all crawlers to see the changes. Google typically re-crawls it more frequently.
Q: Should I block Googlebot to hide duplicate content?
A: No! Use canonical tags, noindex meta tags, or consolidate content instead. Blocking Googlebot prevents Google from understanding your site structure and can harm SEO.
Q: Can I have multiple robots.txt files?
A: Each (sub)domain can have its own robots.txt in the root. For example, example.com/robots.txt and blog.example.com/robots.txt are separate. But you can't have multiple in the same domain.
Robots.txt vs Meta Robots vs X-Robots-Tag
| Method | Purpose | When to Use |
|---|---|---|
| robots.txt | Control crawling | Block entire sections, save crawl budget |
| Meta Robots | Control indexing | Prevent specific pages from appearing in results |
| X-Robots-Tag | Control indexing (HTTP header) | Control non-HTML files (PDFs, images, etc.) |
Related Web Tools
Explore more powerful web tools to enhance your productivity
HTML Previewer
Preview HTML code
Markdown to HTML
Convert Markdown to HTML
HTML to Markdown
Convert HTML to Markdown
CSS Minifier
Minify CSS code