0
0
SeoHow-ToBeginner ยท 3 min read

How to Use robots.txt for SEO: Guide and Examples

Use a robots.txt file placed in your website's root directory to tell search engines which pages or folders to crawl or avoid. It uses simple rules like User-agent to specify the crawler and Disallow to block paths. This helps manage SEO by controlling what content is indexed.
๐Ÿ“

Syntax

The robots.txt file uses simple lines to control web crawlers. Each section starts with User-agent to specify which crawler the rules apply to. Disallow tells the crawler which pages or folders it should NOT visit. Allow can be used to override disallow rules for specific paths. Comments start with #.

plaintext
User-agent: *
Disallow: /private/
Allow: /private/public-info.html
๐Ÿ’ป

Example

This example blocks all crawlers from accessing the /admin/ folder but allows them to crawl everything else on the site.

plaintext
User-agent: *
Disallow: /admin/
Output
Search engines will not crawl any URL starting with /admin/ but will crawl all other pages.
โš ๏ธ

Common Pitfalls

  • Placing robots.txt in the wrong folder so it is not found by crawlers.
  • Using Disallow: / unintentionally blocks the entire site.
  • Expecting robots.txt to hide pages from search results; it only blocks crawling, not indexing if pages are linked elsewhere.
  • Incorrect syntax like missing colons or spaces can cause rules to be ignored.
plaintext
Wrong:
User-agent *
Disallow /private/

Right:
User-agent: *
Disallow: /private/
๐Ÿ“Š

Quick Reference

DirectivePurposeExample
User-agentSpecifies which crawler the rules apply toUser-agent: Googlebot
DisallowBlocks crawler from specified pathDisallow: /secret/
AllowAllows crawling of a path even if parent is disallowedAllow: /secret/public.html
SitemapSpecifies location of sitemap fileSitemap: https://example.com/sitemap.xml
#Adds a comment line# This is a comment
โœ…

Key Takeaways

Place robots.txt in your website's root folder to control crawler access.
Use User-agent and Disallow directives to block or allow specific crawlers and paths.
robots.txt controls crawling but does not guarantee pages won't appear in search results.
Check syntax carefully to avoid blocking your entire site accidentally.
Use the Sitemap directive to help crawlers find your sitemap.