How to Use robots.txt for SEO: Guide and Examples
Use a
robots.txt file placed in your website's root directory to tell search engines which pages or folders to crawl or avoid. It uses simple rules like User-agent to specify the crawler and Disallow to block paths. This helps manage SEO by controlling what content is indexed.Syntax
The robots.txt file uses simple lines to control web crawlers. Each section starts with User-agent to specify which crawler the rules apply to. Disallow tells the crawler which pages or folders it should NOT visit. Allow can be used to override disallow rules for specific paths. Comments start with #.
plaintext
User-agent: * Disallow: /private/ Allow: /private/public-info.html
Example
This example blocks all crawlers from accessing the /admin/ folder but allows them to crawl everything else on the site.
plaintext
User-agent: * Disallow: /admin/
Output
Search engines will not crawl any URL starting with /admin/ but will crawl all other pages.
Common Pitfalls
- Placing
robots.txtin the wrong folder so it is not found by crawlers. - Using
Disallow: /unintentionally blocks the entire site. - Expecting
robots.txtto hide pages from search results; it only blocks crawling, not indexing if pages are linked elsewhere. - Incorrect syntax like missing colons or spaces can cause rules to be ignored.
plaintext
Wrong: User-agent * Disallow /private/ Right: User-agent: * Disallow: /private/
Quick Reference
| Directive | Purpose | Example |
|---|---|---|
| User-agent | Specifies which crawler the rules apply to | User-agent: Googlebot |
| Disallow | Blocks crawler from specified path | Disallow: /secret/ |
| Allow | Allows crawling of a path even if parent is disallowed | Allow: /secret/public.html |
| Sitemap | Specifies location of sitemap file | Sitemap: https://example.com/sitemap.xml |
| # | Adds a comment line | # This is a comment |
Key Takeaways
Place robots.txt in your website's root folder to control crawler access.
Use User-agent and Disallow directives to block or allow specific crawlers and paths.
robots.txt controls crawling but does not guarantee pages won't appear in search results.
Check syntax carefully to avoid blocking your entire site accidentally.
Use the Sitemap directive to help crawlers find your sitemap.