Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Robots.txt Configuration
📖 Scenario: You manage a website and want to control which parts search engines can access. You will create a robots.txt file to guide web crawlers.
🎯 Goal: Build a robots.txt file that blocks search engines from accessing a private folder but allows them to crawl the rest of the site.
📋 What You'll Learn
Create a robots.txt file with user-agent and disallow rules
Specify the user-agent as all bots using *
Disallow access to the /private/ folder
Allow access to all other parts of the website
💡 Why This Matters
🌍 Real World
Webmasters use robots.txt files to manage which parts of their website search engines can index, protecting private content and improving SEO.
💼 Career
Understanding robots.txt is important for SEO specialists, web developers, and digital marketers to control site visibility and crawler behavior.
Progress0 / 4 steps
1
Create the robots.txt file and specify the user-agent
Create a robots.txt file and write the line User-agent: * to target all web crawlers.
SEO Fundamentals
Hint
The User-agent: * line tells all search engines to follow the rules below.
2
Add a rule to disallow the private folder
Add the line Disallow: /private/ below User-agent: * to block crawlers from the private folder.
SEO Fundamentals
Hint
The Disallow line tells crawlers not to visit the specified folder.
3
Allow crawling of all other pages
Add the line Allow: / below the disallow rule to explicitly allow crawling of all other pages.
SEO Fundamentals
Hint
The Allow: / line tells crawlers they can visit everything except what is disallowed.
4
Complete the robots.txt file with a comment
Add a comment line # Robots.txt file to control crawler access at the top of the file to describe its purpose.
SEO Fundamentals
Hint
Comments start with # and help explain the file to humans.
Practice
(1/5)
1. What is the main purpose of a robots.txt file on a website?
easy
A. To tell search engines which pages to crawl or not crawl
B. To speed up the website loading time
C. To store user login information
D. To create a sitemap for the website
Solution
Step 1: Understand the role of robots.txt
The robots.txt file is used to give instructions to search engine robots about which parts of the website they can access.
Step 2: Identify the correct purpose
It does not speed up loading, store user data, or create sitemaps. Its main role is to control crawling.
Final Answer:
To tell search engines which pages to crawl or not crawl -> Option A
Quick Check:
robots.txt controls crawling = D [OK]
Hint: robots.txt controls crawling rules for search engines [OK]
Common Mistakes:
Thinking robots.txt speeds up website
Confusing robots.txt with sitemap.xml
Assuming robots.txt stores user data
2. Which of the following is the correct syntax to block all web crawlers from accessing the entire website in robots.txt?
easy
A. User-agent: *
Disallow: /
B. User-agent: *
Disallow:
C. User-agent: all
Disallow: /
D. User-agent: *
Allow: /
Solution
Step 1: Understand the syntax for blocking all
To block all crawlers, use User-agent: * to target all, and Disallow: / to block the entire site.
Step 2: Check each option
User-agent: *
Disallow: allows all because Disallow is empty. User-agent: all
Disallow: / uses 'all' which is invalid. User-agent: *
Allow: / allows all pages.
Final Answer:
User-agent: *
Disallow: / -> Option A
Quick Check:
Block all with Disallow: / = A [OK]
Hint: Use Disallow: / to block entire site for all agents [OK]
Common Mistakes:
Leaving Disallow empty to block site
Using 'all' instead of '*' for user-agent
Using Allow instead of Disallow to block
3. Given the following robots.txt content, which URL will be blocked from crawling?
A. https://example.com/private/info.html by Bingbot
B. https://example.com/temp/info.html by Googlebot
C. https://example.com/public/page.html by Googlebot
D. https://example.com/private/data.html by Googlebot
Solution
Step 1: Analyze rules for Googlebot
Googlebot is blocked from /private/ but not from /temp/ because the specific rule for Googlebot disallows /private/ only.
Step 2: Analyze rules for other bots
All other bots (like Bingbot) are blocked from /temp/ but not /private/.
Final Answer:
https://example.com/private/data.html by Googlebot -> Option D
Quick Check:
Googlebot blocked /private/ = B [OK]
Hint: Specific user-agent rules override general ones [OK]
Common Mistakes:
Assuming all bots blocked from /private/
Ignoring user-agent specific rules
Confusing /temp/ and /private/ paths
4. Identify the error in this robots.txt snippet:
User-agent: *
Disallow /admin/
medium
A. User-agent should be capitalized
B. Missing colon after Disallow
C. Disallow path should be empty to block
D. User-agent cannot be *
Solution
Step 1: Check syntax for Disallow directive
Each directive must have a colon after the keyword. Here, Disallow is missing a colon.
Step 2: Verify other parts
User-agent can be '*', capitalization is not strict, and Disallow path is correct to block /admin/.
Final Answer:
Missing colon after Disallow -> Option B
Quick Check:
Directives need colon after keyword = A [OK]
Hint: Check for colon after directives like Disallow [OK]
Common Mistakes:
Omitting colon after Disallow
Thinking * is invalid user-agent
Believing capitalization matters
5. You want to allow Googlebot to crawl everything except the /private/ folder, but block all other bots from the entire site. Which robots.txt configuration achieves this?
hard
A. User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /private/
B. User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /private/
C. User-agent: Googlebot
Disallow: /private/
User-agent: *
Disallow: /
D. User-agent: *
Disallow: /private/
User-agent: Googlebot
Disallow: /
Solution
Step 1: Understand Googlebot's rule
Googlebot should be allowed everywhere except /private/, so Disallow: /private/ applies to Googlebot.
Step 2: Understand other bots' rule
All other bots (*) should be blocked from the entire site, so Disallow: / applies to them.
Step 3: Check options
User-agent: Googlebot
Disallow: /private/
User-agent: *
Disallow: / matches these rules exactly. Other options either allow or block incorrectly.