Recall & Review
beginner
What is the purpose of a
robots.txt file?A
robots.txt file tells web robots (like search engine crawlers) which parts of a website they can or cannot visit. It helps control what content is indexed.Click to reveal answer
beginner
What does the
User-agent directive specify in a robots.txt file?The
User-agent directive specifies which web robot the following rules apply to. For example, User-agent: Googlebot targets Google's crawler.Click to reveal answer
beginner
How do you block all web crawlers from accessing your entire website using
robots.txt?You write:<br>
User-agent: *<br>Disallow: /<br>This tells all robots not to visit any pages on the site.Click to reveal answer
beginner
What does the
Disallow directive do in a robots.txt file?The
Disallow directive tells the specified user-agent which paths or pages it should NOT crawl.Click to reveal answer
intermediate
Can
robots.txt prevent a page from being indexed if other sites link to it?No.
robots.txt only controls crawling. If other sites link to a page, search engines might still index its URL without content.Click to reveal answer
What does
User-agent: * mean in a robots.txt file?✗ Incorrect
User-agent: * means the rules apply to all web crawlers or robots.
How do you allow all web crawlers to access your entire website?
✗ Incorrect
Leaving Disallow: empty means no pages are blocked, so all are allowed.
Which directive blocks a specific folder from being crawled?
✗ Incorrect
Disallow: /folder/ tells robots not to crawl that folder.
If a page is blocked by
robots.txt, can it still appear in search results?✗ Incorrect
Blocking crawling does not guarantee no indexing if other sites link to the page.
Where should the
robots.txt file be placed on a website?✗ Incorrect
The robots.txt file must be in the root directory to be found by crawlers.
Explain how a
robots.txt file controls web crawler access to a website.Think about how you tell robots where they can and cannot go.
You got /5 concepts.
Describe a scenario where blocking a page with
robots.txt might not prevent it from appearing in search results.Consider what happens if other sites link to a blocked page.
You got /3 concepts.