0
0
SEO Fundamentalsknowledge~5 mins

How Google discovers pages (crawling) in SEO Fundamentals - Performance & Efficiency

Choose your learning style9 modes available
Time Complexity: How Google discovers pages (crawling)
O(n)
Understanding Time Complexity

When Google finds new web pages, it uses a process called crawling. Understanding how long this takes helps us see how quickly new content appears in search results.

We want to know how the time to discover pages grows as the number of pages increases.

Scenario Under Consideration

Analyze the time complexity of this simplified crawling process.


// Start with a list of known URLs
queue = [seed_url]
visited = set()

while queue:  # corrected syntax
  current_url = queue.pop()
  if current_url not in visited:
    visited.add(current_url)
    links = extract_links(current_url)
    for link in links:
      if link not in visited:
        queue.append(link)

This code visits pages, extracts links, and adds new pages to visit until all reachable pages are found.

Identify Repeating Operations

Look at what repeats as the crawler runs:

  • Primary operation: Visiting each page and extracting its links.
  • How many times: Once for each unique page found.
How Execution Grows With Input

As the number of pages grows, the crawler visits more pages and extracts more links.

Input Size (n)Approx. Operations
10About 10 page visits and link checks
100About 100 page visits and link checks
1000About 1000 page visits and link checks

Pattern observation: The work grows roughly in direct proportion to the number of pages found.

Final Time Complexity

Time Complexity: O(n)

This means the time to crawl grows linearly with the number of pages discovered.

Common Mistake

[X] Wrong: "Crawling time grows much faster because every page links to every other page."

[OK] Correct: In reality, each page is visited once, so the crawler does not repeatedly visit the same pages, keeping growth linear.

Interview Connect

Understanding how crawling scales helps you think about real systems that handle lots of data efficiently. This skill shows you can reason about processes that grow with input size.

Self-Check

"What if the crawler revisited pages multiple times instead of tracking visited ones? How would the time complexity change?"