Bird
Raised Fist0
Google Sheetsspreadsheet~5 mins

IMPORTXML for structured data in Google Sheets - Step-by-Step Guide

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
IMPORTXML lets you pull data from websites into your spreadsheet. It helps you grab specific parts like tables or lists from a webpage without copying manually.
When you want to get the latest stock prices from a financial website automatically.
When you need to collect weather data from a weather forecast page for your report.
When you want to extract headlines from a news website to track current events.
When you want to import a list of product prices from an online store for comparison.
When you want to gather sports scores from a sports website without typing them yourself.
Steps
Step 1: Open
- Google Sheets document
You see a blank or existing spreadsheet ready for data entry
Step 2: Click
- a cell where you want the imported data to appear
The cell is selected and ready for formula input
Step 3: Type
- the selected cell
You start entering the IMPORTXML formula
💡 The formula looks like =IMPORTXML("URL", "XPath")
Step 4: Enter
- the formula bar
Formula example: =IMPORTXML("https://example.com", "//h2") pulls all h2 headings from the page
💡 Use double quotes around the URL and XPath query
Step 5: Press
- Enter key
The cell fills with data extracted from the website matching the XPath query
Step 6: Adjust
- the XPath query in the formula if needed
You get different parts of the webpage data as required
Before vs After
Before
Cell A1 is empty with no data
After
Cell A1 shows a list of all h2 headings from the specified webpage
Settings Reference
URL
📍 First argument in IMPORTXML formula
Specifies the webpage to pull data from
Default: None
XPath query
📍 Second argument in IMPORTXML formula
Defines which parts of the webpage to extract
Default: None
Common Mistakes
Using a wrong or incomplete XPath query
The formula returns errors or no data because it can't find matching elements
Use a correct XPath expression that matches the webpage structure, test it with browser tools
Not enclosing URL or XPath in double quotes
Formula shows syntax error because arguments are not recognized as text
Always put URL and XPath inside double quotes like "https://example.com" and "//h2"
Trying to import data from pages that block scraping or require login
IMPORTXML cannot access protected or dynamic content, so it returns errors
Use only publicly accessible static pages or APIs for IMPORTXML
Summary
IMPORTXML pulls specific data from webpages into your sheet using URL and XPath.
You must write the URL and XPath query correctly inside the formula.
It works only with publicly accessible static web pages.

Practice

(1/5)
1. What does the IMPORTXML function do in Google Sheets?
easy
A. It fetches data from a web page using a URL and XPath query.
B. It imports data from another Google Sheet only.
C. It creates charts based on web data automatically.
D. It exports your sheet data to a web page.

Solution

  1. Step 1: Understand IMPORTXML purpose

    IMPORTXML is designed to pull data from web pages by using a URL and an XPath query to specify what data to extract.
  2. Step 2: Compare options

    Only "It fetches data from a web page using a URL and XPath query." correctly describes this function. The other options describe unrelated features.
  3. Final Answer:

    It fetches data from a web page using a URL and XPath query. -> Option A
  4. Quick Check:

    IMPORTXML = fetch web data [OK]
Hint: IMPORTXML grabs web data using URL + XPath [OK]
Common Mistakes:
  • Thinking IMPORTXML only works with other sheets
  • Confusing IMPORTXML with chart creation
  • Assuming it exports data instead of importing
2. Which of these is the correct syntax for using IMPORTXML to get all <h2> elements from a webpage URL in cell A1?
easy
A. =IMPORTXML(A1, "//h2/@text")
B. =IMPORTXML(A1, "//h2[]")
C. =IMPORTXML(A1, "h2")
D. =IMPORTXML(A1, "//h2")

Solution

  1. Step 1: Recall IMPORTXML syntax

    The function takes two arguments: a URL and an XPath query. To select all <h2> elements, the XPath is "//h2".
  2. Step 2: Evaluate options

    =IMPORTXML(A1, "//h2") uses correct XPath syntax. =IMPORTXML(A1, "//h2[]") has invalid brackets. =IMPORTXML(A1, "h2") misses the XPath axis. =IMPORTXML(A1, "//h2/@text") tries to get an attribute "text" which doesn't exist.
  3. Final Answer:

    =IMPORTXML(A1, "//h2") -> Option D
  4. Quick Check:

    Correct XPath syntax = =IMPORTXML(A1, "//h2") [OK]
Hint: Use double slashes and tag name for XPath [OK]
Common Mistakes:
  • Adding brackets [] incorrectly in XPath
  • Omitting // in XPath
  • Trying to get text as attribute with @text
3. Given the formula =IMPORTXML("https://example.com", "//ul/li"), what will the output be?
medium
A. All paragraphs (<p>) from the page.
B. Only the first list item from the page.
C. All list items (<li>) inside unordered lists (<ul>) from the page.
D. An error because XPath is invalid.

Solution

  1. Step 1: Understand the XPath query

    The XPath "//ul/li" selects all
  2. elements that are children of any <ul> element on the page.
  3. Step 2: Predict IMPORTXML output

    IMPORTXML will return all matching list items, not just the first, and it won't return paragraphs or error since XPath is valid.
  4. Final Answer:

    All list items (
  5. ) inside unordered lists (<ul>) from the page.
  6. -> Option C
  7. Quick Check:

    XPath selects all matching nodes = All list items (
  8. ) inside unordered lists (<ul>) from the page. [OK]
Hint: XPath //ul/li selects all list items under ul [OK]
Common Mistakes:
  • Assuming only first match is returned
  • Confusing <li> with <p> tags
  • Thinking XPath syntax is wrong here
4. You wrote =IMPORTXML("https://example.com", "//div[@class='price']") but get a #N/A error. What is the likely problem?
medium
A. The URL is invalid or unreachable.
B. The XPath syntax for class attribute is incorrect.
C. IMPORTXML does not support attribute filters.
D. You must use single quotes inside the XPath instead of double quotes.

Solution

  1. Step 1: Check XPath syntax

    The XPath "//div[@class='price']" is correct for selecting divs with class 'price'.
  2. Step 2: Consider other causes of #N/A

    #N/A often means the URL is unreachable or blocked. IMPORTXML supports attribute filters and double quotes inside XPath strings are allowed if escaped properly.
  3. Final Answer:

    The URL is invalid or unreachable. -> Option A
  4. Quick Check:

    #N/A often means URL problem [OK]
Hint: Check URL accessibility if #N/A error occurs [OK]
Common Mistakes:
  • Assuming XPath syntax is wrong when it's correct
  • Not verifying the URL is accessible
  • Thinking IMPORTXML can't filter by attributes
5. You want to import the latest news headlines from https://news.example.com where headlines are in <h3 class='headline'> tags. Which formula correctly imports only the text of these headlines?
hard
A. =IMPORTXML("https://news.example.com", "//h3[@class='headline']")
B. =IMPORTXML("https://news.example.com", "//h3[@class='headline']/text()")
C. =IMPORTXML("https://news.example.com", "//h3[@class='headline']/@text")
D. =IMPORTXML("https://news.example.com", "//h3[@class='headline']/innerText")

Solution

  1. Step 1: Understand XPath to get text content

    To get only the text inside elements, use the XPath function /text() after selecting the element.
  2. Step 2: Evaluate options

    =IMPORTXML("https://news.example.com", "//h3[@class='headline']/text()") correctly uses /text(). =IMPORTXML("https://news.example.com", "//h3[@class='headline']") returns the whole element including tags. =IMPORTXML("https://news.example.com", "//h3[@class='headline']/@text") tries to get an attribute 'text' which doesn't exist. =IMPORTXML("https://news.example.com", "//h3[@class='headline']/innerText") uses invalid XPath syntax.
  3. Final Answer:

    =IMPORTXML("https://news.example.com", "//h3[@class='headline']/text()") -> Option B
  4. Quick Check:

    Use /text() to get element text [OK]
Hint: Add /text() to XPath to get only text content [OK]
Common Mistakes:
  • Omitting /text() and getting full HTML tags
  • Using @text which is not an attribute
  • Trying invalid XPath like innerText