How to Scrape Data Using Selenium: Simple Guide with Example
To scrape data using
Selenium, you first set up a WebDriver to open a browser, then use find_element or find_elements methods with locators like By.CSS_SELECTOR to locate data on the page. Finally, extract the text or attributes from those elements to get the data you want.Syntax
Here is the basic syntax to scrape data using Selenium:
driver = webdriver.Chrome(): Starts the Chrome browser.driver.get(url): Opens the webpage at the given URL.element = driver.find_element(By.CSS_SELECTOR, 'selector'): Finds a single element using a CSS selector.elements = driver.find_elements(By.CSS_SELECTOR, 'selector'): Finds multiple elements matching the selector.text = element.text: Gets the visible text inside the element.driver.quit(): Closes the browser when done.
python
from selenium import webdriver from selenium.webdriver.common.by import By # Start browser driver = webdriver.Chrome() # Open webpage driver.get('https://example.com') # Find element element = driver.find_element(By.CSS_SELECTOR, 'h1') # Get text text = element.text # Close browser driver.quit()
Example
This example opens the example.com homepage, finds the main heading <h1>, and prints its text content.
python
from selenium import webdriver from selenium.webdriver.common.by import By # Initialize Chrome WebDriver driver = webdriver.Chrome() # Open the webpage driver.get('https://example.com') # Locate the main heading element heading = driver.find_element(By.CSS_SELECTOR, 'h1') # Print the text inside the heading print('Heading text:', heading.text) # Close the browser driver.quit()
Output
Heading text: Example Domain
Common Pitfalls
- Not waiting for elements: Pages may load slowly, so elements might not be ready. Use explicit waits like
WebDriverWaitto wait for elements. - Wrong locators: Using unstable locators like absolute XPaths can break your scraper. Prefer CSS selectors or stable attributes.
- Not closing browser: Forgetting
driver.quit()can leave browser processes running. - Ignoring page navigation: If scraping multiple pages, ensure navigation completes before scraping.
python
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC # Initialize Chrome WebDriver driver = webdriver.Chrome() # Wrong way: directly find element without wait # element = driver.find_element(By.CSS_SELECTOR, 'div.content') # May fail if not loaded # Right way: wait until element is present wait = WebDriverWait(driver, 10) element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'div.content'))) # Close browser driver.quit()
Quick Reference
Remember these key points when scraping with Selenium:
- Use
By.CSS_SELECTORorBy.XPATHto locate elements. - Use
element.textto get visible text. - Use explicit waits (
WebDriverWait) to handle dynamic pages. - Always close the browser with
driver.quit(). - Keep locators simple and stable for reliable scraping.
Key Takeaways
Set up Selenium WebDriver and open the target webpage before scraping.
Use stable locators like CSS selectors and explicit waits to find elements reliably.
Extract data using element properties like .text or .get_attribute().
Always close the browser with driver.quit() to free resources.
Handle dynamic content by waiting for elements to load before scraping.