0
0
PythonHow-ToBeginner · 4 min read

How to Scrape Dynamic Websites Using Python Easily

To scrape a dynamic website using Python, use Selenium which controls a real browser to load JavaScript content. This lets you extract data after the page fully loads, unlike simple requests that only get static HTML.
📐

Syntax

Use Selenium with a web driver to open a browser, load the dynamic page, and extract content after JavaScript runs.

  • webdriver.Chrome(): Starts a Chrome browser controlled by Python.
  • get(url): Opens the webpage URL.
  • find_element(By, selector): Finds elements on the page.
  • page_source: Gets the full HTML after scripts run.
  • quit(): Closes the browser.
python
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--headless')  # Run browser without GUI

service = Service()  # Path to chromedriver if needed
browser = webdriver.Chrome(service=service, options=options)

browser.get('https://example.com')  # Load dynamic page
html = browser.page_source  # Get HTML after JS loads

# Example: find element by CSS selector
element = browser.find_element(By.CSS_SELECTOR, 'h1')
print(element.text)

browser.quit()
💻

Example

This example shows how to scrape the main heading from a dynamic website using Selenium in headless mode. It waits for the page to load and then extracts the text inside the <h1> tag.

python
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.add_argument('--headless')  # Run browser without GUI

service = Service()  # Use default chromedriver path
browser = webdriver.Chrome(service=service, options=options)

browser.get('https://quotes.toscrape.com/js/')  # Dynamic site example

# Wait for JavaScript to load content
time.sleep(3)

# Extract first quote text
quote = browser.find_element(By.CLASS_NAME, 'text')
print(quote.text)

browser.quit()
Output
"The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."
⚠️

Common Pitfalls

  • Not waiting for JavaScript to load causes empty or incomplete data.
  • Using requests alone won't work on dynamic sites because it fetches only static HTML.
  • Forgetting to close the browser with quit() can leave processes running.
  • Not using the correct web driver version for your browser causes errors.

Always check the page structure after JavaScript runs to select the right elements.

python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

options = Options()
options.add_argument('--headless')

browser = webdriver.Chrome(options=options)

browser.get('https://quotes.toscrape.com/js/')

# WRONG: Trying to get element immediately without wait
try:
    quote = browser.find_element(By.CLASS_NAME, 'text')
    print(quote.text)
except Exception as e:
    print('Error:', e)

# RIGHT: Wait for content to load

time.sleep(3)  # Simple wait
quote = browser.find_element(By.CLASS_NAME, 'text')
print('After wait:', quote.text)

browser.quit()
Output
Error: Message: no such element: Unable to locate element: {"method":"css selector","selector":".text"} After wait: "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."
📊

Quick Reference

Tips for scraping dynamic websites with Python:

  • Use Selenium with a browser driver like ChromeDriver.
  • Run in headless mode to avoid opening a visible browser.
  • Wait for JavaScript to load using time.sleep() or explicit waits.
  • Use find_element methods to locate elements after page load.
  • Always close the browser with quit() to free resources.

Key Takeaways

Use Selenium to control a real browser and load JavaScript content for dynamic sites.
Always wait for the page to fully load before extracting data to avoid missing content.
Run browsers in headless mode for faster, invisible scraping.
Close the browser properly with quit() to prevent resource leaks.
Requests alone cannot scrape dynamic content generated by JavaScript.