How to Scrape Dynamic Websites Using Python Easily
To scrape a dynamic website using
Python, use Selenium which controls a real browser to load JavaScript content. This lets you extract data after the page fully loads, unlike simple requests that only get static HTML.Syntax
Use Selenium with a web driver to open a browser, load the dynamic page, and extract content after JavaScript runs.
webdriver.Chrome(): Starts a Chrome browser controlled by Python.get(url): Opens the webpage URL.find_element(By, selector): Finds elements on the page.page_source: Gets the full HTML after scripts run.quit(): Closes the browser.
python
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By from selenium.webdriver.chrome.options import Options options = Options() options.add_argument('--headless') # Run browser without GUI service = Service() # Path to chromedriver if needed browser = webdriver.Chrome(service=service, options=options) browser.get('https://example.com') # Load dynamic page html = browser.page_source # Get HTML after JS loads # Example: find element by CSS selector element = browser.find_element(By.CSS_SELECTOR, 'h1') print(element.text) browser.quit()
Example
This example shows how to scrape the main heading from a dynamic website using Selenium in headless mode. It waits for the page to load and then extracts the text inside the <h1> tag.
python
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By from selenium.webdriver.chrome.options import Options import time options = Options() options.add_argument('--headless') # Run browser without GUI service = Service() # Use default chromedriver path browser = webdriver.Chrome(service=service, options=options) browser.get('https://quotes.toscrape.com/js/') # Dynamic site example # Wait for JavaScript to load content time.sleep(3) # Extract first quote text quote = browser.find_element(By.CLASS_NAME, 'text') print(quote.text) browser.quit()
Output
"The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."
Common Pitfalls
- Not waiting for JavaScript to load causes empty or incomplete data.
- Using
requestsalone won't work on dynamic sites because it fetches only static HTML. - Forgetting to close the browser with
quit()can leave processes running. - Not using the correct web driver version for your browser causes errors.
Always check the page structure after JavaScript runs to select the right elements.
python
from selenium import webdriver from selenium.webdriver.chrome.options import Options import time options = Options() options.add_argument('--headless') browser = webdriver.Chrome(options=options) browser.get('https://quotes.toscrape.com/js/') # WRONG: Trying to get element immediately without wait try: quote = browser.find_element(By.CLASS_NAME, 'text') print(quote.text) except Exception as e: print('Error:', e) # RIGHT: Wait for content to load time.sleep(3) # Simple wait quote = browser.find_element(By.CLASS_NAME, 'text') print('After wait:', quote.text) browser.quit()
Output
Error: Message: no such element: Unable to locate element: {"method":"css selector","selector":".text"}
After wait: "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking."
Quick Reference
Tips for scraping dynamic websites with Python:
- Use
Seleniumwith a browser driver like ChromeDriver. - Run in headless mode to avoid opening a visible browser.
- Wait for JavaScript to load using
time.sleep()or explicit waits. - Use
find_elementmethods to locate elements after page load. - Always close the browser with
quit()to free resources.
Key Takeaways
Use Selenium to control a real browser and load JavaScript content for dynamic sites.
Always wait for the page to fully load before extracting data to avoid missing content.
Run browsers in headless mode for faster, invisible scraping.
Close the browser properly with quit() to prevent resource leaks.
Requests alone cannot scrape dynamic content generated by JavaScript.