How to Get Page Source in Selenium: Syntax and Example
To get the page source in Selenium, use the
driver.page_source property which returns the entire HTML content of the current page as a string. This lets you inspect or save the page's HTML for testing or debugging.Syntax
The syntax to get the page source in Selenium is simple. Use driver.page_source where driver is your WebDriver instance. It returns the full HTML content of the current page as a string.
- driver: Your Selenium WebDriver object controlling the browser.
- page_source: Property that fetches the HTML source code of the loaded page.
python
page_html = driver.page_source
Example
This example shows how to open a webpage using Selenium, get its page source, and print the first 500 characters of the HTML content.
python
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.chrome.options import Options # Setup Chrome options options = Options() options.add_argument('--headless') # Run browser in headless mode # Setup Chrome driver service (adjust path to your chromedriver) service = Service(executable_path='./chromedriver') # Create WebDriver instance with webdriver.Chrome(service=service, options=options) as driver: driver.get('https://example.com') page_html = driver.page_source print(page_html[:500]) # Print first 500 chars of page source
Output
<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n <meta charset="utf-8" />\n <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n <meta name="viewport" content="width=device-width, initial-scale=1" />\n <style type="text/css">\n body {\n background-color: #f0f0f2;\n margin: 40px;\n font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n \n }\n </style>\n</head>\n<body>\n<div>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples in documents.</p>\n <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>
Common Pitfalls
- Not waiting for page load: Trying to get
page_sourcebefore the page fully loads may give incomplete HTML. - Using wrong driver instance: Ensure you call
page_sourceon the active WebDriver controlling the browser. - Expecting dynamic content:
page_sourceshows the current HTML, but some content loaded by JavaScript after page load may not appear immediately.
python
from selenium import webdriver # Wrong: calling page_source before get() # driver = webdriver.Chrome() # print(driver.page_source) # This will be empty or default # Right way: # driver.get('https://example.com') # print(driver.page_source)
Quick Reference
Remember these tips when using driver.page_source:
- Always navigate to the page first with
driver.get(url). - Wait for page elements to load if needed before getting source.
page_sourcereturns a string of the full HTML.- Use it for debugging, saving HTML, or verifying page content.
Key Takeaways
Use driver.page_source to get the full HTML content of the current page in Selenium.
Always navigate to the page and wait for it to load before accessing page_source.
page_source returns a string containing the entire HTML, useful for debugging or validation.
Dynamic content loaded after page load may not appear immediately in page_source.
Ensure you use the correct WebDriver instance when calling page_source.