PythonHow-ToBeginner · 4 min read

How to Use BeautifulSoup in Python: Simple Web Scraping Guide

To use BeautifulSoup in Python, first install it with pip install beautifulsoup4. Then import it and parse HTML content using BeautifulSoup(html, 'html.parser') to extract data easily from web pages.

📐

Syntax

The basic syntax to use BeautifulSoup is:

from bs4 import BeautifulSoup: imports the library.
BeautifulSoup(html, 'html.parser'): creates a soup object from HTML text.
Use soup methods like find(), find_all(), or select() to locate elements.

python

from bs4 import BeautifulSoup

html = '<html><body><p>Hello, world!</p></body></html>'
soup = BeautifulSoup(html, 'html.parser')

# Find the first <p> tag
paragraph = soup.find('p')
print(paragraph.text)

Output

Hello, world!

💻

Example

This example shows how to parse a simple HTML string and extract all links (<a> tags) with their URLs and text.

python

from bs4 import BeautifulSoup

html = '''
<html>
  <body>
    <h1>My Website</h1>
    <a href='https://example.com'>Example</a>
    <a href='https://openai.com'>OpenAI</a>
  </body>
</html>
'''

soup = BeautifulSoup(html, 'html.parser')

links = soup.find_all('a')
for link in links:
    print(f'Text: {link.text}, URL: {link.get("href")}')

Output

Text: Example, URL: https://example.com Text: OpenAI, URL: https://openai.com

⚠️

Common Pitfalls

Common mistakes when using BeautifulSoup include:

Not specifying the parser (like 'html.parser'), which can cause errors or slower parsing.
Trying to parse content before fetching it properly (e.g., parsing an empty string).
Using find() when multiple elements are expected; use find_all() instead.
Not handling cases where elements might not exist, causing NoneType errors.

python

from bs4 import BeautifulSoup

html = ''  # Empty HTML string

# Wrong: parsing empty content
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('p'))  # Returns None, no error but no data

# Right: check if element exists
paragraph = soup.find('p')
if paragraph:
    print(paragraph.text)
else:
    print('No paragraph found')

Output

None No paragraph found

📊

Quick Reference

Here is a quick reference for common BeautifulSoup methods:

Method	Description
`BeautifulSoup(html, 'html.parser')`	Create soup object from HTML string
`find(tag)`	Find first occurrence of a tag
`find_all(tag)`	Find all occurrences of a tag
`select(css_selector)`	Find elements using CSS selectors
`get(attribute)`	Get attribute value of a tag
`.text`	Get text inside a tag

✅

Key Takeaways

Install BeautifulSoup with pip before using it in Python.

Parse HTML with BeautifulSoup(html, 'html.parser') to create a soup object.

Use find() for one element and find_all() for multiple elements.

Always check if elements exist before accessing their properties to avoid errors.

BeautifulSoup makes extracting data from HTML easy and readable.