How to Parse URL in Python: Simple Guide with Examples
You can parse a URL in Python using the
urllib.parse module, specifically the urlparse() function. It breaks down a URL into components like scheme, netloc, path, params, query, and fragment for easy access.Syntax
The urlparse() function from the urllib.parse module takes a URL string and returns a ParseResult object with these parts:
- scheme: The protocol (e.g.,
http,https). - netloc: The network location (domain and port).
- path: The path to the resource.
- params: Parameters for last path element.
- query: The query string after
?. - fragment: The part after
#.
python
from urllib.parse import urlparse result = urlparse('https://example.com:8080/path/to/page?name=alice&age=30#section1') print(result)
Output
ParseResult(scheme='https', netloc='example.com:8080', path='/path/to/page', params='', query='name=alice&age=30', fragment='section1')
Example
This example shows how to parse a URL and access each part separately.
python
from urllib.parse import urlparse url = 'https://example.com:8080/path/to/page?name=alice&age=30#section1' parsed_url = urlparse(url) print('Scheme:', parsed_url.scheme) print('Network location:', parsed_url.netloc) print('Path:', parsed_url.path) print('Parameters:', parsed_url.params) print('Query:', parsed_url.query) print('Fragment:', parsed_url.fragment)
Output
Scheme: https
Network location: example.com:8080
Path: /path/to/page
Parameters:
Query: name=alice&age=30
Fragment: section1
Common Pitfalls
One common mistake is trying to parse URLs without importing urlparse from urllib.parse. Another is expecting the query string to be automatically split into key-value pairs; urlparse() only returns the raw query string.
To get query parameters as a dictionary, use parse_qs() from urllib.parse.
python
from urllib.parse import urlparse, parse_qs url = 'https://example.com/path?name=alice&age=30' parsed = urlparse(url) # Wrong: expecting query to be a dict print(parsed.query) # Outputs raw string # Right: parse query string into dict query_params = parse_qs(parsed.query) print(query_params)
Output
name=alice&age=30
{'name': ['alice'], 'age': ['30']}
Quick Reference
Here is a quick summary of useful functions for URL parsing in Python:
| Function | Description |
|---|---|
| urlparse(url) | Parse URL into components |
| urlunparse(parts) | Combine components back into URL string |
| parse_qs(query) | Parse query string into dictionary |
| urljoin(base, url) | Combine base URL with relative URL |
Key Takeaways
Use urllib.parse.urlparse() to split a URL into parts.
Access URL parts like scheme, netloc, path, query, and fragment from the ParseResult.
Use urllib.parse.parse_qs() to convert query strings into dictionaries.
Remember to import functions from urllib.parse before using them.
Combine URL parts back with urlunparse() if needed.