How to Validate URL in Python: Simple and Effective Methods
To validate a URL in Python, you can use the
urllib.parse module to parse the URL and check its components or use regular expressions with the re module for pattern matching. The validators library also provides a simple way to check if a URL is valid.Syntax
Here are common ways to validate a URL in Python:
- Using
urllib.parse: Parse the URL and check if scheme and netloc parts exist. - Using
remodule: Match the URL string against a regular expression pattern. - Using
validatorslibrary: Callvalidators.url(url)which returnsTrueif valid.
python
from urllib.parse import urlparse import re import validators # Using urllib.parse parsed_url = urlparse('https://example.com') if parsed_url.scheme and parsed_url.netloc: print('Valid URL') else: print('Invalid URL') # Using regex pattern = re.compile(r'^(https?|ftp)://[\w.-]+(?:\.[\w\.-]+)+[/\w\-\._~:/?#[\]@!$&'"()*+,;=.]+$') url = 'https://example.com' if pattern.match(url): print('Valid URL') else: print('Invalid URL') # Using validators library if validators.url('https://example.com'): print('Valid URL') else: print('Invalid URL')
Output
Valid URL
Valid URL
Valid URL
Example
This example shows how to validate URLs using urllib.parse and the validators library. It checks multiple URLs and prints if each is valid or not.
python
from urllib.parse import urlparse import validators def is_valid_url(url): parsed = urlparse(url) if not (parsed.scheme and parsed.netloc): return False return validators.url(url) urls = [ 'https://www.google.com', 'ftp://files.server.com', 'http:/invalid-url', 'justtext', 'https://example.com/path?query=1' ] for url in urls: print(f'{url} ->', 'Valid' if is_valid_url(url) else 'Invalid')
Output
https://www.google.com -> Valid
ftp://files.server.com -> Valid
http:/invalid-url -> Invalid
justtext -> Invalid
https://example.com/path?query=1 -> Valid
Common Pitfalls
Common mistakes when validating URLs include:
- Only checking if the string starts with
httporhttpswithout verifying the full structure. - Using overly simple regex that misses valid URLs or accepts invalid ones.
- Not handling URLs without schemes or with uncommon schemes.
- Ignoring the need to check both scheme and network location parts.
Always use reliable parsing or validation libraries when possible.
python
from urllib.parse import urlparse # Wrong way: only checking start url = 'http:/example.com' if url.startswith('http://') or url.startswith('https://'): print('Valid URL') else: print('Invalid URL') # This prints 'Invalid URL' but URL is invalid # Right way: parse and check parsed = urlparse(url) if parsed.scheme in ('http', 'https') and parsed.netloc: print('Valid URL') else: print('Invalid URL') # Correctly prints 'Invalid URL'
Output
Invalid URL
Invalid URL
Quick Reference
Tips for URL validation in Python:
- Use
urllib.parse.urlparse()to break down the URL and check essential parts. - Use the
validatorslibrary for simple and reliable validation. - Be cautious with regex; prefer tested patterns or libraries.
- Always check both scheme (like
http) and network location (domain).
Key Takeaways
Use urllib.parse.urlparse to check URL components like scheme and netloc for basic validation.
The validators library offers a simple function to confirm if a URL is valid.
Avoid relying only on string startswith checks or simple regex for URL validation.
Always verify both the scheme and network location parts of a URL.
Testing with multiple URL examples helps ensure your validation works correctly.