Bash Script to Extract URLs from File
grep -oE 'https?://[^ "\'>]+' filename in Bash to extract all URLs from a file by matching http or https links.Examples
How to Think About It
Algorithm
Code
#!/bin/bash # Extract URLs from a file passed as argument if [ $# -eq 0 ]; then echo "Usage: $0 filename" exit 1 fi filename="$1" grep -oE 'https?://[^ "\'>]+' "$filename"
Dry Run
Let's trace the input 'Visit https://example.com and http://test.org for info.' through the code
Read line
Line: Visit https://example.com and http://test.org for info.
Apply grep regex
Matches found: https://example.com, http://test.org
Print matches
Output lines: https://example.com http://test.org
| Step | Action | Value |
|---|---|---|
| 1 | Read line | Visit https://example.com and http://test.org for info. |
| 2 | Extract URLs | https://example.com, http://test.org |
| 3 | Print URLs | https://example.com http://test.org |
Why This Works
Step 1: Use grep with regex
The grep -oE command searches for patterns and prints only the matched parts, not the whole line.
Step 2: Regex pattern explained
The pattern https?://[^ "\'>]+ matches 'http' or 'https', then '://', then any characters except space, quotes, or angle brackets, which usually end URLs.
Step 3: Extract URLs line by line
This approach reads each line and extracts all URLs found, printing each on its own line for easy use.
Alternative Approaches
awk '{while(match($0, /https?:\/\/[^ "\'>]+/)){print substr($0, RSTART, RLENGTH); $0=substr($0, RSTART+RLENGTH)}}' filenamesed -n 's/.*\(https\?:\/\/[^ "\'>]*\).*/\1/p' filenameComplexity: O(n) time, O(k) space
Time Complexity
The script reads each line once and applies a regex match, so time grows linearly with file size (O(n)).
Space Complexity
Only stores matched URLs temporarily, so space depends on number of URLs found (O(k)), generally small compared to input.
Which Approach is Fastest?
Using grep is fastest and simplest for extracting URLs; awk is flexible but slower; sed is limited to first URL per line.
| Approach | Time | Space | Best For |
|---|---|---|---|
| grep with regex | O(n) | O(k) | Simple, fast URL extraction |
| awk with match loop | O(n) | O(k) | Multiple URLs per line, flexible processing |
| sed substitution | O(n) | O(1) | Extracting first URL per line, simpler cases |
grep -oE with a simple regex to quickly extract URLs from text files.-o option in grep, which causes the whole line to print instead of just URLs.