How to Parse XML in Python: Simple Guide with Examples
To parse XML in Python, use the built-in
xml.etree.ElementTree module which provides functions to read and navigate XML data. You can load XML from a string or file, then access elements and attributes easily with methods like find() and iter().Syntax
The main steps to parse XML using xml.etree.ElementTree are:
import xml.etree.ElementTree as ET: Import the module.tree = ET.parse('file.xml'): Load XML from a file.root = tree.getroot(): Get the root element of the XML.element.find('tag'): Find a child element by tag.element.iter('tag'): Iterate over all elements with a tag.
python
import xml.etree.ElementTree as ET tree = ET.parse('file.xml') root = tree.getroot() for child in root: print(child.tag, child.attrib) item = root.find('item') print(item.text if item is not None else 'No item found')
Example
This example shows how to parse a simple XML string, access elements, and print their text and attributes.
python
import xml.etree.ElementTree as ET xml_data = ''' <store> <item name="apple" price="0.5" /> <item name="banana" price="0.3" /> <item name="cherry" price="0.2" /> </store> ''' root = ET.fromstring(xml_data) for item in root.iter('item'): name = item.attrib.get('name') price = item.attrib.get('price') print(f"Item: {name}, Price: {price}")
Output
Item: apple, Price: 0.5
Item: banana, Price: 0.3
Item: cherry, Price: 0.2
Common Pitfalls
Common mistakes when parsing XML in Python include:
- Trying to parse malformed XML which causes errors.
- Using
find()when multiple elements exist; it returns only the first match. - Not handling missing elements or attributes, which can cause
NoneTypeerrors. - Confusing
parse()(for files) withfromstring()(for strings).
Always check if elements exist before accessing their text or attributes.
python
import xml.etree.ElementTree as ET xml_data = '<root><item>Value</item></root>' root = ET.fromstring(xml_data) # Wrong: assumes 'price' attribute exists # price = root.find('item').attrib['price'] # KeyError if missing # Right: safely get attribute with default price = root.find('item').attrib.get('price', 'N/A') print(price)
Output
N/A
Quick Reference
| Function/Method | Description |
|---|---|
| ET.parse(filename) | Parse XML file and return ElementTree object |
| ET.fromstring(string) | Parse XML from string and return root element |
| tree.getroot() | Get root element from ElementTree |
| element.find('tag') | Find first child element with tag |
| element.findall('tag') | Find all child elements with tag |
| element.iter('tag') | Iterate over all elements with tag in subtree |
| element.attrib | Dictionary of element's attributes |
| element.text | Text content inside element |
Key Takeaways
Use xml.etree.ElementTree for easy XML parsing in Python.
Load XML from files with ET.parse() or from strings with ET.fromstring().
Access elements safely using find(), iter(), and check for None before use.
Handle missing attributes with dict.get() to avoid errors.
Common errors come from malformed XML or wrong method usage.