Read and Navigate XML - Beautiful Soup
### How to read and navigate XML
There is a Python library called BeautifulSoup, which makes reading in and parsing XML data easier. Here is the link to the documentation: [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/)
The find() method will find the first place where an xml element occurs. For example using find('record') will return the first record in the xml file:
```xml
<record>
<field name="Country or Area" key="ABW">Aruba</field>
<field name="Item" key="SP.POP.TOTL">Population, total</field>
<field name="Year">1960</field>
<field name="Value">54211</field>
</record>
```
The find_all() method returns all of the matching tags. So find_all('record') would return all of the elements with the `<record>` tag.
Run the code cells below to get a basic idea of how to navigate XML with BeautifulSoup. To navigate through the xml file, you search for a specific tag using the find() method or find_all() method.
Below these code cells, there is an exercise for wrangling the XML data.
# output the first 5 records in the xml file
# this is an example of how to navigate the XML document with BeautifulSoup
i = 0
# use the find_all method to get all record tags in the document
for record in soup.find_all('record'):
# use the find_all method to get all fields in each record
i += 1
for record in record.find_all('field'):
print(record['name'], ': ' , record.text)
print()
if i == 5:
break
Comments
Post a Comment