Read and Navigate XML - Beautiful Soup

 ### How to read and navigate XML


There is a Python library called BeautifulSoup, which makes reading in and parsing XML data easier. Here is the link to the documentation: [Beautiful Soup Documentation](https://www.crummy.com/software/BeautifulSoup/)


The find() method will find the first place where an xml element occurs. For example using find('record') will return the first record in the xml file:


```xml

<record>

  <field name="Country or Area" key="ABW">Aruba</field>

  <field name="Item" key="SP.POP.TOTL">Population, total</field>

  <field name="Year">1960</field>

  <field name="Value">54211</field>

</record>

```


The find_all() method returns all of the matching tags. So find_all('record') would return all of the elements with the `<record>` tag.


Run the code cells below to get a basic idea of how to navigate XML with BeautifulSoup. To navigate through the xml file, you search for a specific tag using the find() method or find_all() method. 


Below these code cells, there is an exercise for wrangling the XML data.


# output the first 5 records in the xml file

# this is an example of how to navigate the XML document with BeautifulSoup


i = 0

# use the find_all method to get all record tags in the document

for record in soup.find_all('record'):

    # use the find_all method to get all fields in each record

    i += 1

    for record in record.find_all('field'):

        print(record['name'], ': ' , record.text)

    print()

    if i == 5:

        break

Comments

Popular posts from this blog

difference-between-stream-processing-and-message-processing

WordNet in Python