Course Content
Web Scraping with Python
Web Scraping with Python
Navigating HTML Document
After reading the HTML
document, you have the flexibility to navigate it in several ways. To delve deeper, you can specify a tag just like an attribute. For example, let's examine the <head>
element and represent it in a 'structured' form (by employing the .prettify()
method).
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print(soup.head.prettify())
Feel free to experiment by substituting the .head
attribute with .body
, for example. As shown above, the <head>
element encompasses several children. You can iterate through all the children of elements using a for
loop and the .children
attribute.
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') # Iterating over all element children for child in soup.head.children: print(child)
Thanks for your feedback!