Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Navigating HTML Document | Beautiful Soup: Part I
Web Scraping with Python
course content

Course Content

Web Scraping with Python

Web Scraping with Python

1. Getting Acquainted with HTML
2. Beautiful Soup: Part I
3. Beautiful Soup: Part II

bookNavigating HTML Document

After reading the HTML document, you have the flexibility to navigate it in several ways. To delve deeper, you can specify a tag just like an attribute. For example, let's examine the <head> element and represent it in a 'structured' form (by employing the .prettify() method).

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print(soup.head.prettify())
copy

Feel free to experiment by substituting the .head attribute with .body, for example. As shown above, the <head> element encompasses several children. You can iterate through all the children of elements using a for loop and the .children attribute.

1234567891011121314
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') # Iterating over all element children for child in soup.head.children: print(child)
copy

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 2
some-alt