Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Attributes & Contents of Element | Beautiful Soup: Part II
Web Scraping with Python
course content

Course Content

Web Scraping with Python

Web Scraping with Python

1. Getting Acquainted with HTML
2. Beautiful Soup: Part I
3. Beautiful Soup: Part II

book
Attributes & Contents of Element

The methods discussed in the previous sections return specific parts of the HTML code. BeautifulSoup enables us to retrieve the attributes and contents of particular elements. To access the attributes of an object, use the .attrs attribute. For instance, we can retrieve the attributes of the first <div> element.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print(soup.find('div').attrs)
copy

It's important to note that the result of using the .attrs attribute is a dictionary where the keys are attribute names and the values are their respective values. If you wish to obtain the content stored within a tag, employ the .contents attribute. For example, let's examine the contents of the first <div> element.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print(soup.find('div').contents)
copy

As observed above, all the newline characters were included in a list of elements, which may not be the most desirable representation of content. If you want to extract only the text within a specific element, utilize the .get_text() method. Compare the results from the example below with the one obtained earlier.

123456789101112
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print(soup.find('div').get_text())
copy

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 3. Chapter 1
We're sorry to hear that something went wrong. What happened?
some-alt