Learn What is Beautiful Soup? | Decoding HTML with Beautiful Soup

BeautifulSoup is a python library that offers extensive functionality for parsing HTML pages. In the previous section, you worked with HTML as a string, which imposed significant limitations.

To install BeautifulSoup, execute the following command in your terminal or command prompt:

pip install beautifulsoup4;
To get started, import BeautifulSoup from bs4:

from bs4 import BeautifulSoup.


              123
            
# Importing the library
from bs4 import BeautifulSoup
print(BeautifulSoup)

This library is designed for working with HTML files and does not handle links. However, you can manage that using urlopen from urllib.request. To start parsing, provide two parameters to the BeautifulSoup function: the HTML file and the parser (use the built-in html.parser). This creates a BeautifulSoup object. For example, open and read a web page.


              12345678910111213
            
# Importing libraries
from bs4 import BeautifulSoup
from urllib.request import urlopen

# Reading web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html"
page = urlopen(url)
html = page.read().decode("utf-8")

# Reading HTML with BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
print(type(soup))
print(soup)

The first method to explore is .prettify(), which displays the HTML file as a nested data structure.


              123456789101112
            
# Importing libraries
from bs4 import BeautifulSoup
from urllib.request import urlopen

# Reading web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html"
page = urlopen(url)
html = page.read().decode("utf-8")

# Reading HTML with BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
print(soup.prettify())

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Swipe to show menu

To install BeautifulSoup, execute the following command in your terminal or command prompt:

pip install beautifulsoup4;
To get started, import BeautifulSoup from bs4:

from bs4 import BeautifulSoup.


              123
            
# Importing the library
from bs4 import BeautifulSoup
print(BeautifulSoup)


              12345678910111213
            
# Importing libraries
from bs4 import BeautifulSoup
from urllib.request import urlopen

# Reading web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html"
page = urlopen(url)
html = page.read().decode("utf-8")

# Reading HTML with BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
print(type(soup))
print(soup)

The first method to explore is .prettify(), which displays the HTML file as a nested data structure.


              123456789101112
            
# Importing libraries
from bs4 import BeautifulSoup
from urllib.request import urlopen

# Reading web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html"
page = urlopen(url)
html = page.read().decode("utf-8")

# Reading HTML with BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
print(soup.prettify())

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 1