What is Beautiful Soup?
BeautifulSoup
is a python library that offers extensive functionality for parsing HTML pages. In the previous section, you worked with HTML as a string, which imposed significant limitations.
To install BeautifulSoup
, execute the following command in your terminal or command prompt:
-
pip install beautifulsoup4
; -
To get started, import
BeautifulSoup
frombs4
:from bs4 import BeautifulSoup
.
123# Importing the library from bs4 import BeautifulSoup print(BeautifulSoup)
This library is designed for working with HTML files and does not handle links. However, you can manage that using urlopen
from urllib.request
. To start parsing, provide two parameters to the BeautifulSoup
function: the HTML file and the parser (use the built-in html.parser
). This creates a BeautifulSoup
object. For example, open and read a web page.
12345678910111213# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(type(soup)) print(soup)
The first method to explore is .prettify()
, which displays the HTML file as a nested data structure.
123456789101112# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(soup.prettify())
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Awesome!
Completion rate improved to 4.35
What is Beautiful Soup?
Swipe to show menu
BeautifulSoup
is a python library that offers extensive functionality for parsing HTML pages. In the previous section, you worked with HTML as a string, which imposed significant limitations.
To install BeautifulSoup
, execute the following command in your terminal or command prompt:
-
pip install beautifulsoup4
; -
To get started, import
BeautifulSoup
frombs4
:from bs4 import BeautifulSoup
.
123# Importing the library from bs4 import BeautifulSoup print(BeautifulSoup)
This library is designed for working with HTML files and does not handle links. However, you can manage that using urlopen
from urllib.request
. To start parsing, provide two parameters to the BeautifulSoup
function: the HTML file and the parser (use the built-in html.parser
). This creates a BeautifulSoup
object. For example, open and read a web page.
12345678910111213# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(type(soup)) print(soup)
The first method to explore is .prettify()
, which displays the HTML file as a nested data structure.
123456789101112# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, "html.parser") print(soup.prettify())
Thanks for your feedback!