Opening HTML File

You're acquainted with the fundamental aspects of HTML, let's explore the initial method of working with it in Python.

One of the modules you can employ to handle HTML files in Python is urllib.request. You'll need to import the urlopen method to access web pages. Simply provide the URL of the page you wish to open as a parameter to this method.


              1234567
            
# Importing the module
from urllib.request import urlopen

# Opening web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html"
page = urlopen(url)
print(page)

As seen in the example above, you receive an http.client.HTTPResponse object as a result, which differs from what we intended. To obtain the HTML structure, you should apply the .read() and .decode("utf-8") methods to the object you've acquired.


              1234567891011
            
# Importing the module
from urllib.request import urlopen

# Opening web page
url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html"
page = urlopen(url)

# Reading and decoding
web_page = page.read().decode("utf-8")
print(type(web_page))
print(web_page)

As a result of applying the .read() and .decode() methods, you obtain a string. This string contains the HTML structure in a well-formatted manner, making it easily readable and allowing you to apply string methods to it.

If the .decode() method weren't applied, you would receive a bytes object with the entire HTML page represented as a single string with specific characters. Feel free to experiment with it!

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 8

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Web Scraping with Python

1. Getting Acquainted with HTML

2. Decoding HTML with Beautiful Soup

What is Beautiful Soup?Navigating HTML Document Challenge: The BeautifulSoup Object Challenge: Iterating Over Lists Working with Specific Elements Working with Paragraph Elements

3. Working with Element Attributes in Beautiful Soup

Attributes and Contents of Element Challenge: Attributes Attributes and Contents of Multiple Elements Challenge: Text from HTML Elements Advanced Search Challenge: Find All