Course Content
Web Scraping with Python
Web Scraping with Python
Working with Specific Elements
Navigating an HTML
document using Python attributes will retrieve only the first occurrence of a particular element. But what if you're interested in the first instance of an element and don't know its full path? In such cases, you can utilize the .find()
method, passing the tag (without < >
brackets) as a string. For example, let's locate the first <div>
element in the HTML
document.
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print(soup.find('div'))
Furthermore, you can retrieve all instances of a specific element by employing the .find_all()
method. This will yield a list of instances. For instance, let's locate all the <p>
tags in the HTML
document.
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/jesus.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') print(soup.find_all('p'))
You can also use the .find_all()
method to find not just one but multiple tags by providing a list of tags. For example, let's gather all the <div>
and <title>
elements.
# Importing libraries from bs4 import BeautifulSoup from urllib.request import urlopen # Reading web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url) html = page.read().decode("utf-8") # Reading HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') for el in soup.find_all(["div", "title"]): print(el)
Thanks for your feedback!