Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Work with Soup | Beautiful Soup
Web Scraping with Python (res)
course content

Course Content

Web Scraping with Python (res)

Web Scraping with Python (res)

1. HTML Files and DevTools
2. Beautiful Soup
3. CSS Selectors/XPaths
4. Tables

bookWork with Soup

Continue exploring BeautifulSoup let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):

12
print(soup.div.name) print(soup.div.attrs)
copy

In the code, we used the method .name to get the tag’s name and the function .attrs, which returns all tag attributes as a dictionary.

Another useful function is .get_text(), which extracts all the raw text from the website without HTML tags.

The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file.

In a similar way you can also get only text in the extracted HTML tags using the function .get_text() or .string:

12
print(soup.h1.string) print(soup.h1.get_text())
copy

If a tag contains more than one thing (or nothing), it is unclear what .string should refer to, so the function returns None.

Task

Here you will work on the same page about Christ the Redeemer as in the previous task.

  1. Import the BeautifulSoup library.
  2. Print the attributes of the p tag.
  3. Print only the text of the ul tags.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 2
toggle bottom row

bookWork with Soup

Continue exploring BeautifulSoup let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):

12
print(soup.div.name) print(soup.div.attrs)
copy

In the code, we used the method .name to get the tag’s name and the function .attrs, which returns all tag attributes as a dictionary.

Another useful function is .get_text(), which extracts all the raw text from the website without HTML tags.

The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file.

In a similar way you can also get only text in the extracted HTML tags using the function .get_text() or .string:

12
print(soup.h1.string) print(soup.h1.get_text())
copy

If a tag contains more than one thing (or nothing), it is unclear what .string should refer to, so the function returns None.

Task

Here you will work on the same page about Christ the Redeemer as in the previous task.

  1. Import the BeautifulSoup library.
  2. Print the attributes of the p tag.
  3. Print only the text of the ul tags.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 2. Chapter 2
toggle bottom row

bookWork with Soup

Continue exploring BeautifulSoup let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):

12
print(soup.div.name) print(soup.div.attrs)
copy

In the code, we used the method .name to get the tag’s name and the function .attrs, which returns all tag attributes as a dictionary.

Another useful function is .get_text(), which extracts all the raw text from the website without HTML tags.

The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file.

In a similar way you can also get only text in the extracted HTML tags using the function .get_text() or .string:

12
print(soup.h1.string) print(soup.h1.get_text())
copy

If a tag contains more than one thing (or nothing), it is unclear what .string should refer to, so the function returns None.

Task

Here you will work on the same page about Christ the Redeemer as in the previous task.

  1. Import the BeautifulSoup library.
  2. Print the attributes of the p tag.
  3. Print only the text of the ul tags.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Everything was clear?

How can we improve it?

Thanks for your feedback!

Continue exploring BeautifulSoup let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):

12
print(soup.div.name) print(soup.div.attrs)
copy

In the code, we used the method .name to get the tag’s name and the function .attrs, which returns all tag attributes as a dictionary.

Another useful function is .get_text(), which extracts all the raw text from the website without HTML tags.

The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file.

In a similar way you can also get only text in the extracted HTML tags using the function .get_text() or .string:

12
print(soup.h1.string) print(soup.h1.get_text())
copy

If a tag contains more than one thing (or nothing), it is unclear what .string should refer to, so the function returns None.

Task

Here you will work on the same page about Christ the Redeemer as in the previous task.

  1. Import the BeautifulSoup library.
  2. Print the attributes of the p tag.
  3. Print only the text of the ul tags.

Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
Section 2. Chapter 2
Switch to desktopSwitch to desktop for real-world practiceContinue from where you are using one of the options below
some-alt