Conteúdo do Curso
Web Scraping with Python (res)
Web Scraping with Python (res)
Work with Soup
Continue exploring BeautifulSoup
let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):
print(soup.div.name) print(soup.div.attrs)
In the code, we used the method .name
to get the tag’s name and the function .attrs
, which returns all tag attributes as a dictionary.
Another useful function is .get_text()
, which extracts all the raw text from the website without HTML tags.
The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file.
In a similar way you can also get only text in the extracted HTML tags using the function .get_text()
or .string
:
print(soup.h1.string) print(soup.h1.get_text())
If a tag contains more than one thing (or nothing), it is unclear what .string
should refer to, so the function returns None
.
Tarefa
Here you will work on the same page about Christ the Redeemer as in the previous task.
- Import the
BeautifulSoup
library. - Print the attributes of the
p
tag. - Print only the text of the
ul
tags.
Obrigado pelo seu feedback!
Work with Soup
Continue exploring BeautifulSoup
let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):
print(soup.div.name) print(soup.div.attrs)
In the code, we used the method .name
to get the tag’s name and the function .attrs
, which returns all tag attributes as a dictionary.
Another useful function is .get_text()
, which extracts all the raw text from the website without HTML tags.
The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file.
In a similar way you can also get only text in the extracted HTML tags using the function .get_text()
or .string
:
print(soup.h1.string) print(soup.h1.get_text())
If a tag contains more than one thing (or nothing), it is unclear what .string
should refer to, so the function returns None
.
Tarefa
Here you will work on the same page about Christ the Redeemer as in the previous task.
- Import the
BeautifulSoup
library. - Print the attributes of the
p
tag. - Print only the text of the
ul
tags.
Obrigado pelo seu feedback!
Work with Soup
Continue exploring BeautifulSoup
let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):
print(soup.div.name) print(soup.div.attrs)
In the code, we used the method .name
to get the tag’s name and the function .attrs
, which returns all tag attributes as a dictionary.
Another useful function is .get_text()
, which extracts all the raw text from the website without HTML tags.
The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file.
In a similar way you can also get only text in the extracted HTML tags using the function .get_text()
or .string
:
print(soup.h1.string) print(soup.h1.get_text())
If a tag contains more than one thing (or nothing), it is unclear what .string
should refer to, so the function returns None
.
Tarefa
Here you will work on the same page about Christ the Redeemer as in the previous task.
- Import the
BeautifulSoup
library. - Print the attributes of the
p
tag. - Print only the text of the
ul
tags.
Obrigado pelo seu feedback!
Continue exploring BeautifulSoup
let’s learn some important functions! We can extract not only tag but also their parts (for example, names or attributes):
print(soup.div.name) print(soup.div.attrs)
In the code, we used the method .name
to get the tag’s name and the function .attrs
, which returns all tag attributes as a dictionary.
Another useful function is .get_text()
, which extracts all the raw text from the website without HTML tags.
The output of the page will contain a lot of extra blank lines. It happened because of newline characters in the initial HTML file.
In a similar way you can also get only text in the extracted HTML tags using the function .get_text()
or .string
:
print(soup.h1.string) print(soup.h1.get_text())
If a tag contains more than one thing (or nothing), it is unclear what .string
should refer to, so the function returns None
.
Tarefa
Here you will work on the same page about Christ the Redeemer as in the previous task.
- Import the
BeautifulSoup
library. - Print the attributes of the
p
tag. - Print only the text of the
ul
tags.