Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Selectors | CSS Selectors/XPaths
Web Scraping with Python (res)
course content

Conteúdo do Curso

Web Scraping with Python (res)

Web Scraping with Python (res)

1. HTML Files and DevTools
2. Beautiful Soup
3. CSS Selectors/XPaths
4. Tables

Selectors

We already know how to define any part of the HTML file you want. However, how can we extract the text using the XPaths? For this purpose, you can use the scrapy library to import Selector, which helps to select the data. Then create a selector object declaring the parameter text as the HTML file you want to work with:

Here the first line imports needed packages, and the next one creates an object for work.

To get any part of the HTML file you want, just use the function .xpath() of the selector object and your path as the parameter:

12
title = sel.xpath("//title") print(title)
copy

The function returns the list of all title tags as selector objects, which could be inconvenient in use. To get any tag you want as a string, just specify the number of the list element and apply the function .extract():

1
print(title[0].extract())
copy

Here we selected the first element of all initially extracted title tags and converted it to the string.

The BeautifulSoup doesn’t provide us with the functions to work with XPaths, like with CSS locators (which we will consider later). However, you should know how the XPaths work as it’s an extremely powerful tool, and a lot of other more common libraries for advanced web-scrappers have the tools to work with XPaths (like lxml library).

Tarefa

Let’s return to our websites. Here we work with the following page. Your should:

  1. Import Selector from scrapy to extract the text using the XPaths.
  2. Get all p tags using XPaths. Save the list of tags in the variable p_tags.
  3. Get the fourth element of the list p_tags as a string and print it.

Tarefa

Let’s return to our websites. Here we work with the following page. Your should:

  1. Import Selector from scrapy to extract the text using the XPaths.
  2. Get all p tags using XPaths. Save the list of tags in the variable p_tags.
  3. Get the fourth element of the list p_tags as a string and print it.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Seção 3. Capítulo 3
toggle bottom row

Selectors

We already know how to define any part of the HTML file you want. However, how can we extract the text using the XPaths? For this purpose, you can use the scrapy library to import Selector, which helps to select the data. Then create a selector object declaring the parameter text as the HTML file you want to work with:

Here the first line imports needed packages, and the next one creates an object for work.

To get any part of the HTML file you want, just use the function .xpath() of the selector object and your path as the parameter:

12
title = sel.xpath("//title") print(title)
copy

The function returns the list of all title tags as selector objects, which could be inconvenient in use. To get any tag you want as a string, just specify the number of the list element and apply the function .extract():

1
print(title[0].extract())
copy

Here we selected the first element of all initially extracted title tags and converted it to the string.

The BeautifulSoup doesn’t provide us with the functions to work with XPaths, like with CSS locators (which we will consider later). However, you should know how the XPaths work as it’s an extremely powerful tool, and a lot of other more common libraries for advanced web-scrappers have the tools to work with XPaths (like lxml library).

Tarefa

Let’s return to our websites. Here we work with the following page. Your should:

  1. Import Selector from scrapy to extract the text using the XPaths.
  2. Get all p tags using XPaths. Save the list of tags in the variable p_tags.
  3. Get the fourth element of the list p_tags as a string and print it.

Tarefa

Let’s return to our websites. Here we work with the following page. Your should:

  1. Import Selector from scrapy to extract the text using the XPaths.
  2. Get all p tags using XPaths. Save the list of tags in the variable p_tags.
  3. Get the fourth element of the list p_tags as a string and print it.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

Seção 3. Capítulo 3
toggle bottom row

Selectors

We already know how to define any part of the HTML file you want. However, how can we extract the text using the XPaths? For this purpose, you can use the scrapy library to import Selector, which helps to select the data. Then create a selector object declaring the parameter text as the HTML file you want to work with:

Here the first line imports needed packages, and the next one creates an object for work.

To get any part of the HTML file you want, just use the function .xpath() of the selector object and your path as the parameter:

12
title = sel.xpath("//title") print(title)
copy

The function returns the list of all title tags as selector objects, which could be inconvenient in use. To get any tag you want as a string, just specify the number of the list element and apply the function .extract():

1
print(title[0].extract())
copy

Here we selected the first element of all initially extracted title tags and converted it to the string.

The BeautifulSoup doesn’t provide us with the functions to work with XPaths, like with CSS locators (which we will consider later). However, you should know how the XPaths work as it’s an extremely powerful tool, and a lot of other more common libraries for advanced web-scrappers have the tools to work with XPaths (like lxml library).

Tarefa

Let’s return to our websites. Here we work with the following page. Your should:

  1. Import Selector from scrapy to extract the text using the XPaths.
  2. Get all p tags using XPaths. Save the list of tags in the variable p_tags.
  3. Get the fourth element of the list p_tags as a string and print it.

Tarefa

Let’s return to our websites. Here we work with the following page. Your should:

  1. Import Selector from scrapy to extract the text using the XPaths.
  2. Get all p tags using XPaths. Save the list of tags in the variable p_tags.
  3. Get the fourth element of the list p_tags as a string and print it.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo

Tudo estava claro?

We already know how to define any part of the HTML file you want. However, how can we extract the text using the XPaths? For this purpose, you can use the scrapy library to import Selector, which helps to select the data. Then create a selector object declaring the parameter text as the HTML file you want to work with:

Here the first line imports needed packages, and the next one creates an object for work.

To get any part of the HTML file you want, just use the function .xpath() of the selector object and your path as the parameter:

12
title = sel.xpath("//title") print(title)
copy

The function returns the list of all title tags as selector objects, which could be inconvenient in use. To get any tag you want as a string, just specify the number of the list element and apply the function .extract():

1
print(title[0].extract())
copy

Here we selected the first element of all initially extracted title tags and converted it to the string.

The BeautifulSoup doesn’t provide us with the functions to work with XPaths, like with CSS locators (which we will consider later). However, you should know how the XPaths work as it’s an extremely powerful tool, and a lot of other more common libraries for advanced web-scrappers have the tools to work with XPaths (like lxml library).

Tarefa

Let’s return to our websites. Here we work with the following page. Your should:

  1. Import Selector from scrapy to extract the text using the XPaths.
  2. Get all p tags using XPaths. Save the list of tags in the variable p_tags.
  3. Get the fourth element of the list p_tags as a string and print it.

Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Seção 3. Capítulo 3
Mude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
We're sorry to hear that something went wrong. What happened?
some-alt