Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
CSS Selectors in BeautifulSoup | CSS Selectors/XPaths
Web Scraping with Python (res)
course content

Contenido del Curso

Web Scraping with Python (res)

Web Scraping with Python (res)

1. HTML Files and DevTools
2. Beautiful Soup
3. CSS Selectors/XPaths
4. Tables

bookCSS Selectors in BeautifulSoup

To extract the data with the already written CSS Selectors, you can use Selector from the scrapy library. However, we will consider another way to work with CSS Selectors using the library BeautifulSoup from the previous section. To select the data from the file, use the function .select() of the already created BeautifulSoup object:

12
сss_locator = "html > body > div" print(soup.select(сss_locator))
copy

We know how to navigate through HTML files using attributes. However, we can select all elements with a specified class or id without the tag’s name or path. For example:

12
print(soup.select("#id-1")) print(soup.select(".class-1"))
copy

In the first line, we select all elements with the id equal to id-1. In the second line, CSS Selector navigates to all tags that belong to the class-1.

You can also go through all elements of your class with for loop:

1234
for link in soup.select(".class-link > a"): page = urlopen(link) html = page.read().decode("utf-8") new_soup = BeautifulSoup(html, "html.parser")
copy

Here we go through all the links of the class class-link and create BeautifulSoup object for each new page.

Keep in mind that instead of urllib.request library you can send every time get request (request for seeing a webpage) to the page using the library requests and .content() function to convert the page to the HTML format:

1234
import requests page_response = requests.get(url) page = page_response.content
copy

Tarea

Go through all the links on the main webpage, get their HTML code, and print the titles of each page. Here we will first go through all tags, saving them into a list, and then go through all href attributes of extracted tags to get all URLs of the pages.

  1. Import the library for opening URLs.
  2. Select all a tags using the method .select() and CSS Selector as the parameter. Assign the result to the variable a_tags.
  3. Create the empty list links.
  4. Go through the list a_tags with the for loop and the variable a to extract the attributes href and add them to the empty list links.
  5. During running through each webpage by its link, get the title tag using the method .title of the BeautifulSoup object and print it.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 5
toggle bottom row

bookCSS Selectors in BeautifulSoup

To extract the data with the already written CSS Selectors, you can use Selector from the scrapy library. However, we will consider another way to work with CSS Selectors using the library BeautifulSoup from the previous section. To select the data from the file, use the function .select() of the already created BeautifulSoup object:

12
сss_locator = "html > body > div" print(soup.select(сss_locator))
copy

We know how to navigate through HTML files using attributes. However, we can select all elements with a specified class or id without the tag’s name or path. For example:

12
print(soup.select("#id-1")) print(soup.select(".class-1"))
copy

In the first line, we select all elements with the id equal to id-1. In the second line, CSS Selector navigates to all tags that belong to the class-1.

You can also go through all elements of your class with for loop:

1234
for link in soup.select(".class-link > a"): page = urlopen(link) html = page.read().decode("utf-8") new_soup = BeautifulSoup(html, "html.parser")
copy

Here we go through all the links of the class class-link and create BeautifulSoup object for each new page.

Keep in mind that instead of urllib.request library you can send every time get request (request for seeing a webpage) to the page using the library requests and .content() function to convert the page to the HTML format:

1234
import requests page_response = requests.get(url) page = page_response.content
copy

Tarea

Go through all the links on the main webpage, get their HTML code, and print the titles of each page. Here we will first go through all tags, saving them into a list, and then go through all href attributes of extracted tags to get all URLs of the pages.

  1. Import the library for opening URLs.
  2. Select all a tags using the method .select() and CSS Selector as the parameter. Assign the result to the variable a_tags.
  3. Create the empty list links.
  4. Go through the list a_tags with the for loop and the variable a to extract the attributes href and add them to the empty list links.
  5. During running through each webpage by its link, get the title tag using the method .title of the BeautifulSoup object and print it.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 5
toggle bottom row

bookCSS Selectors in BeautifulSoup

To extract the data with the already written CSS Selectors, you can use Selector from the scrapy library. However, we will consider another way to work with CSS Selectors using the library BeautifulSoup from the previous section. To select the data from the file, use the function .select() of the already created BeautifulSoup object:

12
сss_locator = "html > body > div" print(soup.select(сss_locator))
copy

We know how to navigate through HTML files using attributes. However, we can select all elements with a specified class or id without the tag’s name or path. For example:

12
print(soup.select("#id-1")) print(soup.select(".class-1"))
copy

In the first line, we select all elements with the id equal to id-1. In the second line, CSS Selector navigates to all tags that belong to the class-1.

You can also go through all elements of your class with for loop:

1234
for link in soup.select(".class-link > a"): page = urlopen(link) html = page.read().decode("utf-8") new_soup = BeautifulSoup(html, "html.parser")
copy

Here we go through all the links of the class class-link and create BeautifulSoup object for each new page.

Keep in mind that instead of urllib.request library you can send every time get request (request for seeing a webpage) to the page using the library requests and .content() function to convert the page to the HTML format:

1234
import requests page_response = requests.get(url) page = page_response.content
copy

Tarea

Go through all the links on the main webpage, get their HTML code, and print the titles of each page. Here we will first go through all tags, saving them into a list, and then go through all href attributes of extracted tags to get all URLs of the pages.

  1. Import the library for opening URLs.
  2. Select all a tags using the method .select() and CSS Selector as the parameter. Assign the result to the variable a_tags.
  3. Create the empty list links.
  4. Go through the list a_tags with the for loop and the variable a to extract the attributes href and add them to the empty list links.
  5. During running through each webpage by its link, get the title tag using the method .title of the BeautifulSoup object and print it.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

To extract the data with the already written CSS Selectors, you can use Selector from the scrapy library. However, we will consider another way to work with CSS Selectors using the library BeautifulSoup from the previous section. To select the data from the file, use the function .select() of the already created BeautifulSoup object:

12
сss_locator = "html > body > div" print(soup.select(сss_locator))
copy

We know how to navigate through HTML files using attributes. However, we can select all elements with a specified class or id without the tag’s name or path. For example:

12
print(soup.select("#id-1")) print(soup.select(".class-1"))
copy

In the first line, we select all elements with the id equal to id-1. In the second line, CSS Selector navigates to all tags that belong to the class-1.

You can also go through all elements of your class with for loop:

1234
for link in soup.select(".class-link > a"): page = urlopen(link) html = page.read().decode("utf-8") new_soup = BeautifulSoup(html, "html.parser")
copy

Here we go through all the links of the class class-link and create BeautifulSoup object for each new page.

Keep in mind that instead of urllib.request library you can send every time get request (request for seeing a webpage) to the page using the library requests and .content() function to convert the page to the HTML format:

1234
import requests page_response = requests.get(url) page = page_response.content
copy

Tarea

Go through all the links on the main webpage, get their HTML code, and print the titles of each page. Here we will first go through all tags, saving them into a list, and then go through all href attributes of extracted tags to get all URLs of the pages.

  1. Import the library for opening URLs.
  2. Select all a tags using the method .select() and CSS Selector as the parameter. Assign the result to the variable a_tags.
  3. Create the empty list links.
  4. Go through the list a_tags with the for loop and the variable a to extract the attributes href and add them to the empty list links.
  5. During running through each webpage by its link, get the title tag using the method .title of the BeautifulSoup object and print it.

Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
Sección 3. Capítulo 5
Switch to desktopCambia al escritorio para practicar en el mundo realContinúe desde donde se encuentra utilizando una de las siguientes opciones
some-alt