Contenido del Curso
Web Scraping with Python (res)
Web Scraping with Python (res)
First Steps
The web page is comprised of HTML.
HTML is the markup language for creating web pages.
Let’s get the HTML file of the web page!
We can work with the data of sites by their URLs. To open website URLs in your Python programs, use the function urlopen()
from the module urllib.request
and define the URL you want to open as a string variable:
from urllib.request import urlopen url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url)
It looks simple, but when we want to print the variable page to see what is going on in the HTML file, we get:
It returns HTTPResponse object
. To parse it, use the .read()
method, which returns a sequence of bytes, and then the function decode("utf-8")
to decode the data from bytes to string
bytes = page.read() html = bytes.decode("utf-8") print(html)
We can also use methods consequentially: page.read().decode("utf-8")
.
Here we open the URL we will work within this course!
Tarea
Write the missing code to get the HTML structure from the page which interests you.
- Import module
urlopen
to open URLs from your code. - Open the URL. Assign the result to the variable
page
. - Get a sequence of bytes using the method
.read()
. Assign the result to the variablebytes
. - Decode bytes to string using the method
.decode()
. Assign the result to the variablehtml
.
¡Gracias por tus comentarios!
First Steps
The web page is comprised of HTML.
HTML is the markup language for creating web pages.
Let’s get the HTML file of the web page!
We can work with the data of sites by their URLs. To open website URLs in your Python programs, use the function urlopen()
from the module urllib.request
and define the URL you want to open as a string variable:
from urllib.request import urlopen url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url)
It looks simple, but when we want to print the variable page to see what is going on in the HTML file, we get:
It returns HTTPResponse object
. To parse it, use the .read()
method, which returns a sequence of bytes, and then the function decode("utf-8")
to decode the data from bytes to string
bytes = page.read() html = bytes.decode("utf-8") print(html)
We can also use methods consequentially: page.read().decode("utf-8")
.
Here we open the URL we will work within this course!
Tarea
Write the missing code to get the HTML structure from the page which interests you.
- Import module
urlopen
to open URLs from your code. - Open the URL. Assign the result to the variable
page
. - Get a sequence of bytes using the method
.read()
. Assign the result to the variablebytes
. - Decode bytes to string using the method
.decode()
. Assign the result to the variablehtml
.
¡Gracias por tus comentarios!
First Steps
The web page is comprised of HTML.
HTML is the markup language for creating web pages.
Let’s get the HTML file of the web page!
We can work with the data of sites by their URLs. To open website URLs in your Python programs, use the function urlopen()
from the module urllib.request
and define the URL you want to open as a string variable:
from urllib.request import urlopen url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url)
It looks simple, but when we want to print the variable page to see what is going on in the HTML file, we get:
It returns HTTPResponse object
. To parse it, use the .read()
method, which returns a sequence of bytes, and then the function decode("utf-8")
to decode the data from bytes to string
bytes = page.read() html = bytes.decode("utf-8") print(html)
We can also use methods consequentially: page.read().decode("utf-8")
.
Here we open the URL we will work within this course!
Tarea
Write the missing code to get the HTML structure from the page which interests you.
- Import module
urlopen
to open URLs from your code. - Open the URL. Assign the result to the variable
page
. - Get a sequence of bytes using the method
.read()
. Assign the result to the variablebytes
. - Decode bytes to string using the method
.decode()
. Assign the result to the variablehtml
.
¡Gracias por tus comentarios!
The web page is comprised of HTML.
HTML is the markup language for creating web pages.
Let’s get the HTML file of the web page!
We can work with the data of sites by their URLs. To open website URLs in your Python programs, use the function urlopen()
from the module urllib.request
and define the URL you want to open as a string variable:
from urllib.request import urlopen url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/page.html" page = urlopen(url)
It looks simple, but when we want to print the variable page to see what is going on in the HTML file, we get:
It returns HTTPResponse object
. To parse it, use the .read()
method, which returns a sequence of bytes, and then the function decode("utf-8")
to decode the data from bytes to string
bytes = page.read() html = bytes.decode("utf-8") print(html)
We can also use methods consequentially: page.read().decode("utf-8")
.
Here we open the URL we will work within this course!
Tarea
Write the missing code to get the HTML structure from the page which interests you.
- Import module
urlopen
to open URLs from your code. - Open the URL. Assign the result to the variable
page
. - Get a sequence of bytes using the method
.read()
. Assign the result to the variablebytes
. - Decode bytes to string using the method
.decode()
. Assign the result to the variablehtml
.