Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Learn Applying String Methods | Getting Acquainted with HTML
Web Scraping with Python

bookApplying String Methods

What can you do with the page you have read? Since it is a string, you can use any string method. For example, the .find() method returns the index of the first occurrence of a specific element. You can use it to locate the page title by finding the indexes of the opening and closing tags and considering the length of the closing tag.

1234567891011121314
# Importing the module from urllib.request import urlopen # Opening web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html" page = urlopen(url) # Reading and decoding web_page = page.read().decode("utf-8") # Indexes of opening and closing title tags start = web_page.find("<title") finish = web_page.find("</title>") + len("</title>") print(web_page[start:finish])
copy

As shown in the example above, two variables, start and finish, were created. The start variable stores the index of the first character within the opening <title> tag, while the finish variable stores the index of the character right after the closing </title> tag. The .find() method returns the starting index of the closing tag, so the tag’s length is added to get the final position.

Note
Study More

List slicing excludes the last element, which is why the next character after the closing tag is used.

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 10

Ask AI

expand

Ask AI

ChatGPT

Ask anything or try one of the suggested questions to begin our chat

Suggested prompts:

Can you explain how the .find() method works in more detail?

What other string methods can I use to process the web page content?

Can you show how to extract just the text inside the <title> tags?

Awesome!

Completion rate improved to 4.35

bookApplying String Methods

Swipe to show menu

What can you do with the page you have read? Since it is a string, you can use any string method. For example, the .find() method returns the index of the first occurrence of a specific element. You can use it to locate the page title by finding the indexes of the opening and closing tags and considering the length of the closing tag.

1234567891011121314
# Importing the module from urllib.request import urlopen # Opening web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html" page = urlopen(url) # Reading and decoding web_page = page.read().decode("utf-8") # Indexes of opening and closing title tags start = web_page.find("<title") finish = web_page.find("</title>") + len("</title>") print(web_page[start:finish])
copy

As shown in the example above, two variables, start and finish, were created. The start variable stores the index of the first character within the opening <title> tag, while the finish variable stores the index of the character right after the closing </title> tag. The .find() method returns the starting index of the closing tag, so the tag’s length is added to get the final position.

Note
Study More

List slicing excludes the last element, which is why the next character after the closing tag is used.

Everything was clear?

How can we improve it?

Thanks for your feedback!

SectionΒ 1. ChapterΒ 10
some-alt