Applying String Methods
What can you do with the page you have read? Since it is a string, you can use any string method. For example, the .find()
method returns the index of the first occurrence of a specific element. You can use it to locate the page title by finding the indexes of the opening and closing tags and considering the length of the closing tag.
1234567891011121314# Importing the module from urllib.request import urlopen # Opening web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html" page = urlopen(url) # Reading and decoding web_page = page.read().decode("utf-8") # Indexes of opening and closing title tags start = web_page.find("<title") finish = web_page.find("</title>") + len("</title>") print(web_page[start:finish])
As shown in the example above, two variables, start
and finish
, were created. The start
variable stores the index of the first character within the opening <title>
tag, while the finish
variable stores the index of the character right after the closing </title>
tag. The .find()
method returns the starting index of the closing tag, so the tagβs length is added to get the final position.
List slicing excludes the last element, which is why the next character after the closing tag is used.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain how the .find() method works in more detail?
What other string methods can I use to process the web page content?
Can you show how to extract just the text inside the <title> tags?
Awesome!
Completion rate improved to 4.35
Applying String Methods
Swipe to show menu
What can you do with the page you have read? Since it is a string, you can use any string method. For example, the .find()
method returns the index of the first occurrence of a specific element. You can use it to locate the page title by finding the indexes of the opening and closing tags and considering the length of the closing tag.
1234567891011121314# Importing the module from urllib.request import urlopen # Opening web page url = "https://codefinity-content-media.s3.eu-west-1.amazonaws.com/18a4e428-1a0f-44c2-a8ad-244cd9c7985e/mother.html" page = urlopen(url) # Reading and decoding web_page = page.read().decode("utf-8") # Indexes of opening and closing title tags start = web_page.find("<title") finish = web_page.find("</title>") + len("</title>") print(web_page[start:finish])
As shown in the example above, two variables, start
and finish
, were created. The start
variable stores the index of the first character within the opening <title>
tag, while the finish
variable stores the index of the character right after the closing </title>
tag. The .find()
method returns the starting index of the closing tag, so the tagβs length is added to get the final position.
List slicing excludes the last element, which is why the next character after the closing tag is used.
Thanks for your feedback!