Intro to XPath
In the previous sections, we used a lot of different methods to find needed data without knowing the HTML way to the tag. To extract desired parts of code describing where the elements exactly are (in your HTML structure), you can use XPath.
XPath provides us with simple Python-friendly syntax to navigate through the HTML file. The notation is familiar with folder organization on your computer. For example, in the syntax of the XPathes / means move forward to one generation, like in the directory hierarchy where the same symbol moves us deeper into the folder's content. For instance:
xpath = "/html/body/div"
Here we define the path to all div tags in body tags of html tags.
To specify which div from a variety of tags you want, enclose its number in square brackets []:
xpath = "/html/head/div[2]"
This path for the following tree will detect the second p of the body (numeration starts from 1):
If you want to detect all p tags everywhere, use //. This symbol is used to pass tags. For example, you want to detect all p blocks but don’t know the whole path to each one:
xpath = "//p"
It also can be used when you know the beginning or part of your path:
xpath = "/html/body//p"
The code above directs all p tags in the body tag.
Danke für Ihr Feedback!
Fragen Sie AI
Fragen Sie AI
Fragen Sie alles oder probieren Sie eine der vorgeschlagenen Fragen, um unser Gespräch zu beginnen
Fragen Sie mich Fragen zu diesem Thema
Zusammenfassen Sie dieses Kapitel
Zeige reale Beispiele
Awesome!
Completion rate improved to 4.76
Intro to XPath
Swipe um das Menü anzuzeigen
In the previous sections, we used a lot of different methods to find needed data without knowing the HTML way to the tag. To extract desired parts of code describing where the elements exactly are (in your HTML structure), you can use XPath.
XPath provides us with simple Python-friendly syntax to navigate through the HTML file. The notation is familiar with folder organization on your computer. For example, in the syntax of the XPathes / means move forward to one generation, like in the directory hierarchy where the same symbol moves us deeper into the folder's content. For instance:
xpath = "/html/body/div"
Here we define the path to all div tags in body tags of html tags.
To specify which div from a variety of tags you want, enclose its number in square brackets []:
xpath = "/html/head/div[2]"
This path for the following tree will detect the second p of the body (numeration starts from 1):
If you want to detect all p tags everywhere, use //. This symbol is used to pass tags. For example, you want to detect all p blocks but don’t know the whole path to each one:
xpath = "//p"
It also can be used when you know the beginning or part of your path:
xpath = "/html/body//p"
The code above directs all p tags in the body tag.
Danke für Ihr Feedback!