As the elements that we search for may have a regular style, they are likely to have the same CSS classes.
On this first web page, I want to find a wrapper that contains all the relevant information about a single item.
When I navigate to the web page, and right-click and select inspect, I can see that the wrapper I am interested in has a li tag and a series of classes as shown below:<li class=”Grid-col u-size-1–4-l u-size-3–12-m u-size-6–12 u-size-1–5-xl”To help find all of these list items whilst I am in the browser, I can switch across to the console tab in the developers tool in Chrome.
When I am here, I can write a test query.
To do so, I write two dollar signs, $$ and input my query in quotations within parentheses as such:$$(‘li.
u-size-1–5-xl’)This brings up a list with a length of 40 as shown in the brackets above.
I quickly validate that each list item corresponds to the appropriate beach ball on the web page by scrolling through the list items and visually inspecting the corresponding highlights in the browser.
A working practice I encourage, is to check the last item in the list and check whether it is the last item you are interested in on the page.
In our case it is!If this was not the case, you could again try different CSS queries in the console developer tool.
This is a good practice to adopt, as we can perform initial validation in the browser, not within our Python Scripts.
It is also very easy to see, because we can look at the CSS query and the browser page at the same time!We can now pass this query directly into the .
I create a sensibly named variable called beach_balls, which points to a list of all the beach balls found.
We can now iterate over this beach_balls list.
print(type(beach_balls))<class 'list'>Using this same approach I can write a simple for loop to extract the information I am interested in.
Here, I use the find_element_by_css_selector (I have used element, not elements) to find the tags and classes pertaining to the other pieces of information that are contained within the original wrapper.
When I find the appropriate elements for the desc ect, I use the .
text method to extract the text and the lstrip method for simple clean-up on the string.
This code is working well, but the really useful aspect of web scraping is the automation it provides.
To demonstrate, I will scrape just 2 pages with Beach balls from Walmart (although we could easily scrape data from many more pages).
I have written a while loop, that will iterate twice, based on the condition provided.
It is important to note that not all the information is available for every item.
For example, the shipping information is missing in a few instances.
Normally, when you encountered this situation when writing a scraper, you should write a condition within the for loop, so all information matches up.
However, this Walmart page is organised as such, that when information is missing, empty padding fills the space.
This means no conditional checks are requited within the for loop in this instance, but be careful, this will not always be the case!Multiple Page ScrapeWith each iteration, I add the relevant item to the appropriate list.
At the end of the first iteration of the for loop, I click onto the next page.
I find the tag and classes which correspond to the next page and use the .
click() method to navigate onto it.
The script should end on the third page if everything has worked as intended.
There should have been two iteration according to my while condition.
The script has worked as intended.
Below, the third page has loaded, as show by the green icon at the bottom of the web page.
Finally, I will write the output to a CSV file, by zipping my lists together and giving them sensible names by using the pandas Dataframe method.
This tutorial has focused on using selenium to scrape from one website across many of it’s pages.
If we wanted, we could scrape from different websites and begin constructing a price comparison model for our beach balls, or any other item we may be interested in.
But to conclude, go and get yourself a beach ball and head down the beach!.. More details