@andresvourakis (image on the left), @ilyapavlov (image on the right)Web Scraping Mountain Weather Forecasts using Python and a Raspberry PiExtracting Large Amounts of Data without an APIAndres VourakisBlockedUnblockFollowFollowingApr 11MotivationBefore walking you through the project, let me tell you a little bit about the motivation behind it.
Aside from Data Science and Machine Learning, my other passion is spending time in the mountains.
Planning a trip to any mountain requires lots of careful planning in order to minimize the risks.
That means paying close attention to the weather conditions as the summit day approaches.
My absolutely favorite website for this is Mountain-Forecast.
com which gives you the weather forecasts for almost any mountain in the world at different elevations.
The only problem is that it doesn’t offer any historical data (as far as I can tell), which can sometimes be useful when determining if it is a good idea to make the trip or wait for better conditions.
This problem has been in the back of my mind for a while and I finally decided to do something about it.
Below, I’ll describe how I wrote a web scraper for Mountain-Forecast.
com using Python and Beautiful Soup, and put it into a Raspberry Pi to collect the data on a daily basis.
if you’d rather skip to the code, then check out the repository on GitHub.
Web ScrapingInspecting the WebsiteIn order to figure out which elements I needed to target, I started by inspecting the source code of the page.
This can be easily done by right clicking on the element of interest and selecting inspect.
This brings up the HTML code where we can see the element that each field is contained within.
Weather Forecast TableInspector for Weather Forecast TableLucky for me, the forecast information for every mountain is contained within a table.
The only problem is that each day has multiple sub columns associated with it (i.
AM, PM and night) and so I would need to figure out a way to iterate through them.
In addition, since weather forecasts are provided at different elevations, I would need to extract the link for each one of them and scrape them individually.
Similarly, I inspected the directory containing the URLs for the highest 100 mountains in the United States.
Highest 100 mountains directoryInspector for highest 100 mountains directoryThis seemed like a much easier task since all I needed from the table were the URLs and Mountain Names, in no specific order.
Parsing the Web Page using Beautiful SoupAfter familiarizing myself with the HTML structure of the page, it was time to get started.
My first task was to collect the URLs for the mountains I was interested in.
I wrote a couple of functions to store the information in a dictionary, where the key is Mountain Name and the value is a list of all the URLs associated with it (URLs by elevation).
Then I used the pickle module to serialize the dictionary and save it into a file so that it could be easily retrieved when needed.
Here is the code I wrote to do that:My next task was to collect the weather forecast for each mountain in the dictionary.
I used requests to get the content of the page and beautifulsoup4 to parse it.
As you can see from the code, I manually saved each element of interest into its own variable instead of iterating through them.
It wasn’t pretty but I decided to do it that way since I wasn’t interested in all of the elements (i.
weather maps and freezing scale) and there were a few of them that needed to be handled differently than the rest.
Saving the dataSince my goal was to scrape daily and the forecasts get updated everyday, it was important to figure out a way to update old forecasts instead of creating duplicates and append the new ones.
I used the pandas module to turn the data into a DataFrame (a two-dimensional data structure consisting or rows and columns) and be able to easily manipulate it and then save it as a CSV file.
Here is what the code looks like:Once the data collection and manipulation was done, I ended up with this table:Running the Scraper on Raspberry PiThe Raspberry Pi is a low cost, credit-card sized computer that can be used for a variety of projects like, retro-gaming emulation, home automation, robotics, or in this case, web-scraping.
Running the scraper on the Raspberry Pi can be a better alternative to leaving your personal desktop or laptop running all the time, or investing on a server.
Setting it upFirst I needed to install an Operating System on the Raspberry Pi and I chose Raspbian Stretch Lite, a Debian-based operating system without a graphical desktop, just a terminal.
After installing Raspbian Stretch Lite, I used the the command sudo raspi-config to open up the configuration tool and change the password, expand filesystem, change host name and enable SSH.
Finally, I used sudo apt-get update && sudo apt-get upgrade to make sure everything was up-to-date and proceeded to install all of the dependencies necessary to run my script (i.
Pandas, Beautiful Soup 4, etc…)Automating the scriptIn order to schedule the script to run daily, I used cron, a time-based job scheduler in Unix-like computer operating systems (i.
Ubuntu, Raspbian, macOS, etc…).
Using the following command the script was scheduled to run daily at 10:00 AM.
0 10 * * * /usr/bin/python3 /home/pi/scraper.
pyThis is what the final set-up looks like:Raspberry Pi 3 connected to the internet via EthernetSince SSH is enabled on the Raspberry Pi, I can now easily connect to it via terminal (no need for an extra monitor and keyboard) using my personal laptop or phone and keep an eye on the scraper.
I hope you enjoyed this walk through and it inspired you to code your own Web Scraper using Python and a Raspberry Pi.
If you have any questions or feedback, I’ll be happy to read them in the comments below :).