Python News Scraper with React [Part 1]Edwin CapelBlockedUnblockFollowFollowingFeb 10This Tutorial is meant for complete beginners.
I teach at a coding boot camp and understand the struggles of a coding padawan and would love for beginners to be inspired to pick up coding from this article.
In this tutorial, we will be building a scraper that scrapes all instances of Donald Trump on the news.
We will then save the data into a JSON file.
Finally loading and displaying the data we’ve collected through our react app.
Some context, I made a script that would scrape the top 10 videos from a subreddit and compiled it into a top 10 video compilation.
A friend of mine then approached me and asked if I were able to scrape local news involving fires.
Managed to build the app, and decided that I should make a tutorial on something similar!There will be two parts to this tutorialIn part 1, we will learn how to scrape data that we want through the news sources RSS feed.
In part 2, we will learn how to display the data that we’ve collected using ReactPYTHON RSS SCRAPERTo manage the files that we’re going to work with properly, let’s create a new directory called data.
And include two files, the python script that we will be working with and a JSON file where we will be storing the data we’re scrapingdataapp.
jsonYour directory should look like this now[Pre-requisite]Make sure to have the latest version of python and conda installedCreate conda envFirst, let’s create an environment for our python project, we do this by typing this line in terminalconda create -n trumpTracker numpy python=3.
7In the line above, we’re creating a new python environment called trumpTracker and I’ve specified the version of python I would like to use2.
Activate environmentconda activate trumpTracker3.
These are the libraries that we’re going to use, go ahead and pip install themfeedparserLet’s start coding!Go ahead and open the directory in your favorite code editor (mines vs code)Let’s then open the app.
py script and insert this block of codeFrom line 1 — line 3, we import the python packages that we want to usethen in line 5, we declare an array with the news sources that we scrape from (make sure they are links to the news sources RSS feed)From line 7— line 10, we iterate through the array that we’ve createdFor each news source, we parse the XML.
Inspect the keys.
In our case, we want the title from each articleRun the python script!python app.
pyCheck out all the article titles returned to us, you should see something like thisThe result from the python scriptIf you’re interested in checking out all the keys that are available to us, you can go ahead and include this block of code.
Let’s discuss the pseudocode :check if article.
json exist and is larger than 0 open article.
json load json file into a data variableelse data = empty dictionary data['articles'] = empty dictionary//go through each news source and store if contains trump in summaryfor each source in news source parse source with feedparser for each entry in parsed entry if 'trump' in summary and in entry.
link not in our stored data push entry into data['articles']open article.
json file dump our data['article'] into articles.
json fileThis pseudocode would then translate into this next block of codeFinally, let's make the script scrape from the RSS feed every X minuteswe do this by using a brew formula called watchYou can install watch on brew by entering this line in terminalbrew install watchFinally! Run the python script, and you should see data flowing into your JSON filewatch -n 60 python3 app.
pyExtension to part 1 – store JSON onlineIn this extension, we will be learning how to store our JSON file online so that we can load the data later with our React app.
The service that we’re going to use to host our JSON is this site called JSON Blob, we’re also going to be using a library called requests so that we can make updates to our JSON file that’s hosted on JSON Blob.
*** Blobs that are not accessed in 150 DAYS will be removed.
***First, visit JSON Blob.
Create a new JSON Blob by clicking on New, then save the blob.
Once you have saved it, a modal will pop up.
Make sure to save the link that’s provided to youExample of what you should see after saving the JSON Blob, copy URL providedExplore the JSON Blob API documentation here and discover endpoints that they have provided to us.
In our case, we want to access the update endpoint so that we can make changes to the JSON Blob that we just created.
Let’s discuss the pseudocode:import requestsOnce we're done with collecting data [part 1]define url that we want to make request todefine headersmake update request This pseudocode would then translate to this next block of codeIn Line 5, we import requestsThen in Line 29 -30, we define the URL and headersFinally, in Line 32, we make the update request to JSON Blob.
Run the python script!watch -n 60 python3 app.
pyYAY!We’re done with part 1! In the second part, we will learn how to incorporate this code with React — displaying the data we’ve collected.
Many thanks to aizat for making improvements to the code!Also, Amy for proofreading!Link to Github project here.
If you liked this article, please do share it :)Best Regards,Edwin Capel.