Can python be used for Facebook chat backup? of course, it can!

Enter a directory where you’d like to store your code and runscrapy startproject facebookThis will create a facebook directory with the following contents:facebook/ scrapy.cfg # deploy configuration file facebook/ # project's Python module, you'll import your code from here # project items definition file # project middlewares file # project pipelines file # project settings file spiders/ # a directory where you'll later put your spiders __init__.pyDon’t worry if you don’t get, what’s going around, we won’t be worrying about most of them here, only changes we do are write a spider for scraping the content and changing ROBOTSTXT_OBEY = False in file as Facebook won't allow bots to log in..You can learn more about robots.txt here.Let’s build our spiderCreate a python file under the spiders directory, then import scrapy, pandas, and FormRequest which we’ll be using to feed the credentials for logging in.Here fb_text is the name of our spider, we can write any number of spiders under our spider directory which might serve different purpose, in our case can we can write one for scraping posts and comments etc..Each spider should have its unique name.Then we are getting the credentials for logging in through the terminal, where we’ll be running our spiderscrapy crawl fb_text -a email=”FB USER EMAIL” -a password=”FB USER PASSWORD ”After having the credentials we feed it to FormRequest, it will fill the form (user_email and password) in the start_urls ‘' and returns the home page.Let’s add some SuperPowers to our spiderDone defining the structure, time to give some superpowers to our spider..One is that it should able to crawl through pages to fetch contents and other is to take scrape the content/data.Request function will send the response to the callback function, In our case, we reach the messages page then fetch the people we had conversation and their links..From that list, we’ll scrape one.Above is the core part of the spider, it fetches the conversation between the entities with time and writes it out in a csv file..Full spider file can be found here.For simplicity and easier understanding, is not used for storing data.How to useMake sure to clone this repository, if you skipped the previous part..Navigate through the project’s top-level directory and launch scrapy with:scrapy crawl fb -a email="EMAILTOLOGIN" -a password="PASSWORDTOLOGIN"This will give last 10 recent conversations, from that select the conversation to be scrapped, the bot/spider will scrape the conversations till very last text in that conversation and return a csv file with columns ->Name, Text, Date..Check out the sample below.Road ends here.Github repoIn the PipelineData is the source for solving any ml/ai problem, not every time we end having a well-structured data..Here’s where web scraping comes handy, with which we can scrape/fetch data from the website..Tutorials on web scraping from basics will be posted in future, make sure to follow and support.. More details

Leave a Reply