5 Lines of Code to Convince You to Learn RA brief treatise for those on the fenceIsaac FaberBlockedUnblockFollowFollowingFeb 18All of the code supporting this article can be forklifted from this MatrixDS project.
Some good advice for data scientists (or really anyone).
If you write the same code more than once; create a function.
If you give the same advice more than once; write a blog post.
So here it is.
This week for the thousandth time in my life I found myself convincing someone to learn R (an open source statistical programming language).
This conversation can be easy when someone is a business analyst that uses Excel every day.
If you struggle through a complicated set of spreadsheets, it is quite apparent how writing code is superior.
It’s readable, reproducible, portable, and scalable.
However, most objections to R come from the developer and computer science communities.
I move in between Python and R day-to-day and project-to-project.
As such, some of R’s syntax quirks don’t quite line up with the expectations of these communities.
But concern over quirks is not a good reason to avoid a powerful tool.
So for those on the fence about learning R, be you a business analyst or developer, here is my best attempt to convince you in the next 5 minutes.
Yes, R is designed for data, data science, and statistics but it is also extremely flexible and robust.
Here are five simple lines of code (or rather, functions) that represent how awesome the language is.
packages('any_package_on_CRAN')Inside of every R session is a doorway that accesses the work of thousands of community members.
This line of code installs (from within R) packages from CRAN (Comprehensive R Archive Network), a network of volunteer servers around the world.
Official R logo used by CRANCRAN hosts a collection of totally open source and free ‘packages’ (libraries) that other R users have written.
Most importantly, you are directly connected to a massive network of engineers, statisticians, data scientists, scientists, and developers.
These packages extend the R language in almost every way possible.
They are meticulously validated (with a kind of hybrid automated peer review process) before they get to CRAN so you know they will work across all sorts of platforms.
If you want to do something with data, there is probably a package for that.
Some fun examples include:Web scraping: httr, rvestSocial Media: twitteR, Rfacebook, RlinkedinBusiness Ops: salesforcer, gmailr, officerFinance: tidyquant, empirical financeCloud storage: Dropbox, googledrive, BoxMaps: maps, leafletDeep learning: keras, tensorflowI am just scratching the surface with this list.
With the popularity of R, the number of available packages has grown exponentially.
Currently, there are nearly 14k packages available.
If the usefulness of a programming language can be judged on the supporting libraries, R exceeds the wildest expectations.
Line 2library(tidyverse)Of all the packages available for R the most useful for the basic tasks of data science and analytics is the tidyverse.
Popular Hex Design for tidyverse R PackagesWritten, in part, by Hadley Wickham, the tidyverse is a collection of packages that make common data science tasks simple, elegant, reproducible, and fast.
The beauty of these packages is that they all share a common grammar and design so working with them is intuitive and creates significant efficiency.
The core packages enable moving through most of the data science process with ease.
Reading data into R: readr, databasesMunging and wrangling data: dplyr, tidyr, tibble, stringrVisualizing data: ggplot2One of the biggest problems with programming languages is that it can be quite difficult to read someone else’s code.
However, when using the tidyverse and core principals of tidy data, this common pain point melts away.
Common sense verbs and basic code styling make interpreting others work a breeze.
Line 3my_linear_model <- lm(formula = dist ~ speed, data = cars)The first two lines of code are all about the vast ecosystem around R.
However, base R (what you get without any external packages) is terrific in its own right.
Available without importing any external libraries, this line of code is a vectorized (i.
super fast) linear regression.
You can run this code on a data set with millions of rows and hundreds of columns.
As long as you don’t run out of memory on your computer, it will be extremely efficient.
Summary statistics, Monte Carlo simulation, matrix operations, and generalized linear regression models, are just a few examples of what you get with base R out-of-the-box.
What’s more, is that these models are run rigorously and are ready for publication, complete with an interpretable output.
For example:summary(my_linear_model) # model summary#> Call:#> lm(formula = dist ~ speed, data = cars)#> #> Residuals:#> Min 1Q Median 3Q Max #> -29.
201 #> #> Coefficients:#> Estimate Std.
Error t value Pr(>|t|) #> (Intercept) -17.
0123 * #> speed 3.
49e-12 ***#> —#> Signif.
codes: 0 '***' 0.
001 '**' 0.
01 '*' 0.
1 ' ' 1#> #> Residual standard error: 15.
38 on 48 degrees of freedom#> Multiple R-squared: 0.
6511, Adjusted R-squared: 0.
6438 #> F-statistic: 89.
57 on 1 and 48 DF, p-value: 1.
49e-12The results can then be used for analysis or applications.
The major point here is that R is designed to do data science and statistics from the bottom up; whereas other languages, like Python, rely on external libraries (not that there is anything wrong with that!).
This really shines through when using it.
Line 4knit()Every data science project requires some documentation.
At some point, you need to take that hard-earned data-driven insight and present it to someone else.
There are few better ways to do this than with RMarkdown.
RMarkdown uses knitr to create a powerful documentation systemRMarkdowns basic concept is that it ties together simple markdown syntax with R code ‘chunks.
’ This results in a powerful documentation system.
The process is straight forward; ‘knit’ the R chunks and render the markdown to whatever format you want (html, pdf, word, etc.
This means that your code is immediately ready for presentation in many formats:Reports/PapersSlidesWebsitesInteractive DashboardsBooksNo other language has this type of support from within its own ecosystem.
You can learn R and then literally be ready to write and publish a book!Line 5runApp()Many data science projects do not end with a static output (like we get from RMarkdown).
Instead, you want to deliver something that integrates into your customer’s operations.
Enter Shiny, the web application framework for R.
When making the leap from analysis to analytics nothing is more accessible than Shiny.
I have worked on dozens of projects, and the most critical ones involve some type of automation.
For example, sometimes business problems require ongoing solutions; you can use Shiny to create a problem specific app to leave with your customer.
What’s more, you do not have to start from scratch.
There is a great collection of examples.
Example Shiny AppThe best way to get started is to build a simple and beautiful app with shinydashboard.
With just a few lines of R code, you have a fully functional web application just using the runApp() command.
Then it can be hosted at places like MatrixDS or ShinyApp.
Wrapping UpWith high-quality community-driven resources from packages to web applications, R is a must-know language for data science.
Moreover, when you learn R you join a super awesome and fun community.
Here are a few things you can do:Go to a conferenceJoin a meetupFollow Hadley Wickham on TwitterBookmark R BloggersJoin R LadiesIn addition, I would be remiss if I didn’t mention that RStudio, the most popular development environment for R, is a major contributor to the open source movement.
Support for the packages and language have some heavyweight backing across academia and industry.
Therefore, R is not going anywhere.
You might be asking how to get started.
Here are a few resources to get you off on the right foot.
Online CoursesBusiness Science UniversityData Camp Intro to RUdemyOr Start Right NowIf you want to learn R for free at your own pace without having to install anything, just forklift this MatrixDS project and follow the directions.
Connect with me on LinkedIn: https://www.
com/in/isaacfaber/Connect with me on MatrixDS: https://community.