A Simple Example of Refactoring with the R Magrttr packageAlexis Idlette-WilsonBlockedUnblockFollowFollowingJan 12Photo by Lidya Nada on UnsplashConfession time.
I used to be a little bit of a tutorial junkie.
If it said “Introduction to,” I was there for it.
The problem with tutorials though is that they continually tell you what to do next.
Of course one needs that at first (I assume) but the real learning occurs when you pick up your own project and try to apply the concepts from the tutorial without direction.
For this reason, I feel delighted when I find a real world situation that allows me to gingerly connect those dots I picked up from my most recent online class.
Like most R programmers, I dutifully downloaded the magrittr package with the promise of syntactically efficient chains of code.
I copied and pasted several dozen “%>%” pipes before I actually conceived an original thought to utilize one outside of a pre-written exercise.
I found a small data set of wholesale client data that I thought was perfect to analyze with the factoextra package for cluster analysis.
The data set was minimal; it listed the revenue from each wholesale customer in six different sectors: Fresh, Milk, Grocery, Frozen, Detergents_Paper and Delicassen.
In this imaginary world, I am an international grocer.
To analyze the characteristics of the resulting clusters, I needed values that possessed some comparative value.
The obvious solution was to calculate each sector’s proportion of revenue for each client.
After pulling the data into a data.
frame, I wrote the following code initially.
data2 <- mutate(data1,totalSpend = (Fresh+Milk+Grocery+Frozen+Detergents_Paper+Delicassen), FreshSpend = (Fresh/totalSpend), MilkSpend = (Milk/totalSpend), GrocerySpend = (Grocery/totalSpend), FrozenSpend = (Frozen/totalSpend), DPSpend = (Detergents_Paper/totalSpend), DelicassenSpend = (Delicassen/totalSpend))# View(data2)dataX <- scale(data2[9:15])Yes, when I wrote these lines I knew I had committed the cardinal coding sin of repeating myself.
In the name of progress, I did what came to me at that moment and then I repented.
The above code uses the mutate function of the tidyverse package to add columns to the data set.
I then scaled the data for analysis.
Below see the resulting data table.
A couple of days later, I was reading some technical article and came across the prop.
I had a Homer Simpson moment and proceeded to refactor my code.
That’s when the brain cells way in the back woke up and told me to use a pipe.
Well, you know what happened next.
Ten lines of code became two.
dataProp <- prop.
matrix(data1[3:8]), 1) %>%scale()I created a new table object named dataProp to hold the output of prop.
table function creates a table wherein each cell corresponds to each original cell’s proportion of some value.
With the “1” parameter added, prop.
table calculates each cell’s proportion to the total of the row.
In this case, it calculates the percentage of each customer’s spend attributable to each grocery category.
A deep dive into the prop.
table function can be found here.
table was not happy with my data frame and gave me a very nasty error before I added the as.
matrix function:Error in margin.
table(x, margin) : 'x' is not an arrayAfter using as.
matrix, the line ran without an error and I got the same results as the first table.
Originally, there was more code involved but I realized I didn’t need half of it.
I figured I would share my little triumph with you anyway and discuss all the other bits another time.
Anyone else have those moments when you surprise yourself by actually remembering something?.That may just be an old folks’ thing.