Let us see how using a multilevel model can help us accomplish this.
Firstly, the relevant libraries and dataset are imported.
# Import Librarieslibrary(lme4)library(ggplot2)library(reshape2)library(dplyr)library(data.
table)# Load data and convert to numericsetwd("yourdirectory")mydata<-read.
csv")attach(mydata)From this dataset, we are importing General Purchases across different agencies (as identified by their Agency Number), along with the Amount data (it is being assumed that all the positive values represent the purchases from these vendors).
The Vendor variable is converted into numeric format and the data frame is formulated once again:Vendor<-as.
frame(mydata,Vendor)attach(mydata)The multilevel model is formulated, and the conditional modes of the random effects are extracted using ranef.
mlevel <- lmer(Amount ~ 1 + (1|Vendor.
1),mydata)ranef(mlevel)Here are the regression results:> mlevelLinear mixed model fit by REML ['lmerMod']Formula: Amount ~ 1 + (1 | Vendor.
1) Data: mydataREML criterion at convergence: 4967261Random effects: Groups Name Std.
1 (Intercept) 4616 Residual 5910 Number of obs: 244051, groups: Vendor.
1, 39789Fixed Effects:(Intercept) 574.
2For the purchase data, the fixed and random effects are added together, and a plot of purchases for the last 20 observations are formulated.
# Average sales (amount) by vendorpurchases <- fixef(mlevel) + ranef(mlevel)$Vendor.
1<-rownames(purchases)names(purchases)<-"Intercept"purchases <- purchases[,c(2,1)]# plotggplot(purchases[39750:39770,],aes(x=Vendor.
1,y=Intercept))+geom_point()Now that the observed data has been generated, 20 simulations will be run to generate predictions for the 20 hypothetical new vendors — i.
what sales could a new vendor to this market expect?The fixed intercept is added to a random number with a standard deviation of 200:# Simulation – 20 new vendorsnew_purchases <- data.
1 = as.
character(39800:39819), Intercept= fixef(mlevel)+rnorm(20,0,200),Status="Simulated")purchases$Status <- "Observed"purchases2 <- rbind(purchases,new_purchases)Now, the simulated amounts can be plotted against observed amounts to determine potential vendor sales:# Plot simulated vs observedggplot(purchases2[39709:39809,],aes(x=Vendor.
1,y=Intercept,color=Status))+ geom_point()+ geom_hline(aes(yintercept = fixef(mlevel),linewidth=1.
5))We can see that the simulated sales are more or less in line with that observed from the actual data.
As mentioned, the advantage of a multilevel model is the fact that differences across levels are taken into account when running the model, and this helps us avoid the issue of significantly different trends across levels ultimately yielding a “one size fits all” result from a standard linear regression.
ConclusionIn this example, we have seen:How to implement a multilevel model in RThe advantages of these models in modelling data with multiple categoriesRunning simulations with the modelYou can also find another example of how to run a multilevel model here.
Thank you for your time!.You can also feel free to view further data science and machine learning content at michaeljgrogan.