# Creating functions and using lapply in R

Creating functions and using lapply in RMichael Grogan (MGCodesandStats)BlockedUnblockFollowFollowingMay 31Functions are used to simplify a series of calculations.

For instance, let us suppose that there exists an array of numbers which we wish to add to another variable.

Instead of carrying out separate calculations for each number in the array, it would be much easier to simply create a function that does this for us automatically.

A function in R generally works by:a.

Defining the variables to include in the function and the calculation.

e.

g.

to add two numbers together, our function is:function(number1,number2) {(number1+number2)}b.

Using sapply to define the list of numbers and typically an associated sequence:sapply(c(20,40,60), number1addnumber2, seq(2,20,by=2))To see how this works in practice, let us take a look at the examples below:1.

frame(result1)> result1df X1 X2 X31 22 42 622 24 44 643 26 46 664 28 48 685 30 50 706 32 52 727 34 54 748 36 56 769 38 58 7810 40 60 802.

Subtract an array of numbers from a sequence> #Function2> number1minusnumber2<-function(number1,number2) {(number1-number2)}> result2<-sapply(c(20,40,60),number1minusnumber2,seq(2,20,by=2))> result2df<-data.

frame(result2)> result2df X1 X2 X31 18 38 582 16 36 563 14 34 544 12 32 525 10 30 506 8 28 487 6 26 468 4 24 449 2 22 4210 0 20 403.

Multiply an array of numbers by a sequence> #Function3> multiplynumber1bynumber2 <-function(number1,number2) {(number1*number2)}> result3<-sapply(c(20,40,60), multiplynumber1bynumber2, seq(2,20, by=2))> result3df=data.

frame(result3)> result3df X1 X2 X31 40 80 1202 80 160 2403 120 240 3604 160 320 4805 200 400 6006 240 480 7207 280 560 8408 320 640 9609 360 720 108010 400 800 12004.

Divide an array of numbers by a sequence> #Function4> dividenumber1bynumber2<-function(number1,number2) {(number1/number2)}> result4<-sapply(c(20,40,60), dividenumber1bynumber2, seq(2, 20, by=2))> result4df<-data.

frame(result4)> result4df X1 X2 X31 10.

000000 20.

000000 30.

0000002 5.

000000 10.

000000 15.

0000003 3.

333333 6.

666667 10.

0000004 2.

500000 5.

000000 7.

5000005 2.

000000 4.

000000 6.

0000006 1.

666667 3.

333333 5.

0000007 1.

428571 2.

857143 4.

2857148 1.

250000 2.

500000 3.

7500009 1.

111111 2.

222222 3.

33333310 1.

000000 2.

000000 3.

0000005.

Raise number to a power> #Function5> raisenumber1bypower<-function(number1,power) {(number1^power)}> result5<-sapply(c(20,40,60),raisenumber1bypower,seq(0.

5,2.

5,by=0.

5))> result5df<-data.

frame(result5)> result5df X1 X2 X31 4.

472136 6.

324555 7.

7459672 20.

000000 40.

000000 60.

0000003 89.

442719 252.

982213 464.

7580024 400.

000000 1600.

000000 3600.

0000005 1788.

854382 10119.

288513 27885.

4800936.

Create a probability functionThis function calculates a mu variable by using lambda and powers.

You can see more details on how the below function works at my other post.

> #Functionprob> muCalculation <- function(lambda, powers) {1 – ((1 – lambda)^powers)}> probability_at_lambda <- sapply(c(0.

02, 0.

04, 0.

06), muCalculation, seq(0, 100, 10))> probability_at_lambdadf=data.

frame(probability_at_lambda)> probability_at_lambdadf X1 X2 X31 0.

0000000 0.

0000000 0.

00000002 0.

1829272 0.

3351674 0.

46138493 0.

3323920 0.

5579976 0.

70989384 0.

4545157 0.

7061424 0.

84374445 0.

5542996 0.

8046338 0.

91583846 0.

6358303 0.

8701142 0.

95466937 0.

7024469 0.

9136477 0.

97558428 0.

7568774 0.

9425902 0.

98684939 0.

8013511 0.

9618321 0.

992916810 0.

8376894 0.

9746247 0.

996184911 0.

8673804 0.

9831297 0.

9979451Run Functions Across Lists Using lapplyFunctions are very useful when it comes to running a command more than once on particular groups of data.

While we could use a for loop for this purpose, combining a pre-defined function with lapply is a very efficient and useful function in R.

Let’s see how it works.

Suppose that we have two groups of time series (TS1 and TS2).

The objective is to split these two time series and then run an ARIMA forecast on both of them.

Instead of running the ARIMA forecast twice, we wish to use a function to run it repeatedly in a similar manner to a loop.

Firstly, we are defining our dataframe and then splitting it by group:group<-c("TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2")value<-c("23","19","22","31","33","25","32","29","32","34","41","45","47","49","42","44","43","50","55","57","410","395","402","403","401","390","420","415","417","410","425","427","423","430","432","428","410","405","410","414")df1<-data.

frame(group,value)newlist <- split(df1, df1\$group)newlistThen, we are defining our forecast function using auto.

arima:arm <- function(x) plot(forecast(auto.

arima(x\$value),10))We can now use lapply to run the function on the list that has now been split by group:armforecast <- lapply(newlist,arm)Upon doing this, R returns ARIMA plots for both of our time series.

To summarise, we have taken a look at how to create functions in R, and how we can use lapply to replicate a for loop by running a function numerous times on data that has been split according to certain criteria.

Note that you can find even more detail on how to use apply, sapply and lapply at this post from R-bloggers.