Python vs Excel — Compound Annual Growth Rate (CAGR)

Two lines of code to have your very own CAGR function.

You will notice here that there is a slight difference with the Excel function.

In this function, there is a minus 1 after num_periods.

This allows me to correctly calculate the CAGR by specifying the total number of time periods as an argument rather than the number of compounding periods (which is always minus 1 of the total).

I do this because I have, in the past, made repeated mistakes of incorrectly specifying the number of compounding periods on Excel.

Now that you have your CAGR function, you can either save it into a separate Python file (from which you can import into any other Python file) or just code it into the file where you will be loading the Excel table into a dataframe and apply it there.

To apply your CAGR function, first import the Pandas library:import pandas as pdPandas is an open-source, easy to use Python library that can convert any csv or Excel file into a dataframe for data analysis.

It’s a must-have for any data analyst programming in Python.

Next, load the Excel into a Pandas dataframe:ExcelFile = 'ExcelTable.

xlsx' #name of my Excel filedf = pd.

read_excel(ExcelFile, index_col='Year')Notice that I’ve added a parameter index_col to assign the ‘Year’ column as the index, leaving ‘Yearly Income (RM)’ as the only column in the dataframe.

Do a print function on your dataframe to see if the Excel table was successfully converted into a dataframe:print(df)Looks like it came out rightYou now have a dataframe to which you can apply the CAGR function.

The keys to the CAGR function in Python, and why it is much easier to use in Python, are the variables below:start_value = float(df.

iloc)end_value = float(df.

iloc[-1])num_periods = len(df)These variables will extract the arguments you need for your CAGR function, and in this case, the output returns:36000.

00102000.

009The great thing about these variables is that the code does not have to change even as your dataset grows.

Using the .

iloc method together with list indexing ensures that the function will always use the first row as the start value and the last row as the end value, and the len function will always count correctly the total number of time periods.

Even if you add an extra year(s), the code works the same.

Say for example, you add a row to the dataframe:df.

loc =Running a print function on the same variables will return a different output in accordance with the new data:36000.

0084000.

0010 If you want to specify a time period, say 5 years between 2012 and 2016, for the CAGR calculation, you can also use the .

loc method to easily do so:start_value = float(df.

loc)end_value = float(df.

loc)num_periods = len(df.

loc[2012:2016])Slicing and dicingNow let’s try to apply the CAGR function to this sliced dataset by using the variables as input arguments:result = cagr(start_value, end_value, num_periods)print(result)0.

12801507993497308But wait!.The output shows the result in a float type with too many decimals.

Though I personally prefer this kind of output, it is generally not friendly to many eyes.

We will need string formatting to present the output in percentages:print("{:.

2%}".

format(result))12.

80%Much more human-friendly.

And that’s all you need to perform CAGR analysis in Python!.