How to write your favorite R functions — in Python?

how about if the distribution is Poisson?How to calculate the inter-quartile range of a series of data points?How to generate few random numbers following a Student’s t-distribution?R programming environment allows you do just that.

On the other hand, Python scripting ability allows an analyst to use those statistics in a wide variety of analytics pipeline with limitless sophistication and creativity.

To combine the advantage of both worlds, one needs a simple Python-based wrapper library which contains most commonly used functions pertaining to probability distributions and descriptive statistics defined in R-style so that users can call those functions real fast without having to go to the proper Python statistical libraries and figure out the whole list of methods and arguments.

A Python wrapper script for most convenient R-functionsI wrote a Python script to define the most convenient and widely used R-functions in simple statistical analysis — in Python.

After importing this script you will be able to use those R-functions naturally just like in a R programming environment.

Goal of this script is to provide simple Python sub-routines mimicking R-style statistical functions for quickly calculating density/point estimates, cumulative distributions, quantiles, and generating random variates for various important probability distributions.

To maintain the spirit of R styling, no class hierarchy was used and just raw functions are defined in this file so that user can import this one Python script and use all the functions whenever he/she needs them with a single name call.

Note, I use the word mimic.

Under no circumstance, I am claiming to emulate the true functional programming paradigm of R which consists of deep environmental setup and complex inter-relationships between those environments and objects.

This script just allows me (and I hope countless other Python users too) to quickly fire up a Python program or Jupyter notebook, import the script, and start doing simple descriptive statistics in no time.

That’s the goal, nothing more, nothing less.

Or, you may have coded in R in your grad school and just starting out to learn and use Python for data analysis.

You will be happy to see and use some of the same well-known functions in your Jupyter notebook in the similar manner that you have used in R environment.

Whatever the reason may be, it is fun :-)Simple ExamplesTo start just import the script and start working with lists of numbers as if they were data vectors in R.

from R_functions import *lst=[20,12,16,32,27,65,44,45,22,18]<more code, more statistics.

>For example, you want to calculate Tuckey five number summary from a vector of data points.

You just call one simple function fivenum and pass on the vector.

It will return the five-number summary in a Numpy array.

lst=[20,12,16,32,27,65,44,45,22,18]fivenum(lst)> array([12.

, 18.

5, 24.

5, 41.

, 65.

])Or, you want to know the answer to the following question.

Suppose a machine outputs 10 finished goods per hour on average with a standard deviation of 2.

The output pattern follows a near normal distribution.

What is the probability that the machine will output at least 7 but no more than 12 units in the next hour?The answer is essentially this,You can obtain the answer with just one line of code using pnorm…pnorm(12,10,2)-pnorm(7,10,2)> 0.

7745375447996848Or, the following,Suppose you have a loaded coin with probability of turning head up 60% every time you toss it.

You are playing a game of 10 tosses.

How do you plot and map out the chances of all the possible number of wins (from 0 to 10) with this coin?You can obtain a nice bar chart with just few lines of code and using just one function dbinom…probs=[]import matplotlib.

pyplot as pltfor i in range(11): probs.





show()Simple interface for probability calculationsR is amazing to offer an extremely simplified and intuitive interface for quick calculation from essential probability distributions.

The interface goes like this…d{distirbution} — gives the density function value at a point xp{distirbution} — gives the cumulative value at a point xq{distirbution} — gives the quantile function value at a probability pr{distirbution} — generates one or multiple random variateIn our implementation, we stick to this interface and associated argument list so that you can execute these functions exactly like in a R environment.

Currently implemented functionsCurrently, following R-style functions are implemented in the script for fast calling.

Mean, median, variance, standard deviationTuckey five-number summary, IQRCovariance of a matrix or between two vectorsDensity, cumulative probability, quantile function, and random variate generation for following distributions — normal, uniform, binomial, Poisson, F, Student’s-t, Chi-square, Beta, and Gamma.

Work in progress…Obviously, this is a work in progress and I plan to add some more convenient R-functions to this script.

For example, in R single line of command lm can get you a ordinary least-square fitted model to a numerical data set with all the necessary inferential statistics (P-values, standard error, etc.


This is powerfully brief and compact! On the other hand, standard linear regression problems in Python is often tackaled using Scikit-learn which needs bit more scripting to accomplish this.

I plan to incorporate this single function linear model fitting feature using Python’s statsmodels backend.

If you like this script and find use for it in you work, please star/fork my GitHub repo and spread the news.

If you have any questions or ideas to share, please contact the author at tirthajyoti[AT]gmail.


Also, you can check author’s GitHub repositories for other fun code snippets in Python, R, or MATLAB and machine learning resources.

If you are, like me, passionate about machine learning/data science, please feel free to add me on LinkedIn or follow me on Twitter.

If you liked this article, please don’t forget to leave a clap :-).

. More details

Leave a Reply