This leads to a multivariate normal distribution, the equation of which is given below:Σ is a covariance matrix.

Function symbol N and f are used interchangeably.

Effects of the covariance matrix or the correlation among random variables are given below:Kevin Murphy’s slideMaximum Likelihood Estimate (MLE) of Mean and VarianceImagine that we are given a bunch of adult weights and we want to find a normal distribution model ( mean, variance) to represent the data.

To find optimal mean and variance which maximize the likelihood of the data, MLE is used.

Look at the below image which clearly explains the concept of MLE.

The bell-shaped red curves are given theoretical normal distribution functions using varying values of the mean ( values increasing from left to right).

The likelihood of observed data peaks at the mean value which coincides with sample average.

The same process applies to the estimate of variance.

StatQuest: https://www.

youtube.

com/watch?v=XepXtl9YKwc&t=107sTherefore, the MLE of mean and variance:Enough for background theories.

Let’s dive into Gaussian classifier.

Gaussian ClassifierLet’s imagine that we are given a training dataset which falls into two classes (1 and 2 — binary classification).

The objective here is to predict the class to which new data belongs.

In other words, for a given new data(x), we want to estimate p ( y = 1|x) and p ( y = 2|x).

X is assigned to any class which has the highest probability (make sense ?).

Bayes’ theorem can help us with estimating p ( y = 1|x) and p ( y = 2|x).

p ( y = c|x) is p ( y = 1|x) and p ( y = 2|x)p ( x|y = c ) is assumed to be Gaussian/normal distribution.

p ( y = c) is a class prior which is ratio of #class c/ #total sample.

For the purpose of classification, we are only interested in nominator so we can ignore the normalisation constant which is aimed to make class posterior a valid probability distribution.

For the curious reader, please refer to the above reference.

All terms in nominator can be estimated from the training dataset.

It is called ‘Gaussian’ classifier because of the assumption that p ( x|y = c ) is Gaussian distribution.

It is also known as ‘Mixture Gaussian’ and ‘Discriminant’ classifier.

There are two variants of Gaussian classifier, depending on whether covariance matrices of classes are assumed to be equal or not.

Covariance matrix assumption has an impact on the class boundary.

Shared covariance matrix leads to the linear boundary while separate covariance matrices lead to the quadratic boundary.

Binary Gaussian Classifier Implementation in PythonNow let’s get it real.

To apply all the above theory and for the sake of simplicity, we implement Gaussian classifier for simple binary classification in Python.

We create GaussianClassifier class with two methods — train() and predict().

We test our implementation using breast cancer dataset.

We use 67% of the data as the training set and keep the rest as the test set.

We print out accuracy score and confusion.

Here are the resulting confusion matrix and accuracy score which are not bad at all.

Generative vs DiscriminativeA Gaussian classifier is a generative approach in the sense that it attempts to model class posterior as well as input class-conditional distribution.

Therefore, we can generate new samples in input space with a Gaussian classifier.

On the contrary, a method like logistic regression (another article of mine: https://medium.

com/@rinabuoy13/logistic-regression-with-pytorch-b67ebd75a814) is discriminative since it attempts to model class posterior directly.

So not all classification methods are the same.

.