Calculus — Multivariate Calculus And Machine LearningA Must Know Concept For Every ProfessionalFarhad MalikBlockedUnblockFollowFollowingMay 4Calculus plays a vital role in Machine Learning projects.
In this article, I am going to provide:Overview Of Multivariate CalculusMultivariate Calculus Uses In Machine LearningExplanation Of Calculus In Gradient DescentIf I could go back to my school days and study one topic again, it would be calculus.
Calculus has to be one of the most fascinating mathematical concepts ever discovered.
An understanding of multivariate calculus is one of the most important data science skills.
Multivariate Calculus is used every where in Machine Learning projects.
We are often faced with problems whereby we are attempting to predict a variable that is dependent on multiple variables.
As an instance, we might want to predict the price of a stock and its price can be dependent on a number of factors such as company growth, inflation rate, interest rate and so on.
Therefore, for us to better quantify and predict the price accurately, the field of Machine Learning utilises multivariate calculus to understand the relationships.
What Is Multivariate Calculus?Multivariate implies that there are multiple variables.
Therefore multivariate calculus is a field of calculus which involves multiple variables.
If the output of your function, z, is dependent on one input variable, x, then you declare it as:z = f(x)Subsequently, if the output of your function z depends on multiple inputs, x and y as an instance, then you declare the function as:z = f(x,y)These variables, x and y, are the inputs of the function, therefore they can influence the results of the output.
Most of the machine learning algorithms are trained on multiple features (variables) therefore understanding of how multivariate calculus works is crucial for all of us.
Multivariate Calculus UsesMultivariate calculus is a field that helps us in explaining the relationships between input and output variables.
It provide us with the tools to build an accurate predictive model.
Moreover, multivariate calculus can explain the change in our target variable in relation to the rate of change in the input variables.
Have a look at these quick articles if you want to understand the foundations of Calculus:Calculus 1 — A Must Know Concept For Every ProfessionalOnly Constant In Life Is Change And Calculus Measures Its Ratemedium.
comCalculus 2— A Must Know Concept For Every ProfessionalLearn How Integration Can Combine Changing Quantitiesmedium.
comIf there are two functions; y = f(x) and x = g(z) implying that y depends on x and x is a function of variable z then we can use the chain rule to declare y as a composite function:y=f(g(z)).
This is known as a composite functionThe derivative of y with respect to z:Let’s assume that the function is dependent on two variables: x and m.
We can extend the chain rule above in multivariate calculus:Solving Multivariate CalculusThere are essentially two key techniques to solve a multivariate calculus.
If a function is dependent on a number of variables then we can use partial derivative to find the derivative of the function with respect to one of those variables.
The trick there is to keep all of the variables constant.
If we were to change all of the variables and find the derivative then it is known as “total derivative”.
If there are functions f(x) and g(x), and let’s also consider that the derivatives of both of the functions can be computed then their product of the derivative is:(f+g)’ = f’ + g’Hence the derivative is the sum of derivative of the functions f and g.
Partial Derivative:This section will help us understand the steps involved in computing partial derivative.
Understanding With An Examplez = f(x,y) = x² + yThere are two input variables involved (x and y) therefore there are two main steps:We are going to take the derivative of the function with respect to x first.
Key is to assume y is a constant number.
Subsequently, derivative of a constant is 0 therefore derivative of y is 0.
This yields:Here, 0 is used to refer to the derivative of y (a constant number).
We are going to substitute y back in the equation in step 3.
Now assume x is a constant number and take the derivative of the function with respect to y:Derivative of a constant is 0 therefore derivative of x² is 0.
In the last step, we are going to take the two derivatives which we calculated in steps 1 and 2, and substitute x and y back in:3.
Combine the two derivatives.
The multivariate derivative is:Another Examplez= f(x,y) = x²+y²We are going to take the derivative of the function with respect to x first.
Firstly, assume y is a constant number and now take the derivative of the function with respect to x:2.
Now assume x is a constant and now take the derivative of the function with respect to y:3.
Combine the two derivatives together.
The multivariate derivative is:How Is Multivariate Calculus Used In Machine Learning?I am going to explain how multivariate calculus is used in machine learning by explaining the process in detail as I believe it’s important to grasp it.
In a support vector algorithm, multivariate calculus is used to find the maximal margin.
In the EM algorithm, it is used to find the maxima.
The optimization problems rely on the multivariate calculus.
In gradient descend, it is used to find the local and global maxima.
Let’s assume we are attempting to forecast a variable that is dependent on multiple variables.
The variable that we are predicting is a continuous variable e.
monthly rainfall and is dependent on a number of variables e.
temperature, wind speed and so on.
We decide to utilise the linear regression algorithm.
Linear algorithm with multiple variables is known as multiple linear regressionOur sole aim in linear regression is to fit the actual values as closely as possible.
We quantify the difference between the predicted and actual values by using a loss (or cost) function.
This loss function could be based on the mean squared error or root mean squared error algorithms.
We are going to repeat the exercise until the loss function converges and we reach the minimum value.
When an algorithm meets its target, it known as converging.
This is achieved by using the gradient descend algorithm which internally depends on the techniques of multivariate partial calculus.
Let’s Look Deeper And Understand Mean Squared Error FirstAs the name implies, Mean Squared Error (MSE) is the mean of the residuals (error) squared.
To calculate MSE:First we need to subtract fitted values of y (y hat) from actual values of y.
The difference is known as residual.
Some of the values will be positive and the other values will be negative.
Therefore square the residuals and then sum them.
This is known as sum of squared residuals: SSRFrom the points of y, we can calculate the mean of y.
Calculate the difference between value of best fit line and mean of Y.
Square the differences and then sum the values.
This final value is known as sum of squared deviations of y from its mean: ESSThis image illustrates the process:We could calculate the accuracy using: root mean squared errorRoot Mean Squared ErrorCalculate difference between prediction and actual observation.
Square the differenceSum the squared differencesDivide sum of squared differences by total number of observationsCalculate square root of itTo get the general idea of the mathematical measures, read:Must Know Mathematical Measures For Every Data ScientistKey Mathematical Formulae Explained In Easy To Follow Bullet Pointsmedium.
comThe key to note is that the aim of the algorithms is to compute the difference between actual and predicted values.
The closer the fit, the lower the value of the cost function and better trained the algorithm is.
Now each time we run the regression algorithm, we change the values of the inputs, compute the outputs and then calculate the cost function.
The question is — how should we change the input values?.Should we increase or should we decrease them?Understanding Gradient Descend To Understand Multivariate CalculusGradient descent is used in a number of algorithms including regression to neural networks.
It relies heavily on multivariate calculus to find the minimum points.
What is Gradient Descent?The algorithm known as gradient descent (GD) is used for finding minima and/or maxima of a function.
This function could be the cost function of a machine learning algorithm.
Let’s imagine you are kicking a football on an uneven ground whilst you are blindfolded.
The aim of the game is to kick the ball until it lands on one of the lowest points on the surface, marked as A and B in the image below.
You are also instructed to complete the game within a minute.
This is the starting position:We know that the slower you kick the ball, the slower the ball will move, and the longer it will take for the ball to reach points A or B but you are more likely to reach your target than to miss it.
However, we might not converge within the target time.
We are also aware of the fact that the faster you kick the ball, the higher the chances of you overshooting and missing the target points completely.
And we might never converge.
This is the principal of the the gradient descend technique when it attempts to find the minimum point.
The speed of the ball kick is known as the learning rate, α.
Gradient descend finds the rate of change of the variables and adjusts to move towards the minimum point.
The minimum point in this case is the values of the input variables which will give us the minimum value for the cost function.
If there are two variables involved then you will need to compute partial derivatives with respect to both of the variables individuallyLet’s assume you kick the ball with minimal power and the ball moves at a very low speed:The ball moves forward slowly and you keep kicking until it reaches the point A as demonstrated in the image below:The faster you kick the ball, the more the ball moves and the more likely it is for you to miss the minimum point due to overshooting.
Each time the algorithm decides if it needs to increase or decrease the values of the input variables.
It does so by calculating the rate of change of the variables.
The rate of change of the variables is computed using partial derivative technique of multivariate calculus.
For example, if your target variable y depends on variables a and b then it will find the partial derivative of the function with respect to a and then with respect to b to understand the rate of change.
It will then use this information to alter the values iteratively.
Each time the learning rate along with the partial derivative of the input variable is going to help you move either forward or backwards.
When you reach the minimum point, the derivative of the function will be 0.
This implies that the gradient descend has reached the minimum point.
The minimum point is when the derivative is 0Although there is a numerically algebraic solution to find the local minimum, the derivative iterative nature of gradient descent scales better when we have a larger data set.
To find gradient descend for multiple variables, we will have to treat each variable separately by making all of the other variables constant and then find the partial derivative of the function.
This is a very crucial use case of multivariate partial calculus in machine learning algorithms.
This technique is used in optimisation algorithms, regression and neural networks.
Neural networks cost function and optimisation is also dependent on the same multivariate calculus.
SummaryThis article provided an overview of multivariate calculus.
It explained that multivariate calculus is the calculus of multiple variables.
It also explained the benefits of multivariate calculus and how it works.
It provided an overview of the partial differentiation equations.
It then explained how the multivariate calculus works in machine learning algorithms.
Have a look at this article, if you want to understand how the foundations of Calculus:Calculus 1 — A Must Know Concept For Every ProfessionalOnly Constant In Life Is Change And Calculus Measures Its Ratemedium.
comCalculus 2 — A Must Know Concept For Every ProfessionalLearn How Integration Can Combine Changing Quantitiesmedium.
comHope it helps.