In other words, the minima of the Cost Function have to be found out.Batch Gradient Descent can be used as the Optimization Strategy in this case.Implementation of Multi-Variate Linear Regression using Batch Gradient Descent:The implementation is done by creating 3 modules each used for performing different operations in the Training Process.=> hypothesis(): It is the function that calculates and outputs the hypothesis value of the Target Variable, given theta (theta_0, theta_1, theta_2, theta_3, …., theta_n), Features in a matrix, X of dimension [m X (n+1)] where m is the number of samples and n is the number of features..The implementation of hypothesis() is given below:def hypothesis(theta, X, n): h = np.ones((X.shape[0],1)) theta = theta.reshape(1,n+1) for i in range(0,X.shape[0]): h[i] = float(np.matmul(theta, X[i])) h = h.reshape(X.shape[0]) return h=>BGD(): It is the function that performs the Batch Gradient Descent Algorithm taking current value of theta (theta_0, theta_1,…, theta_n), learning rate (alpha), number of iterations (num_iters), list of hypothesis values of all samples (h), feature set (X), Target Variable set (y) and Number of Features (n) as input and outputs the optimized theta (theta_0, theta_1, theta_2, theta_3, …, theta_n), theta_history (list containing value of theta for every iteration) and finally the cost history or cost which contains the value of the cost function over all the iterations..The implementation of BGD() is given below:def BGD(theta, alpha, num_iters, h, X, y, n): theta_history = np.ones((num_iters,n+1)) cost = np.ones(num_iters) for i in range(0,num_iters): theta[0] = theta[0] – (alpha/X.shape[0]) * sum(h – y) for j in range(1,n+1): theta[j] = theta[j] – (alpha/X.shape[0]) * sum((h-y) * X.transpose()[j]) theta_history[i] = theta h = hypothesis(theta, X, n) cost[i] = (1/X.shape[0]) * 0.5 * sum(np.square(h – y_train)) theta = theta.reshape(1,n+1) return theta, theta_history, cost=>linear_regression(): It is the principal function that takes the features matrix (X), Target Variable Vector (y), learning rate (alpha) and number of iterations (num_iters) as input and outputs the final optimized theta i.e., the values of [theta_0, theta_1, theta_2, theta_3,….,theta_n] for which the cost function almost achieves minima following Batch Gradient Descent, theta_history which stores the values of theta for every iteration and cost which stores the value of cost for every iteration.def linear_regression(X, y, alpha, num_iters): n = X.shape[1] one_column = np.ones((X.shape[0],1)) X = np.concatenate((one_column, X), axis = 1) # initializing the parameter vector….theta = np.zeros(n+1) # hypothesis calculation…..h = hypothesis(theta, X, n) # returning the optimized parameters by Gradient Descent….theta,theta_history,cost = BGD(theta,alpha,num_iters,h,X,y,n) return theta, theta_history, costNow, let’s move on to the Application of the Multi-Variate Linear Regression on a Practical Practice Data-Set.Let us consider a Housing Price Data-Set of Portland, Oregon..It contains size of the house (in square feet) and number of bedrooms as features and price of the house as the Target Variable..The Data-Set is available at,navoneel1092283/multivariate_regressionContribute to navoneel1092283/multivariate_regression development by creating an account on GitHub.github.comProblem Statement: “Given the size of the house and number of bedrooms, analyze and predict the possible price of the house”Data Reading into Numpy Arrays :data = np.loadtxt('data2.txt', delimiter=',')X_train = data[:,[0,1]] #feature sety_train = data[:,2] #label setFeature Normalization or Feature Scaling:This involves scaling the features for fast and efficient computation.Standard Feature Scaling Strategywhere u is the Mean and sigma is the Standard Deviation:Implementation of feature scaling:mean = np.ones(X_train.shape[1])std = np.ones(X_train.shape[1])for i in range(0, X_train.shape[1]): mean[i] = np.mean(X_train.transpose()[i]) std[i] = np.std(X_train.transpose()[i]) for j in range(0, X_train.shape[0]): X_train[j][i] = (X_train[j][i] – mean[i])/std[i]Here,Mean of the feature “size of the house (in sq. feet)” or F1: 2000.6808Mean of the feature “number of bed-rooms” or F2: 3.1702Standard Deviation of F1: 7.86202619e+02Standard Deviation of F2: 7.52842809e-01# calling the principal function with learning_rate = 0.0001 and # num_iters = 300000theta, theta_history, cost = linear_regression(X_train, y_train, 0.0001, 300000)theta after Multi-Variate Linear RegressionThe cost has been reduced in the course of Batch Gradient Descent iteration-by-iteration..The reduction in the cost is shown with the help of Line Curve.import matplotlib.pyplot as pltcost = list(cost)n_iterations = [x for x in range(1,300001)]plt.plot(n_iterations, cost)plt.xlabel('No. of iterations')plt.ylabel('Cost')Line Curve Representation of Cost Minimization using BGD in Multi-Variate Linear RegressionSide-by-Side Visualization of Features and Target Variable Actual and Prediction using 3-D Scatter Plots :=>Actual Target Variable Visualization:from matplotlib import pyplotfrom mpl_toolkits.mplot3d import Axes3Dsequence_containing_x_vals = list(X_train.transpose()[0])sequence_containing_y_vals = list(X_train.transpose()[1])sequence_containing_z_vals = list(y_train)fig = pyplot.figure()ax = Axes3D(fig)ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals)ax.set_xlabel('Living Room Area', fontsize=10)ax.set_ylabel('Number of Bed Rooms', fontsize=10)ax.set_zlabel('Actual Housing Price', fontsize=10)=>Prediction Target Variable Visualization:# Getting the predictions…X_train = np.concatenate((np.ones((X_train.shape[0],1)), X_train) ,axis = 1)predictions = hypothesis(theta, X_train, X_train.shape[1] – 1)from matplotlib import pyplotfrom mpl_toolkits.mplot3d import Axes3Dsequence_containing_x_vals = list(X_train.transpose()[1])sequence_containing_y_vals = list(X_train.transpose()[2])sequence_containing_z_vals = list(predictions)fig = pyplot.figure()ax = Axes3D(fig)ax.scatter(sequence_containing_x_vals, sequence_containing_y_vals, sequence_containing_z_vals)ax.set_xlabel('Living Room Area', fontsize=10)ax.set_ylabel('Number of Bed Rooms', fontsize=10)ax.set_zlabel('Housing Price Predictions', fontsize=10)Actual Housing Price Vs Predicted Housing PricePerformance Analysis:Mean Absolute Error: 51502.7803 (in dollars)Mean Square Error: 4086560101.2158 (in dollars square)Root Mean Square Error: 63926.2082 (in dollars)R-Square Score: 0.7329One thing to be noted, is that the Mean Absolute Error, Mean Square Error and Root Mean Square Error is not unit free..To make them unit-free, before Training the Model, the Target Label can be scaled in the same way, the features were scaled..Other than that, a descent R-Square-Score of 0.7329 is also obtained.That’s all about the Implementation of Multi-Variate Linear Regression in Python using Gradient Descent from scratch.REFERENCES[1] https://medium.com/@nc2012/polynomial-regression-in-python-a-z-f030736cd6aaFor Personal Contacts regarding the article or discussions on Machine Learning/Data Mining or any department of Data Science, feel free to reach out to me on LinkedIn.Navoneel Chakrabarty – Contributing Author – Towards Data Science | LinkedInView Navoneel Chakrabarty's profile on LinkedIn, the world's largest professional community..Navoneel has 2 jobs listed…www.linkedin.com. More details