The implementation of hypothesis() remains the same.=>BGD(): It is the function that performs the Batch Gradient Descent Algorithm taking current values of theta_0 and theta_1, alpha, number of iterations (num_iters), hypothesis value (h), feature set (X) and Target Variable set (y) as input and outputs the optimized theta (theta_0 and theta_1), theta_0 history (theta_0) and theta_1 history (theta_1) i.e., the value of theta_0 and theta_1 at each iteration and finally the cost history which contains the value of the cost function over all the iterations..The implementation of Gradient_Descent() is given below:def BGD(theta, alpha, num_iters, h, X, y): cost = np.ones(num_iters) theta_0 = np.ones(num_iters) theta_1 = np.ones(num_iters) for i in range(0,num_iters): theta[0] = theta[0] – (alpha/X.shape[0]) * sum(h – y) theta[1] = theta[1] – (alpha/X.shape[0]) * sum((h – y) * X) h = hypothesis(theta, X) cost[i] = (1/X.shape[0]) * 0.5 * sum(np.square(h – y_train)) theta_0[i] = theta[0] theta_1[i] = theta[1] theta = theta.reshape(1,2) return theta, theta_0, theta_1, cost=>linear_regression(): It is the principal function that takes the feature set (X), Target Variable set (y) and learning rate as input and outputs the final optimized theta i.e., the values of theta_0 and theta_1 for which the cost function almost achieves minima following Batch Gradient Descent.def linear_regression(X, y, alpha): # initializing the parameter vector….theta = np.zeros(2) # hypothesis calculation…..h = hypothesis(theta, X) # returning the optimized parameters by Gradient Descent….theta,theta_0,theta_1,cost= BGD(theta,alpha,300,h,X,y) return theta, theta_0, theta_1, costUsing the 3-module-Linear Regression-BGD:data = np.loadtxt('data1.txt', delimiter=',')X_train = data[:,0] #the feature_sety_train = data[:,1] #the labels# calling the principal function with learning_rate = 0.0001theta,theta_0,theta_1,cost=linear_regression(X_train,y_train,0.0001)The theta output comes out to be:theta after BGDVisualization of theta on Scatter Plot:The Regression Line Visualization of the obtained theta can be done on Scatter Plot:import matplotlib.pyplot as plttraining_predictions = hypothesis(theta, X_train)scatter = plt.scatter(X_train, y_train, label="training data")regression_line = plt.plot(X_train, training_predictions, label="linear regression")plt.legend()plt.xlabel('Population of City in 10,000s')plt.ylabel('Profit in $10,000s')The Regression Line Visualization comes out to be:Regression Line Visualization after BGDAlso, the cost has been reduced in the course of Batch Gradient Descent iteration-by-iteration..The reduction in the cost is shown with the help of Line Curve and Surface Plot.Line Curve for representing reduction in cost in 300 iterations:import matplotlib.pyplot as pltcost = list(cost)n_iterations = [x for x in range(1,301)]plt.plot(n_iterations, cost)plt.xlabel('No. of iterations')plt.ylabel('Cost')The line curve comes out to be:Line Curve Representation of Cost Minimization using BGDSurface Plot for representing reduction in cost:J = np.ones((300,300))in1 = 0in2 = 0theta_0 = theta_0.reshape(300)theta_1 = theta_1.reshape(300)for i in theta_0: for j in theta_1: t = np.array([i,j]) h = hypothesis(t, X_train) J[in1][in2]=(1/X_train.shape[0])*0.5*sum(np.square(h-y_train)) in2 = in2 + 1 in1 = in1 + 1 in2 = 0from mpl_toolkits.mplot3d import Axes3Dfig = plt.figure()ax = fig.add_subplot(111, projection='3d')X,Y = np.meshgrid(theta_0, theta_1)ax.plot_surface(X, Y, J)ax.set_xlabel('theta_0')ax.set_ylabel('theta_1')ax.set_zlabel('J')Surface Plot Representation of Cost Minimizaion with values of theta_0 and theta_1Performance Analysis (SGD Vs BGD)The Model Performance Analysis is done on the following metrics:=>Mean Absolute Error: Average of the mod(differences) between the predictions and actual observations over a sample of instances.=>Mean Square Error: Average of the squared differences between the predictions and actual observations over sample of instances.=>Root Mean Square Error:Square root of the average of the squared differences between the predictions and actual observations over a sample of instances.=>R-Square Score or Coefficient of Determination:Comparison between SGD and BGD:So, Batch Gradient Descent is a Clear Winner over Stochastic Gradient Descent in all respects !!That’s all about the Implementation of Uni-Variate Linear Regression in Python using Gradient Descent from Scratch.REFERENCES[1] Bulmer, Michael (2003)..Francis Galton: Pioneer of Heredity and Biometry..Johns Hopkins University Press..ISBN 0–8018–7403–3.For Personal Contacts regarding the article or discussions on Machine Learning/Data Mining or any department of Data Science, feel free to reach out to me on LinkedInNavoneel Chakrabarty – Contributing Author – Towards Data Science | LinkedInView Navoneel Chakrabarty's profile on LinkedIn, the world's largest professional community..Navoneel has 2 jobs listed…www.linkedin.com. More details