So in order to predict Y (salary) given X (age), we need to know the values of a and b (the model’s coefficients).While training and building a regression model, it is these coefficients which are learned and fitted to training data..During training process we try to minimize the error between actual and predicted values and thus minimizing cost function.In the figure, the red points are the data points and the blue line is the predicted line for the training data..To get the predicted value, these data points are projected on to the line.To summarize, our aim is to find such values of coefficients which will minimize the cost function..The most common cost function is Mean Squared Error (MSE) which is equal to average squared difference between an observation’s actual and predicted values..To give a brief understanding, in Gradient descent we start with some random values of coefficients, compute gradient of cost function on these values, update the coefficients and calculate the cost function again..In case of regression, the ID3 algorithm can be used to identify the splitting node by reducing standard deviation (in classification information gain is used).A decision tree is built by partitioning the data into subsets containing instances with similar values (homogenous)..If the numerical sample is completely homogeneous, its standard deviation is zero.The steps for finding splitting node is briefly described as below:Calculate standard deviation of target variable using below formula.2.. More details