# Get started with Machine Learning by building a simple project

No matter what the input is the function outputs a value between 0 and 1.We have implemented this cost fuction in our lrCostFunction.m..Let’s see what does it do.function [J, grad] = lrCostFunction (theta, X, y, lambda)m = size(X, 1); n = size(X, 2); J = 0; grad = zeros(size(theta));h = sigmoid(X*theta); % here h is our hypothesis function J = (1/m)*(-y'*log(h) – (1 – y)'*log(1 – h)); J_reg = (lambda/(2*m))*(sum(theta.^2) – theta(1)^2); J = J + J_reg; grad = (1/m).*(X'*(h – y)) + (lambda/m).*(theta); grad(1) = (1/m)*(sum(h – y));endWe take theta, X, y, and lambda as input to the function..Then we calculate the gradient(all theta values are simultaneously calculated) and to remove the bias term from regularization we update the value of grad(1) as shown above.Sigmoid functionIt calculates the sigmoid of the argument(matrix or vector) which results into a value between 0 and 1.function [ret_z] = sigmoid (z) % takes z as an argument and returns it's sigmoid ret_z = 1./(1 + exp(-z)); % Note : Here, we have passed z = X*thetaendoneVsAll algorithmThis algorithm divides the given set of data to us(here 3 sets)into 2 sets of data and tries to find the perfect fit or our hypothesis function or more precisely the parameters theta needed to fit the 2 sets of data created..So, for the given training set we would have 3 different hypothesis function which we store in our ‘all_theta’ matrix..With that said let’s look at the oneVsAll code to see how it works.% oneVsAll algorithm divides the dataset into% 2 classes and finds optimum theta% values for the respective classesfunction [all_theta] = oneVsAll (X, y, num_of_classes, lambda) m = size(X, 1); n = size(X, 2); all_theta = zeros(num_of_classes, n); % for this example we have 3 classes for i = 1: num_of_classes % Set the parameters for our advance optimization % ================================== initial_theta = zeros(n, 1); options = optimset('GradObj','on','MaxIter',50); % ================================== costfunc = @(t)(lrCostFunction(t,X,y == i,lambda)); [theta] = fmincg(costfunc,initial_theta,options); % (y == i)divides the training set into 2 distinct classes % and then tries to find optimum theta % values to fit the dataset all_theta(i,:) = theta; % 3×5 matrix endendNow, let’s look at some of the details of this function..We have made ‘all_theta’ matrix to store our best fit theta values that the function ‘fmincg’ is going to return..Minimizing the cost function returns an optimum theta(best fit values for theta) which we store into our all_theta matrix for each iteration.predictOneVsAll.mIn this function we pass our all_theta matrix and X matrix to predict the type of species for all the training examples!.Let’s look at the code for the function and discuss some important concepts.% here we are going to use all the training examples% and find their respective % classes by using the hypothesis function sigmoid% The value in which hypothesis returns the max value is our predicted value which we store in p% Note: p is a column vectorfunction [p] = predictOneVsAll (X,all_theta) m = size(X, 1); p = zeros(m,1); h = sigmoid(X*all_theta'); [max_val,max_ind] = max(h,[],2); p = max_ind;endWe define a prediction vector ‘p’ of the size of our X vector to store our predicted value for all the training examples..To perform this task we calculate the hypothesis vector that returns a hypothesis value for each of the classes(each of the 3 species)!.The maximum predicted value among the three values would be taken as the prediction?.You don’t really need to know this code right now and it’s quite advance for you to understand as a beginner so just focus on its working.function [X, fX, i] = fmincg(f, X, options, P1, P2, P3, P4, P5)if exist('options', 'var') && ~isempty(options) && isfield(options, 'MaxIter') length = options.MaxIter;else length = 100;endRHO = 0.01; % a bunch of constants for line searchesSIG = 0.5; % RHO and SIG are the constants in the Wolfe-Powell conditionsINT = 0.1; % don't reevaluate within 0.1 of the limit of the current bracketEXT = 3.0; % extrapolate maximum 3 times the current bracketMAX = 20; % max 20 function evaluations per line searchRATIO = 100; % maximum allowed slope ratioargstr = ['feval(f, X']; % compose string used to call functionfor i = 1:(nargin – 3) argstr = [argstr, ',P', int2str(i)];endargstr = [argstr, ')'];if max(size(length)) == 2, red=length(2); length=length(1); else red=1; endS=['Iteration '];i = 0; % zero the run length counterls_failed = 0; % no previous line search has failedfX = [];[f1 df1] = eval(argstr); % get function value and gradienti = i + (length<0); % count epochs?!s = -df1; % search direction is steepestd1 = -s'*s; % this is the slopez1 = red/(1-d1); % initial step is red/(|s|+1)while i < abs(length) % while not finished i = i + (length>0); % count iterations?!X0 = X; f0 = f1; df0 = df1; % make a copy of current values X = X + z1*s; % begin line search [f2 df2] = eval(argstr); i = i + (length<0); % count epochs?.d2 = df2'*s; end % end of line searchif success % if line search succeeded f1 = f2; fX = [fX' f1]'; fprintf('%s %4i | Cost: %4.6e
', S, i, f1); s = (df2'*df2-df1'*df2)/(df1'*df1)*s – df2; % Polack-Ribiere direction tmp = df1; df1 = df2; df2 = tmp; % swap derivatives d2 = df1'*s; if d2 > 0 % new slope must be negative s = -df1; % otherwise use steepest direction d2 = -s'*s; end z1 = z1 * min(RATIO, d1/(d2-realmin)); % slope ratio but max RATIO d1 = d2; ls_failed = 0; % this line search did not fail else X = X0; f1 = f0; df1 = df0; % restore point from before failed line search if ls_failed || i > abs(length) % line search failed twice in a row break; % or we ran out of time, so we give up end tmp = df1; df1 = df2; df2 = tmp; % swap derivatives s = -df1; % try steepest d1 = -s'*s; z1 = 1/(1-d1); ls_failed = 1; % this line search failed end if exist('OCTAVE_VERSION') fflush(stdout); endendfprintf('.');if success % if line search succeeded f1 = f2; fX = [fX' f1]'; fprintf('%s %4i | Cost: %4.6e
', S, i, f1); s = (df2'*df2-df1'*df2)/(df1'*df1)*s – df2; % Polack-Ribiere direction tmp = df1; df1 = df2; df2 = tmp; % swap derivatives d2 = df1'*s; if d2 > 0 % new slope must be negative s = -df1; % otherwise use steepest direction d2 = -s'*s; end z1 = z1 * min(RATIO, d1/(d2-realmin)); % slope ratio but max RATIO d1 = d2; ls_failed = 0; % this line search did not fail else X = X0; f1 = f0; df1 = df0; % restore point from before failed line search if ls_failed || i > abs(length) % line search failed twice in a row break; % or we ran out of time, so we give up end tmp = df1; df1 = df2; df2 = tmp; % swap derivatives s = -df1; % try steepest d1 = -s'*s; z1 = 1/(1-d1); ls_failed = 1; % this line search failed end if exist('OCTAVE_VERSION') fflush(stdout); endendfprintf('. More details