Yes, a goal function.
A goal function can guide us on how to update the parameter in the right way.
As for the logistic regression, we usually use log likelihood.
Wait, wait…what the hell about these things!Don’t panic.
Let’s take it apart.
1->2 (how to get line 1 to line 2): log(ab) = log a + log b2->3: log(a)^b = b * log a3->4: Due to we only have two class, y=0 and y=1, so we can use the below equation:3->44->5: we use below transformation to make the equation more readableSo we get the final part.
Don’t forget why we start this.
A goal function can guide us how to update the parameter in the right way.
We need to use this to calculate the loss to update the parameter.
More specifically, we need to calculate the derivative of the log-likelihood function.
Here I will directly give the final update equation.
In step 6, the most important equation is this one.
If you cannot understand how to get this, it is totally ok.
All we need to do is to write it as real code.
But if you are interested, this video should be helpful.
8 Update parameter θStep 8 is a little longer, but it is very important.
We will crack it.
θj is the j-th parameter.
η is the learning rate, we set it as 0.
n is the number of data samples, in our case, we have 20.
i is the i-th data sampleBecause we have three parameters, we can write it as three equations.
The := notation is just like =.
You can find the explanation here.
The most difficult part is the Σ (summation symbol), so I expand the Σ for better understanding.
I colored the three parts in the equation because we can represent them as matrices.
Look at the red and blue part in the first row where we update theta 0.
We write the red part and blue part as column vectors.
Because we have 20 data samples, so the dimension of f is (20,1).
The dimension of x0 is (20,1).
We can write matrix multiplication with transpose.
So the dimension should be (1, 20) x (20, 1) -> (1,).
We get one scale to update the theta 0.
The x1 and x2 is also column vector.
And we can write to them as an X matrix.
And theta is a row vectorBack to the equation.
We can write is asWrite is as one equation.
A Numpy array-like version might be easy to understand.
Let’s do a little calculation to make sure the dimension is right.
θ: (1, 3) f^T: (1, 20) x: (20, 3)dot production: (1, 20) x (20, 3) -> (1, 3)Everything seems so right.
Let's write the code.
Actually, just two line.
import numpy as npimport matplotlib.
pyplot as plt# read datadata = np.
csv", delimiter=',', skiprows=1)train_x = data[:, 0:2]train_y = data[:, 2]# initialize parametertheta = np.
randn(3)# standardizationmu = train_x.
mean(axis=0)sigma = train_x.
std(axis=0)def standardizer(x): return (x – mu) / sigmastd_x = standardizer(train_x)# get matrixdef to_matrix(std_x): return np.
array([[1, x1, x2] for x1, x2 in std_x])mat_x = to_matrix(std_x)# dot productdef f(x): return np.
dot(x, theta)# sigmoid functiondef f(x): return 1 / (1 + np.
dot(x, theta)))# update timesepoch = 2000# learning rateETA = 1e-3# update parameterfor _ in range(epoch): """ f(mat_x) – train_y: (20,) mat_x: (20, 3) theta: (3,) dot production: (20,) x (20, 3) -> (3,) """ theta = theta – ETA * np.
dot(f(X) – train_y, mat_x)Something strange?.Remember what we write before the code?dot production: (1, 20) x (20, 3) -> (1, 3)The dimension changes make sense here.
But why when we write code, we use (20,) x (20, 3) -> (3,) ?Actually, this is not real math notation, this is the Numpy notation.
And if you are using TensorFlow or PyTroch, you should be familiar with it.
(20,) means this is a 1-D array with 20 numbers.
It can be a row vector or a column vector because it only has 1 dimension.
If we set this as a 2-D array, like (20, 1) or (1, 20), we can easily determine that(20, 1) is a column vector and (1, 20) is a row vector.
But why not explicitly set the dimension to eliminate ambiguity?Well.
Believe me, I have the seam question when I first see this.
But after some coding practice, I think I know the reason.
Because it can save our time!We take (20,) x (20, 3) -> (3,) as an example.
If we want to get the (1, 20) x (20, 3) -> (1, 3), what we need to do with (20,) x (20, 3) -> (3,)?Convert (20,) to (1, 20)Calculate (1, 20) x (20, 3) -> (1, 3)Because (1, 3) is a 2-D column vector we need to convert it to a 1-D array.
(1,3) -> (3,)Honestly, it is frustrating.
Why we cannot complete these in just one step?Yes, that’s why we can write(20,) x (20, 3) -> (3,).
Ok, let’s take a look at how the numpy.
dot() doc says.
dot(): If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
Hmm, actually I cannot get the point.
matmul() describes similar calculations with reshapes to (20,1) or (1,20) to perform standard 2d matrix product.
Maybe we can get some inspiration.
matmul(): If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions.
After matrix multiplication the prepended 1 is removed.
Ha, this is the missing part!.So in our case, (20,)becomes (1, 20) because the first dimension of (20,3) is 20.
And (1, 20) * (20, 3) -> (1, 3).
Then prepended 1 is removed, so we get (3,).
One step for all.
9 Plot the lineAfter updating the parameter 2000 times, we should plot the result to see the performance of our model.
We will make some data points as x1, and calculate x2 based on the parameters we learned.
# plot linex1 = np.
linspace(-2, 2, 100)x2 = – (theta + x1 * theta) / thetaplt.
plot(std_x[train_y == 1, 0], std_x[train_y == 1, 1], 'o') # train data of class 1plt.
plot(std_x[train_y == 0, 0], std_x[train_y == 0, 1], 'x') # train data of class 0plt.
plot(x1, x2, linestyle='dashed') # plot the line we learnedplt.
show()10 SummaryCongratulations!.I am glad you make it.
Hope my article is helpful for you.
You can find the whole code below.
Leave comments to let me know whether my article is easy to understand.
Stay tuned for my next article about the non-linear separable problem.