Bivariate Logistic Regression Example (python)Intuitive Understanding and Simple ExerciseAndrew HershyBlockedUnblockFollowFollowingJun 24Source: Anne SprattA logistic regression is a model used to predict the “either-or” of a target variable.

The example we will be working on is:Target variable: Student will pass or fail the exam.

Independent variable: Hours spent studying per weekLogistic models are essentially linear models with an extra step.

In logistic models, a linear regression is ran through a “sigmoid function” which compresses its output into dichotomous 1’s and 0’s.

If we wanted to predict actual test scores, we would use a linear model.

If we wanted to predict “pass”/ “fail”, we would use a logistic regression model.

Linear (Predict Numerical Test Score):y = b0 + b1xLogistic (Predict “Pass/Fail”):p = 1 / 1 + e ^-(b0 + b1x)Visualization:In the image below, the straight line is linear , and the “S” shaped line is logistic.

Logistic regressions have higher accuracy when used for “either-or” models due to their shape.

Logistic Regressions are “S” shaped.

Linear Regressions are straight.

Understanding the data:import numpy as npimport pandas as pdimport matplotlib.

pyplot as plt%matplotlib inlinedf = pd.

read_excel(r”C:UsersxxLog_test.

xlsx”)x = df[‘W_hours’]y = df[‘Y’]plt.

scatter(x,y)plt.

show()df.

info()x.

plot.

hist()y.

plot.

hist()There are 23 rows in the dataset.

Below is the distribution of hours studied:Below is the distribution of pass (1)/fail (0):Data preparation / modelingNext, we will use the sklearn library to import “LogisticRegression”.

Details about the parameters can be found here.

We are converting our bivariate model into 2 dimensions with the .

reshape() function.

We are defining 1 column, but we are leaving the number of rows to be the size of the dataset.

So we get the new shape for x as (23, 1), a vertical array.

This is needed to make the sklearn function work properly.

Use the “logreg.

fit(x,y)” to fit the regression.

from sklearn.

linear_model import LogisticRegressionlogreg = LogisticRegression(C=1.

0, solver=’lbfgs’, multi_class=’ovr’)#Convert a 1D array to a 2D array in numpyx = x.

reshape(-1,1)#Run Logistic Regressionlogreg.

fit(x, y)Using and visualizing the modelLet’s write a program where we can get the predicted probability of passing and failing by hours studied.

We input the study time in the code below: Examples of 12, 16, and 20 hours studied.

print(logreg.

predict_proba([[12]]))print(logreg.

predict_proba([[16]]))print(logreg.

predict_proba([[20]]))The output on the left is the probability of failing, the output on the right is passing.

In order to visualize the model, let’s make a loop where we run each half-hour of study time into the regression from 0 to 33.

hours = np.

arange(0, 33, 0.

5)probabilities= []for i in hours: p_fail, p_pass = logreg.

predict_proba([[i]])[0] probabilities.

append(p_pass)plt.

scatter(hours,probabilities)plt.

title("Logistic Regression Model")plt.

xlabel('Hours')plt.

ylabel('Status (1:Pass, 0:Fail)')plt.

show()In this fictional set of data, a student is guaranteed to pass if he/she studies more than 20 hours and is guaranteed to fail if less than 10 hours.

17 hours is the 50/50 mark.

Thanks for reading,Check out a more detailed Logistic Regression model predicting cancer hereFind out about Linear vs Polynomial Regressions hereFind out more about Rsquared herePlease subscribe if you found this helpful.

.