Decision Boundary Visualization(A-Z)Meaning, Significance, ImplementationNavoneel ChakrabartyBlockedUnblockFollowFollowingJan 16Classification problems have been very common and essential in the field of Data Science.
For example: Diabetic Retinopathy, Mood or Sentiment Analysis, Digit Recognition, Cancer-Type prediction (Malignant or Benign) etc.
These problems are often solved by Machine Learning or Deep Learning.
Also in Computer Vision, projects like Diabetic Retinopathy or Glaucoma Detection, Texture Analysis is often used now-a-days instead of Classical Machine Learning with conventional Image Processing or Deep Learning.
Although Deep Learning has been the state-of-the-art in Diabetic Retinopathy as per the research paper:“A Deep Learning Method for the detection of Diabetic Retinopathy” .
In classification problems, prediction of a particular class is involved among multiple classes.
In other words, it can also be framed in a way that a particular instance (data-point in terms of Feature Space Geometry) needs to be kept under a particular region (signifying the class) and needs to separated from other regions (signifying other classes).
This separation from other regions can be visualized by a boundary known as Decision Boundary.
This visualization of the Decision Boundary in feature space is done on a Scatter Plot where every point depicts a data-point of the data-set and axes depicting the features.
The Decision Boundary separates the data-points into regions, which are actually the classes in which they belong.
Importance/Significance of a Decision Boundary:After training a Machine Learning Model using a data-set, it is often necessary to visualize the classification of the data-points in Feature Space.
Decision Boundary on a Scatter Plot serves the purpose, in which the Scatter Plot contains the data-points belonging to different classes (denoted by colour or shape) and the decision boundary can be drawn following many different strategies:Single-Line Decision Boundary: The basic strategy to draw the Decision Boundary on a Scatter Plot is to find a single line that separates the data-points into regions signifying different classes.
Now, this single line is found using the parameters related to the Machine Learning Algorithm that are obtained after training the model.
The line co-ordinates are found using the obtained parameters and intuition behind the Machine Learning Algorithm.
Deployment of this strategy is not possible if the intuition and working mechanism of the ML Algorithm is not known.
Contour-Based Decision Boundary: Another strategy involves drawing contours which are regions each enclosing data-points with matching or closely matching colours-depicting classes to which the data-points belong and contours-depicting the predicted classes.
This is the mostly followed strategy as this does not employ parameters and related calculations of the Machine Learning Algorithm obtained after Model Training.
But on the other hand, this does not perfectly separate data-points using a single line that can only be given by obtained parameters after training and their co-ordinates calculation.
Exemplar Implementation of Single-Line Decision Boundary:Here, I am going to demonstrate Single-Line Decision Boundary for a Machine Learning Model based on Logistic Regression.
Going into the hypothesis of Logistic Regression -where z is defined as -theta_1, theta_2, theta_3 , ….
, theta_n are the parameters of Logistic Regression and x_1, x_2, …, x_n are the featuresSo, h(z) is a Sigmoid Function whose range is from 0 to 1 (0 and 1 inclusive).
For plotting Decision Boundary, h(z) is taken equal to the threshold value used in the Logistic Regression, which is conventionally 0.
So, ifthen,Now, for plotting Decision Boundary, 2 features are required to be considered and plotted along x and y axes of the Scatter Plot.
So,where,where x_1 is the original feature of the datasetSo, 2 values of x’_1 are obtained along with 2 corresponding x’_2 values.
The x’_1 are the x extremes and x’_2 are the y extremes of the Single Line Decision Boundary.
Application on a Fictional Dataset:The Dataset contains marks obtained by 100 students in 2 exams and the label (0/1), that indicates whether the student will be admitted to a university (1 or negative) or not (0 or positive).
The Dataset is available atnavoneel1092283/logistic_regressionContribute to navoneel1092283/logistic_regression development by creating an account on GitHub.
comProblem Statement: “Given the marks obtained in 2 exams, predict whether the student will be admitted to the university or not using Logistic Regression”Here, the marks in 2 exams will be the 2 features that are considered.
The following is the Implemented Logistic Regression in 3 modules.
The Detailed Implementation is given in the article,Logistic Regression in Python from scratchClassification is a very common and important variant among Machine Learning Problems.
Many Machine Algorithms have…import numpy as npfrom math import *def logistic_regression(X, y, alpha): n = X.
shape one_column = np.
shape,1)) X = np.
concatenate((one_column, X), axis = 1) theta = np.
zeros(n+1) h = hypothesis(theta, X, n) theta, theta_history, cost = Gradient_Descent(theta, alpha , 100000, h, X, y, n) return theta, theta_history, costdef Gradient_Descent(theta, alpha, num_iters, h, X, y, n): theta_history = np.
ones((num_iters,n+1)) cost = np.
ones(num_iters) for i in range(0,num_iters): theta = theta – (alpha/X.
shape) * sum(h – y) for j in range(1,n+1): theta[j] = theta[j] – (alpha/X.
shape) * sum((h – y) * X.
transpose()[j]) theta_history[i] = theta h = hypothesis(theta, X, n) cost[i] = (-1/X.
shape) * sum(y * np.
log(h) + (1 – y) * np.
log(1 – h)) theta = theta.
reshape(1,n+1) return theta, theta_history, costdef hypothesis(theta, X, n): h = np.
shape,1)) theta = theta.
reshape(1,n+1) for i in range(0,X.
shape): h[i] = 1 / (1 + exp(-float(np.
matmul(theta, X[i])))) h = h.
shape) return hExecuting Logistic Regression on the dataset:data = np.
txt', delimiter=',')X_train = data[:,[0,1]]y_train = data[:,2]theta, theta_history, cost = logistic_regression(X_train, y_train , 0.
001)The theta (parameter) vector obtained,Getting the predictions or predicted classes of the data-points:Xp=np.
shape,1)), X_train),axis= 1)h=hypothesis(theta, Xp, Xp.
shape – 1)Plotting the Single Line Decision Boundary:import matplotlib.
pyplot as pltc0 = c1 = 0 # Counter of label 0 and label 1 instancesif i in range(0, X.
shape): if y_train[i] = 0: c0 = c0 + 1 else: c1 = c1 + 1×0 = np.
ones((c0,2)) # matrix label 0 instancesx1 = np.
ones((c1,2)) # matrix label 1 instancesk0 = k1 = 0for i in range(0,y_train.
shape): if y_train[i] == 0: x0[k0] = X_train[i] k0 = k0 + 1 else: x1[k1] = X_train[i] k1 = k1 + 1X = [x0, x1]colors = ["green", "blue"] # colours for Scatter Plottheta = theta.
reshape(3)# getting the x co-ordinates of the decision boundaryplot_x = np.
array([min(X_train[:,0]) – 2, max(X_train[:,0]) + 2])# getting corresponding y co-ordinates of the decision boundaryplot_y = (-1/theta) * (theta * plot_x + theta)# Plotting the Single Line Decision Boundaryfor x, c in zip(X, colors): if c == "green": plt.
scatter(x[:,0], x[:,1], color = c, label = "Not Admitted") else: plt.
scatter(x[:,0], x[:,1], color = c, label = "Admitted")plt.
plot(plot_x, plot_y, label = "Decision_Boundary")plt.
xlabel("Marks obtained in 1st Exam")plt.
ylabel("Marks obtained in 2nd Exam")Obtained Single Line Decision BoundaryIn this way, Single Line Decision Boundary can be plotted for any Logistic Regression based Machine Learning Model.
For other Machine Learning Algorithm based models, corresponding hypothesis and intuition must be known.
Exemplar Implementation of Contour-Based Decision Boundary:Using the same fictional problem, dataset and trained model, Contour-Based Decision Boundary is to be plotted.
# Plotting decision regionsx_min, x_max = X_train[:, 0].
min() – 1, X_train[:, 0].
max() + 1y_min, y_max = X_train[:, 1].
min() – 1, X_train[:, 1].
max() + 1xx, yy = np.
arange(x_min, x_max, 0.
arange(y_min, y_max, 0.
1))X = np.
shape,1)) , np.
ravel()]), axis = 1)h = hypothesis(theta, X, 2)h = h.
contourf(xx, yy, h)plt.
scatter(X_train[:, 0], X_train[:, 1], c=y_train, s=30, edgecolor='k')plt.
xlabel("Marks obtained in 1st Exam")plt.
ylabel("Marks obtained in 2nd Exam")Obtained Contour-Based Decision Boundary where yellow -> Admitted and blue -> Not AdmittedThis method is apparently more convenient as no intuition and hypothesis or any Mathematics behind the Machine Learning Algorithm is required.
All that is required, is the knack of Advanced Python Programming !!!!So, it is a general method of plotting Decision Boundaries for any Machine Learning Model.
In most Practical and Advanced-Level projects, many features are being involved.
Then, how to plot Decision Boundaries in 2-D Scatter Plots?In those cases, there are multiple way outs:Feature Importance Scores given by Random Forest Classifier or Extra Trees Classifier can be used, to obtain 2 most important features and then the Decision Boundary can be plotted on the Scatter Plot.
Dimension Reduction techniques like Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA) can be used for reducing N number of features into 2 features (n_components = 2) as the information or interpretation of the N features get embedded into the 2 features.
Then, Decision Boundary can be plotted on the Scatter Plot considering the 2 features.
That’s all about Decision Boundary Visualization.
Chakrabarty, “A Deep Learning Method for the detection of Diabetic Retinopathy,” 2018 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), Gorakhpur, India, 2018, pp.
8596839For Personal Contacts regarding the article or discussions on Machine Learning/Data Mining or any department of Data Science, feel free to reach out to me on LinkedInNavoneel Chakrabarty – Contributing Writer – Hacker Noon | LinkedInView Navoneel Chakrabarty's profile on LinkedIn, the world's largest professional community.
Navoneel has 4 jobs listed…www.
com.. More details