# A Gentle Introduction to Credit Risk Modeling with Data Science — Part 2

No Assets.”In our last post, we started using Data Science for Credit Risk Modeling by analyzing loan data from Lending Club.We’ve raised some possible indications that the loan grades assigned by Lending Club are not as optimal as possible.Over the next posts, our objective will be using Machine Learning to beat those loan grades.Soon this guy will take your job AND generate your credit score.We will do this by conceptualizing a new credit score predictive model in order to predict loan grades.In this post, we will use Data Science and Exploratory Data Analysis to delve deeper into some of the Borrower Variables, such as annual income and employment status and see how they affect other variables.This is crucial to help us visualize and understand what kind of public are we dealing with, allowing us to come up with an Economic Profile.Economic ProfileWould our friend W get a loan grade B?In the dataset, we have some variables from each borrower’s economic profile, such as:Income: annual income in USDEmployment Length: how many years of employment at the current job.Employment Title: job title provided by the borrower in the loan applicationWe will analyze each of these variables and how they interact within our dataset.Show me the MoneyWe have two variables reflecting borrowers’ income, depending on the nature of the application: single annual income or joint annual income.Single applications are the ones filed by only one person, while joint applications are filed by two or more people.As it can be seen in the countplot above, the quantity of joint applications is negligible.Let’s generate a couple of violin plots and analyze the annual income for single and joint applications together.Violin plots are similar to box plots, except that they also show the probability density of the data at different values (in the simplest case this could be a histogram).Our first violin plot looks weird.Before digging deeper, let’s zoom in and try to understand why it has this format by generating a distribution plot for annual incomes from single applications.Annual Incomes for Single ApplicationsWe can observe a few particularities:It is heavily skewed — deviates from the gaussian distribution.It is heavily peaked.It has a long right tail.These aspects are usually observed in distributions which are fit by a Power Law.A power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one quantity varies as a power of another.Throughout the years, some scientists have analyzed a variety of distributions..If you’re interested into this subject, check out this excellent paper from Aaron Clauset.Coming back to our problem, let’s zoom into the distribution for annual joint incomes and see how they differ from the annual single incomes data distribution.Interestingly enough, we have a different animal here.Our distribution is unimodal and resembles the gaussian distribution, being skewed to the left.Income versus Loan AmountWe will check the relationship between Income and Loan Amount by generating a boxplot.But in order to do this we’ll look at a subset from our data, where income is less than USD 120K per year.The reason for this is that applications with income above this limit are quite not statistically representative in our population — from 880K loans, only 10% have annual incomes higher than USD 120K.If we don’t cap our annual income, we would have a lot of outliers and our boxplot would look like the one on the left:From the right side boxplot, we have a few highlights:Fully Paid status quartile distribution is very different from Charged Off..This could be explained by some possible scenarios, among them:Unemployed people are getting loansSome of the applications are being filed without this informationLet’s investigate this further by checking the annual income from people with “None” as employment title.All the quartiles seem to be a little bit above zero, which wouldn’t be a surprise.But there are also a lot of outliers — some people with no employment title and an annual income of more than USD 250K, for example.Maybe Compliance and KYC are not doing a good job?At this point, we don’t know.Before answering this question, let me introduce you to NINJA Loans.NINJA LoansIn financial markets, NINJA Loan is a slang term for a loan extended to a borrower with “no income, no job and no assets.” Whereas most lenders require the borrower to show a stable stream of income or sufficient collateral, a NINJA loan ignores the verification process.Let’s see if the applications with no income have actually got a loan.We can conclude that many people with almost zero annual income actually got loans.At the same time, there are current loans which were granted to people with zero annual income.Bottomline for now is that we can’t safely assume this is a problem with Compliance or KYC.. More details