P2P Lending Platform Data Analysis: Exploratory Data Analysis in R — Part 1

I flipped through Prosper’s Form S-1 and Annual Report and Wiki, and made a summary:Before 2009, the major credit risk metric displayed for investors was Credit Grade, which was based on the borrower’s credit score from an independent credit reporting agency.

But the loan performance on Prosper were not very well at that time.

After temporarily shut down asked by SEC and restructuring, Prosper launched new credit risk metric displayed since July 2009 — Prosper Rating, which was regarded as stricter credit guidelines for borrowers.

The new loan performance shows that Prosper’s loan default rate has been significantly reduced.

Seems like Prosper Rating performed well than old Credit Grade metric which evaluates a loan like banks’ way.

It’s leads me want to compare the Loan Status before and after 2009 in the Prosper loan data.

Some tips for knowing the data background knowledge…Before preceding to the next part, I want to introduce some tips about how to quickly understand the basic domain knowledge for a target data.

As an experienced investment banker, quickly understanding the basic domain knowledge and summarizing it are our major daily job.

When we get start to know a brand new industry knowledge, it’s very useful to find the company’s listed document, such as SEC filing.

The most well-known listing document is called Annual Report.

If the company of interest has not listed on stock market before, it’s also feasible to search the document of leading companies with the same industry which have been listed.

The following are the major sources to quickly get to know a new industry knowledge and history of a company/industry:Listed document: Form S-1, Form 10-K, Annual Report, etc.

They provide basic company background and history, industry information and competition, main product and service introduction, financial performance, etc.

They can be easily found on the company web page of Investor Relation(IR) if the company has been in Initial Public Offerings of stock market.

Industry Report: We can find plentiful industry trends and players on industry report.

Well-known sources such as IBISWorld, IDC, MarketResearch.

com, etc.

Note most of industry report sources require a paid account, but they always provide a report summary which make us can get some basic information.

Statistical source: It’s useful to investigate a quantity performance during a period.

Most of listed documents provide financial reports.

If you want a more integrated statistical source, the most recommended source is Statista.

Wikipedia and Google Search: Panacea for nearly everything.

Loan Performance before and after 2009In the data set, I defined HighRisk loan be loans are PastDue, Chargeoff or Defaulted; Completed loans be loans are in Completed, FinalPaymentInProgress and Cancelled.

The bar chart above shows that the proportion of high risk loans have decreased after 2009 from about 37% to 30%.

Let’s compare the relationship between each level of Credit Grade and Prosper Rating from HR to AA(high to low risk) and Loan Status.

I want to check how do Prosper Rating and Credit Grade assess both of the bad and good loans.

The percentage of High Risk loan appears an inverse relationship with both of Prosper Rating and Credit Grade as the risk level decrease.

The lower percentage of High Risk loan, the better the Rating is.

And we can see that the whole High Risk loans(in green color) actually decrease after Prosper Rating was launched.

I further group the loans with each Credit level from AA to HR(low to high risk)in both HighRisk and Completed loan:The chart above shows that number of loans rated in good level rating before 2009 have decreased after 2009 in both Completed loan and High Risk loan, implied that Prosper conduct more stricter loan audit after 2009.

Further more, High Risk loans totally decreased compared to loans before 2009 as we have shown in previous plot, while High Risk loans rated in D and E still increased after 2009.

It can be inferred that:Prosper conduct stricter loan auditing after 2009.

The ability of Prosper Rating performed better on assessing the high risk loans compared to Credit Grade applied before 2009.

Components of Prosper RatingWe have seen that the well performance of Prosper Rating from the Prosper data.

So how does the Prosper Rating be measured?According to this page, Prosper Rating is determined by Estimated Loss Rates, and this Estimated Loss Rates is determined by two scores: 1) a custom Prosper Score and 2) Credit Score from a consumer credit reporting agency (like Experian).

So I will investigate more in Prosper Score and Credit Score to see how they make Prosper Rating more accurate than Credit Grade.


Prosper ScoreAccording to Prosper website, Prosper Score was built using historical Prosper data to assess the risk of Prosper borrower listings.

It ranges from 1 to 11, with 11 being the lowest risk, to 1 being the highest risk.

Graph above shows Prosper Score has a bell-shaped distribution spiking on Score with 4,6,7, and fewer counts with scores in both lowest and highest risk among the Prosper data.

Group Prosper Score with each Loan Status, we can see they are distributed a left-skewed shape in completed loan, which means completed loans primarily locate in good rating.

However, Prosper score distributed a bell-shaped in high risk loan.

Compared to Prosper Rating with a left-skewed shape shown before, seems like Prosper Score presents a less ability to detect the high risk loans.

The other component of Prosper Rating is credit score from a reporting agency.

In this data set, I found the variables related to this kind of scores were CreditScoreRangeLower and CreditScoreRangeUpper.

I create a new variable, CreditScoreAverage, by averaging both of the two variables, to as a representative variable for credit score.


Credit Score AverageBefore 2009, Prosper does not allow individuals with an Credit Score (Experian Scorex PLUS) below 520 to post listings on the Platform.

And after 2009, Prosper made the Credit Score have the minimum threshold up to 640, but in some cases they allowed scores minimum value to 600 if borrower previously completed a Prosper loan.

So I divided the graph into two period and limit the minimum value of the score on x-axis to 510 and 630 to exclude the outliers of special cases.

Both of the CreditScoreAverage before and after 2009 distributed right-skewed, and with most of counts in 610 to 670 before 2009, and with most of counts in 670 to 710 after 2009.

The overall Average Credit Score after 2009 was apparently rated higher compared to loans before 2009.

The reason is Prosper truly set the higher threshold on borrower’s credit score after 2009, and it also matches the observation result in previous section of Prosper Rating.

But how do Credit Score Average and Prosper Score make the Prosper Rating more accurate?.Does Credit Score Average make difference between Completed and High Risk loan before and after 2009?.I grouped the Average Credit Score with loan status in Completed and High Risk before and after 2009:Compared the distribution of Completed and High Risk loans, the graphs above appear nearly similar distributions with right-skewed shape in both two periods.

Seems like CreditScoreAverage does not make difference to detect Completed and High Risk loans before and after 2009, except that the change of threshold.

It turns out: If Prosper only uses Credit Score for auditing, under the condition of more strict assessing after 2009(higher threshold), the credit score of overall borrowers at that time will primarily located in high-risk tiers, even for the loans which have the high probability to complete.

However, since Prosper combined Prosper Score as well, it makes Prosper Rating present much better measuring ability and appear a much better discriminating between completed and high risk loans.

Investigation so far…Let’s make a brief summary.

After 2009, Prosper applied the Prosper Score to make Prosper Rating have more discrimination between bad loans and completed loans, under the condition of stricter assessing standard on bureau score threshold after 2009.

So we can say Prosper Score played important role in the Prosper Rating metric.

Let’s using data to elaborate the assumption:Above graphs show that trends between Prosper Rating and Prosper Score appear a slightly positive shape, and the variance of Prosper Score in each Prosper Rating is more concentrated.

Compared to Credit Score Average, the variance of Credit Score Average in each Prosper Rating is kind of larger than Prosper Score’s.

Seems like Prosper puts more linear weights of Prosper Score than Credit Score Average on their own Prosper Rating model.

Next step: Lifting the Veil of Prosper ScoreSo what’s the major elements of Prosper Score?.I flipped through Prosper annual report from 2010 and 2013, I found some information about Prosper Score:Prosper Score is built to estimate the likelihood that a loan will go 61+ days past due.

Unlike credit score obtained from a credit reporting agency is based on a much broader population, Prosper Score is based on a more precise picture from a smaller lending platform subset.


I infer that if Prosper just measure the borrower credit by traditional bureau agency, in fact it is just a similar measuring way like a bank or other official lending institution.

Prosper Score consider the borrower behaviors that is unique among platform population.

Maybe such a custom assessment is more suitable for the lending platform market, because it is specifically measured by the Prosper borrower and applicant population.

Because we know the lending platform offers borrower a additional platform when he can’t borrow from a bank which measure credit score in more strict way.

The lending platform spreads the risk across many investors, and it make the measuring way to be very different.

Hence, I search what’s the major elements of Prosper Score.

I found some different sources that Prosper Score was composed by different set of elements over time, like the website or this one.

I am not going explore all the related features in the Prosper data in order to avoid making the report too long.

Instead, I will choose some variables I think important which also covered by these variable lists from these sources.

In the next part, I will explore the major features may be related to Prosper Score, which have the probability to make Prosper Rating be more discriminating in evaluating quality of loans.

Note: For more detail exploration result, see my report on Rpubs and codes in GitHub!.. More details

Leave a Reply