As with the Linear Regression Model, we use the cleaned up Lending Club data set as input.

In [1]:

```
%pylab inline
import pandas as pd
dfr = pd.read_csv('../datasets/loanf.csv')
# inspect, sanity check
dfr.head()
```

Populating the interactive namespace from numpy and matplotlib

Out[1]:

Interest.Rate | FICO.Score | Loan.Length | Monthly.Income | Loan.Amount | |
---|---|---|---|---|---|

6 | 15.31 | 670 | 36 | 4891.67 | 6000 |

11 | 19.72 | 670 | 36 | 3575.00 | 2000 |

12 | 14.27 | 665 | 36 | 4250.00 | 10625 |

13 | 21.67 | 670 | 60 | 14166.67 | 28000 |

21 | 21.98 | 665 | 36 | 6666.67 | 22000 |

In [10]:

```
# we add a column which indicates (True/False) whether the interest rate is <= 12
dfr['TF']=dfr['Interest.Rate']<=12
# inspect again
dfr.head()
# we see that the TF values are False as Interest.Rate is higher than 12 in all these cases
```

Out[10]:

Interest.Rate | FICO.Score | Loan.Length | Monthly.Income | Loan.Amount | TF | |
---|---|---|---|---|---|---|

6 | 15.31 | 670 | 36 | 4891.67 | 6000 | False |

11 | 19.72 | 670 | 36 | 3575.00 | 2000 | False |

12 | 14.27 | 665 | 36 | 4250.00 | 10625 | False |

13 | 21.67 | 670 | 60 | 14166.67 | 28000 | False |

21 | 21.98 | 665 | 36 | 6666.67 | 22000 | False |

In [11]:

```
# now we check the rows that have interest rate == 10 (just some number < 12)
# this is just to confirm that the TF value is True where we expect it to be
d = dfr[dfr['Interest.Rate']==10]
d.head()
# all is well
```

Out[11]:

Interest.Rate | FICO.Score | Loan.Length | Monthly.Income | Loan.Amount | TF | |
---|---|---|---|---|---|---|

650 | 10 | 700 | 36 | 3250.00 | 2800 | True |

204 | 10 | 715 | 36 | 15416.67 | 6000 | True |

440 | 10 | 730 | 36 | 6250.00 | 21000 | True |

521 | 10 | 715 | 36 | 5000.00 | 12000 | True |

1017 | 10 | 735 | 60 | 4000.00 | 5000 | True |

In [12]:

```
import statsmodels.api as sm
# statsmodels requires us to add a constant column representing the intercept
dfr['intercept']=1.0
# identify the independent variables
ind_cols=['FICO.Score','Loan.Amount','intercept']
logit = sm.Logit(dfr['TF'], dfr[ind_cols])
result=logit.fit()
```

Optimization terminated successfully. Current function value: 798.758166 Iterations 8

In [13]:

```
# get the fitted coefficients from the results
coeff = result.params
print coeff
```

FICO.Score 0.087423 Loan.Amount -0.000174 intercept -60.125045 dtype: float64

So, using the above coefficients, the linear part of our predictor is

$$z = -60.125 + 0.087423*FicoScore -0.000174*LoanAmount$$Finally, the probability of our desired outcome, ie our getting a loan at 12% interest or less, is

$$p(z) = \frac{1}{1 + e^{b_0 + b_1*FicoScore + b_2*LoanAmount}}$$where $b_0 = −60.125, b_1 = 0.087423$ and $b_2 = −0.000174$

We create a function in code that encapsulates all this.

It takes as input, a borrowers FICO score, the desired loan amount and the coefficient vector from our model. It returns a probability of getting the loan, a number between 0 and 1.

In [14]:

```
def pz(fico,amt,coeff):
# compute the linear expression by multipyling the inputs by their respective coefficients.
# note that the coefficient array has the intercept coefficient at the end
z = coeff[0]*fico + coeff[1]*amt + coeff[2]
return 1/(1+exp(-1*z))
```

In [15]:

```
pz(720,10000,coeff)
```

Out[15]:

0.74637858895151077

Now we are going to try (fico, amt) pairs as follows:

- 720,20000
- 720,30000
- 820,10000
- 820,20000
- 820,30000

In [16]:

```
print("Trying multiple FICO Loan Amount combinations: ")
print('----')
print("fico=720, amt=10,000")
print(pz(720,10000,coeff))
print("fico=720, amt=20,000")
print(pz(720,20000,coeff))
print("fico=720, amt=30,000")
print(pz(720,30000,coeff))
print("fico=820, amt=10,000")
print(pz(820,10000,coeff))
print("fico=820, amt=20,000")
print(pz(820,20000,coeff))
print("fico=820, amt=30,000")
print(pz(820,30000,coeff))
```

In [17]:

```
pz(820,63000,coeff)
```

Out[17]:

0.64525116319288345

Try the following pairs of (fico, amt) values and plug them into the pz() function mimicing the syntax below. What insight does this give you?

- 820,50000
- 820,60000
- 820,70000
- 820,63000
- 820,65000

Place your cursor on the cell below. Hit shift-enter to recreate the result.

Then click Insert->Cell Below via the Insert menu dropdown. This creates a new empty cell.
Now enter the pz() function with the next pair of values. Hit shift-enter.
Repeat this till the end of the list of values.
Answer the question above, if possible.
Then explore other pairs as you wish.

In [18]:

```
pz(820,50000,coeff)
```

Out[18]:

0.94586368176054425

Use the supporting notebooks in the appendix to learn some plotting techniques and try to create a yes/no plot for loan amount on x-axis and probability of loan on the y-axis for a FICO score of 720. Do the same for a fico score of 820.

How would you create a plot that showed the probability of getting a loan as a function of *both* FICO score and loan amount varying? What tools would you need?

We see for the (720, 10000) case, a probability close to 0.7 which tells us that we have a good chance of getting the loan at a favorable interest rate. Using our threshold of 0.67 we count this as a 'yes'.

Using a Logistic Regression model, a desired Interest Rate of 12 per cent, we use dthe Lending Club dataset to compute a probability that we will get a 10,000 dollar loan with a FICO Score of 720. Our result indicated with a strong degree of certainty that we would be able to procure a loan with these terms.

When we try the multiple combinations we see the following:

- With a FICO Score of 720 the chance of a 20,000 and 30,000 Loan is lower than 0.67 so we count that as a probable "no".
- For the same amounts the FICO=820 score corresponds to probabilities greater than 0.75 and we count that as a "yes".
- For the same FICO the probability goes down with increasing Loan Amount
- For the same Loan Amount, the lower FICO has a lower probability.
- This is consistent with the signs of the coefficients for these variables in our model.

In [19]:

```
from IPython.core.display import HTML
def css_styling():
styles = open("../styles/custom.css", "r").read()
return HTML(styles)
css_styling()
```

Out[19]: