This is a python toolbox for visualising common generalized linear models:
The source code is hosted here: https://github.com/pxr687/RegressionVisualiser3D
The user supplies parameters for the data-generating process and a population of observations is created through those parameters. The population data consists of two continuous predictor variables and one outcome variable. The type of outcome variable depends on the type of regression model being visualized (continuous for linear regression, binary for logistic regression etc.)..
A random sample (of a size specified by the user) is drawn from the population data. A regression model is then fit to the sample data.
3D visualisations are then shown, which depict:
An optional regression table (with slopes and p-values etc.) is also shown, alongside the true regression equation used to generate the data.
To aid understanding, the user can also specify the names of the predictor variables and the outcome variable.
If the user does not supply population parameters, defaults are used.
Note: RegressionVisualiser3D is designed to be used in a Jupyter Notebook, but can also be used from the terminal (set markdown = False
if using from the terminal).
Important Note: Certain combinations of parameters will generate computational errors. In that instance play around with them until you find combinations which do not cause errors. To use for teaching, it's best to find legal combinations ahead of time :)
# import the package
import RegressionVisualiser3D
# generate the visualisation (model_plot is the only function that the user
# interacts with). The user can set the population parameters:
RegressionVisualiser3D.model_plot(intercept = 10, # set the intercept
predictor1_slope = 0.5, # set the slope of predictor 1
predictor2_slope = 1.2, # set the slope of predictor 2
interaction_slope = 0.05, # set the slope of interaction between the two predictors
predictor_correlation_weight = 0.2) # set how correlated the predictors should be
3D linear regression visualiser:
A population of 1000 observations has been generated. A random sample of 100 observations has been drawn from that population. Two graphs have been created.
The lefthand graph shows the population data and the population regression surface (e.g. if a regression model were fit to all of the population data).
The righthand graph shows the sample which was randomly drawn from the population. It also shows the sample regression surface (e.g. from a regression model fit to the sample data).
Beneath is the regression table (with slopes and p-values etc.) from the sample data. The true population regression equation used to generate the data is also shown.
Dep. Variable: | Outcome Variable | R-squared: | 0.999 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.999 |
Method: | Least Squares | F-statistic: | 4.919e+04 |
Date: | Sat, 29 Apr 2023 | Prob (F-statistic): | 8.34e-153 |
Time: | 12:19:29 | Log-Likelihood: | -189.92 |
No. Observations: | 100 | AIC: | 387.8 |
Df Residuals: | 96 | BIC: | 398.3 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 9.8506 | 2.231 | 4.416 | 0.000 | 5.423 | 14.278 |
Predictor 1 | 0.5263 | 0.048 | 11.016 | 0.000 | 0.431 | 0.621 |
Predictor 2 | 1.2091 | 0.052 | 23.155 | 0.000 | 1.105 | 1.313 |
Predictor 1 * Predictor 2 | 0.0493 | 0.001 | 45.743 | 0.000 | 0.047 | 0.051 |
Omnibus: | 5.384 | Durbin-Watson: | 1.725 |
---|---|---|---|
Prob(Omnibus): | 0.068 | Jarque-Bera (JB): | 4.835 |
Skew: | -0.429 | Prob(JB): | 0.0892 |
Kurtosis: | 3.651 | Cond. No. | 2.98e+04 |
True population regression equation:
$Y = 10 + 0.5 * X_1 + 1.2 * X_2 + 0.05 * X_1 * X_2 + error$
Where:
$Y$: Outcome Variable
$X_1$: Predictor 1
$X_2$: Predictor 2
RegressionVisualiser3D.model_plot(model_type = "poisson_regression",
intercept = 10,
predictor1_slope = 0.02,
predictor2_slope = 0.06,
interaction_slope = 0,
predictor_correlation_weight = 0.3)
3D poisson regression visualiser:
A population of 1000 observations has been generated. A random sample of 100 observations has been drawn from that population. Two graphs have been created.
The lefthand graph shows the population data and the population regression surface (e.g. if a regression model were fit to all of the population data).
The righthand graph shows the sample which was randomly drawn from the population. It also shows the sample regression surface (e.g. from a regression model fit to the sample data).
Beneath is the regression table (with slopes and p-values etc.) from the sample data. The true population regression equation used to generate the data is also shown.
Dep. Variable: | Outcome Variable | No. Observations: | 100 |
---|---|---|---|
Model: | GLM | Df Residuals: | 96 |
Model Family: | Poisson | Df Model: | 3 |
Link Function: | Log | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -1.0345e+07 |
Date: | Sat, 29 Apr 2023 | Deviance: | 2.0689e+07 |
Time: | 12:20:12 | Pearson chi2: | 2.13e+07 |
No. Iterations: | 8 | Pseudo R-squ. (CS): | 1.000 |
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 8.9243 | 0.002 | 4394.250 | 0.000 | 8.920 | 8.928 |
Predictor 1 | 0.0457 | 4.39e-05 | 1040.874 | 0.000 | 0.046 | 0.046 |
Predictor 2 | 0.0735 | 2.82e-05 | 2606.362 | 0.000 | 0.073 | 0.074 |
Predictor 1 * Predictor 2 | -0.0003 | 6.05e-07 | -493.170 | 0.000 | -0.000 | -0.000 |
True population regression equation:
$ln(Y) = 10 + 0.02 * X_1 + 0.06 * X_2 + 0 * X_1 * X_2 + error$
Where:
$Y$: Outcome Variable
$X_1$: Predictor 1
$X_2$: Predictor 2
RegressionVisualiser3D.model_plot(model_type = "binary_logistic_regression",
intercept = 0,
predictor1_slope = -0.01,
predictor2_slope = 0.02,
interaction_slope = 0,
predictor_correlation_weight = 0.02)
3D binary logistic regression visualiser:
A population of 1000 observations has been generated. A random sample of 100 observations has been drawn from that population. Two graphs have been created.
The lefthand graph shows the population data and the population regression surface (e.g. if a regression model were fit to all of the population data).
The righthand graph shows the sample which was randomly drawn from the population. It also shows the sample regression surface (e.g. from a regression model fit to the sample data).
Beneath is the regression table (with slopes and p-values etc.) from the sample data. The true population regression equation used to generate the data is also shown.
Optimization terminated successfully. Current function value: 0.230833 Iterations 9 Optimization terminated successfully. Current function value: 0.230833 Iterations 9
Dep. Variable: | Outcome Variable | No. Observations: | 100 |
---|---|---|---|
Model: | Logit | Df Residuals: | 96 |
Method: | MLE | Df Model: | 3 |
Date: | Sat, 29 Apr 2023 | Pseudo R-squ.: | 0.6497 |
Time: | 12:25:07 | Log-Likelihood: | -23.083 |
converged: | True | LL-Null: | -65.896 |
Covariance Type: | nonrobust | LLR p-value: | 1.906e-18 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -9.2721 | 9.192 | -1.009 | 0.313 | -27.289 | 8.745 |
Predictor 1 | -0.0476 | 0.088 | -0.539 | 0.590 | -0.221 | 0.126 |
Predictor 2 | 0.4467 | 0.203 | 2.196 | 0.028 | 0.048 | 0.845 |
Predictor 1 * Predictor 2 | -0.0017 | 0.002 | -0.992 | 0.321 | -0.005 | 0.002 |
True population regression equation:
$logit(Y) = 0 + -0.01 * X_1 + 0.02 * X_2 + 0 * X_1 * X_2 + error$
Where:
$Y$: Outcome Variable
$X_1$: Predictor 1
$X_2$: Predictor 2