RegressionVisualiser3D¶

This is a python toolbox for visualising common generalized linear models:

Linear regression
Poisson regssion
Binary Logistic regression

The source code is hosted here: https://github.com/pxr687/RegressionVisualiser3D

The user supplies parameters for the data-generating process and a population of observations is created through those parameters. The population data consists of two continuous predictor variables and one outcome variable. The type of outcome variable depends on the type of regression model being visualized (continuous for linear regression, binary for logistic regression etc.)..

A random sample (of a size specified by the user) is drawn from the population data. A regression model is then fit to the sample data.

3D visualisations are then shown, which depict:

The population data
The population regression surface
The sample data
The sample regression surface

An optional regression table (with slopes and p-values etc.) is also shown, alongside the true regression equation used to generate the data.

To aid understanding, the user can also specify the names of the predictor variables and the outcome variable.

If the user does not supply population parameters, defaults are used.

Note: RegressionVisualiser3D is designed to be used in a Jupyter Notebook, but can also be used from the terminal (set markdown = False if using from the terminal).

Important Note: Certain combinations of parameters will generate computational errors. In that instance play around with them until you find combinations which do not cause errors. To use for teaching, it's best to find legal combinations ahead of time :)

In [ ]:

# import the package
import RegressionVisualiser3D

Linear Regression¶

In [21]:

# generate the visualisation (model_plot is the only function that the user
# interacts with). The user can set the population parameters:

RegressionVisualiser3D.model_plot(intercept = 10,                     # set the intercept
                                  predictor1_slope = 0.5,             # set the slope of predictor 1
                                  predictor2_slope = 1.2,             # set the slope of predictor 2
                                  interaction_slope = 0.05,           # set the slope of interaction between the two predictors
                                  predictor_correlation_weight = 0.2) # set how correlated the predictors should be

3D linear regression visualiser:

A population of 1000 observations has been generated. A random sample of 100 observations has been drawn from that population. Two graphs have been created.

The lefthand graph shows the population data and the population regression surface (e.g. if a regression model were fit to all of the population data).

The righthand graph shows the sample which was randomly drawn from the population. It also shows the sample regression surface (e.g. from a regression model fit to the sample data).

Beneath is the regression table (with slopes and p-values etc.) from the sample data. The true population regression equation used to generate the data is also shown.

OLS Regression Results
Dep. Variable:	Outcome Variable	R-squared:	0.999
Model:	OLS	Adj. R-squared:	0.999
Method:	Least Squares	F-statistic:	4.919e+04
Date:	Sat, 29 Apr 2023	Prob (F-statistic):	8.34e-153
Time:	12:19:29	Log-Likelihood:	-189.92
No. Observations:	100	AIC:	387.8
Df Residuals:	96	BIC:	398.3
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
const	9.8506	2.231	4.416	0.000	5.423	14.278
Predictor 1	0.5263	0.048	11.016	0.000	0.431	0.621
Predictor 2	1.2091	0.052	23.155	0.000	1.105	1.313
Predictor 1 * Predictor 2	0.0493	0.001	45.743	0.000	0.047	0.051

Omnibus:	5.384	Durbin-Watson:	1.725
Prob(Omnibus):	0.068	Jarque-Bera (JB):	4.835
Skew:	-0.429	Prob(JB):	0.0892
Kurtosis:	3.651	Cond. No.	2.98e+04

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.98e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

True population regression equation:

$Y = 10 + 0.5 * X_1 + 1.2 * X_2 + 0.05 * X_1 * X_2 + error$

Where:

$Y$: Outcome Variable

$X_1$: Predictor 1

$X_2$: Predictor 2

Poisson Regression¶

In [23]:

RegressionVisualiser3D.model_plot(model_type = "poisson_regression",
                                  intercept = 10,
                                  predictor1_slope = 0.02,
                                  predictor2_slope = 0.06,
                                  interaction_slope = 0,
                                  predictor_correlation_weight = 0.3)

3D poisson regression visualiser:

A population of 1000 observations has been generated. A random sample of 100 observations has been drawn from that population. Two graphs have been created.

The lefthand graph shows the population data and the population regression surface (e.g. if a regression model were fit to all of the population data).

The righthand graph shows the sample which was randomly drawn from the population. It also shows the sample regression surface (e.g. from a regression model fit to the sample data).

Beneath is the regression table (with slopes and p-values etc.) from the sample data. The true population regression equation used to generate the data is also shown.

Generalized Linear Model Regression Results
Dep. Variable:	Outcome Variable	No. Observations:	100
Model:	GLM	Df Residuals:	96
Model Family:	Poisson	Df Model:	3
Link Function:	Log	Scale:	1.0000
Method:	IRLS	Log-Likelihood:	-1.0345e+07
Date:	Sat, 29 Apr 2023	Deviance:	2.0689e+07
Time:	12:20:12	Pearson chi2:	2.13e+07
No. Iterations:	8	Pseudo R-squ. (CS):	1.000
Covariance Type:	nonrobust

	coef	std err	z	P>\|z\|	[0.025	0.975]
const	8.9243	0.002	4394.250	0.000	8.920	8.928
Predictor 1	0.0457	4.39e-05	1040.874	0.000	0.046	0.046
Predictor 2	0.0735	2.82e-05	2606.362	0.000	0.073	0.074
Predictor 1 * Predictor 2	-0.0003	6.05e-07	-493.170	0.000	-0.000	-0.000

True population regression equation:

$ln(Y) = 10 + 0.02 * X_1 + 0.06 * X_2 + 0 * X_1 * X_2 + error$

Where:

$Y$: Outcome Variable

$X_1$: Predictor 1

$X_2$: Predictor 2

Binary Logistic Regression¶

In [64]:

RegressionVisualiser3D.model_plot(model_type = "binary_logistic_regression",
                                  intercept = 0,
                                  predictor1_slope = -0.01,
                                  predictor2_slope = 0.02,
                                  interaction_slope = 0,
                                  predictor_correlation_weight = 0.02)

3D binary logistic regression visualiser:

A population of 1000 observations has been generated. A random sample of 100 observations has been drawn from that population. Two graphs have been created.

The lefthand graph shows the population data and the population regression surface (e.g. if a regression model were fit to all of the population data).

The righthand graph shows the sample which was randomly drawn from the population. It also shows the sample regression surface (e.g. from a regression model fit to the sample data).

Beneath is the regression table (with slopes and p-values etc.) from the sample data. The true population regression equation used to generate the data is also shown.

Optimization terminated successfully.
         Current function value: 0.230833
         Iterations 9
Optimization terminated successfully.
         Current function value: 0.230833
         Iterations 9

Logit Regression Results
Dep. Variable:	Outcome Variable	No. Observations:	100
Model:	Logit	Df Residuals:	96
Method:	MLE	Df Model:	3
Date:	Sat, 29 Apr 2023	Pseudo R-squ.:	0.6497
Time:	12:25:07	Log-Likelihood:	-23.083
converged:	True	LL-Null:	-65.896
Covariance Type:	nonrobust	LLR p-value:	1.906e-18

	coef	std err	z	P>\|z\|	[0.025	0.975]
const	-9.2721	9.192	-1.009	0.313	-27.289	8.745
Predictor 1	-0.0476	0.088	-0.539	0.590	-0.221	0.126
Predictor 2	0.4467	0.203	2.196	0.028	0.048	0.845
Predictor 1 * Predictor 2	-0.0017	0.002	-0.992	0.321	-0.005	0.002

True population regression equation:

$logit(Y) = 0 + -0.01 * X_1 + 0.02 * X_2 + 0 * X_1 * X_2 + error$

Where:

$Y$: Outcome Variable

$X_1$: Predictor 1

$X_2$: Predictor 2