The key goal of research data science is to learn from data. One of the most powerful methods of learning from data is statistical modelling.
In modelling you build mathematical descriptions of the processes that generate your data. By doing so, researchers can see beyond the data and peek into the phenomeon that gave rise to your data.
This module provides a high-level introduction to statistical modelling. We aim to demystify the key concepts involved, providing a foundational approach to modelling that one could apply to any modelling problem. In doing so we will also cover the main pitfalls that any modeller needs to contend with.
Abstract concepts are better understood through application. Here we use simple models (linear and logistic regression) to bring modelling concepts to life, but the intended take-homes are not specific to any particular modelling technique.
The module is structured as follows:
References:
We will include more specific references as we move through the module. But useful accessible introductions to modelling that has inspired much of this module’s content are Poldrack’s Statistical Thinking for the 21st Century, Holmes and Huber’s Modern Statistics for Modern Biology, as well as the introductory sections of Richard McElreath’s wonderfully readable Statistical Rethinking and Bishop’s classic Machine Learning for Pattern Recognition textbook.