Often the experimental design or the data suggests a linear mixed model whose random effects are associated with multiple grouping factors.
An inclusion of multiple random effects terms which correspond to multiple grouping factors is often refered to as crossed random effect. A good coverage of linear mixed models with crossed random effects can be found in Chapter 2 of Douglas Bates' lme4 book.
A simple example of nested random effects from Bates' book is the following model. The $i$th observation of diameter in the $j$th sample from the $k$th plate is modeled as:
$$diameter_{ijk} = Intercept + SampleIntercept_{j} + PlateIntercept_{k} + RandomError_{ij},$$where Intercept is the overall average, and SampleIntercept as well as PlateIntercept are random intercept terms, due to the sample and plate that a particular observation comes from.
In mixed_models
we would fit such a model with:
LMM.from_formula(formula: "diameter ~ 1 + (1 | Sample) + (1 | Plate)",
data: penicillin)
As an example we fit a linear mixed model, which can be written as
$$y = \beta_{0} + \beta_{1} \cdot x + b_{0} + b_{1} \cdot x + c_{0} + c_{1} \cdot x + \epsilon,$$where $y$ is the response and $x$ is a predictor variable; $\beta_{0}$ and $\beta_{1}$ are the fixed intercept and slope coefficients; $b_{0}$ and $b_{1}$ are random intercept and slope coefficients due to factor $g$; $c_{0}$ and $c_{1}$ are random intercept and slope coefficients due to factor $h$.
The simulated data set is loaded, and its first five rows are displayed with:
require 'mixed_models'
# we pass `headers: true` to `#from_csv`, because
# mixed_models expects that all variable names in the data frame are ruby Symbols
df = Daru::DataFrame.from_csv "../spec/data/crossed_effects_data.csv", headers: true
df.head 5
Daru::DataFrame:47127301961640 rows: 5 cols: 4 | ||||
---|---|---|---|---|
g | h | x | y | |
0 | 1 | 1 | 1.71742040246789 | 0.202546206520008 |
1 | 2 | 3 | 0.223744902239436 | 0.840573625427331 |
2 | 3 | 1 | -1.11598926418025 | -0.998332155138107 |
3 | 1 | 2 | -0.15562952641427 | -0.0145985318440115 |
4 | 1 | 2 | -0.108919415063593 | 0.722443338784882 |
Then we fit a linear mixed model in mixed_models
, and display the estimated correlation structure of the random effects:
mod = LMM.from_formula(formula: "y ~ x + (x|g) + (x|h)", data: df, reml: false)
mod.ran_ef_summary
Daru::DataFrame:47127296310540 rows: 4 cols: 4 | ||||
---|---|---|---|---|
g | g_x | h | h_x | |
g | 0.7539785718983487 | 0.999999998031062 | ||
g_x | 0.999999998031062 | 0.7490861098771483 | ||
h | 0.5638620447671433 | 0.9999999971404151 | ||
h_x | 0.9999999971404151 | 0.38533198068098046 |
We can see that the crossed random effects corresponding to the grouping factors $g$ and $h$ form uncorrelated blocks in the correlation matrix. That is, crossed random effects are assumed to be independent by the model.
Of course, we can use all of the model attributes, diagnostics and inference methods described in other mixed_models
tutorials for this model as well.
For example, we can test for the significance of the fixed slope effect, using the bootstrap approach with the following line of code:
p_value = mod.fix_ef_p(variable: :x, method: :bootstrap, nsim: 1000)
0.0989010989010989
Or we can use the likelihood ratio test instead:
alternative_p_value = mod.fix_ef_p(variable: :x, method: :lrt)
0.04871184664935746
Since the p-value obtained by LRT is barely below a significance level of 5%, and since the bootstrap method is typically more accurate than LRT and produces a rather high p-value here, we conclude that the data does not show enough evidence for statistical significance of predictor $x$ as a fixed effect.
The grouping factors of random effects terms can be nested in each other. We refer to such random effects structures as nested random effects (even though strictly speaking not the random effects but the corresponding grouping factors are nested). As for crossed random effects, a good reference for linear mixed models with nested random effects is Chapter 2 of Douglas Bates' lme4 book.
For example, consider an experiment where we measure the bone volume of each digit in each foot of a number of mice (i.e. digit is nested within foot, which is nested within mouse). The $i$th observation of volume in the $m$th digit of the $k$th foot of the $j$th mouse can be modeled as:
$$volume_{ijkm} = Intercept + MouseIntercept_{j} + FootIntercept_{kj} + RandomError_{ijkm},$$i.e. the random effect foot only appears as nested within mouse (i.e. the intercept due to foot 1 in mouse 1 is different than the intercept due to foot 1 in mouse 2).
In mixed_models
we could fit such a model with:
LMM.from_formula(formula: "volume ~ 1 + (1 | mouse) + (1 | mouse:foot)",
data: bone_data)
Remark: In the R
package lme4
, instead of the formula "volume ~ 1 + (1|mouse) + (1|mouse:foot)
" a shorter equivalent formula "volume ~ 1 + (1|mouse/foot)
" can be used to fit the model. However, the formula parser in mixed_models
currently does not support the shortcut notation /
.
As an example we fit a linear mixed model with nested random effects to the following data.
df = Daru::DataFrame.from_csv("../spec/data/nested_effects_with_slope_data.csv", headers: true)
df.head 5
Daru::DataFrame:47127301459180 rows: 5 cols: 4 | ||||
---|---|---|---|---|
a | b | x | y | |
0 | a3 | b1 | 0.388425310194731 | 5.10364866735101 |
1 | a3 | b2 | 0.446223000551612 | 6.23307061450375 |
2 | a3 | b1 | 1.54993657118302 | 12.2050404173393 |
3 | a3 | b1 | 1.52786614599715 | 12.0067595454774 |
4 | a3 | b2 | 0.760112121512708 | 8.20054527384668 |
We consider the following model:
y
to be the response and x
its predictor.b
to be nested within the factor a
.a
; that is, a different (random) intercept term for each level of a
.b
which is nested in a
; that is, different (random) intercept for each combination of levels of a
and b
.We fit this model in mixed_models
, and display the estimated random effects correlation structure.
mod = LMM.from_formula(formula: "y ~ x + (1|a) + (1|a:b)", data: df, reml: false)
mod.ran_ef_summary
Daru::DataFrame:47127301931880 rows: 2 cols: 2 | ||
---|---|---|
a | a_and_b | |
a | 1.3410830040769561 | |
a_and_b | 0.9769750031499026 |
We see that the standard deviations of the effect of a
and of the nested effect of b
and a
are of comparable magnitude.
We can use all methods available in LMM
to look at various parameter estimates or to do statistical inference.
For example, we can test the nested random effect for significance, in order to decide whether we should drop that term from the model to reduce model complexity. We can use the Chi squared based likelihood ratio test as follows.
p_val = mod.ran_ef_p(variable: :intercept, grouping: [:a, :b], method: :lrt)
0.0050606262424956515
Where the nested grouping factor is supplied as an Array [:a, :b]
.
The p-value is small, suggesting that we probably should keep the term (1|a:b)
in the model formula. To be more sure we can perform a bootstrap based hypothesis test as follows.
p_val_boot = mod.ran_ef_p(variable: :intercept, grouping: [:a, :b],
method: :bootstrap, nsim: 1000)
0.000999000999000999
The bootstrap p-value also support the above conclusion.