Often the experimental design or the data suggests a linear mixed model whose random effects are associated with multiple grouping factors.

An inclusion of multiple random effects terms which correspond to multiple grouping factors is often refered to as *crossed random effect*. A good coverage of linear mixed models with crossed random effects can be found in Chapter 2 of Douglas Bates' lme4 book.

A simple example of nested random effects from Bates' book is the following model. The $i$th observation of *diameter* in the $j$th *sample* from the $k$th *plate* is modeled as:

where *Intercept* is the overall average, and *SampleIntercept* as well as *PlateIntercept* are random intercept terms, due to the *sample* and *plate* that a particular observation comes from.

In `mixed_models`

we would fit such a model with:

```
LMM.from_formula(formula: "diameter ~ 1 + (1 | Sample) + (1 | Plate)",
data: penicillin)
```

As an example we fit a linear mixed model, which can be written as

$$y = \beta_{0} + \beta_{1} \cdot x + b_{0} + b_{1} \cdot x + c_{0} + c_{1} \cdot x + \epsilon,$$where $y$ is the response and $x$ is a predictor variable; $\beta_{0}$ and $\beta_{1}$ are the fixed intercept and slope coefficients; $b_{0}$ and $b_{1}$ are *random* intercept and slope coefficients due to factor $g$; $c_{0}$ and $c_{1}$ are *random* intercept and slope coefficients due to factor $h$.

The simulated data set is loaded, and its first five rows are displayed with:

In [1]:

```
require 'mixed_models'
# we pass `headers: true` to `#from_csv`, because
# mixed_models expects that all variable names in the data frame are ruby Symbols
df = Daru::DataFrame.from_csv "../spec/data/crossed_effects_data.csv", headers: true
df.head 5
```

Out[1]:

Daru::DataFrame:47127301961640 rows: 5 cols: 4 | ||||
---|---|---|---|---|

g | h | x | y | |

0 | 1 | 1 | 1.71742040246789 | 0.202546206520008 |

1 | 2 | 3 | 0.223744902239436 | 0.840573625427331 |

2 | 3 | 1 | -1.11598926418025 | -0.998332155138107 |

3 | 1 | 2 | -0.15562952641427 | -0.0145985318440115 |

4 | 1 | 2 | -0.108919415063593 | 0.722443338784882 |

`mixed_models`

, and display the estimated correlation structure of the random effects:

In [2]:

```
mod = LMM.from_formula(formula: "y ~ x + (x|g) + (x|h)", data: df, reml: false)
mod.ran_ef_summary
```

Out[2]:

Daru::DataFrame:47127296310540 rows: 4 cols: 4 | ||||
---|---|---|---|---|

g | g_x | h | h_x | |

g | 0.7539785718983487 | 0.999999998031062 | ||

g_x | 0.999999998031062 | 0.7490861098771483 | ||

h | 0.5638620447671433 | 0.9999999971404151 | ||

h_x | 0.9999999971404151 | 0.38533198068098046 |

Of course, we can use all of the model attributes, diagnostics and inference methods described in other `mixed_models`

tutorials for this model as well.

For example, we can test for the significance of the fixed slope effect, using the bootstrap approach with the following line of code:

In [3]:

```
p_value = mod.fix_ef_p(variable: :x, method: :bootstrap, nsim: 1000)
```

Out[3]:

0.0989010989010989

Or we can use the likelihood ratio test instead:

In [4]:

```
alternative_p_value = mod.fix_ef_p(variable: :x, method: :lrt)
```

Out[4]:

0.04871184664935746

The grouping factors of random effects terms can be *nested* in each other. We refer to such random effects structures as *nested* random effects (even though strictly speaking not the random effects but the corresponding grouping factors are nested). As for crossed random effects, a good reference for linear mixed models with nested random effects is Chapter 2 of Douglas Bates' lme4 book.

For example, consider an experiment where we measure the bone volume of each digit in each foot of a number of mice (i.e. digit is nested within foot, which is nested within mouse).
The $i$th observation of *volume* in the $m$th *digit* of the $k$th *foot* of the $j$th *mouse* can be modeled as:

i.e. the random effect *foot* only appears as nested within *mouse* (i.e. the intercept due to foot 1 in mouse 1 is different than the intercept due to foot 1 in mouse 2).

In `mixed_models`

we could fit such a model with:

```
LMM.from_formula(formula: "volume ~ 1 + (1 | mouse) + (1 | mouse:foot)",
data: bone_data)
```

**Remark:** In the `R`

package `lme4`

, instead of the formula "`volume ~ 1 + (1|mouse) + (1|mouse:foot)`

" a shorter equivalent formula "`volume ~ 1 + (1|mouse/foot)`

" can be used to fit the model. However, the formula parser in `mixed_models`

currently does not support the shortcut notation `/`

.

As an example we fit a linear mixed model with nested random effects to the following data.

In [5]:

```
df = Daru::DataFrame.from_csv("../spec/data/nested_effects_with_slope_data.csv", headers: true)
df.head 5
```

Out[5]:

Daru::DataFrame:47127301459180 rows: 5 cols: 4 | ||||
---|---|---|---|---|

a | b | x | y | |

0 | a3 | b1 | 0.388425310194731 | 5.10364866735101 |

1 | a3 | b2 | 0.446223000551612 | 6.23307061450375 |

2 | a3 | b1 | 1.54993657118302 | 12.2050404173393 |

3 | a3 | b1 | 1.52786614599715 | 12.0067595454774 |

4 | a3 | b2 | 0.760112121512708 | 8.20054527384668 |

We consider the following model:

- We take
`y`

to be the response and`x`

its predictor. - We consider the factor
`b`

to be nested within the factor`a`

. - We assume that the intercept varies due to variable
`a`

; that is, a different (random) intercept term for each level of`a`

. - Moreover, we assume that the intercept varies due to the factor
`b`

which is nested in`a`

; that is, different (random) intercept for each combination of levels of`a`

and`b`

.

We fit this model in `mixed_models`

, and display the estimated random effects correlation structure.

In [6]:

```
mod = LMM.from_formula(formula: "y ~ x + (1|a) + (1|a:b)", data: df, reml: false)
mod.ran_ef_summary
```

Out[6]:

Daru::DataFrame:47127301931880 rows: 2 cols: 2 | ||
---|---|---|

a | a_and_b | |

a | 1.3410830040769561 | |

a_and_b | 0.9769750031499026 |

`a`

and of the nested effect of `b`

and `a`

are of comparable magnitude.

We can use all methods available in `LMM`

to look at various parameter estimates or to do statistical inference.

For example, we can test the nested random effect for significance, in order to decide whether we should drop that term from the model to reduce model complexity. We can use the Chi squared based likelihood ratio test as follows.

In [7]:

```
p_val = mod.ran_ef_p(variable: :intercept, grouping: [:a, :b], method: :lrt)
```

Out[7]:

0.0050606262424956515

Where the nested grouping factor is supplied as an Array `[:a, :b]`

.

The p-value is small, suggesting that we probably should keep the term `(1|a:b)`

in the model formula. To be more sure we can perform a bootstrap based hypothesis test as follows.

In [8]:

```
p_val_boot = mod.ran_ef_p(variable: :intercept, grouping: [:a, :b],
method: :bootstrap, nsim: 1000)
```

Out[8]:

0.000999000999000999

The bootstrap p-value also support the above conclusion.