1 Loading packages

We start by loading the required packages. If they are not already installed, then the code below will first install them, before loading them. The package faux is not available on CRAN for R version 4.4.2, hence we will install it via github. See details here

requiredPackages = c('tidyverse', 'knitr', 'lme4', 'ggstats', 'ggstatsplot', 'sjPlot', 'paletteer', 'lattice', 'car')

for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p, dependencies = TRUE)
  library(p,character.only = TRUE)
}

Specific to the faux package

First install the package devtools, and then install the package faux via the github install.

# install.packages("devtools")
# devtools::install_github("debruine/faux")
library(faux)

2 Data set

Our experiment has the following structure: we asked 40 subjects to respond to 40 items in a fully crossed design. There were two IVs: Condition with congruent and incongruent and Age with young and old. The Condition is a within subject variable; age is a between subject. However, Condition and Age were both within item variables. The dependant variable was LookingTime.

This meant that we used items within each of the young and the old subjects in addition to items within each of the congurent and incongruent conditions.

Our research question is as follows: Age of subject will impact the Looking Time in the two conditions. Our hypothesis is: The older a subject is, the more the looking time it is to the incongruent condition.

We will use the package faux to simulate a dataframe. This step is crucial in assessing contributions of particular predictors and for testing ideas as we already know the distribution of the data and what to expect as an outcome when it comes to the fixed effect in question.

The simulated data has the following parameters

set.seed(42)
# define parameters
Subj_n = 40  # number of subjects
Item_n = 40  # number of items
b0 = 100       # intercept
b1 = 2.5 * b0   # fixed effect of condition
u0s_sd = 300   # random intercept SD for subjects
u0i_sd = 200   # random intercept SD for items
u1s_sd = 100   # random b1 slope SD for subjects
u1i_sd = 50   # random b1 slope SD for items
r01s = -0.3     # correlation between random effects 0 and 1 for subjects
r01i = 0.2     # correlation between random effects 0 and 1 for items
sigma_sd = 150 # error SD

# set up data structure
dataCong <- add_random(Subj = Subj_n, Item = Item_n) %>%
  # add within categorical variable for subject
  add_within("Subj", Cond = c("Congruent", "Incongruent")) %>%
  add_recode("Cond", "Cond.Incongruent", Congruent = 0, Incongruent = 1) %>%
  # add between categorical variable for subject
  add_between("Subj", Age = c("Young", "Old")) %>%
  add_recode("Age", "Age.Old", Young = 0, Old = 1) %>%
  # add random effects 
  add_ranef("Subj", u0s = u0s_sd, u1s = u1s_sd, .cors = r01s) %>%
  add_ranef("Item", u0i = u0i_sd, u1i = u1i_sd, .cors = r01i) %>%
  ##add_ranef(c("Subj", "Item"), u0si = u0s_sd + u0i_sd) %>%
  ##add_ranef(c("Subj", "Item"), u1si = u1s_sd + u1i_sd) %>%
  add_ranef(sigma = sigma_sd) %>%
  # calculate DV
  mutate(LookingTime = b0 + b1 + u0s + u0i + #u0si + u1si +
           (((b1 + u1s) + 0.5) * Cond.Incongruent) + (((b1 + u1s) + 0.9) * Age.Old) + # subject specific variation
           (((b1 + u1i) - 0.3) * Cond.Incongruent) + (((b1 + u1i) - 0.25) * Age.Old) + # item specific variation  
           sigma)
write.csv(dataCong, "dataCong.csv")

If you were not able to install the faux package, simply uncomment the following line of code below to import the dataset

#dataCong <- read.csv("dataCong.csv")[-1]

3 Verify dataframe

3.1 Tibble

dataCong <- dataCong %>% 
  mutate(Subj = factor(Subj),
         Item = factor(Item))
dataCong

3.2 Counts

3.2.1 Subjects

dataCong %>% 
  group_by(Cond, Age, Subj) %>% 
  summarise(count = n())

3.2.2 Items

3.2.2.1 Age

dataCong %>% 
  group_by(Age, Item) %>% 
  summarise(count = n())

3.2.2.2 Cond

dataCong %>% 
  group_by(Cond, Item) %>% 
  summarise(count = n())

4 Visualisations

4.1 Condition by Age

The figure below confirms this, where we see an increase in LookingTime in the incongruent condition and oevrall, older participants show an increase in LookingTime

dataCong %>% 
  ggplot(aes(x = Cond,
             y = LookingTime,
             colour = Age)) +
  theme_bw() + 
  geom_boxplot() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm")

4.2 Subject by Condition

This figure shows that subjects are variable in how they responded to this task

dataCong %>% 
  ggplot(aes(x = Cond,
             y = LookingTime,
             colour = Subj)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj))))

4.3 Subject by Age

This figure shows that subjects had an impact on the LookingTime in both age groups, simply due to their variable responses to the different items

dataCong %>% 
  ggplot(aes(x = Age,
             y = LookingTime,
             colour = Subj)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj))))

4.4 Item by Condition

This figure shows that items had an impact on the LookingTime in both conditions

dataCong %>% 
  ggplot(aes(x = Cond,
             y = LookingTime,
             colour = Item)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item))))

4.5 Subject by Age

This figure shows that items had an impact on the LookingTime in both age groups

dataCong %>% 
  ggplot(aes(x = Age,
             y = LookingTime,
             colour = Item)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item))))

5 Modelling strategy

We use an LMER model with a crossed random effect. To choose our optimal model, we start first by a simple model with only random intercepts, increasing complexity by accounting for random slopes for both subjects and items. It is clear from our visualisation above, that there is no interaction between the two predictors. However, for demonstration purposes, we do test for this

6 Simple Linear Model

mdl.lm <- dataCong %>% 
  lm(LookingTime ~ Cond + Age, data = .)
summary(mdl.lm)

Call:
lm(formula = LookingTime ~ Cond + Age, data = .)

Residuals:
     Min       1Q   Median       3Q      Max 
-1215.59  -290.17   -21.78   264.89  1661.41 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       304.95      12.91   23.61   <2e-16 ***
CondIncongruent   504.35      14.91   33.82   <2e-16 ***
AgeOld            566.44      14.91   37.98   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 421.8 on 3197 degrees of freedom
Multiple R-squared:  0.4472,    Adjusted R-squared:  0.4469 
F-statistic:  1293 on 2 and 3197 DF,  p-value: < 2.2e-16
hist(residuals(mdl.lm))

qqnorm(residuals(mdl.lm)); qqline(residuals(mdl.lm))

plot(fitted(mdl.lm), residuals(mdl.lm), cex = 4)

6.1 No interaction

6.1.1 Crossed random intercepts

xmdl.rand.Interc <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

6.1.2 Crossed random intercepts + By-speaker random slopes

xmdl.rand.Slope1 <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

6.1.3 Crossed random intercepts + By-speaker and by-item random slopes

xmdl.rand.Slope2 <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 + Cond | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

6.1.4 Crossed random intercepts + By-speaker and by-item random slopes

xmdl.rand.Slope3 <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 + Cond + Age| Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

6.2 With interaction

6.2.1 Crossed random intercepts + Interaction

xmdl.rand.Interc.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

6.2.2 Crossed random intercepts + By-speaker random slopes + Interaction

xmdl.rand.Slope1.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 + Cond | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

6.2.3 Crossed random intercepts + By-speaker and by-item random slopes + Interaction

xmdl.rand.Slope2.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 + Cond | Subj) + 
         (1 + Cond | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

6.2.4 Crossed random intercepts + By-speaker and by-item random slopes

xmdl.rand.Slope3.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 + Cond | Subj) + 
         (1 + Cond * Age| Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

6.3 Model comparison

anova(xmdl.rand.Interc, xmdl.rand.Slope1, xmdl.rand.Slope2, xmdl.rand.Slope3, xmdl.rand.Interc.Int, xmdl.rand.Slope1.Int, xmdl.rand.Slope2.Int, xmdl.rand.Slope3.Int)
Data: .
Models:
xmdl.rand.Interc: LookingTime ~ Cond + Age + (1 | Subj) + (1 | Item)
xmdl.rand.Interc.Int: LookingTime ~ Cond * Age + (1 | Subj) + (1 | Item)
xmdl.rand.Slope1: LookingTime ~ Cond + Age + (1 + Cond | Subj) + (1 | Item)
xmdl.rand.Slope1.Int: LookingTime ~ Cond * Age + (1 + Cond | Subj) + (1 | Item)
xmdl.rand.Slope2: LookingTime ~ Cond + Age + (1 + Cond | Subj) + (1 + Cond | Item)
xmdl.rand.Slope2.Int: LookingTime ~ Cond * Age + (1 + Cond | Subj) + (1 + Cond | Item)
xmdl.rand.Slope3: LookingTime ~ Cond + Age + (1 + Cond | Subj) + (1 + Cond + Age | Item)
xmdl.rand.Slope3.Int: LookingTime ~ Cond * Age + (1 + Cond | Subj) + (1 + Cond * Age | Item)
                     npar   AIC   BIC logLik deviance    Chisq Df Pr(>Chisq)    
xmdl.rand.Interc        6 42074 42110 -21031    42062                           
xmdl.rand.Interc.Int    7 42050 42093 -21018    42036  25.8359  1  3.717e-07 ***
xmdl.rand.Slope1        8 41834 41883 -20909    41818 217.8699  1  < 2.2e-16 ***
xmdl.rand.Slope1.Int    9 41833 41888 -20908    41815   3.0247  1     0.0820 .  
xmdl.rand.Slope2       10 41808 41869 -20894    41788  27.3253  1  1.719e-07 ***
xmdl.rand.Slope2.Int   11 41807 41874 -20892    41785   3.0149  1     0.0825 .  
xmdl.rand.Slope3       13 41780 41858 -20877    41754  31.2599  2  1.629e-07 ***
xmdl.rand.Slope3.Int   18 41786 41895 -20875    41750   3.3401  5     0.6477    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The results above highlight that the model accounting for both by-subject and by-item random intercepts and random slopes for Condition are improving the model fit in comparison to a more complex model. We rerun the model with REML = False

7 Optimal model

xmdl.Optimal <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 + Cond + Age | Item), data = ., REML = TRUE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))

7.1 Model criticism

hist(residuals(xmdl.Optimal))

qqnorm(residuals(xmdl.Optimal)); qqline(residuals(xmdl.Optimal))

plot(fitted(xmdl.Optimal), residuals(xmdl.Optimal), cex = 4)

7.2 Summary

xmdl.Optimal %>% 
  summary()
Linear mixed model fit by REML ['lmerMod']
Formula: LookingTime ~ Cond + Age + (1 + Cond | Subj) + (1 + Cond + Age |      Item)
   Data: .
Control: lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))

REML criterion at convergence: 41724.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.5337 -0.6485 -0.0054  0.6647  3.5358 

Random effects:
 Groups   Name            Variance Std.Dev. Corr       
 Subj     (Intercept)     123480   351.40              
          CondIncongruent  10746   103.66   -0.26      
 Item     (Intercept)      38781   196.93              
          CondIncongruent   1872    43.27    0.31      
          AgeOld            1851    43.03   -0.13  0.69
 Residual                  22613   150.38              
Number of obs: 3200, groups:  Subj, 40; Item, 40

Fixed effects:
                Estimate Std. Error t value
(Intercept)       315.04      83.42   3.777
CondIncongruent   504.35      18.54  27.204
AgeOld            546.28     107.70   5.072

Correlation of Fixed Effects:
            (Intr) CndInc
CndIncngrnt -0.120       
AgeOld      -0.646  0.016

7.3 ANOVA

Anova(xmdl.Optimal)
Analysis of Deviance Table (Type II Wald chisquare tests)

Response: LookingTime
       Chisq Df Pr(>Chisq)    
Cond 740.067  1  < 2.2e-16 ***
Age   25.729  1  3.928e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

7.4 Plotting model’s output

7.4.1 With ggstats

We use two functions from the package ggstats.

7.4.1.1 A plot

ggcoef_model(xmdl.Optimal)

7.4.2 A plot + a table + 95% CI

ggcoef_table(xmdl.Optimal)

7.4.3 With ggstatsplot

ggcoefstats(xmdl.Optimal, point.args = list(color = paletteer_c("grDevices::rainbow", 13), stats.label.color = paletteer_c("grDevices::rainbow", 13)))
Warning: Ignoring unknown parameters: `stats.label.colour`Number of labels is greater than default palette color count.
• Select another color `palette` (and/or `package`).

7.5 Exploring random effects

7.5.1 Subject random effects

random_effects <- ranef(xmdl.Optimal) %>%
  pluck(1) %>%
  rownames_to_column() %>%
  rename(Subject = rowname, Intercept = "(Intercept)") 
 
random_effects %>%
  knitr::kable()
Subject Intercept CondIncongruent
Subj01 -404.45054 58.1953368
Subj02 164.05584 56.9126271
Subj03 -56.37438 -104.0600477
Subj04 -87.06837 43.7884729
Subj05 -55.65214 104.5323900
Subj06 -42.39999 -7.1394328
Subj07 -448.81496 124.6524549
Subj08 -171.75810 -105.4504187
Subj09 -587.74594 123.6911001
Subj10 -44.50407 -50.9618431
Subj11 -346.46288 1.6396966
Subj12 -550.49633 158.6524706
Subj13 417.66027 -169.1409700
Subj14 -11.45716 -30.0008252
Subj15 62.46659 -14.3520076
Subj16 -210.03389 -23.9077430
Subj17 124.19946 -122.7635167
Subj18 666.67147 -42.1836697
Subj19 815.78333 217.1615447
Subj20 -393.58521 26.1633665
Subj21 120.64645 37.9297097
Subj22 455.31816 -46.4925696
Subj23 42.42431 -54.4053551
Subj24 -496.29436 -31.7202386
Subj25 -552.91357 144.2714898
Subj26 -21.91509 -159.7897817
Subj27 65.14521 54.5212747
Subj28 373.40089 -166.0862255
Subj29 -132.82108 -21.8820276
Subj30 104.73419 -112.8914459
Subj31 -79.72260 125.0118699
Subj32 -182.30982 15.3898936
Subj33 -303.68384 -0.1957386
Subj34 215.98609 136.5078869
Subj35 -152.60397 76.5559812
Subj36 401.50547 -145.0045141
Subj37 274.61818 -114.4122298
Subj38 139.48473 -82.7958618
Subj39 733.87830 58.5575032
Subj40 155.08935 41.5013939
NA

7.5.2 Items random effects

random_effects <- ranef(xmdl.Optimal) %>%
  pluck(2) %>%
  rownames_to_column() %>%
  rename(Items = rowname, Intercept = "(Intercept)") 
 
random_effects %>%
  knitr::kable()
Items Intercept CondIncongruent AgeOld
Item01 -247.414507 16.2991489 28.312990
Item02 -28.240445 20.6149279 37.267856
Item03 -27.548429 11.7099339 -3.218254
Item04 30.999046 32.2392558 32.548236
Item05 246.216859 38.0585978 2.963549
Item06 -135.583736 28.3151555 28.127356
Item07 75.461498 28.6715548 -2.095288
Item08 72.679400 69.1730999 73.479523
Item09 -180.071444 32.9237845 60.953057
Item10 -170.029328 8.4112590 23.865843
Item11 -278.857705 -39.6392919 -22.526611
Item12 114.001985 4.7652485 -15.353357
Item13 -137.533542 -49.5701246 -36.764423
Item14 -272.928912 -85.2576750 -45.340742
Item15 219.855541 -70.2475223 -83.121735
Item16 243.071753 13.0287924 -4.812872
Item17 210.792625 10.6898312 17.840004
Item18 328.737750 -36.9498461 -82.922663
Item19 -46.910049 28.2829527 22.930110
Item20 -107.870056 -25.5384308 -13.846115
Item21 -209.598744 -56.8339801 -29.482588
Item22 -190.253581 -3.0439700 20.502593
Item23 207.958662 34.4328035 33.373202
Item24 -371.211245 -46.1955836 1.751542
Item25 163.603914 15.0837752 -7.780876
Item26 -56.954235 -39.1972201 -59.652416
Item27 126.648406 7.5867262 -1.102683
Item28 48.176096 -0.4969082 1.358911
Item29 -76.699978 -27.4397677 -1.462473
Item30 -20.393146 33.1813806 36.286219
Item31 33.763413 -15.9993987 -14.400133
Item32 -15.979008 47.9237115 67.643383
Item33 64.451213 -8.2252951 -47.157817
Item34 102.286457 22.6554454 17.134672
Item35 319.882212 54.1470278 10.485257
Item36 115.886689 22.7596669 9.094458
Item37 93.471368 9.1514824 -17.490313
Item38 -523.293815 -46.1545897 -5.185309
Item39 287.858945 20.0462752 2.292928
Item40 -8.431927 -59.3622338 -34.495018
NA

7.5.3 Plots

7.5.3.1 sjPlot

7.5.3.1.1 Fixed effects
7.5.3.1.1.1 Condition
plot_model(xmdl.Optimal, type="pred", terms=c("Cond"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

7.5.3.1.1.2 Age
plot_model(xmdl.Optimal, type="pred", terms=c("Age"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

7.5.3.1.1.3 Condition by Age
plot_model(xmdl.Optimal, type="emm", terms=c("Cond", "Age"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

7.5.3.1.2 Random effects
7.5.3.1.2.1 Subject
plot_model(xmdl.Optimal, type="pred", terms=c("Subj"), pred.type="re", ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

7.5.3.1.2.2 Subject by Condition
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Subj"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line() 

7.5.3.1.2.3 Subject by Age
plot_model(xmdl.Optimal, type="pred", terms=c("Age", "Subj"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

7.5.3.1.2.4 Item
plot_model(xmdl.Optimal, type="pred", terms=c("Item"), pred.type="re", ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

7.5.3.1.2.5 Item by Cond
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

7.5.3.1.2.6 Item by Age
plot_model(xmdl.Optimal, type="pred", terms=c("Age", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

7.5.3.1.2.7 Item by Cond facetted by Age
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Item", "Age"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

7.5.3.1.2.8 Subject by Item
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

7.5.3.1.2.9 Subject by Item facetted by Cond
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item", "Cond"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

7.5.3.1.2.10 Subject by Item facetted by Age
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item", "Age"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

7.5.3.2 Lattice

7.5.3.2.1 Subject Intercepts
dotplot(ranef(xmdl.Optimal))$Subj[1]

7.5.3.2.2 Subject Slopes
dotplot(ranef(xmdl.Optimal), xlim = c(-350, 350))$Subj[2]

7.5.3.2.3 Item Intercepts
dotplot(ranef(xmdl.Optimal))$Item[1]

7.5.3.2.4 Item Slopes for Cond
dotplot(ranef(xmdl.Optimal), xlim = c(-150, 150))$Item[2]

7.5.3.2.5 Item Slopes for Age
dotplot(ranef(xmdl.Optimal), xlim = c(-150, 150))$Item[3]

8 Conclusion

This tutorial showed how one can explore random effects and formally assess the need for Random slopes

As a rule of thumb: Any within-subject (or within-item) should be tested for a potential inclusion as a random slope

Fixed effects provides averages over all observations, even when using mixed effects regressions; we need to explore what random effects (intercepts and slopes) tell us.

In this example, we see that many subjects vary beyond the fixed effect; Standard Errors are not enough to quantify this type of variation. The same is true for items that are not explored routinely!

I hope this tutorial helped you to uncover the role of participants and items and what they can tell us beyond the fixed effect!

9 session info

sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8    LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Paris
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] faux_1.2.1.9002    car_3.1-3          carData_3.0-5      lattice_0.22-6     paletteer_1.6.0    sjPlot_2.8.17      ggstatsplot_0.13.0 ggstats_0.7.0      lme4_1.1-35.5      Matrix_1.7-1      
[11] knitr_1.49         lubridate_1.9.4    forcats_1.0.0      stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2        readr_2.1.5        tidyr_1.3.1        tibble_3.2.1       ggplot2_3.5.1     
[21] tidyverse_2.0.0   

loaded via a namespace (and not attached):
 [1] rematch2_2.1.2         sandwich_3.1-1         rlang_1.1.4            magrittr_2.0.3         multcomp_1.4-26        furrr_0.3.1            compiler_4.4.2         statsExpressions_1.6.2
 [9] mgcv_1.9-1             vctrs_0.6.5            pkgconfig_2.0.3        fastmap_1.2.0          backports_1.5.0        labeling_0.4.3         effectsize_1.0.0       utf8_1.2.4            
[17] rmarkdown_2.29         tzdb_0.4.0             haven_2.5.4            nloptr_2.1.1           xfun_0.49              cachem_1.1.0           labelled_2.13.0        jsonlite_1.8.9        
[25] sjmisc_2.8.10          ggeffects_2.0.0        broom_1.0.7            parallel_4.4.2         R6_2.5.1               bslib_0.8.0            RColorBrewer_1.1-3     stringi_1.8.4         
[33] parallelly_1.40.1      boot_1.3-31            jquerylib_0.1.4        estimability_1.5.1     Rcpp_1.0.13-1          zoo_1.8-12             parameters_0.24.0      correlation_0.8.6     
[41] splines_4.4.2          timechange_0.3.0       tidyselect_1.2.1       rstudioapi_0.17.1      abind_1.4-8            yaml_2.3.10            codetools_0.2-20       sjlabelled_1.2.0      
[49] listenv_0.9.1          withr_3.0.2            bayestestR_0.15.0      coda_0.19-4.1          evaluate_1.0.1         future_1.34.0          survival_3.7-0         RcppParallel_5.1.9    
[57] pillar_1.9.0           insight_1.0.0          generics_0.1.3         hms_1.1.3              rstantools_2.4.0       munsell_0.5.1          scales_1.3.0           minqa_1.2.8           
[65] globals_0.16.3         xtable_1.8-4           glue_1.8.0             emmeans_1.10.5         tools_4.4.2            mvtnorm_1.3-2          grid_4.4.2             cards_0.4.0           
[73] datawizard_0.13.0      colorspace_2.1-1       nlme_3.1-166           patchwork_1.3.0        performance_0.12.4     Formula_1.2-5          cli_3.6.3              fansi_1.0.6           
[81] broom.helpers_1.17.0   sjstats_0.19.0         gtable_0.3.6           broom.mixed_0.2.9.6    sass_0.4.9             zeallot_0.1.0          digest_0.6.37          prismatic_1.1.2       
[89] ggrepel_0.9.6          TH.data_1.1-2          farver_2.1.2           htmltools_0.5.8.1      lifecycle_1.0.4        MASS_7.3-61           
---
title: "Tutorial: Exploring Random Effects: What Do Participants and Items Tell us Beyond the Fixed Effects?"
author:
  name: "Jalal Al-Tamimi"
  affiliation: Université Paris Cité
date: "`r format(Sys.time(), '%d %B %Y')`"
output:
  html_notebook:
    highlight: pygments
    number_sections: yes
    toc: yes
    toc_depth: 6
    toc_float:
      collapsed: yes
      fig_crop: no
editor_options: 
  markdown: 
    wrap: sentence
---


# Loading packages 

We start by loading the required packages. If they are not already installed, then the code below will first install them, before loading them. The package `faux` is not available on CRAN for R version 4.4.2, hence we will install it via github. See details [here](https://debruine.github.io/faux/)

```{r warning=FALSE, message=FALSE, error=FALSE}
requiredPackages = c('tidyverse', 'knitr', 'lme4', 'ggstats', 'ggstatsplot', 'sjPlot', 'paletteer', 'lattice', 'car')

for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p, dependencies = TRUE)
  library(p,character.only = TRUE)
}
```

Specific to the `faux` package

First install the package devtools, and then install the package `faux` via the github install. 

```{r warning=FALSE, message=FALSE, error=FALSE}
# install.packages("devtools")
# devtools::install_github("debruine/faux")
library(faux)
```


# Data set

Our experiment has the following structure: we asked 40 subjects to respond to 40 items in a fully crossed design. There were two IVs: Condition with `congruent` and `incongruent` and Age with `young` and `old`. The Condition is a within subject variable; age is a between subject. However, Condition and Age were both within item variables. The dependant variable was LookingTime. 

This meant that we used items within each of the `young` and the `old` subjects in addition to items within each of the `congurent` and `incongruent` conditions.  

Our research question is as follows: `Age` of subject will impact the `Looking Time` in the two conditions. 
Our hypothesis is: The older a subject is, the more the looking time it is to the incongruent condition. 

We will use the package `faux` to simulate a dataframe. This step is crucial in assessing contributions of particular predictors and for testing ideas as we already know the distribution of the data and what to expect as an outcome when it comes to the fixed effect in question.


The simulated data has the following parameters

```{r warning=FALSE, message=FALSE, error=FALSE}
set.seed(42)
# define parameters
Subj_n = 40  # number of subjects
Item_n = 40  # number of items
b0 = 100       # intercept
b1 = 2.5 * b0   # fixed effect of condition
u0s_sd = 300   # random intercept SD for subjects
u0i_sd = 200   # random intercept SD for items
u1s_sd = 100   # random b1 slope SD for subjects
u1i_sd = 50   # random b1 slope SD for items
r01s = -0.3     # correlation between random effects 0 and 1 for subjects
r01i = 0.2     # correlation between random effects 0 and 1 for items
sigma_sd = 150 # error SD

# set up data structure
dataCong <- add_random(Subj = Subj_n, Item = Item_n) %>%
  # add within categorical variable for subject
  add_within("Subj", Cond = c("Congruent", "Incongruent")) %>%
  add_recode("Cond", "Cond.Incongruent", Congruent = 0, Incongruent = 1) %>%
  # add between categorical variable for subject
  add_between("Subj", Age = c("Young", "Old")) %>%
  add_recode("Age", "Age.Old", Young = 0, Old = 1) %>%
  # add random effects 
  add_ranef("Subj", u0s = u0s_sd, u1s = u1s_sd, .cors = r01s) %>%
  add_ranef("Item", u0i = u0i_sd, u1i = u1i_sd, .cors = r01i) %>%
  ##add_ranef(c("Subj", "Item"), u0si = u0s_sd + u0i_sd) %>%
  ##add_ranef(c("Subj", "Item"), u1si = u1s_sd + u1i_sd) %>%
  add_ranef(sigma = sigma_sd) %>%
  # calculate DV
  mutate(LookingTime = b0 + b1 + u0s + u0i + #u0si + u1si +
           (((b1 + u1s) + 0.5) * Cond.Incongruent) + (((b1 + u1s) + 0.9) * Age.Old) + # subject specific variation
           (((b1 + u1i) - 0.3) * Cond.Incongruent) + (((b1 + u1i) - 0.25) * Age.Old) + # item specific variation  
           sigma)
write.csv(dataCong, "dataCong.csv")
```


If you were not able to install the `faux` package, simply uncomment the following line of code below to import the dataset

```{r warning=FALSE, message=FALSE, error=FALSE}
#dataCong <- read.csv("dataCong.csv")[-1]
```

# Verify dataframe

## Tibble

```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong <- dataCong %>% 
  mutate(Subj = factor(Subj),
         Item = factor(Item))
dataCong
```


## Counts

### Subjects

```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong %>% 
  group_by(Cond, Age, Subj) %>% 
  summarise(count = n())
```



### Items

#### Age

```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong %>% 
  group_by(Age, Item) %>% 
  summarise(count = n())
```


#### Cond

```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong %>% 
  group_by(Cond, Item) %>% 
  summarise(count = n())
```



# Visualisations

## Condition by Age

The figure below confirms this, where we see an increase in LookingTime in the `incongruent` condition and oevrall, `older` participants show an increase in LookingTime


```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong %>% 
  ggplot(aes(x = Cond,
             y = LookingTime,
             colour = Age)) +
  theme_bw() + 
  geom_boxplot() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm")
```

## Subject by Condition

This figure shows that subjects are variable in how they responded to this task


```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
dataCong %>% 
  ggplot(aes(x = Cond,
             y = LookingTime,
             colour = Subj)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj))))
```


## Subject by Age

This figure shows that subjects had an impact on the LookingTime in both age groups, simply due to their variable responses to the different items


```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
dataCong %>% 
  ggplot(aes(x = Age,
             y = LookingTime,
             colour = Subj)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj))))
```



## Item by Condition

This figure shows that items had an impact on the LookingTime in both conditions


```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
dataCong %>% 
  ggplot(aes(x = Cond,
             y = LookingTime,
             colour = Item)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item))))
```


## Subject by Age

This figure shows that items had an impact on the LookingTime in both age groups


```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
dataCong %>% 
  ggplot(aes(x = Age,
             y = LookingTime,
             colour = Item)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item))))
```

# Modelling strategy

We use an LMER model with a crossed random effect. To choose our optimal model, we start first by a simple model with only random intercepts, increasing complexity by accounting for random slopes for both subjects and items. It is clear from our visualisation above, that there is no interaction between the two predictors. However, for demonstration purposes, we do test for this


# Simple Linear Model

```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lm <- dataCong %>% 
  lm(LookingTime ~ Cond + Age, data = .)
summary(mdl.lm)
hist(residuals(mdl.lm))
qqnorm(residuals(mdl.lm)); qqline(residuals(mdl.lm))
plot(fitted(mdl.lm), residuals(mdl.lm), cex = 4)
```

## No interaction

### Crossed random intercepts

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Interc <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


### Crossed random intercepts + By-speaker random slopes

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope1 <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```

### Crossed random intercepts + By-speaker and by-item random slopes

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope2 <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 + Cond | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


### Crossed random intercepts + By-speaker and by-item random slopes

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope3 <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 + Cond + Age| Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


## With interaction

### Crossed random intercepts + Interaction

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Interc.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


### Crossed random intercepts + By-speaker random slopes + Interaction

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope1.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 + Cond | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```

### Crossed random intercepts + By-speaker and by-item random slopes + Interaction

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope2.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 + Cond | Subj) + 
         (1 + Cond | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


### Crossed random intercepts + By-speaker and by-item random slopes

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope3.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 + Cond | Subj) + 
         (1 + Cond * Age| Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


## Model comparison


```{r warning=FALSE, message=FALSE, error=FALSE}
anova(xmdl.rand.Interc, xmdl.rand.Slope1, xmdl.rand.Slope2, xmdl.rand.Slope3, xmdl.rand.Interc.Int, xmdl.rand.Slope1.Int, xmdl.rand.Slope2.Int, xmdl.rand.Slope3.Int)
```


The results above highlight that the model accounting for both by-subject and by-item random intercepts and random slopes for Condition are improving the model fit in comparison to a more complex model. We rerun the model with `REML = False`

# Optimal model

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.Optimal <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 + Cond + Age | Item), data = ., REML = TRUE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


## Model criticism

```{r warning=FALSE, message=FALSE, error=FALSE}
hist(residuals(xmdl.Optimal))
qqnorm(residuals(xmdl.Optimal)); qqline(residuals(xmdl.Optimal))
plot(fitted(xmdl.Optimal), residuals(xmdl.Optimal), cex = 4)
```

## Summary

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.Optimal %>% 
  summary()
```

## ANOVA
```{r warning=FALSE, message=FALSE, error=FALSE}
Anova(xmdl.Optimal)
```


## Plotting model's output

### With `ggstats`

We use two functions from the package `ggstats`. 

#### A plot


```{r warning=FALSE, message=FALSE, error=FALSE}
ggcoef_model(xmdl.Optimal)
```



### A plot + a table + 95% CI

```{r warning=FALSE, message=FALSE, error=FALSE}
ggcoef_table(xmdl.Optimal)
```



### With `ggstatsplot`

```{r}
ggcoefstats(xmdl.Optimal, point.args = list(color = paletteer_c("grDevices::rainbow", 13), stats.label.color = paletteer_c("grDevices::rainbow", 13)))

```


## Exploring random effects


### Subject random effects


```{r warning=FALSE, message=FALSE, error=FALSE}
random_effects <- ranef(xmdl.Optimal) %>%
  pluck(1) %>%
  rownames_to_column() %>%
  rename(Subject = rowname, Intercept = "(Intercept)") 
 
random_effects %>%
  knitr::kable()

```




### Items random effects


```{r warning=FALSE, message=FALSE, error=FALSE}
random_effects <- ranef(xmdl.Optimal) %>%
  pluck(2) %>%
  rownames_to_column() %>%
  rename(Items = rowname, Intercept = "(Intercept)") 
 
random_effects %>%
  knitr::kable()

```


### Plots

#### sjPlot

##### Fixed effects

###### Condition

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Cond"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```

###### Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Age"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```


###### Condition by Age

```{r}
plot_model(xmdl.Optimal, type="emm", terms=c("Cond", "Age"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```

##### Random effects

###### Subject

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Subj"), pred.type="re", ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```

###### Subject by Condition

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Subj"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line() 

```

###### Subject by Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Age", "Subj"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

```



###### Item

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Item"), pred.type="re", ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```





###### Item by Cond

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

```




###### Item by Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Age", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

```



###### Item by Cond facetted by Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Item", "Age"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

```


###### Subject by Item

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

```

###### Subject by Item facetted by Cond

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item", "Cond"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

```


###### Subject by Item facetted by Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item", "Age"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

```

#### Lattice

##### Subject Intercepts

```{r}
dotplot(ranef(xmdl.Optimal))$Subj[1]
```


##### Subject Slopes

```{r}
dotplot(ranef(xmdl.Optimal), xlim = c(-350, 350))$Subj[2]
```




##### Item Intercepts

```{r}
dotplot(ranef(xmdl.Optimal))$Item[1]
```


##### Item Slopes for Cond

```{r}
dotplot(ranef(xmdl.Optimal), xlim = c(-150, 150))$Item[2]
```


##### Item Slopes for Age

```{r}
dotplot(ranef(xmdl.Optimal), xlim = c(-150, 150))$Item[3]
```




# Conclusion

This tutorial showed how one can explore random effects and formally assess the need for Random slopes

As a rule of thumb: Any within-subject (or within-item) should be tested for a potential inclusion as a random slope

Fixed effects provides averages over all observations, even when using mixed effects regressions; we need to explore what random effects (intercepts and slopes) tell us.

In this example, we see that many subjects vary beyond the fixed effect; Standard Errors are not enough to quantify this type of variation. The same is true for items that are not explored routinely!


I hope this tutorial helped you to uncover the role of participants and items and what they can tell us beyond the fixed effect!

# session info

```{r warning=FALSE, message=FALSE, error=FALSE}
sessionInfo()
```

