Loading packages
We start by loading the required packages. If they are not already
installed, then the code below will first install them, before loading
them. The package faux
is not available on CRAN for R
version 4.4.2, hence we will install it via github. See details here
requiredPackages = c('tidyverse', 'knitr', 'lme4', 'ggstats', 'ggstatsplot', 'sjPlot', 'paletteer', 'lattice', 'car')
for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p, dependencies = TRUE)
library(p,character.only = TRUE)
}
Specific to the faux
package
First install the package devtools, and then install the package
faux
via the github install.
# install.packages("devtools")
# devtools::install_github("debruine/faux")
library(faux)
Data set
Our experiment has the following structure: we asked 40 subjects to
respond to 40 items in a fully crossed design. There were two IVs:
Condition with congruent
and incongruent
and
Age with young
and old
. The Condition is a
within subject variable; age is a between subject. However, Condition
and Age were both within item variables. The dependant variable was
LookingTime.
This meant that we used items within each of the young
and the old
subjects in addition to items within each of
the congurent
and incongruent
conditions.
Our research question is as follows: Age
of subject will
impact the Looking Time
in the two conditions. Our
hypothesis is: The older a subject is, the more the looking time it is
to the incongruent condition.
We will use the package faux
to simulate a dataframe.
This step is crucial in assessing contributions of particular predictors
and for testing ideas as we already know the distribution of the data
and what to expect as an outcome when it comes to the fixed effect in
question.
The simulated data has the following parameters
set.seed(42)
# define parameters
Subj_n = 40 # number of subjects
Item_n = 40 # number of items
b0 = 100 # intercept
b1 = 2.5 * b0 # fixed effect of condition
u0s_sd = 300 # random intercept SD for subjects
u0i_sd = 200 # random intercept SD for items
u1s_sd = 100 # random b1 slope SD for subjects
u1i_sd = 50 # random b1 slope SD for items
r01s = -0.3 # correlation between random effects 0 and 1 for subjects
r01i = 0.2 # correlation between random effects 0 and 1 for items
sigma_sd = 150 # error SD
# set up data structure
dataCong <- add_random(Subj = Subj_n, Item = Item_n) %>%
# add within categorical variable for subject
add_within("Subj", Cond = c("Congruent", "Incongruent")) %>%
add_recode("Cond", "Cond.Incongruent", Congruent = 0, Incongruent = 1) %>%
# add between categorical variable for subject
add_between("Subj", Age = c("Young", "Old")) %>%
add_recode("Age", "Age.Old", Young = 0, Old = 1) %>%
# add random effects
add_ranef("Subj", u0s = u0s_sd, u1s = u1s_sd, .cors = r01s) %>%
add_ranef("Item", u0i = u0i_sd, u1i = u1i_sd, .cors = r01i) %>%
##add_ranef(c("Subj", "Item"), u0si = u0s_sd + u0i_sd) %>%
##add_ranef(c("Subj", "Item"), u1si = u1s_sd + u1i_sd) %>%
add_ranef(sigma = sigma_sd) %>%
# calculate DV
mutate(LookingTime = b0 + b1 + u0s + u0i + #u0si + u1si +
(((b1 + u1s) + 0.5) * Cond.Incongruent) + (((b1 + u1s) + 0.9) * Age.Old) + # subject specific variation
(((b1 + u1i) - 0.3) * Cond.Incongruent) + (((b1 + u1i) - 0.25) * Age.Old) + # item specific variation
sigma)
write.csv(dataCong, "dataCong.csv")
If you were not able to install the faux
package, simply
uncomment the following line of code below to import the dataset
#dataCong <- read.csv("dataCong.csv")[-1]
Verify dataframe
Tibble
dataCong <- dataCong %>%
mutate(Subj = factor(Subj),
Item = factor(Item))
dataCong
Counts
Subjects
dataCong %>%
group_by(Cond, Age, Subj) %>%
summarise(count = n())
Items
Age
dataCong %>%
group_by(Age, Item) %>%
summarise(count = n())
Cond
dataCong %>%
group_by(Cond, Item) %>%
summarise(count = n())
Visualisations
Condition by Age
The figure below confirms this, where we see an increase in
LookingTime in the incongruent
condition and oevrall,
older
participants show an increase in LookingTime
dataCong %>%
ggplot(aes(x = Cond,
y = LookingTime,
colour = Age)) +
theme_bw() +
geom_boxplot() +
geom_smooth(aes(as.numeric(Cond)), method = "lm")

Subject by
Condition
This figure shows that subjects are variable in how they responded to
this task
dataCong %>%
ggplot(aes(x = Cond,
y = LookingTime,
colour = Subj)) +
theme_bw() +
geom_point() +
geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj))))

Subject by Age
This figure shows that subjects had an impact on the LookingTime in
both age groups, simply due to their variable responses to the different
items
dataCong %>%
ggplot(aes(x = Age,
y = LookingTime,
colour = Subj)) +
theme_bw() +
geom_point() +
geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj))))

Item by
Condition
This figure shows that items had an impact on the LookingTime in both
conditions
dataCong %>%
ggplot(aes(x = Cond,
y = LookingTime,
colour = Item)) +
theme_bw() +
geom_point() +
geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item))))

Subject by Age
This figure shows that items had an impact on the LookingTime in both
age groups
dataCong %>%
ggplot(aes(x = Age,
y = LookingTime,
colour = Item)) +
theme_bw() +
geom_point() +
geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item))))

Modelling strategy
We use an LMER model with a crossed random effect. To choose our
optimal model, we start first by a simple model with only random
intercepts, increasing complexity by accounting for random slopes for
both subjects and items. It is clear from our visualisation above, that
there is no interaction between the two predictors. However, for
demonstration purposes, we do test for this
Simple Linear
Model
mdl.lm <- dataCong %>%
lm(LookingTime ~ Cond + Age, data = .)
summary(mdl.lm)
Call:
lm(formula = LookingTime ~ Cond + Age, data = .)
Residuals:
Min 1Q Median 3Q Max
-1215.59 -290.17 -21.78 264.89 1661.41
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 304.95 12.91 23.61 <2e-16 ***
CondIncongruent 504.35 14.91 33.82 <2e-16 ***
AgeOld 566.44 14.91 37.98 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 421.8 on 3197 degrees of freedom
Multiple R-squared: 0.4472, Adjusted R-squared: 0.4469
F-statistic: 1293 on 2 and 3197 DF, p-value: < 2.2e-16

qqnorm(residuals(mdl.lm)); qqline(residuals(mdl.lm))

plot(fitted(mdl.lm), residuals(mdl.lm), cex = 4)

No interaction
Crossed random
intercepts
xmdl.rand.Interc <- dataCong %>%
lmer(LookingTime ~ Cond + Age +
(1 | Subj) +
(1 | Item), data = ., REML = FALSE,
control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
Crossed random
intercepts + By-speaker random slopes
xmdl.rand.Slope1 <- dataCong %>%
lmer(LookingTime ~ Cond + Age +
(1 + Cond | Subj) +
(1 | Item), data = ., REML = FALSE,
control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
Crossed random
intercepts + By-speaker and by-item random slopes
xmdl.rand.Slope2 <- dataCong %>%
lmer(LookingTime ~ Cond + Age +
(1 + Cond | Subj) +
(1 + Cond | Item), data = ., REML = FALSE,
control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
Crossed random
intercepts + By-speaker and by-item random slopes
xmdl.rand.Slope3 <- dataCong %>%
lmer(LookingTime ~ Cond + Age +
(1 + Cond | Subj) +
(1 + Cond + Age| Item), data = ., REML = FALSE,
control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
With interaction
Crossed random
intercepts + Interaction
xmdl.rand.Interc.Int <- dataCong %>%
lmer(LookingTime ~ Cond * Age +
(1 | Subj) +
(1 | Item), data = ., REML = FALSE,
control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
Crossed random
intercepts + By-speaker random slopes + Interaction
xmdl.rand.Slope1.Int <- dataCong %>%
lmer(LookingTime ~ Cond * Age +
(1 + Cond | Subj) +
(1 | Item), data = ., REML = FALSE,
control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
Crossed random
intercepts + By-speaker and by-item random slopes + Interaction
xmdl.rand.Slope2.Int <- dataCong %>%
lmer(LookingTime ~ Cond * Age +
(1 + Cond | Subj) +
(1 + Cond | Item), data = ., REML = FALSE,
control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
Crossed random
intercepts + By-speaker and by-item random slopes
xmdl.rand.Slope3.Int <- dataCong %>%
lmer(LookingTime ~ Cond * Age +
(1 + Cond | Subj) +
(1 + Cond * Age| Item), data = ., REML = FALSE,
control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
Model comparison
anova(xmdl.rand.Interc, xmdl.rand.Slope1, xmdl.rand.Slope2, xmdl.rand.Slope3, xmdl.rand.Interc.Int, xmdl.rand.Slope1.Int, xmdl.rand.Slope2.Int, xmdl.rand.Slope3.Int)
Data: .
Models:
xmdl.rand.Interc: LookingTime ~ Cond + Age + (1 | Subj) + (1 | Item)
xmdl.rand.Interc.Int: LookingTime ~ Cond * Age + (1 | Subj) + (1 | Item)
xmdl.rand.Slope1: LookingTime ~ Cond + Age + (1 + Cond | Subj) + (1 | Item)
xmdl.rand.Slope1.Int: LookingTime ~ Cond * Age + (1 + Cond | Subj) + (1 | Item)
xmdl.rand.Slope2: LookingTime ~ Cond + Age + (1 + Cond | Subj) + (1 + Cond | Item)
xmdl.rand.Slope2.Int: LookingTime ~ Cond * Age + (1 + Cond | Subj) + (1 + Cond | Item)
xmdl.rand.Slope3: LookingTime ~ Cond + Age + (1 + Cond | Subj) + (1 + Cond + Age | Item)
xmdl.rand.Slope3.Int: LookingTime ~ Cond * Age + (1 + Cond | Subj) + (1 + Cond * Age | Item)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
xmdl.rand.Interc 6 42074 42110 -21031 42062
xmdl.rand.Interc.Int 7 42050 42093 -21018 42036 25.8359 1 3.717e-07 ***
xmdl.rand.Slope1 8 41834 41883 -20909 41818 217.8699 1 < 2.2e-16 ***
xmdl.rand.Slope1.Int 9 41833 41888 -20908 41815 3.0247 1 0.0820 .
xmdl.rand.Slope2 10 41808 41869 -20894 41788 27.3253 1 1.719e-07 ***
xmdl.rand.Slope2.Int 11 41807 41874 -20892 41785 3.0149 1 0.0825 .
xmdl.rand.Slope3 13 41780 41858 -20877 41754 31.2599 2 1.629e-07 ***
xmdl.rand.Slope3.Int 18 41786 41895 -20875 41750 3.3401 5 0.6477
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The results above highlight that the model accounting for both
by-subject and by-item random intercepts and random slopes for Condition
are improving the model fit in comparison to a more complex model. We
rerun the model with REML = False
Optimal model
xmdl.Optimal <- dataCong %>%
lmer(LookingTime ~ Cond + Age +
(1 + Cond | Subj) +
(1 + Cond + Age | Item), data = ., REML = TRUE,
control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
Model criticism
hist(residuals(xmdl.Optimal))

qqnorm(residuals(xmdl.Optimal)); qqline(residuals(xmdl.Optimal))

plot(fitted(xmdl.Optimal), residuals(xmdl.Optimal), cex = 4)

Summary
xmdl.Optimal %>%
summary()
Linear mixed model fit by REML ['lmerMod']
Formula: LookingTime ~ Cond + Age + (1 + Cond | Subj) + (1 + Cond + Age | Item)
Data: .
Control: lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 41724.6
Scaled residuals:
Min 1Q Median 3Q Max
-3.5337 -0.6485 -0.0054 0.6647 3.5358
Random effects:
Groups Name Variance Std.Dev. Corr
Subj (Intercept) 123480 351.40
CondIncongruent 10746 103.66 -0.26
Item (Intercept) 38781 196.93
CondIncongruent 1872 43.27 0.31
AgeOld 1851 43.03 -0.13 0.69
Residual 22613 150.38
Number of obs: 3200, groups: Subj, 40; Item, 40
Fixed effects:
Estimate Std. Error t value
(Intercept) 315.04 83.42 3.777
CondIncongruent 504.35 18.54 27.204
AgeOld 546.28 107.70 5.072
Correlation of Fixed Effects:
(Intr) CndInc
CndIncngrnt -0.120
AgeOld -0.646 0.016
ANOVA
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: LookingTime
Chisq Df Pr(>Chisq)
Cond 740.067 1 < 2.2e-16 ***
Age 25.729 1 3.928e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Plotting model’s
output
With
ggstats
We use two functions from the package ggstats
.
A plot
ggcoef_model(xmdl.Optimal)

A plot + a table +
95% CI
ggcoef_table(xmdl.Optimal)

With
ggstatsplot
ggcoefstats(xmdl.Optimal, point.args = list(color = paletteer_c("grDevices::rainbow", 13), stats.label.color = paletteer_c("grDevices::rainbow", 13)))
Warning: Ignoring unknown parameters: `stats.label.colour`Number of labels is greater than default palette color count.
• Select another color `palette` (and/or `package`).

Exploring random
effects
Subject random
effects
random_effects <- ranef(xmdl.Optimal) %>%
pluck(1) %>%
rownames_to_column() %>%
rename(Subject = rowname, Intercept = "(Intercept)")
random_effects %>%
knitr::kable()
Subj01 |
-404.45054 |
58.1953368 |
Subj02 |
164.05584 |
56.9126271 |
Subj03 |
-56.37438 |
-104.0600477 |
Subj04 |
-87.06837 |
43.7884729 |
Subj05 |
-55.65214 |
104.5323900 |
Subj06 |
-42.39999 |
-7.1394328 |
Subj07 |
-448.81496 |
124.6524549 |
Subj08 |
-171.75810 |
-105.4504187 |
Subj09 |
-587.74594 |
123.6911001 |
Subj10 |
-44.50407 |
-50.9618431 |
Subj11 |
-346.46288 |
1.6396966 |
Subj12 |
-550.49633 |
158.6524706 |
Subj13 |
417.66027 |
-169.1409700 |
Subj14 |
-11.45716 |
-30.0008252 |
Subj15 |
62.46659 |
-14.3520076 |
Subj16 |
-210.03389 |
-23.9077430 |
Subj17 |
124.19946 |
-122.7635167 |
Subj18 |
666.67147 |
-42.1836697 |
Subj19 |
815.78333 |
217.1615447 |
Subj20 |
-393.58521 |
26.1633665 |
Subj21 |
120.64645 |
37.9297097 |
Subj22 |
455.31816 |
-46.4925696 |
Subj23 |
42.42431 |
-54.4053551 |
Subj24 |
-496.29436 |
-31.7202386 |
Subj25 |
-552.91357 |
144.2714898 |
Subj26 |
-21.91509 |
-159.7897817 |
Subj27 |
65.14521 |
54.5212747 |
Subj28 |
373.40089 |
-166.0862255 |
Subj29 |
-132.82108 |
-21.8820276 |
Subj30 |
104.73419 |
-112.8914459 |
Subj31 |
-79.72260 |
125.0118699 |
Subj32 |
-182.30982 |
15.3898936 |
Subj33 |
-303.68384 |
-0.1957386 |
Subj34 |
215.98609 |
136.5078869 |
Subj35 |
-152.60397 |
76.5559812 |
Subj36 |
401.50547 |
-145.0045141 |
Subj37 |
274.61818 |
-114.4122298 |
Subj38 |
139.48473 |
-82.7958618 |
Subj39 |
733.87830 |
58.5575032 |
Subj40 |
155.08935 |
41.5013939 |
Items random
effects
random_effects <- ranef(xmdl.Optimal) %>%
pluck(2) %>%
rownames_to_column() %>%
rename(Items = rowname, Intercept = "(Intercept)")
random_effects %>%
knitr::kable()
Item01 |
-247.414507 |
16.2991489 |
28.312990 |
Item02 |
-28.240445 |
20.6149279 |
37.267856 |
Item03 |
-27.548429 |
11.7099339 |
-3.218254 |
Item04 |
30.999046 |
32.2392558 |
32.548236 |
Item05 |
246.216859 |
38.0585978 |
2.963549 |
Item06 |
-135.583736 |
28.3151555 |
28.127356 |
Item07 |
75.461498 |
28.6715548 |
-2.095288 |
Item08 |
72.679400 |
69.1730999 |
73.479523 |
Item09 |
-180.071444 |
32.9237845 |
60.953057 |
Item10 |
-170.029328 |
8.4112590 |
23.865843 |
Item11 |
-278.857705 |
-39.6392919 |
-22.526611 |
Item12 |
114.001985 |
4.7652485 |
-15.353357 |
Item13 |
-137.533542 |
-49.5701246 |
-36.764423 |
Item14 |
-272.928912 |
-85.2576750 |
-45.340742 |
Item15 |
219.855541 |
-70.2475223 |
-83.121735 |
Item16 |
243.071753 |
13.0287924 |
-4.812872 |
Item17 |
210.792625 |
10.6898312 |
17.840004 |
Item18 |
328.737750 |
-36.9498461 |
-82.922663 |
Item19 |
-46.910049 |
28.2829527 |
22.930110 |
Item20 |
-107.870056 |
-25.5384308 |
-13.846115 |
Item21 |
-209.598744 |
-56.8339801 |
-29.482588 |
Item22 |
-190.253581 |
-3.0439700 |
20.502593 |
Item23 |
207.958662 |
34.4328035 |
33.373202 |
Item24 |
-371.211245 |
-46.1955836 |
1.751542 |
Item25 |
163.603914 |
15.0837752 |
-7.780876 |
Item26 |
-56.954235 |
-39.1972201 |
-59.652416 |
Item27 |
126.648406 |
7.5867262 |
-1.102683 |
Item28 |
48.176096 |
-0.4969082 |
1.358911 |
Item29 |
-76.699978 |
-27.4397677 |
-1.462473 |
Item30 |
-20.393146 |
33.1813806 |
36.286219 |
Item31 |
33.763413 |
-15.9993987 |
-14.400133 |
Item32 |
-15.979008 |
47.9237115 |
67.643383 |
Item33 |
64.451213 |
-8.2252951 |
-47.157817 |
Item34 |
102.286457 |
22.6554454 |
17.134672 |
Item35 |
319.882212 |
54.1470278 |
10.485257 |
Item36 |
115.886689 |
22.7596669 |
9.094458 |
Item37 |
93.471368 |
9.1514824 |
-17.490313 |
Item38 |
-523.293815 |
-46.1545897 |
-5.185309 |
Item39 |
287.858945 |
20.0462752 |
2.292928 |
Item40 |
-8.431927 |
-59.3622338 |
-34.495018 |
Plots
sjPlot
Fixed
effects
Condition
plot_model(xmdl.Optimal, type="pred", terms=c("Cond"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

Age
plot_model(xmdl.Optimal, type="pred", terms=c("Age"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

Condition by
Age
plot_model(xmdl.Optimal, type="emm", terms=c("Cond", "Age"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

Random
effects
Subject
plot_model(xmdl.Optimal, type="pred", terms=c("Subj"), pred.type="re", ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

Subject by
Condition
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Subj"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

Subject by
Age
plot_model(xmdl.Optimal, type="pred", terms=c("Age", "Subj"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

Item
plot_model(xmdl.Optimal, type="pred", terms=c("Item"), pred.type="re", ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

Item by
Cond
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

Item by
Age
plot_model(xmdl.Optimal, type="pred", terms=c("Age", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

Item by Cond
facetted by Age
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Item", "Age"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

Subject by
Item
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

Subject by
Item facetted by Cond
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item", "Cond"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

Subject by
Item facetted by Age
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item", "Age"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

Lattice
Subject
Intercepts
dotplot(ranef(xmdl.Optimal))$Subj[1]

Subject
Slopes
dotplot(ranef(xmdl.Optimal), xlim = c(-350, 350))$Subj[2]

Item
Intercepts
dotplot(ranef(xmdl.Optimal))$Item[1]

Item Slopes for
Cond
dotplot(ranef(xmdl.Optimal), xlim = c(-150, 150))$Item[2]

Item Slopes for
Age
dotplot(ranef(xmdl.Optimal), xlim = c(-150, 150))$Item[3]

Conclusion
This tutorial showed how one can explore random effects and formally
assess the need for Random slopes
As a rule of thumb: Any within-subject (or within-item) should be
tested for a potential inclusion as a random slope
Fixed effects provides averages over all observations, even when
using mixed effects regressions; we need to explore what random effects
(intercepts and slopes) tell us.
In this example, we see that many subjects vary beyond the fixed
effect; Standard Errors are not enough to quantify this type of
variation. The same is true for items that are not explored
routinely!
I hope this tutorial helped you to uncover the role of participants
and items and what they can tell us beyond the fixed effect!
session info
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8 LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
time zone: Europe/Paris
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] faux_1.2.1.9002 car_3.1-3 carData_3.0-5 lattice_0.22-6 paletteer_1.6.0 sjPlot_2.8.17 ggstatsplot_0.13.0 ggstats_0.7.0 lme4_1.1-35.5 Matrix_1.7-1
[11] knitr_1.49 lubridate_1.9.4 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
[21] tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] rematch2_2.1.2 sandwich_3.1-1 rlang_1.1.4 magrittr_2.0.3 multcomp_1.4-26 furrr_0.3.1 compiler_4.4.2 statsExpressions_1.6.2
[9] mgcv_1.9-1 vctrs_0.6.5 pkgconfig_2.0.3 fastmap_1.2.0 backports_1.5.0 labeling_0.4.3 effectsize_1.0.0 utf8_1.2.4
[17] rmarkdown_2.29 tzdb_0.4.0 haven_2.5.4 nloptr_2.1.1 xfun_0.49 cachem_1.1.0 labelled_2.13.0 jsonlite_1.8.9
[25] sjmisc_2.8.10 ggeffects_2.0.0 broom_1.0.7 parallel_4.4.2 R6_2.5.1 bslib_0.8.0 RColorBrewer_1.1-3 stringi_1.8.4
[33] parallelly_1.40.1 boot_1.3-31 jquerylib_0.1.4 estimability_1.5.1 Rcpp_1.0.13-1 zoo_1.8-12 parameters_0.24.0 correlation_0.8.6
[41] splines_4.4.2 timechange_0.3.0 tidyselect_1.2.1 rstudioapi_0.17.1 abind_1.4-8 yaml_2.3.10 codetools_0.2-20 sjlabelled_1.2.0
[49] listenv_0.9.1 withr_3.0.2 bayestestR_0.15.0 coda_0.19-4.1 evaluate_1.0.1 future_1.34.0 survival_3.7-0 RcppParallel_5.1.9
[57] pillar_1.9.0 insight_1.0.0 generics_0.1.3 hms_1.1.3 rstantools_2.4.0 munsell_0.5.1 scales_1.3.0 minqa_1.2.8
[65] globals_0.16.3 xtable_1.8-4 glue_1.8.0 emmeans_1.10.5 tools_4.4.2 mvtnorm_1.3-2 grid_4.4.2 cards_0.4.0
[73] datawizard_0.13.0 colorspace_2.1-1 nlme_3.1-166 patchwork_1.3.0 performance_0.12.4 Formula_1.2-5 cli_3.6.3 fansi_1.0.6
[81] broom.helpers_1.17.0 sjstats_0.19.0 gtable_0.3.6 broom.mixed_0.2.9.6 sass_0.4.9 zeallot_0.1.0 digest_0.6.37 prismatic_1.1.2
[89] ggrepel_0.9.6 TH.data_1.1-2 farver_2.1.2 htmltools_0.5.8.1 lifecycle_1.0.4 MASS_7.3-61
---
title: "Tutorial: Exploring Random Effects: What Do Participants and Items Tell us Beyond the Fixed Effects?"
author:
  name: "Jalal Al-Tamimi"
  affiliation: Université Paris Cité
date: "`r format(Sys.time(), '%d %B %Y')`"
output:
  html_notebook:
    highlight: pygments
    number_sections: yes
    toc: yes
    toc_depth: 6
    toc_float:
      collapsed: yes
      fig_crop: no
editor_options: 
  markdown: 
    wrap: sentence
---


# Loading packages 

We start by loading the required packages. If they are not already installed, then the code below will first install them, before loading them. The package `faux` is not available on CRAN for R version 4.4.2, hence we will install it via github. See details [here](https://debruine.github.io/faux/)

```{r warning=FALSE, message=FALSE, error=FALSE}
requiredPackages = c('tidyverse', 'knitr', 'lme4', 'ggstats', 'ggstatsplot', 'sjPlot', 'paletteer', 'lattice', 'car')

for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p, dependencies = TRUE)
  library(p,character.only = TRUE)
}
```

Specific to the `faux` package

First install the package devtools, and then install the package `faux` via the github install. 

```{r warning=FALSE, message=FALSE, error=FALSE}
# install.packages("devtools")
# devtools::install_github("debruine/faux")
library(faux)
```


# Data set

Our experiment has the following structure: we asked 40 subjects to respond to 40 items in a fully crossed design. There were two IVs: Condition with `congruent` and `incongruent` and Age with `young` and `old`. The Condition is a within subject variable; age is a between subject. However, Condition and Age were both within item variables. The dependant variable was LookingTime. 

This meant that we used items within each of the `young` and the `old` subjects in addition to items within each of the `congurent` and `incongruent` conditions.  

Our research question is as follows: `Age` of subject will impact the `Looking Time` in the two conditions. 
Our hypothesis is: The older a subject is, the more the looking time it is to the incongruent condition. 

We will use the package `faux` to simulate a dataframe. This step is crucial in assessing contributions of particular predictors and for testing ideas as we already know the distribution of the data and what to expect as an outcome when it comes to the fixed effect in question.


The simulated data has the following parameters

```{r warning=FALSE, message=FALSE, error=FALSE}
set.seed(42)
# define parameters
Subj_n = 40  # number of subjects
Item_n = 40  # number of items
b0 = 100       # intercept
b1 = 2.5 * b0   # fixed effect of condition
u0s_sd = 300   # random intercept SD for subjects
u0i_sd = 200   # random intercept SD for items
u1s_sd = 100   # random b1 slope SD for subjects
u1i_sd = 50   # random b1 slope SD for items
r01s = -0.3     # correlation between random effects 0 and 1 for subjects
r01i = 0.2     # correlation between random effects 0 and 1 for items
sigma_sd = 150 # error SD

# set up data structure
dataCong <- add_random(Subj = Subj_n, Item = Item_n) %>%
  # add within categorical variable for subject
  add_within("Subj", Cond = c("Congruent", "Incongruent")) %>%
  add_recode("Cond", "Cond.Incongruent", Congruent = 0, Incongruent = 1) %>%
  # add between categorical variable for subject
  add_between("Subj", Age = c("Young", "Old")) %>%
  add_recode("Age", "Age.Old", Young = 0, Old = 1) %>%
  # add random effects 
  add_ranef("Subj", u0s = u0s_sd, u1s = u1s_sd, .cors = r01s) %>%
  add_ranef("Item", u0i = u0i_sd, u1i = u1i_sd, .cors = r01i) %>%
  ##add_ranef(c("Subj", "Item"), u0si = u0s_sd + u0i_sd) %>%
  ##add_ranef(c("Subj", "Item"), u1si = u1s_sd + u1i_sd) %>%
  add_ranef(sigma = sigma_sd) %>%
  # calculate DV
  mutate(LookingTime = b0 + b1 + u0s + u0i + #u0si + u1si +
           (((b1 + u1s) + 0.5) * Cond.Incongruent) + (((b1 + u1s) + 0.9) * Age.Old) + # subject specific variation
           (((b1 + u1i) - 0.3) * Cond.Incongruent) + (((b1 + u1i) - 0.25) * Age.Old) + # item specific variation  
           sigma)
write.csv(dataCong, "dataCong.csv")
```


If you were not able to install the `faux` package, simply uncomment the following line of code below to import the dataset

```{r warning=FALSE, message=FALSE, error=FALSE}
#dataCong <- read.csv("dataCong.csv")[-1]
```

# Verify dataframe

## Tibble

```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong <- dataCong %>% 
  mutate(Subj = factor(Subj),
         Item = factor(Item))
dataCong
```


## Counts

### Subjects

```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong %>% 
  group_by(Cond, Age, Subj) %>% 
  summarise(count = n())
```



### Items

#### Age

```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong %>% 
  group_by(Age, Item) %>% 
  summarise(count = n())
```


#### Cond

```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong %>% 
  group_by(Cond, Item) %>% 
  summarise(count = n())
```



# Visualisations

## Condition by Age

The figure below confirms this, where we see an increase in LookingTime in the `incongruent` condition and oevrall, `older` participants show an increase in LookingTime


```{r warning=FALSE, message=FALSE, error=FALSE}
dataCong %>% 
  ggplot(aes(x = Cond,
             y = LookingTime,
             colour = Age)) +
  theme_bw() + 
  geom_boxplot() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm")
```

## Subject by Condition

This figure shows that subjects are variable in how they responded to this task


```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
dataCong %>% 
  ggplot(aes(x = Cond,
             y = LookingTime,
             colour = Subj)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj))))
```


## Subject by Age

This figure shows that subjects had an impact on the LookingTime in both age groups, simply due to their variable responses to the different items


```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
dataCong %>% 
  ggplot(aes(x = Age,
             y = LookingTime,
             colour = Subj)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj))))
```



## Item by Condition

This figure shows that items had an impact on the LookingTime in both conditions


```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
dataCong %>% 
  ggplot(aes(x = Cond,
             y = LookingTime,
             colour = Item)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item))))
```


## Subject by Age

This figure shows that items had an impact on the LookingTime in both age groups


```{r echo=TRUE, error=FALSE, message=FALSE, warning=FALSE}
dataCong %>% 
  ggplot(aes(x = Age,
             y = LookingTime,
             colour = Item)) +
  theme_bw() + 
  geom_point() +
  geom_smooth(aes(as.numeric(Cond)), method = "lm", se = FALSE) +
  scale_colour_manual(values = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item))))
```

# Modelling strategy

We use an LMER model with a crossed random effect. To choose our optimal model, we start first by a simple model with only random intercepts, increasing complexity by accounting for random slopes for both subjects and items. It is clear from our visualisation above, that there is no interaction between the two predictors. However, for demonstration purposes, we do test for this


# Simple Linear Model

```{r warning=FALSE, message=FALSE, error=FALSE}
mdl.lm <- dataCong %>% 
  lm(LookingTime ~ Cond + Age, data = .)
summary(mdl.lm)
hist(residuals(mdl.lm))
qqnorm(residuals(mdl.lm)); qqline(residuals(mdl.lm))
plot(fitted(mdl.lm), residuals(mdl.lm), cex = 4)
```

## No interaction

### Crossed random intercepts

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Interc <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


### Crossed random intercepts + By-speaker random slopes

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope1 <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```

### Crossed random intercepts + By-speaker and by-item random slopes

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope2 <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 + Cond | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


### Crossed random intercepts + By-speaker and by-item random slopes

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope3 <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 + Cond + Age| Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


## With interaction

### Crossed random intercepts + Interaction

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Interc.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


### Crossed random intercepts + By-speaker random slopes + Interaction

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope1.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 + Cond | Subj) + 
         (1 | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```

### Crossed random intercepts + By-speaker and by-item random slopes + Interaction

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope2.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 + Cond | Subj) + 
         (1 + Cond | Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


### Crossed random intercepts + By-speaker and by-item random slopes

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.rand.Slope3.Int <- dataCong %>% 
  lmer(LookingTime ~ Cond * Age + 
         (1 + Cond | Subj) + 
         (1 + Cond * Age| Item), data = ., REML = FALSE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


## Model comparison


```{r warning=FALSE, message=FALSE, error=FALSE}
anova(xmdl.rand.Interc, xmdl.rand.Slope1, xmdl.rand.Slope2, xmdl.rand.Slope3, xmdl.rand.Interc.Int, xmdl.rand.Slope1.Int, xmdl.rand.Slope2.Int, xmdl.rand.Slope3.Int)
```


The results above highlight that the model accounting for both by-subject and by-item random intercepts and random slopes for Condition are improving the model fit in comparison to a more complex model. We rerun the model with `REML = False`

# Optimal model

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.Optimal <- dataCong %>% 
  lmer(LookingTime ~ Cond + Age + 
         (1 + Cond | Subj) + 
         (1 + Cond + Age | Item), data = ., REML = TRUE,
       control = lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e5)))
```


## Model criticism

```{r warning=FALSE, message=FALSE, error=FALSE}
hist(residuals(xmdl.Optimal))
qqnorm(residuals(xmdl.Optimal)); qqline(residuals(xmdl.Optimal))
plot(fitted(xmdl.Optimal), residuals(xmdl.Optimal), cex = 4)
```

## Summary

```{r warning=FALSE, message=FALSE, error=FALSE}
xmdl.Optimal %>% 
  summary()
```

## ANOVA
```{r warning=FALSE, message=FALSE, error=FALSE}
Anova(xmdl.Optimal)
```


## Plotting model's output

### With `ggstats`

We use two functions from the package `ggstats`. 

#### A plot


```{r warning=FALSE, message=FALSE, error=FALSE}
ggcoef_model(xmdl.Optimal)
```



### A plot + a table + 95% CI

```{r warning=FALSE, message=FALSE, error=FALSE}
ggcoef_table(xmdl.Optimal)
```



### With `ggstatsplot`

```{r}
ggcoefstats(xmdl.Optimal, point.args = list(color = paletteer_c("grDevices::rainbow", 13), stats.label.color = paletteer_c("grDevices::rainbow", 13)))

```


## Exploring random effects


### Subject random effects


```{r warning=FALSE, message=FALSE, error=FALSE}
random_effects <- ranef(xmdl.Optimal) %>%
  pluck(1) %>%
  rownames_to_column() %>%
  rename(Subject = rowname, Intercept = "(Intercept)") 
 
random_effects %>%
  knitr::kable()

```




### Items random effects


```{r warning=FALSE, message=FALSE, error=FALSE}
random_effects <- ranef(xmdl.Optimal) %>%
  pluck(2) %>%
  rownames_to_column() %>%
  rename(Items = rowname, Intercept = "(Intercept)") 
 
random_effects %>%
  knitr::kable()

```


### Plots

#### sjPlot

##### Fixed effects

###### Condition

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Cond"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```

###### Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Age"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```


###### Condition by Age

```{r}
plot_model(xmdl.Optimal, type="emm", terms=c("Cond", "Age"), ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```

##### Random effects

###### Subject

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Subj"), pred.type="re", ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```

###### Subject by Condition

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Subj"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line() 

```

###### Subject by Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Age", "Subj"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

```



###### Item

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Item"), pred.type="re", ci.lvl = NA, dodge = 0) + theme_bw() + geom_line()

```





###### Item by Cond

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

```




###### Item by Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Age", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

```



###### Item by Cond facetted by Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Cond", "Item", "Age"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Item)))) + theme_bw() + geom_line()

```


###### Subject by Item

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

```

###### Subject by Item facetted by Cond

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item", "Cond"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

```


###### Subject by Item facetted by Age

```{r}
plot_model(xmdl.Optimal, type="pred", terms=c("Subj", "Item", "Age"), pred.type="re", ci.lvl = NA, dodge = 0, colors = paletteer_c("grDevices::rainbow", length(unique(dataCong$Subj)))) + theme_bw() + geom_line()

```

#### Lattice

##### Subject Intercepts

```{r}
dotplot(ranef(xmdl.Optimal))$Subj[1]
```


##### Subject Slopes

```{r}
dotplot(ranef(xmdl.Optimal), xlim = c(-350, 350))$Subj[2]
```




##### Item Intercepts

```{r}
dotplot(ranef(xmdl.Optimal))$Item[1]
```


##### Item Slopes for Cond

```{r}
dotplot(ranef(xmdl.Optimal), xlim = c(-150, 150))$Item[2]
```


##### Item Slopes for Age

```{r}
dotplot(ranef(xmdl.Optimal), xlim = c(-150, 150))$Item[3]
```




# Conclusion

This tutorial showed how one can explore random effects and formally assess the need for Random slopes

As a rule of thumb: Any within-subject (or within-item) should be tested for a potential inclusion as a random slope

Fixed effects provides averages over all observations, even when using mixed effects regressions; we need to explore what random effects (intercepts and slopes) tell us.

In this example, we see that many subjects vary beyond the fixed effect; Standard Errors are not enough to quantify this type of variation. The same is true for items that are not explored routinely!


I hope this tutorial helped you to uncover the role of participants and items and what they can tell us beyond the fixed effect!

# session info

```{r warning=FALSE, message=FALSE, error=FALSE}
sessionInfo()
```

