This analysis accompanies the article “Al-Tamimi, J. and Khattab, G., (2018). Acoustic correlates of the voicing contrast in Lebanese Arabic singleton and geminate plosives. Invited manuscript for the special issue of Journal of Phonetics,”Marking 50 Years of Research on Voice Onset Time and the Voicing Contrast in the World’s Languages" (eds., T. Cho, G. Docherty & D. Whalen). DOI: https://doi.org/10.1016/j.wocn.2018.09.010
This notebook presents the analyses and results of the Linear Mixed Effects Modelling that was used as a confirmatory analysis. The aim was to evaluate the relationship between each of the 19 acoustic correlates (outcome) and the six predictors.
The data used here are part of a larger project to investigate the acoustic correlates of gemination in Lebanese Arabic. 20 speakers (10 males) produced a list of items with all possible consonants in Lebanese Arabic. The data used in this notebook (and the paper) are concerned with the following stops: /b t tˤ d dˤ k/ as realised in one of the following syllable structures: ˈCVCVC (e.g. /ˈʕadad/ “number”), ˈCV:CVC (e.g. /ˈʕaːded/ “counting”), ˈCVC:VC (e.g. /ˈʕadːad)/ “he enumerated”), ˈCV:C:VC (e.g. /ˈʕaːdːe(h)/ “having counted”) and CVˈC:V:C (e.g. /ʕaˈdːeːd/ “counter”). The vowels surrounding the medial stop were either /a or aː/. 98 items were recorded from each participant. A total of 1960 words were elicited (98 words by 20 speakers), and 171 tokens were discarded due to noise or technical error, leaving a total of 1793 words for subsequent analyses.
19 acoustic correlates were looked at here to evaluate their impact on the four-way contrast between voicing and gemination, i.e., Voiced Singleton, Voiceless Singleton, Voiced Geminate and Voiceless Geminate. The 19 acoustic correlates used included 6 durational, three voicing, and 10 non-temporal including f0, F1 and harmonic differences. This Notebook looked at how each individual acoustic correlate was associated with the predictors. The next notebook extends the analysis by looking at the combination of these acoustic correlates and their predictive power in discriminating between the four categories above see notebook.
We start by loading the required packages.
requiredPackages = c('dplyr','lme4','lmerTest','emmeans','ggplot2')
for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p)
library(p,character.only = TRUE)
}
We start reading in the data and checking the structure.
geminationRes <- read.csv("resultsGemination.csv")
str(geminationRes)
'data.frame': 7172 obs. of 28 variables:
$ file_name : Factor w/ 68 levels "01_F_W_main_1.wav",..: 1 1 1 1 4 4 4 4 7 7 ...
$ speaker : Factor w/ 20 levels "sp1","sp10","sp11",..: 1 1 1 1 3 3 3 3 12 12 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 2 2 2 2 1 1 ...
$ Number : int 364 365 366 367 2379 2380 2381 2382 4373 4374 ...
$ wordTarget : Factor w/ 98 levels "2aabbeh","2aabeD",..: 1 1 1 1 1 1 1 1 1 1 ...
$ word : Factor w/ 331 levels "2AAbaD","2aabbeh",..: 2 2 2 2 54 54 54 54 54 54 ...
$ singGem : Factor w/ 2 levels "geminate","singleton": 1 1 1 1 1 1 1 1 1 1 ...
$ syllType : Factor w/ 2 levels "iambic","trochaic": 2 2 2 2 2 2 2 2 2 2 ...
$ vowelLength : Factor w/ 2 levels "longV1","shortV1": 1 1 1 1 1 1 1 1 1 1 ...
$ syllStruct : Factor w/ 5 levels "V1C2V2","V1CC2V2",..: 5 5 5 5 5 5 5 5 5 5 ...
$ c.v : Factor w/ 7 levels "BVOT","C2","CC2",..: 6 3 1 5 6 3 1 5 6 3 ...
$ phoneme : Factor w/ 17 levels "a","aa","b","bb",..: 2 4 4 10 11 4 4 10 11 4 ...
$ modeArticulation4 : Factor w/ 1 level "stop": 1 1 1 1 1 1 1 1 1 1 ...
$ placeArticulation : Factor w/ 4 levels "alveolar","bilabial",..: 2 2 2 2 2 2 2 2 2 2 ...
$ voiced.unvoiced : Factor w/ 2 levels "voiced","voiceless": 1 1 1 1 1 1 1 1 1 1 ...
$ placeVoicing : Factor w/ 6 levels "alveolar voiced",..: 3 3 3 3 3 3 3 3 3 3 ...
$ duration : num 139.9 165.5 18.5 128.9 158.1 ...
$ durationBurst : num NA NA 4 NA NA NA 5 NA NA NA ...
$ durationAsp : num NA NA 15 NA NA NA 6 NA NA NA ...
$ durationVoicedPerc: num NA 46.4 0 NA NA ...
$ f0Onset : num NA 216 NA 247 NA ...
$ f0Offset : num 218 NA 248 NA 120 ...
$ intensityOnset : num 68.3 60.7 45.4 66.5 73 ...
$ intensityOffset : num 61.8 42.5 63.8 67.7 74.6 ...
$ f1Onset : num NA NA NA 427 NA ...
$ f1Offset : num 619 NA NA NA 462 ...
$ h1mnh2OnsetNorm : num NA NA NA 1.9 NA ...
$ h1mnh2OffsetNorm : num 3.79 NA NA NA 2.69 ...
Various predictors are available in data-frame, and we are interested in the impact of the following predictors: 1. Voicing: Voiced vs Voiceless 2. Gemination: Singleton vs Geminate 3. Sex: Male vs Female 4. (Preceding) Vowel Length: Short vs Long 5. Syllable Type: Iambic vs Trochaic 6. Place of Articulation: Bilabial, Alveolar, (Alveolar-)Pharyngealised and Velar
# using "dplyr"
levels(geminationRes$voiced.unvoiced)
[1] "voiced" "voiceless"
geminationRes$voiced.unvoiced <- factor(geminationRes$voiced.unvoiced,
levels = c("voiceless","voiced"))
levels(geminationRes$voiced.unvoiced)
[1] "voiceless" "voiced"
levels(geminationRes$singGem)
[1] "geminate" "singleton"
geminationRes$singGem <- factor(geminationRes$singGem,
levels = c("singleton","geminate"))
levels(geminationRes$singGem)
[1] "singleton" "geminate"
levels(geminationRes$sex)
[1] "F" "M"
# no change
levels(geminationRes$vowelLength)
[1] "longV1" "shortV1"
geminationRes$vowelLength <- factor(geminationRes$vowelLength,
levels = c("shortV1","longV1"))
levels(geminationRes$vowelLength)
[1] "shortV1" "longV1"
levels(geminationRes$syllType)
[1] "iambic" "trochaic"
geminationRes$syllType <- factor(geminationRes$syllType,
levels = c("trochaic","iambic"))
levels(geminationRes$syllType)
[1] "trochaic" "iambic"
levels(geminationRes$placeArticulation)
[1] "alveolar" "bilabial" "pharyngealised" "velar"
geminationRes$placeArticulation <- factor(geminationRes$placeArticulation,
levels = c("bilabial","alveolar","pharyngealised","velar"))
levels(geminationRes$placeArticulation)
[1] "bilabial" "alveolar" "pharyngealised" "velar"
In our study, we wanted to use a model that allows for a meaningful interpretation of the coefficients and of the intercept. For this, we used contrast coding on all fixed effects by centring them to values between -0.5 and 0.5. With contrast coding, the Intercept is equal to the weighted average across all predictors, and all the coefficients for predictors will be equal to main effects rather than simple effects.
With this centring, our models will show the overall change from the level assigned the negative value to the level with the positive value. E.g., with voicing, voiceless was assigned -0.5 and voiced 0.5. In our statistical model the overall effect voicing will be shown as a change from voiceless to voiced (see article for more details).
# using "dplyr"
# for change from Voiceless to Voiced
geminationRes <- mutate(geminationRes,
voiced.unvoiced_c = ifelse(voiced.unvoiced == 'voiced', 0.5, -0.5))
# for change from Singleton to Geminate
geminationRes <- mutate(geminationRes,
singGem_c = ifelse(singGem == 'geminate', 0.5, -0.5))
# for change from Male to Female
geminationRes <- mutate(geminationRes,
sex_c = ifelse(sex == 'F', 0.5, -0.5))
# for change from Short V1 to Long V1
geminationRes <- mutate(geminationRes,
vowelLength_c = ifelse(vowelLength == 'longV1', 0.5, -0.5))
# for change from Trochaic to Iambic
geminationRes <- mutate(geminationRes,
syllType_c = ifelse(syllType == 'iambic', 0.5, -0.5))
# for change from Bilabial, to Alveolar to (Alveolar-)Pharyngealised to Velar
geminationRes <- mutate(geminationRes, place_c = ifelse(placeArticulation == 'velar', 0.5,
ifelse(placeVoicing =='pharyngealised', 0.133,
ifelse(placeVoicing == 'alveolar', -0.133,-0.5))))
Here we create four new datasets that are subsets of the original dataset. The aim is to have specific data-frames for each of the preceding vowel (V1), medial consonant (CD), Release phase (BVOT) and following vowel (V2). V1 can be either short or long (V1, VV1); CD can be short or long (C2, CC2).
l <- c("a","aa", "e", "ee")
l2 <- c("V2", "VV2")
l3 <- c("V1", "VV1")
l4 <- c("C2","CC2")
l5 <- "BVOT"
geminationVowels <- filter(geminationRes, phoneme %in% l)
geminationV2 <- filter(geminationVowels, c.v %in% l2)
geminationV1 <- filter(geminationVowels, c.v %in% l3)
geminationCons <- filter(geminationRes, c.v %in% l4)
geminationVOT <- filter(geminationRes, c.v %in% l5)
In our original submission, we used the proportion of voicing in both the closure and release phases based the computations we developed see here. This proportion of voicing aims at explaining the patterns of passive devoicing and/or active voicing in the voicing by gemination contrasts. We also wanted to look at the voicing patterns in the voicing by gemination contrasts following the traditional “VOT” notion. Hence we computed the “VOT” by following the revised procedure as described in Abramson, A.S. & Whalen, D. (2017) (https://doi.org/10.1016/j.wocn.2017.05.002). Below is the implementation of this method based on our computations of the proportion of voicing in both closure and release phases. We use the threshold of 50% voicing; if “voicing” is equal or is higher than 50%, the total duration of the closure is considered negative VT, otherwise, the VOT is positive and is equivalent to the total duration of the Release phase (i.e., burst, and aspiration).
# start by creating a new predictor
geminationCons$VOTDec <- NA
x <- -geminationCons$duration
y <- geminationVOT$duration
# below, we use the same threshold
geminationCons$VOTDec <- ifelse(geminationCons$durationVoicedPerc >= 50,
x,y)
Now that the preprocessing of the data finished, we start with our modelling. As a remonder, our aim is to use these analyses in a confirmatory manner. We aim to evaluate the effects of the six predictors described above on each of the acoustic correlates. Crossed random effects were used for speaker and items.
For all models below, we used a combination of models: fixed effects only; random intercepts; random intercepts and slopes; interactions for the fixed effects; interactions for the slopes; etc. The optimal model in each case was the one presented below and had the following structure:
After each model was run, we compared models with an intercept only model (i.e., without the fixed effects). This is used to provide the statistical significance of our model. We are also using the package “lmerTest” to display the p values of each predictor. This uses an approximation of the degrees of freedom, and allows us to obtain estimated p values. This may be problematic given that these are approximations, and thus it is always best to evaluate the significance levels of our predictors using model comparisons.
We evaluate model’s residuals, and then we present the optimal models’ summaries. We then use the “predict” function from “lme4” to obtain the fitted values that take into account both fixed and random effects structure. The fitted (or predicted) values are adjusted by our statistical model and are then used in the figures.
Within duration, we computed the durations of the vowel preceding the consonant (V1), the medial consonant (CD), Release Phase (Rel), Burst Phase (B), Aspiration Phase (Asp) and the vowel following the consonant (V2)
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
durationV1.xmdl.Full.M <- lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationV1.xmdl.Full.R <- lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationV1.xmdl.Null.M <- lmer(duration~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(durationV1.xmdl.Null.M,durationV1.xmdl.Full.M)
Data: geminationV1
Models:
durationV1.xmdl.Null.M: duration ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationV1.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
durationV1.xmdl.Full.M: duration ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
durationV1.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationV1.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
durationV1.xmdl.Null.M 9 15286 15335 -7633.9 15268
durationV1.xmdl.Full.M 15 15162 15244 -7566.0 15132 135.82 6
Pr(>Chisq)
durationV1.xmdl.Null.M
durationV1.xmdl.Full.M < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(durationV1.xmdl.Full.R))
qqnorm(residuals(durationV1.xmdl.Full.R)); qqline(residuals(durationV1.xmdl.Full.R))
plot(fitted(durationV1.xmdl.Full.R),residuals(durationV1.xmdl.Full.R), cex = 4)
summary(durationV1.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
duration ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV1
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 15102
Scaled residuals:
Min 1Q Median 3Q Max
-4.2034 -0.5417 -0.0270 0.5011 6.1062
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 99.9410 9.9970
speaker place_c 0.4836 0.6954
speaker.1 syllType_c 23.0797 4.8041
speaker.2 vowelLength_c 209.1023 14.4604
speaker.3 singGem_c 14.8229 3.8500
speaker.4 voiced.unvoiced_c 3.9109 1.9776
speaker.5 (Intercept) 229.1941 15.1392
Residual 216.9083 14.7278
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 102.818 3.893 29.724 26.412 < 2e-16 ***
voiced.unvoiced_c 10.567 2.643 92.002 3.998 0.000129 ***
singGem_c -10.362 2.546 93.293 -4.070 9.83e-05 ***
sex_c 4.038 6.832 17.968 0.591 0.561863
vowelLength_c 81.472 4.057 39.506 20.082 < 2e-16 ***
syllType_c -14.920 3.305 94.147 -4.515 1.83e-05 ***
place_c -2.028 3.427 88.501 -0.592 0.555494
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.091
singGem_c -0.123 -0.034
sex_c -0.001 0.001 0.001
vowlLngth_c 0.098 -0.089 0.045 0.000
syllType_c 0.282 0.036 -0.343 -0.001 0.162
place_c 0.265 0.527 -0.012 0.000 -0.035 0.029
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV1$fitted_durationV1 <- predict(lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggdurV1ByVoice <- ggplot(geminationV1, aes(y=fitted_durationV1, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Duration (ms)"))) + labs(title = "V1") +
coord_cartesian(ylim = c(25,325)) +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV1$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggdurV1ByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the preceding vowel duration are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV1 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_durationV1), sd(fitted_durationV1))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV1$fitted_durationV1, geminationV1$singGem:geminationV1$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV1$fitted_durationV1 and geminationV1$singGem:geminationV1$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 5.8e-16 -
geminate:voiceless < 2e-16 < 2e-16
geminate:voiced 0.005 < 2e-16
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 4.5e-16
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
durationCD.xmdl.Full.M <- lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationCD.xmdl.Full.R <- lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationCD.xmdl.Null.M <- lmer(duration~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(durationCD.xmdl.Null.M,durationCD.xmdl.Full.M)
Data: geminationCons
Models:
durationCD.xmdl.Null.M: duration ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationCD.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
durationCD.xmdl.Full.M: duration ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
durationCD.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationCD.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
durationCD.xmdl.Null.M 9 15398 15447 -7689.7 15380
durationCD.xmdl.Full.M 15 15272 15355 -7621.1 15242 137.23 6
Pr(>Chisq)
durationCD.xmdl.Null.M
durationCD.xmdl.Full.M < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(durationCD.xmdl.Full.R))
qqnorm(residuals(durationCD.xmdl.Full.R)); qqline(residuals(durationCD.xmdl.Full.R))
plot(fitted(durationCD.xmdl.Full.R),residuals(durationCD.xmdl.Full.R), cex = 4)
summary(durationCD.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
duration ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationCons
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 15214.2
Scaled residuals:
Min 1Q Median 3Q Max
-4.4156 -0.5681 -0.0335 0.5333 6.7738
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 44.77 6.691
speaker place_c 20.75 4.555
speaker.1 syllType_c 75.48 8.688
speaker.2 vowelLength_c 19.18 4.379
speaker.3 singGem_c 294.82 17.170
speaker.4 voiced.unvoiced_c 31.33 5.597
speaker.5 (Intercept) 154.61 12.434
Residual 234.14 15.302
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 121.250 3.104 25.366 39.060 < 2e-16 ***
voiced.unvoiced_c -22.568 2.251 65.416 -10.023 7.57e-15 ***
singGem_c 101.658 4.208 24.950 24.157 < 2e-16 ***
sex_c 9.818 5.669 17.818 1.732 0.100556
vowelLength_c 1.662 2.014 64.632 0.825 0.412313
syllType_c -7.409 2.968 50.626 -2.496 0.015862 *
place_c -10.939 2.663 72.044 -4.107 0.000104 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.070
singGem_c -0.048 -0.013
sex_c -0.002 0.000 0.001
vowlLngth_c 0.127 -0.108 0.029 -0.001
syllType_c 0.202 0.025 -0.120 -0.002 0.187
place_c 0.222 0.410 -0.005 -0.002 -0.047 0.022
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationCons$fitted_durationCD <- predict(lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggdurCDByVoice <- ggplot(geminationCons, aes(y=fitted_durationCD, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Duration (ms)"))) + labs(title = "CD") +
coord_cartesian(ylim = c(25,325)) +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationCons$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggdurCDByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the closure duration are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationCons %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_durationCD), sd(fitted_durationCD))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationCons$fitted_durationCD, geminationCons$singGem:geminationCons$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationCons$fitted_durationCD and geminationCons$singGem:geminationCons$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced <2e-16 -
geminate:voiceless <2e-16 <2e-16
geminate:voiced <2e-16 <2e-16
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced <2e-16
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
durationRel.xmdl.Full.M <- lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationRel.xmdl.Full.R <- lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationRel.xmdl.Null.M <- lmer(duration~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(durationRel.xmdl.Null.M,durationRel.xmdl.Full.M)
Data: geminationVOT
Models:
durationRel.xmdl.Null.M: duration ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationRel.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
durationRel.xmdl.Full.M: duration ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
durationRel.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationRel.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
durationRel.xmdl.Null.M 9 12369 12418 -6175.3 12351
durationRel.xmdl.Full.M 15 12290 12373 -6130.2 12260 90.209 6
Pr(>Chisq)
durationRel.xmdl.Null.M
durationRel.xmdl.Full.M < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(durationRel.xmdl.Full.R))
qqnorm(residuals(durationRel.xmdl.Full.R)); qqline(residuals(durationRel.xmdl.Full.R))
plot(fitted(durationRel.xmdl.Full.R),residuals(durationRel.xmdl.Full.R), cex = 4)
summary(durationRel.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
duration ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationVOT
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 12244.2
Scaled residuals:
Min 1Q Median 3Q Max
-3.5399 -0.5191 -0.0873 0.3942 8.2724
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 12.958 3.600
speaker place_c 23.041 4.800
speaker.1 syllType_c 2.278 1.509
speaker.2 vowelLength_c 6.832 2.614
speaker.3 singGem_c 4.564 2.136
speaker.4 voiced.unvoiced_c 16.655 4.081
speaker.5 (Intercept) 21.039 4.587
Residual 44.108 6.641
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 24.9692 1.2494 33.1458 19.985 < 2e-16 ***
voiced.unvoiced_c -9.3847 1.3296 53.0690 -7.058 3.61e-09 ***
singGem_c 0.3376 1.0092 77.6741 0.335 0.739
sex_c 1.6417 2.1058 17.3511 0.780 0.446
vowelLength_c 5.1409 1.0808 71.5438 4.757 9.93e-06 ***
syllType_c 1.5479 1.2070 85.8098 1.282 0.203
place_c 12.3890 1.6645 58.2555 7.443 5.15e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.078
singGem_c -0.133 -0.024
sex_c -0.003 0.000 0.001
vowlLngth_c 0.157 -0.091 0.059 0.000
syllType_c 0.330 0.027 -0.327 -0.001 0.229
place_c 0.235 0.297 -0.009 -0.003 -0.037 0.023
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationVOT$fitted_durationRel <- predict(lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggdurRelByVoice <- ggplot(geminationVOT, aes(y=fitted_durationRel, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Duration (ms)"))) + labs(title = "Rel") +
coord_cartesian(ylim = c(0,60)) +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationVOT$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggdurRelByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the duration of the release phase are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationVOT %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_durationRel), sd(fitted_durationRel))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationVOT$fitted_durationRel, geminationVOT$singGem:geminationVOT$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationVOT$fitted_durationRel and geminationVOT$singGem:geminationVOT$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced < 2e-16 -
geminate:voiceless 6.9e-12 < 2e-16
geminate:voiced < 2e-16 1.5e-06
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced < 2e-16
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
durationB.xmdl.Full.M <- lmer(durationBurst~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationB.xmdl.Full.R <- lmer(durationBurst~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationB.xmdl.Null.M <- lmer(durationBurst~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(durationB.xmdl.Null.M,durationB.xmdl.Full.M)
Data: geminationVOT
Models:
durationB.xmdl.Null.M: durationBurst ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationB.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
durationB.xmdl.Full.M: durationBurst ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
durationB.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationB.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
durationB.xmdl.Null.M 9 9199.7 9248.8 -4590.8 9181.7
durationB.xmdl.Full.M 15 9127.6 9209.5 -4548.8 9097.6 84.046 6
Pr(>Chisq)
durationB.xmdl.Null.M
durationB.xmdl.Full.M 5.204e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(durationB.xmdl.Full.R))
qqnorm(residuals(durationB.xmdl.Full.R)); qqline(residuals(durationB.xmdl.Full.R))
plot(fitted(durationB.xmdl.Full.R),residuals(durationB.xmdl.Full.R), cex = 4)
summary(durationB.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
durationBurst ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationVOT
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 9099.8
Scaled residuals:
Min 1Q Median 3Q Max
-3.6555 -0.5779 -0.1718 0.4141 6.7797
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 0.47150 0.6867
speaker place_c 4.96529 2.2283
speaker.1 syllType_c 0.03274 0.1810
speaker.2 vowelLength_c 0.18264 0.4274
speaker.3 singGem_c 0.00000 0.0000
speaker.4 voiced.unvoiced_c 0.87766 0.9368
speaker.5 (Intercept) 1.74348 1.3204
Residual 10.18220 3.1910
Number of obs: 1731, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 8.8508 0.3488 26.1896 25.372 < 2e-16 ***
voiced.unvoiced_c -2.1172 0.3277 33.2882 -6.462 2.40e-07 ***
singGem_c 0.9949 0.2333 92.1656 4.264 4.86e-05 ***
sex_c 1.0501 0.6258 17.5100 1.678 0.111
vowelLength_c 0.3432 0.2572 38.4606 1.334 0.190
syllType_c 0.4512 0.3038 48.9295 1.485 0.144
place_c 4.1499 0.5979 26.5279 6.941 2.04e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.080
singGem_c -0.141 -0.038
sex_c -0.009 -0.002 0.003
vowlLngth_c 0.161 -0.109 0.080 0.002
syllType_c 0.316 0.032 -0.384 -0.001 0.260
place_c 0.159 0.225 -0.007 -0.007 -0.031 0.018
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationVOT$fitted_durationBurst <- predict(lmer(durationBurst~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggdurBByVoice <- ggplot(geminationVOT, aes(y=fitted_durationBurst, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Duration (ms)"))) + labs(title = "B") +
coord_cartesian(ylim = c(2.5,17.5)) +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationVOT$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggdurBByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the duration of the Burst phase are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationVOT %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_durationBurst,na.rm = T), sd(fitted_durationBurst,na.rm = T))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationVOT$fitted_durationBurst, geminationVOT$singGem:geminationVOT$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationVOT$fitted_durationBurst and geminationVOT$singGem:geminationVOT$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced <2e-16 -
geminate:voiceless 6e-11 <2e-16
geminate:voiced <2e-16 <2e-16
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced <2e-16
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
durationAsp.xmdl.Full.M <- lmer(durationAsp~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationAsp.xmdl.Full.R <- lmer(durationAsp~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationAsp.xmdl.Null.M <- lmer(durationAsp~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(durationAsp.xmdl.Null.M,durationAsp.xmdl.Full.M)
Data: geminationVOT
Models:
durationAsp.xmdl.Null.M: durationAsp ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationAsp.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
durationAsp.xmdl.Full.M: durationAsp ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
durationAsp.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationAsp.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
durationAsp.xmdl.Null.M 9 12101 12150 -6041.3 12083
durationAsp.xmdl.Full.M 15 12042 12124 -6005.8 12012 70.961 6
Pr(>Chisq)
durationAsp.xmdl.Null.M
durationAsp.xmdl.Full.M 2.597e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(durationAsp.xmdl.Full.R))
qqnorm(residuals(durationAsp.xmdl.Full.R)); qqline(residuals(durationAsp.xmdl.Full.R))
plot(fitted(durationAsp.xmdl.Full.R),residuals(durationAsp.xmdl.Full.R), cex = 4)
summary(durationAsp.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
durationAsp ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationVOT
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 11996.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.9429 -0.5018 -0.0765 0.3343 10.6900
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 9.673 3.110
speaker place_c 20.398 4.516
speaker.1 syllType_c 1.129 1.062
speaker.2 vowelLength_c 5.792 2.407
speaker.3 singGem_c 2.842 1.686
speaker.4 voiced.unvoiced_c 14.200 3.768
speaker.5 (Intercept) 18.469 4.298
Residual 50.368 7.097
Number of obs: 1731, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 16.2473 1.1564 31.0503 14.050 5.32e-15 ***
voiced.unvoiced_c -7.1656 1.2129 47.3349 -5.908 3.62e-07 ***
singGem_c -1.0769 0.8884 68.9790 -1.212 0.230
sex_c 0.2458 1.9829 17.3256 0.124 0.903
vowelLength_c 4.5724 0.9833 60.1015 4.650 1.87e-05 ***
syllType_c 1.0605 1.0709 75.6114 0.990 0.325
place_c 8.0374 1.5266 50.0638 5.265 2.95e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.076
singGem_c -0.134 -0.028
sex_c -0.005 -0.001 0.002
vowlLngth_c 0.152 -0.091 0.063 0.000
syllType_c 0.327 0.028 -0.340 -0.002 0.231
place_c 0.225 0.287 -0.009 -0.004 -0.037 0.023
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationVOT$fitted_durationAsp <- predict(lmer(durationAsp~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggdurAspByVoice <- ggplot(geminationVOT, aes(y=fitted_durationAsp, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Duration (ms)"))) + labs(title = "Asp") +
coord_cartesian(ylim = c(0,40)) +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationVOT$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggdurAspByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the duration of the Aspiration phase are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationVOT %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_durationAsp,na.rm = T), sd(fitted_durationAsp,na.rm = T))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationVOT$fitted_durationAsp, geminationVOT$singGem:geminationVOT$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationVOT$fitted_durationAsp and geminationVOT$singGem:geminationVOT$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced <2e-16 -
geminate:voiceless <2e-16 <2e-16
geminate:voiced <2e-16 0.84
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced <2e-16
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
durationV2.xmdl.Full.M <- lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationV2.xmdl.Full.R <- lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationV2.xmdl.Null.M <- lmer(duration~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(durationV2.xmdl.Null.M,durationV2.xmdl.Full.M)
Data: geminationV2
Models:
durationV2.xmdl.Null.M: duration ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationV2.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
durationV2.xmdl.Full.M: duration ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
durationV2.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationV2.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
durationV2.xmdl.Null.M 9 17047 17097 -8514.7 17029
durationV2.xmdl.Full.M 15 16976 17058 -8472.8 16946 83.673 6
Pr(>Chisq)
durationV2.xmdl.Null.M
durationV2.xmdl.Full.M 6.215e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(durationV2.xmdl.Full.R))
qqnorm(residuals(durationV2.xmdl.Full.R)); qqline(residuals(durationV2.xmdl.Full.R))
plot(fitted(durationV2.xmdl.Full.R),residuals(durationV2.xmdl.Full.R), cex = 4)
summary(durationV2.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
duration ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV2
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 16910
Scaled residuals:
Min 1Q Median 3Q Max
-4.6382 -0.5865 -0.0072 0.5329 4.5934
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 182.26 13.501
speaker place_c 0.00 0.000
speaker.1 syllType_c 583.55 24.157
speaker.2 vowelLength_c 68.01 8.247
speaker.3 singGem_c 35.19 5.932
speaker.4 voiced.unvoiced_c 39.88 6.315
speaker.5 (Intercept) 644.58 25.389
Residual 606.29 24.623
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 182.939 6.273 25.530 29.163 < 2e-16 ***
voiced.unvoiced_c 7.465 3.884 91.598 1.922 0.0577 .
singGem_c 15.922 3.582 83.903 4.445 2.68e-05 ***
sex_c -12.002 11.470 17.999 -1.046 0.3092
vowelLength_c 4.221 3.870 78.721 1.091 0.2788
syllType_c 94.464 6.929 40.672 13.633 < 2e-16 ***
place_c -3.281 4.753 92.169 -0.690 0.4918
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.074
singGem_c -0.105 -0.032
sex_c -0.001 0.001 0.001
vowlLngth_c 0.122 -0.122 0.065 -0.001
syllType_c 0.161 0.023 -0.225 -0.001 0.156
place_c 0.229 0.498 -0.012 0.000 -0.051 0.020
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV2$fitted_durationV2 <- predict(lmer(duration~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggdurV2ByVoice <- ggplot(geminationV2, aes(y=fitted_durationV2, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Duration (ms)"))) + labs(title = "V2") +
coord_cartesian(ylim = c(25,325)) +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV2$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggdurV2ByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the duration of the following vowel (V2) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV2 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_durationV2), sd(fitted_durationV2))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV2$fitted_durationV2, geminationV2$singGem:geminationV2$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV2$fitted_durationV2 and geminationV2$singGem:geminationV2$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 0.055 -
geminate:voiceless <2e-16 <2e-16
geminate:voiced <2e-16 <2e-16
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 0.416
P value adjustment method: fdr
Within Voicing, we have computed three metrics: positive and negative VOT (Voicing VOT), proportion of voicing in the closure duration (% Voicing CD) and proportion of voicing in the Release Phase (% Voicing Rel)
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
durationVoiceVOT.xmdl.Full.M <- lmer(VOTDec~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationVoiceVOT.xmdl.Full.R <- lmer(VOTDec~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationVoiceVOT.xmdl.Null.M <- lmer(VOTDec~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(durationVoiceVOT.xmdl.Null.M,durationVoiceVOT.xmdl.Full.M)
Data: geminationCons
Models:
durationVoiceVOT.xmdl.Null.M: VOTDec ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationVoiceVOT.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
durationVoiceVOT.xmdl.Full.M: VOTDec ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
durationVoiceVOT.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationVoiceVOT.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq
durationVoiceVOT.xmdl.Null.M 9 19206 19256 -9594.1 19188
durationVoiceVOT.xmdl.Full.M 15 19123 19206 -9546.5 19093 95.02
Chi Df Pr(>Chisq)
durationVoiceVOT.xmdl.Null.M
durationVoiceVOT.xmdl.Full.M 6 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(durationVoiceVOT.xmdl.Full.R))
qqnorm(residuals(durationVoiceVOT.xmdl.Full.R)); qqline(residuals(durationVoiceVOT.xmdl.Full.R))
plot(fitted(durationVoiceVOT.xmdl.Full.R),residuals(durationVoiceVOT.xmdl.Full.R), cex = 4)
summary(durationVoiceVOT.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
VOTDec ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationCons
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 19053.3
Scaled residuals:
Min 1Q Median 3Q Max
-4.5542 -0.3456 0.0334 0.3465 4.2393
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 520.21 22.808
speaker place_c 73.92 8.598
speaker.1 syllType_c 0.00 0.000
speaker.2 vowelLength_c 111.91 10.579
speaker.3 singGem_c 340.41 18.450
speaker.4 voiced.unvoiced_c 627.86 25.057
speaker.5 (Intercept) 331.50 18.207
Residual 2085.69 45.669
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -42.308 6.127 60.677 -6.905 3.46e-09 ***
voiced.unvoiced_c -115.230 8.364 55.606 -13.776 < 2e-16 ***
singGem_c -45.268 7.046 66.785 -6.425 1.62e-08 ***
sex_c 6.462 8.521 17.678 0.758 0.4582
vowelLength_c 10.904 6.300 82.187 1.731 0.0872 .
syllType_c 5.191 7.442 92.022 0.697 0.4873
place_c 6.701 8.383 84.233 0.799 0.4263
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.104
singGem_c -0.160 -0.022
sex_c -0.004 0.001 0.001
vowlLngth_c 0.227 -0.103 0.060 0.000
syllType_c 0.450 0.029 -0.313 0.000 0.263
place_c 0.392 0.386 -0.010 -0.002 -0.052 0.030
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationCons$fitted_VoiceVOT <- predict(lmer(VOTDec~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggVoiceVOTByVoice <- ggplot(geminationCons, aes(y=fitted_VoiceVOT, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("VOT"))) + labs(title = "VOT") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationCons$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggVoiceVOTByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the positive and negative VOT (VoiceVOT) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationCons %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_VoiceVOT), sd(fitted_VoiceVOT))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationCons$fitted_VoiceVOT, geminationCons$singGem:geminationCons$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationCons$fitted_VoiceVOT and geminationCons$singGem:geminationCons$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced <2e-16 -
geminate:voiceless 0.064 <2e-16
geminate:voiced <2e-16 <2e-16
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced <2e-16
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
durationPropVoiceCD.xmdl.Full.M <- lmer(durationVoicedPerc~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationPropVoiceCD.xmdl.Full.R <- lmer(durationVoicedPerc~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationPropVoiceCD.xmdl.Null.M <- lmer(durationVoicedPerc~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(durationPropVoiceCD.xmdl.Null.M,durationPropVoiceCD.xmdl.Full.M)
Data: geminationCons
Models:
durationPropVoiceCD.xmdl.Null.M: durationVoicedPerc ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationPropVoiceCD.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
durationPropVoiceCD.xmdl.Full.M: durationVoicedPerc ~ voiced.unvoiced_c + singGem_c + sex_c +
durationPropVoiceCD.xmdl.Full.M: vowelLength_c + syllType_c + place_c + (voiced.unvoiced_c +
durationPropVoiceCD.xmdl.Full.M: singGem_c + vowelLength_c + syllType_c + place_c || speaker) +
durationPropVoiceCD.xmdl.Full.M: (1 | wordTarget)
Df AIC BIC logLik deviance Chisq
durationPropVoiceCD.xmdl.Null.M 9 15646 15696 -7814.1 15628
durationPropVoiceCD.xmdl.Full.M 15 15544 15626 -7756.9 15514 114.35
Chi Df Pr(>Chisq)
durationPropVoiceCD.xmdl.Null.M
durationPropVoiceCD.xmdl.Full.M 6 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(durationPropVoiceCD.xmdl.Full.R))
qqnorm(residuals(durationPropVoiceCD.xmdl.Full.R)); qqline(residuals(durationPropVoiceCD.xmdl.Full.R))
plot(fitted(durationPropVoiceCD.xmdl.Full.R),residuals(durationPropVoiceCD.xmdl.Full.R), cex = 4)
summary(durationPropVoiceCD.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: durationVoicedPerc ~ voiced.unvoiced_c + singGem_c + sex_c +
vowelLength_c + syllType_c + place_c + (voiced.unvoiced_c +
singGem_c + vowelLength_c + syllType_c + place_c || speaker) +
(1 | wordTarget)
Data: geminationCons
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 15489.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.4488 -0.4823 0.0168 0.4620 3.9848
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 12.39 3.519
speaker place_c 21.39 4.625
speaker.1 syllType_c 14.54 3.814
speaker.2 vowelLength_c 29.42 5.424
speaker.3 singGem_c 22.14 4.706
speaker.4 voiced.unvoiced_c 228.20 15.106
speaker.5 (Intercept) 83.35 9.130
Residual 294.90 17.173
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 58.368 2.261 23.234 25.816 < 2e-16 ***
voiced.unvoiced_c 63.207 3.626 21.977 17.430 2.36e-14 ***
singGem_c -14.467 1.608 31.848 -8.995 2.96e-10 ***
sex_c -9.774 4.229 17.935 -2.311 0.03291 *
vowelLength_c -5.577 1.736 29.878 -3.212 0.00315 **
syllType_c 1.693 1.795 38.294 0.943 0.35139
place_c 5.561 2.020 35.168 2.753 0.00928 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.029
singGem_c -0.084 -0.010
sex_c -0.004 0.001 0.001
vowlLngth_c 0.101 -0.039 0.044 -0.001
syllType_c 0.226 0.014 -0.261 -0.002 0.177
place_c 0.200 0.167 -0.008 -0.004 -0.037 0.025
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationCons$fitted_PropVoiceCD <- predict(lmer(durationVoicedPerc~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationCons,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggPropVoiceCDByVoice <- ggplot(geminationCons, aes(y=fitted_PropVoiceCD, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Voicing (%)"))) + labs(title = "CD") +
coord_cartesian(ylim = c(-20,120)) +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationCons$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggPropVoiceCDByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the Proportion of voicing in the closure duration (% Voicing CD) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationCons %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_PropVoiceCD), sd(fitted_PropVoiceCD))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationCons$fitted_PropVoiceCD, geminationCons$singGem:geminationCons$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationCons$fitted_PropVoiceCD and geminationCons$singGem:geminationCons$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced <2e-16 -
geminate:voiceless <2e-16 <2e-16
geminate:voiced <2e-16 <2e-16
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced <2e-16
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
durationPropVoiceRel.xmdl.Full.M <- lmer(durationVoicedPerc~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationPropVoiceRel.xmdl.Full.R <- lmer(durationVoicedPerc~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
durationPropVoiceRel.xmdl.Null.M <- lmer(durationVoicedPerc~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(durationPropVoiceRel.xmdl.Null.M,durationPropVoiceRel.xmdl.Full.M)
Data: geminationVOT
Models:
durationPropVoiceRel.xmdl.Null.M: durationVoicedPerc ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
durationPropVoiceRel.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
durationPropVoiceRel.xmdl.Full.M: durationVoicedPerc ~ voiced.unvoiced_c + singGem_c + sex_c +
durationPropVoiceRel.xmdl.Full.M: vowelLength_c + syllType_c + place_c + (voiced.unvoiced_c +
durationPropVoiceRel.xmdl.Full.M: singGem_c + vowelLength_c + syllType_c + place_c || speaker) +
durationPropVoiceRel.xmdl.Full.M: (1 | wordTarget)
Df AIC BIC logLik deviance
durationPropVoiceRel.xmdl.Null.M 9 17398 17447 -8689.8 17380
durationPropVoiceRel.xmdl.Full.M 15 17310 17392 -8639.8 17280
Chisq Chi Df Pr(>Chisq)
durationPropVoiceRel.xmdl.Null.M
durationPropVoiceRel.xmdl.Full.M 99.948 6 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(durationPropVoiceRel.xmdl.Full.R))
qqnorm(residuals(durationPropVoiceRel.xmdl.Full.R)); qqline(residuals(durationPropVoiceRel.xmdl.Full.R))
plot(fitted(durationPropVoiceRel.xmdl.Full.R),residuals(durationPropVoiceRel.xmdl.Full.R), cex = 4)
summary(durationPropVoiceRel.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: durationVoicedPerc ~ voiced.unvoiced_c + singGem_c + sex_c +
vowelLength_c + syllType_c + place_c + (voiced.unvoiced_c +
singGem_c + vowelLength_c + syllType_c + place_c || speaker) +
(1 | wordTarget)
Data: geminationVOT
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 17250.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.3311 -0.4157 -0.0002 0.4979 2.9874
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 55.52 7.451
speaker place_c 0.00 0.000
speaker.1 syllType_c 36.37 6.031
speaker.2 vowelLength_c 65.13 8.070
speaker.3 singGem_c 112.70 10.616
speaker.4 voiced.unvoiced_c 546.69 23.381
speaker.5 (Intercept) 66.85 8.176
Residual 791.28 28.130
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 38.6360 2.5703 41.8188 15.032 < 2e-16 ***
voiced.unvoiced_c 63.1715 5.7754 24.5602 10.938 6.30e-11 ***
singGem_c -23.5631 3.2786 33.0544 -7.187 3.05e-08 ***
sex_c -20.6862 3.9928 16.9392 -5.181 7.60e-05 ***
vowelLength_c -7.8711 2.9311 39.7212 -2.685 0.01052 *
syllType_c 8.7082 3.2335 50.8425 2.693 0.00956 **
place_c 0.7589 3.2208 91.9741 0.236 0.81425
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.056
singGem_c -0.127 -0.011
sex_c -0.007 0.002 0.002
vowlLngth_c 0.182 -0.050 0.044 -0.002
syllType_c 0.384 0.016 -0.245 -0.003 0.202
place_c 0.379 0.227 -0.008 0.000 -0.046 0.030
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationVOT$fitted_PropVoiceRel <- predict(lmer(durationVoicedPerc~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggPropVoiceRelByVoice <- ggplot(geminationVOT, aes(y=fitted_PropVoiceRel, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Voicing (%)"))) + labs(title = "Rel") +
coord_cartesian(ylim = c(-20,120)) +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationVOT$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggPropVoiceRelByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the Proportion of voicing in the Release Phase (% Voicing Rel) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationVOT %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_PropVoiceRel), sd(fitted_PropVoiceRel))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationVOT$fitted_PropVoiceRel, geminationVOT$singGem:geminationVOT$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationVOT$fitted_PropVoiceRel and geminationVOT$singGem:geminationVOT$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced < 2e-16 -
geminate:voiceless 3.7e-12 < 2e-16
geminate:voiced < 2e-16 < 2e-16
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced < 2e-16
P value adjustment method: fdr
Within Intensity, we have computed the intensity at four locations: At the Offset of the preceding vowel (V1Offset), at the onset of the Release Phase (RelOnset), at the Offset of the Release Phase (RelOffset) and at the Onset of the following vowel (V2Offset).
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
intensityV1Offset.xmdl.Full.M <- lmer(intensityOffset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
intensityV1Offset.xmdl.Full.R <- lmer(intensityOffset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
intensityV1Offset.xmdl.Null.M <- lmer(intensityOffset~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(intensityV1Offset.xmdl.Null.M,intensityV1Offset.xmdl.Full.M)
Data: geminationV1
Models:
intensityV1Offset.xmdl.Null.M: intensityOffset ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
intensityV1Offset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
intensityV1Offset.xmdl.Full.M: intensityOffset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
intensityV1Offset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
intensityV1Offset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq
intensityV1Offset.xmdl.Null.M 9 9091.6 9141 -4536.8 9073.6
intensityV1Offset.xmdl.Full.M 15 8978.6 9061 -4474.3 8948.6 124.98
Chi Df Pr(>Chisq)
intensityV1Offset.xmdl.Null.M
intensityV1Offset.xmdl.Full.M 6 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(intensityV1Offset.xmdl.Full.R))
qqnorm(residuals(intensityV1Offset.xmdl.Full.R)); qqline(residuals(intensityV1Offset.xmdl.Full.R))
plot(fitted(intensityV1Offset.xmdl.Full.R),residuals(intensityV1Offset.xmdl.Full.R), cex = 4)
summary(intensityV1Offset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
intensityOffset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV1
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 8945.9
Scaled residuals:
Min 1Q Median 3Q Max
-4.3661 -0.6119 -0.0346 0.5985 4.8353
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 0.9241 0.9613
speaker place_c 0.1154 0.3398
speaker.1 syllType_c 1.1642 1.0790
speaker.2 vowelLength_c 0.7610 0.8724
speaker.3 singGem_c 0.0000 0.0000
speaker.4 voiced.unvoiced_c 0.8781 0.9371
speaker.5 (Intercept) 16.8285 4.1023
Residual 7.3419 2.7096
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 67.3528 0.9409 19.5721 71.585 < 2e-16 ***
voiced.unvoiced_c 2.3547 0.3530 53.0072 6.671 1.52e-08 ***
singGem_c 0.1617 0.2613 92.9803 0.619 0.53760
sex_c -5.6355 1.8425 18.0004 -3.059 0.00676 **
vowelLength_c -1.5195 0.3309 50.8626 -4.592 2.91e-05 ***
syllType_c -3.7793 0.4173 56.6523 -9.057 1.31e-12 ***
place_c -1.8546 0.3808 65.8930 -4.870 7.30e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.034
singGem_c -0.059 -0.030
sex_c 0.000 0.001 0.000
vowlLngth_c 0.059 -0.097 0.065 0.000
syllType_c 0.109 0.027 -0.317 -0.001 0.186
place_c 0.118 0.421 -0.012 0.000 -0.046 0.026
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV1$fitted_V1IntensityOffset <- predict(lmer(intensityOffset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggIntensityV1OffsetByVoice <- ggplot(geminationV1, aes(y=fitted_V1IntensityOffset, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Intensity (dB)"))) + labs(title = "V1 Offset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(50,80)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV1$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggIntensityV1OffsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the Intensity at the Offset of the preceding vowel (V1Offset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV1 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_V1IntensityOffset), sd(fitted_V1IntensityOffset))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV1$fitted_V1IntensityOffset, geminationV1$singGem:geminationV1$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV1$fitted_V1IntensityOffset and geminationV1$singGem:geminationV1$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 5.5e-13 -
geminate:voiceless 0.0013 < 2e-16
geminate:voiced 7.6e-10 0.1047
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced < 2e-16
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
intensityRelOnset.xmdl.Full.M <- lmer(intensityOnset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
intensityRelOnset.xmdl.Full.R <- lmer(intensityOnset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
intensityRelOnset.xmdl.Null.M <- lmer(intensityOnset~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(intensityRelOnset.xmdl.Null.M,intensityRelOnset.xmdl.Full.M)
Data: geminationVOT
Models:
intensityRelOnset.xmdl.Null.M: intensityOnset ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
intensityRelOnset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
intensityRelOnset.xmdl.Full.M: intensityOnset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
intensityRelOnset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
intensityRelOnset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq
intensityRelOnset.xmdl.Null.M 9 9512.7 9562.1 -4747.3 9494.7
intensityRelOnset.xmdl.Full.M 15 9440.9 9523.3 -4705.5 9410.9 83.755
Chi Df Pr(>Chisq)
intensityRelOnset.xmdl.Null.M
intensityRelOnset.xmdl.Full.M 6 5.977e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(intensityRelOnset.xmdl.Full.R))
qqnorm(residuals(intensityRelOnset.xmdl.Full.R)); qqline(residuals(intensityRelOnset.xmdl.Full.R))
plot(fitted(intensityRelOnset.xmdl.Full.R),residuals(intensityRelOnset.xmdl.Full.R), cex = 4)
summary(intensityRelOnset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
intensityOnset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationVOT
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 9403.7
Scaled residuals:
Min 1Q Median 3Q Max
-5.2016 -0.5727 0.0334 0.6051 3.6262
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 2.0187 1.4208
speaker place_c 3.6786 1.9180
speaker.1 syllType_c 1.8039 1.3431
speaker.2 vowelLength_c 0.8808 0.9385
speaker.3 singGem_c 1.1726 1.0829
speaker.4 voiced.unvoiced_c 9.9030 3.1469
speaker.5 (Intercept) 12.5941 3.5488
Residual 8.8587 2.9764
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 59.33375 0.84419 22.02045 70.285 < 2e-16 ***
voiced.unvoiced_c 5.43892 0.80469 28.71351 6.759 2.14e-07 ***
singGem_c -2.34891 0.43301 65.32732 -5.425 9.08e-07 ***
sex_c -3.79889 1.60302 17.95114 -2.370 0.02921 *
vowelLength_c -2.59166 0.42277 71.64730 -6.130 4.30e-08 ***
syllType_c 0.09273 0.55597 67.37955 0.167 0.86803
place_c 1.89781 0.66917 55.03426 2.836 0.00638 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.031
singGem_c -0.075 -0.015
sex_c -0.001 0.000 0.001
vowlLngth_c 0.097 -0.063 0.058 0.000
syllType_c 0.173 0.016 -0.270 -0.001 0.207
place_c 0.141 0.199 -0.008 -0.002 -0.039 0.020
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationVOT$fitted_RelIntensityOnset <- predict(lmer(intensityOnset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggIntensityRelOnsetByVoice <- ggplot(geminationVOT, aes(y=fitted_RelIntensityOnset, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Intensity (dB)"))) + labs(title = "Rel Onset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(50,80)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationVOT$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggIntensityRelOnsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the Intensity at the Onset of the Release phase (RelOnset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationVOT %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_RelIntensityOnset), sd(fitted_RelIntensityOnset))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationVOT$fitted_RelIntensityOnset, geminationVOT$singGem:geminationVOT$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationVOT$fitted_RelIntensityOnset and geminationVOT$singGem:geminationVOT$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced < 2e-16 -
geminate:voiceless 0.078 < 2e-16
geminate:voiced < 2e-16 9.7e-16
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced < 2e-16
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
intensityRelOffset.xmdl.Full.M <- lmer(intensityOffset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
intensityRelOffset.xmdl.Full.R <- lmer(intensityOffset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
intensityRelOffset.xmdl.Null.M <- lmer(intensityOffset~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(intensityRelOffset.xmdl.Null.M,intensityRelOffset.xmdl.Full.M)
Data: geminationVOT
Models:
intensityRelOffset.xmdl.Null.M: intensityOffset ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
intensityRelOffset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
intensityRelOffset.xmdl.Full.M: intensityOffset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
intensityRelOffset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
intensityRelOffset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq
intensityRelOffset.xmdl.Null.M 9 8651.3 8700.7 -4316.6 8633.3
intensityRelOffset.xmdl.Full.M 15 8566.1 8648.4 -4268.0 8536.1 97.2
Chi Df Pr(>Chisq)
intensityRelOffset.xmdl.Null.M
intensityRelOffset.xmdl.Full.M 6 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(intensityRelOffset.xmdl.Full.R))
qqnorm(residuals(intensityRelOffset.xmdl.Full.R)); qqline(residuals(intensityRelOffset.xmdl.Full.R))
plot(fitted(intensityRelOffset.xmdl.Full.R),residuals(intensityRelOffset.xmdl.Full.R), cex = 4)
summary(intensityRelOffset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
intensityOffset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationVOT
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 8534.2
Scaled residuals:
Min 1Q Median 3Q Max
-3.4273 -0.6503 -0.0024 0.6486 4.5428
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 0.6806 0.8250
speaker place_c 0.4863 0.6974
speaker.1 syllType_c 1.2283 1.1083
speaker.2 vowelLength_c 0.3749 0.6123
speaker.3 singGem_c 0.2322 0.4819
speaker.4 voiced.unvoiced_c 1.0280 1.0139
speaker.5 (Intercept) 16.2430 4.0303
Residual 5.7548 2.3989
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 64.5944 0.9193 19.1496 70.265 < 2e-16 ***
voiced.unvoiced_c 1.2817 0.3347 45.3702 3.830 0.000391 ***
singGem_c -0.7361 0.2509 57.9888 -2.933 0.004794 **
sex_c -4.3568 1.8100 17.9928 -2.407 0.027037 *
vowelLength_c -2.4445 0.2691 55.8602 -9.085 1.34e-12 ***
syllType_c 0.8141 0.3854 47.0427 2.113 0.039969 *
place_c -1.6463 0.3594 59.2184 -4.581 2.43e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.027
singGem_c -0.047 -0.025
sex_c -0.001 0.000 0.001
vowlLngth_c 0.056 -0.094 0.063 0.000
syllType_c 0.091 0.023 -0.269 -0.001 0.186
place_c 0.096 0.354 -0.010 -0.001 -0.045 0.022
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationVOT$fitted_RelIntensityOffset <- predict(lmer(intensityOffset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationVOT,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggIntensityRelOffsetByVoice <- ggplot(geminationVOT, aes(y=fitted_RelIntensityOffset, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Intensity (dB)"))) + labs(title = "Rel Offset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(50,80)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationVOT$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggIntensityRelOffsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the Intensity at the Offset of the Release phase (RelOffset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationVOT %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_RelIntensityOffset), sd(fitted_RelIntensityOffset))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationVOT$fitted_RelIntensityOffset, geminationVOT$singGem:geminationVOT$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationVOT$fitted_RelIntensityOffset and geminationVOT$singGem:geminationVOT$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 2.7e-08 -
geminate:voiceless 0.12 2.0e-05
geminate:voiced 2.7e-08 0.73
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 2.2e-05
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
intensityV2Onset.xmdl.Full.M <- lmer(intensityOnset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
intensityV2Onset.xmdl.Full.R <- lmer(intensityOnset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
intensityV2Onset.xmdl.Null.M <- lmer(intensityOnset~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(intensityV2Onset.xmdl.Null.M,intensityV2Onset.xmdl.Full.M)
Data: geminationV2
Models:
intensityV2Onset.xmdl.Null.M: intensityOnset ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
intensityV2Onset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
intensityV2Onset.xmdl.Full.M: intensityOnset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
intensityV2Onset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
intensityV2Onset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq
intensityV2Onset.xmdl.Null.M 9 8549.7 8599.1 -4265.9 8531.7
intensityV2Onset.xmdl.Full.M 15 8468.1 8550.5 -4219.1 8438.1 93.575
Chi Df Pr(>Chisq)
intensityV2Onset.xmdl.Null.M
intensityV2Onset.xmdl.Full.M 6 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(intensityV2Onset.xmdl.Full.R))
qqnorm(residuals(intensityV2Onset.xmdl.Full.R)); qqline(residuals(intensityV2Onset.xmdl.Full.R))
plot(fitted(intensityV2Onset.xmdl.Full.R),residuals(intensityV2Onset.xmdl.Full.R), cex = 4)
summary(intensityV2Onset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
intensityOnset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV2
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 8435.8
Scaled residuals:
Min 1Q Median 3Q Max
-3.4796 -0.6459 -0.0052 0.6282 3.8583
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 0.6972 0.8350
speaker place_c 0.5136 0.7166
speaker.1 syllType_c 1.4878 1.2198
speaker.2 vowelLength_c 0.1494 0.3866
speaker.3 singGem_c 0.3525 0.5937
speaker.4 voiced.unvoiced_c 1.6696 1.2921
speaker.5 (Intercept) 17.4143 4.1730
Residual 5.3894 2.3215
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 67.1891 0.9506 19.0806 70.684 < 2e-16 ***
voiced.unvoiced_c 0.7896 0.3793 37.7429 2.082 0.044213 *
singGem_c -0.4436 0.2623 56.6391 -1.691 0.096298 .
sex_c -6.0558 1.8732 17.9905 -3.233 0.004619 **
vowelLength_c -2.4906 0.2468 65.2472 -10.093 5.9e-15 ***
syllType_c 0.9921 0.4015 44.9078 2.471 0.017328 *
place_c -1.4757 0.3607 59.4414 -4.091 0.000131 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.023
singGem_c -0.043 -0.021
sex_c 0.000 0.000 0.000
vowlLngth_c 0.058 -0.090 0.065 0.000
syllType_c 0.084 0.019 -0.246 -0.001 0.194
place_c 0.092 0.310 -0.010 -0.001 -0.049 0.021
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV2$fitted_V2IntensityOnset <- predict(lmer(intensityOnset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggIntensityV2OnsetByVoice <- ggplot(geminationV2, aes(y=fitted_V2IntensityOnset, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("Intensity (dB)"))) + labs(title = "V2 Onset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(50,80)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV2$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggIntensityV2OnsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the Intensity at the Onset of the following vowel (V2Onset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV2 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_V2IntensityOnset), sd(fitted_V2IntensityOnset))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV2$fitted_V2IntensityOnset, geminationV2$singGem:geminationV2$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV2$fitted_V2IntensityOnset and geminationV2$singGem:geminationV2$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 0.0031 -
geminate:voiceless 0.1039 0.1504
geminate:voiced 3e-05 0.2000
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 0.0087
P value adjustment method: fdr
Within the fundamental frequency f0, we have computed f0 at two locations: At the Offset of the preceding vowel (V1Offset) and at the Onset of the following vowel (V2Offset).
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
f0V1Offset.xmdl.Full.M <- lmer(f0Offset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
f0V1Offset.xmdl.Full.R <- lmer(f0Offset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
f0V1Offset.xmdl.Null.M <- lmer(f0Offset~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(f0V1Offset.xmdl.Null.M,f0V1Offset.xmdl.Full.M)
Data: geminationV1
Models:
f0V1Offset.xmdl.Null.M: f0Offset ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
f0V1Offset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
f0V1Offset.xmdl.Full.M: f0Offset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
f0V1Offset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
f0V1Offset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
f0V1Offset.xmdl.Null.M 9 14274 14324 -7128.1 14256
f0V1Offset.xmdl.Full.M 15 14173 14256 -7071.7 14143 112.75 6
Pr(>Chisq)
f0V1Offset.xmdl.Null.M
f0V1Offset.xmdl.Full.M < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(f0V1Offset.xmdl.Full.R))
qqnorm(residuals(f0V1Offset.xmdl.Full.R)); qqline(residuals(f0V1Offset.xmdl.Full.R))
plot(fitted(f0V1Offset.xmdl.Full.R),residuals(f0V1Offset.xmdl.Full.R), cex = 4)
summary(f0V1Offset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
f0Offset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV1
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 14118.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.5284 -0.4831 -0.0411 0.5022 5.6981
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 13.704 3.702
speaker place_c 0.000 0.000
speaker.1 syllType_c 72.720 8.528
speaker.2 vowelLength_c 105.182 10.256
speaker.3 singGem_c 5.242 2.290
speaker.4 voiced.unvoiced_c 14.319 3.784
speaker.5 (Intercept) 435.365 20.865
Residual 128.927 11.355
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 161.417 4.739 18.894 34.064 < 2e-16 ***
voiced.unvoiced_c -4.657 1.407 55.014 -3.310 0.00165 **
singGem_c 4.557 1.155 54.227 3.946 0.00023 ***
sex_c 89.850 9.362 17.997 9.597 1.68e-08 ***
vowelLength_c -12.207 2.526 24.199 -4.833 6.22e-05 ***
syllType_c -21.748 2.335 30.551 -9.313 1.96e-10 ***
place_c 0.401 1.476 91.648 0.272 0.78649
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.026
singGem_c -0.041 -0.027
sex_c 0.000 0.001 0.001
vowlLngth_c 0.024 -0.050 0.030 0.000
syllType_c 0.061 0.019 -0.201 -0.001 0.068
place_c 0.094 0.427 -0.011 0.000 -0.024 0.019
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV1$fitted_V1F0Offset <- predict(lmer(f0Offset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggF0V1OffsetByVoice <- ggplot(geminationV1, aes(y=fitted_V1F0Offset, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("F0 (Hz)"))) + labs(title = "V1 Offset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(80,300)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV1$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggF0V1OffsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the F0 at the Offset of the preceding vowel (V1Offset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV1 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_V1F0Offset), sd(fitted_V1F0Offset))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV1$fitted_V1F0Offset, geminationV1$singGem:geminationV1$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV1$fitted_V1F0Offset and geminationV1$singGem:geminationV1$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 0.06 -
geminate:voiceless 0.44 0.20
geminate:voiced 0.06 0.94
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 0.20
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
f0V2Onset.xmdl.Full.M <- lmer(f0Onset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
f0V2Onset.xmdl.Full.R <- lmer(f0Onset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
f0V2Onset.xmdl.Null.M <- lmer(f0Onset~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(f0V2Onset.xmdl.Null.M,f0V2Onset.xmdl.Full.M)
Data: geminationV2
Models:
f0V2Onset.xmdl.Null.M: f0Onset ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
f0V2Onset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
f0V2Onset.xmdl.Full.M: f0Onset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
f0V2Onset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
f0V2Onset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
f0V2Onset.xmdl.Null.M 9 14956 15006 -7469.2 14938
f0V2Onset.xmdl.Full.M 15 14898 14980 -7433.8 14868 70.926 6
Pr(>Chisq)
f0V2Onset.xmdl.Null.M
f0V2Onset.xmdl.Full.M 2.64e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(f0V2Onset.xmdl.Full.R))
qqnorm(residuals(f0V2Onset.xmdl.Full.R)); qqline(residuals(f0V2Onset.xmdl.Full.R))
plot(fitted(f0V2Onset.xmdl.Full.R),residuals(f0V2Onset.xmdl.Full.R), cex = 4)
summary(f0V2Onset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
f0Onset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV2
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 14840.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.8461 -0.5214 -0.0433 0.4939 6.3033
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 19.61 4.429
speaker place_c 14.50 3.807
speaker.1 syllType_c 206.14 14.358
speaker.2 vowelLength_c 28.80 5.367
speaker.3 singGem_c 16.04 4.005
speaker.4 voiced.unvoiced_c 28.51 5.339
speaker.5 (Intercept) 364.99 19.105
Residual 194.22 13.936
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 166.589 4.388 19.466 37.968 < 2e-16 ***
voiced.unvoiced_c -4.001 1.807 44.099 -2.213 0.032089 *
singGem_c -3.419 1.538 46.068 -2.223 0.031150 *
sex_c 86.824 8.605 18.005 10.090 7.75e-09 ***
vowelLength_c -3.551 1.753 39.840 -2.026 0.049511 *
syllType_c 15.306 3.599 25.112 4.252 0.000257 ***
place_c 3.116 1.977 57.301 1.576 0.120614
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.032
singGem_c -0.049 -0.023
sex_c -0.001 0.000 0.001
vowlLngth_c 0.054 -0.081 0.048 -0.001
syllType_c 0.062 0.014 -0.143 -0.001 0.093
place_c 0.111 0.362 -0.009 -0.001 -0.039 0.013
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV2$fitted_V2F0Onset <- predict(lmer(f0Onset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggF0V2OnsetByVoice <- ggplot(geminationV2, aes(y=fitted_V2F0Onset, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("F0 (Hz)"))) + labs(title = "V2 Onset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(80,300)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV2$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggF0V2OnsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the F0 at the Onset of the following vowel (V2Onset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV2 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_V2F0Onset), sd(fitted_V2F0Onset))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV2$fitted_V2F0Onset, geminationV2$singGem:geminationV2$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV2$fitted_V2F0Onset and geminationV2$singGem:geminationV2$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 0.0741 -
geminate:voiceless 0.3912 0.0053
geminate:voiced 0.1380 0.5774
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 0.0090
P value adjustment method: fdr
Within the first formant frequency (F1), we have computed F1 at two locations: At the Offset of the preceding vowel (V1Offset) and at the Onset of the following vowel (V2Offset).
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
f1V1Offset.xmdl.Full.M <- lmer(f1Offset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
f1V1Offset.xmdl.Full.R <- lmer(f1Offset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
f1V1Offset.xmdl.Null.M <- lmer(f1Offset~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(f1V1Offset.xmdl.Null.M,f1V1Offset.xmdl.Full.M)
Data: geminationV1
Models:
f1V1Offset.xmdl.Null.M: f1Offset ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
f1V1Offset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
f1V1Offset.xmdl.Full.M: f1Offset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
f1V1Offset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
f1V1Offset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
f1V1Offset.xmdl.Null.M 9 21072 21122 -10527 21054
f1V1Offset.xmdl.Full.M 15 21029 21111 -10500 20999 55.37 6
Pr(>Chisq)
f1V1Offset.xmdl.Null.M
f1V1Offset.xmdl.Full.M 3.902e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(f1V1Offset.xmdl.Full.R))
qqnorm(residuals(f1V1Offset.xmdl.Full.R)); qqline(residuals(f1V1Offset.xmdl.Full.R))
plot(fitted(f1V1Offset.xmdl.Full.R),residuals(f1V1Offset.xmdl.Full.R), cex = 4)
summary(f1V1Offset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
f1Offset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV1
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 20948.7
Scaled residuals:
Min 1Q Median 3Q Max
-3.8871 -0.5834 -0.0374 0.4792 6.0431
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 2345.9 48.43
speaker place_c 656.3 25.62
speaker.1 syllType_c 2997.7 54.75
speaker.2 vowelLength_c 874.6 29.57
speaker.3 singGem_c 0.0 0.00
speaker.4 voiced.unvoiced_c 1266.7 35.59
speaker.5 (Intercept) 1678.2 40.97
Residual 5780.7 76.03
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 500.385 13.113 54.128 38.159 < 2e-16 ***
voiced.unvoiced_c -57.289 15.005 81.271 -3.818 0.000262 ***
singGem_c -1.322 11.689 92.445 -0.113 0.910211
sex_c 70.557 19.127 18.080 3.689 0.001669 **
vowelLength_c -52.440 13.664 84.584 -3.838 0.000239 ***
syllType_c -85.399 19.557 68.263 -4.367 4.39e-05 ***
place_c -64.575 17.674 91.021 -3.654 0.000432 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.114
singGem_c -0.190 -0.031
sex_c -0.004 0.000 0.001
vowlLngth_c 0.205 -0.111 0.070 -0.001
syllType_c 0.337 0.026 -0.301 -0.002 0.194
place_c 0.364 0.429 -0.012 -0.003 -0.048 0.023
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV1$fitted_V1F1Offset <- predict(lmer(f1Offset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggF1V1OffsetByVoice <- ggplot(geminationV1, aes(y=fitted_V1F1Offset, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("F1 (Hz)"))) + labs(title = "V1 Offset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(300,900)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV1$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggF1V1OffsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the F1 at the Offset of the preceding vowel (V1Offset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV1 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_V1F1Offset), sd(fitted_V1F1Offset))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV1$fitted_V1F1Offset, geminationV1$singGem:geminationV1$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV1$fitted_V1F1Offset and geminationV1$singGem:geminationV1$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 5e-09 -
geminate:voiceless 0.00055 0.00991
geminate:voiced < 2e-16 0.00202
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 2e-08
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
f1V2Onset.xmdl.Full.M <- lmer(f1Onset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
f1V2Onset.xmdl.Full.R <- lmer(f1Onset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
f1V2Onset.xmdl.Null.M <- lmer(f1Onset~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(f1V2Onset.xmdl.Null.M,f1V2Onset.xmdl.Full.M)
Data: geminationV2
Models:
f1V2Onset.xmdl.Null.M: f1Onset ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
f1V2Onset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
f1V2Onset.xmdl.Full.M: f1Onset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
f1V2Onset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
f1V2Onset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq Chi Df
f1V2Onset.xmdl.Null.M 9 20131 20180 -10056 20113
f1V2Onset.xmdl.Full.M 15 20044 20127 -10007 20014 98.677 6
Pr(>Chisq)
f1V2Onset.xmdl.Null.M
f1V2Onset.xmdl.Full.M < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(f1V2Onset.xmdl.Full.R))
qqnorm(residuals(f1V2Onset.xmdl.Full.R)); qqline(residuals(f1V2Onset.xmdl.Full.R))
plot(fitted(f1V2Onset.xmdl.Full.R),residuals(f1V2Onset.xmdl.Full.R), cex = 4)
summary(f1V2Onset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
f1Onset ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV2
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 19968.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.7533 -0.5218 -0.0820 0.4367 9.4623
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 1183.657 34.40
speaker place_c 923.620 30.39
speaker.1 syllType_c 357.077 18.90
speaker.2 vowelLength_c 403.158 20.08
speaker.3 singGem_c 3.763 1.94
speaker.4 voiced.unvoiced_c 641.256 25.32
speaker.5 (Intercept) 1319.244 36.32
Residual 3383.564 58.17
Number of obs: 1793, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 451.219 10.548 41.734 42.778 < 2e-16 ***
voiced.unvoiced_c -48.764 10.737 78.891 -4.542 1.98e-05 ***
singGem_c -13.838 8.394 85.723 -1.649 0.102905
sex_c 66.262 16.766 17.982 3.952 0.000936 ***
vowelLength_c -82.624 9.680 82.697 -8.536 5.69e-13 ***
syllType_c -55.577 11.723 91.512 -4.741 7.78e-06 ***
place_c -61.507 13.788 82.751 -4.461 2.55e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.102
singGem_c -0.169 -0.031
sex_c -0.004 0.000 0.001
vowlLngth_c 0.185 -0.112 0.071 0.000
syllType_c 0.359 0.031 -0.359 -0.002 0.234
place_c 0.299 0.395 -0.011 -0.003 -0.045 0.025
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV2$fitted_V2F1Onset <- predict(lmer(f1Onset~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggF1V2OnsetByVoice <- ggplot(geminationV2, aes(y=fitted_V2F1Onset, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste("F1 (Hz)"))) + labs(title = "V2 Onset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(300,900)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV2$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggF1V2OnsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the F1 at the Onset of the following vowel (V2Onset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV2 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_V2F1Onset), sd(fitted_V2F1Onset))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV2$fitted_V2F1Onset, geminationV2$singGem:geminationV2$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV2$fitted_V2F1Onset and geminationV2$singGem:geminationV2$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 3.5e-11 -
geminate:voiceless 0.00103 0.00038
geminate:voiced < 2e-16 0.01329
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 9.5e-10
P value adjustment method: fdr
Within the differences between the normalised first and second harmonics to cue voice quality (H1*-H2*), we have computed H1*-H2* at two locations: At the Offset of the preceding vowel (V1Offset) and at the Onset of the following vowel (V2Offset).
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
h1mnh2V1Offset.xmdl.Full.M <- lmer(h1mnh2OffsetNorm~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
h1mnh2V1Offset.xmdl.Full.R <- lmer(h1mnh2OffsetNorm~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
h1mnh2V1Offset.xmdl.Null.M <- lmer(h1mnh2OffsetNorm~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(h1mnh2V1Offset.xmdl.Null.M,h1mnh2V1Offset.xmdl.Full.M)
Data: geminationV1
Models:
h1mnh2V1Offset.xmdl.Null.M: h1mnh2OffsetNorm ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
h1mnh2V1Offset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
h1mnh2V1Offset.xmdl.Full.M: h1mnh2OffsetNorm ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
h1mnh2V1Offset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
h1mnh2V1Offset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq
h1mnh2V1Offset.xmdl.Null.M 9 9135.3 9183.5 -4558.7 9117.3
h1mnh2V1Offset.xmdl.Full.M 15 9143.1 9223.4 -4556.5 9113.1 4.2286
Chi Df Pr(>Chisq)
h1mnh2V1Offset.xmdl.Null.M
h1mnh2V1Offset.xmdl.Full.M 6 0.6458
As can be seen, our optimal model did not improve the model fit. However, we will still use the optimal model to allow for comparison with the other models. This non-significant effect is indicative of a non-association between H1*-H2* and the predictors.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(h1mnh2V1Offset.xmdl.Full.R))
qqnorm(residuals(h1mnh2V1Offset.xmdl.Full.R)); qqline(residuals(h1mnh2V1Offset.xmdl.Full.R))
plot(fitted(h1mnh2V1Offset.xmdl.Full.R),residuals(h1mnh2V1Offset.xmdl.Full.R), cex = 4)
summary(h1mnh2V1Offset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
h1mnh2OffsetNorm ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV1
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 9113
Scaled residuals:
Min 1Q Median 3Q Max
-3.0232 -0.6442 -0.1055 0.5667 4.2980
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 0.000e+00 0.000e+00
speaker place_c 2.997e-15 5.474e-08
speaker.1 syllType_c 0.000e+00 0.000e+00
speaker.2 vowelLength_c 7.493e-02 2.737e-01
speaker.3 singGem_c 0.000e+00 0.000e+00
speaker.4 voiced.unvoiced_c 1.638e-01 4.047e-01
speaker.5 (Intercept) 1.194e+01 3.456e+00
Residual 1.872e+01 4.326e+00
Number of obs: 1567, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -2.039e+00 7.979e-01 1.962e+01 -2.555 0.0191 *
voiced.unvoiced_c 5.505e-02 2.840e-01 3.357e+01 0.194 0.8475
singGem_c -3.431e-03 2.452e-01 1.515e+03 -0.014 0.9888
sex_c 1.829e+00 1.561e+00 1.798e+01 1.172 0.2566
vowelLength_c 8.373e-02 2.593e-01 2.352e+01 0.323 0.7496
syllType_c 3.428e-01 3.223e-01 1.514e+03 1.064 0.2877
place_c 4.023e-01 3.550e-01 1.521e+03 1.133 0.2573
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.041
singGem_c -0.064 -0.026
sex_c -0.001 0.003 -0.001
vowlLngth_c 0.080 -0.150 0.074 -0.001
syllType_c 0.150 0.034 -0.392 -0.001 0.264
place_c 0.134 0.509 -0.010 0.001 -0.065 0.030
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV1$fitted_V1h1mnh2OffsetNorm <- predict(lmer(h1mnh2OffsetNorm~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV1,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggh1mnh2V1OffsetByVoice <- ggplot(geminationV1, aes(y=fitted_V1h1mnh2OffsetNorm, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste(italic("H"),"1*-",italic("H"),"2* (dB)"))) + labs(title = "V1 Offset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(-10,10)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV1$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggh1mnh2V1OffsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the H1*-H2* at the Offset of the preceding vowel (V1Offset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV1 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_V1h1mnh2OffsetNorm,na.rm = T), sd(fitted_V1h1mnh2OffsetNorm,na.rm = T))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV1$fitted_V1h1mnh2OffsetNorm, geminationV1$singGem:geminationV1$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV1$fitted_V1h1mnh2OffsetNorm and geminationV1$singGem:geminationV1$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 0.84 -
geminate:voiceless 0.85 0.84
geminate:voiced 0.85 0.84
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 0.84
P value adjustment method: fdr
We run three models. For model comparisons, we are required to use Maximum Likelihood (ML) on both Optimal and Null models (by adding REML=F). Then we run a second Optimal models with Restricted Maximum Likelihood (REML), to obtain the coefficients (by adding REML=T)
h1mnh2V2Onset.xmdl.Full.M <- lmer(h1mnh2OnsetNorm~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
h1mnh2V2Onset.xmdl.Full.R <- lmer(h1mnh2OnsetNorm~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
h1mnh2V2Onset.xmdl.Null.M <- lmer(h1mnh2OnsetNorm~1+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=F,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
anova(h1mnh2V2Onset.xmdl.Null.M,h1mnh2V2Onset.xmdl.Full.M)
Data: geminationV2
Models:
h1mnh2V2Onset.xmdl.Null.M: h1mnh2OnsetNorm ~ 1 + (voiced.unvoiced_c + singGem_c + vowelLength_c +
h1mnh2V2Onset.xmdl.Null.M: syllType_c + place_c || speaker) + (1 | wordTarget)
h1mnh2V2Onset.xmdl.Full.M: h1mnh2OnsetNorm ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
h1mnh2V2Onset.xmdl.Full.M: syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
h1mnh2V2Onset.xmdl.Full.M: syllType_c + place_c || speaker) + (1 | wordTarget)
Df AIC BIC logLik deviance Chisq
h1mnh2V2Onset.xmdl.Null.M 9 10989 11039 -5485.7 10971
h1mnh2V2Onset.xmdl.Full.M 15 10988 11070 -5478.7 10958 13.965
Chi Df Pr(>Chisq)
h1mnh2V2Onset.xmdl.Null.M
h1mnh2V2Onset.xmdl.Full.M 6 0.03003 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
As can be seen, our optimal model improved the model fit.
We inspect our model below, by looking at the residuals. There are minor deviations observed, due to the nature of our data.
hist(residuals(h1mnh2V2Onset.xmdl.Full.R))
qqnorm(residuals(h1mnh2V2Onset.xmdl.Full.R)); qqline(residuals(h1mnh2V2Onset.xmdl.Full.R))
plot(fitted(h1mnh2V2Onset.xmdl.Full.R),residuals(h1mnh2V2Onset.xmdl.Full.R), cex = 4)
summary(h1mnh2V2Onset.xmdl.Full.R)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
h1mnh2OnsetNorm ~ voiced.unvoiced_c + singGem_c + sex_c + vowelLength_c +
syllType_c + place_c + (voiced.unvoiced_c + singGem_c + vowelLength_c +
syllType_c + place_c || speaker) + (1 | wordTarget)
Data: geminationV2
Control:
lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 1e+05))
REML criterion at convergence: 10955.2
Scaled residuals:
Min 1Q Median 3Q Max
-3.8200 -0.5976 -0.1252 0.4466 4.7005
Random effects:
Groups Name Variance Std.Dev.
wordTarget (Intercept) 0.000e+00 0.000e+00
speaker place_c 1.303e+00 1.141e+00
speaker.1 syllType_c 0.000e+00 0.000e+00
speaker.2 vowelLength_c 5.084e-12 2.255e-06
speaker.3 singGem_c 2.356e-01 4.854e-01
speaker.4 voiced.unvoiced_c 5.053e-01 7.109e-01
speaker.5 (Intercept) 1.506e+01 3.881e+00
Residual 2.595e+01 5.094e+00
Number of obs: 1783, groups: wordTarget, 98; speaker, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -2.2902 0.8945 19.2938 -2.560 0.0190 *
voiced.unvoiced_c -0.1869 0.3347 25.0199 -0.558 0.5816
singGem_c -0.7220 0.2933 27.0620 -2.462 0.0205 *
sex_c 2.6897 1.7578 17.9598 1.530 0.1434
vowelLength_c -0.2544 0.2779 1713.4068 -0.915 0.3601
syllType_c 0.3365 0.3524 1710.6079 0.955 0.3397
place_c 0.8090 0.4643 23.3301 1.742 0.0946 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) vcd.n_ sngGm_ sex_c vwlLn_ syllT_
vocd.nvcd_c 0.041
singGem_c -0.058 -0.032
sex_c -0.001 0.001 0.001
vowlLngth_c 0.080 -0.130 0.073 0.001
syllType_c 0.145 0.042 -0.372 0.000 0.281
place_c 0.110 0.392 -0.010 -0.002 -0.050 0.031
The figure presented below, uses the predicted (or fitted) values; these are the values adjusted by the statistical model. These take into account the random effects’ structure as well as the fixed effects’ structure.
The following code is used to record the predicted (or fitted) values into the same data-frame. This will then be used to plot the results of the model
geminationV2$fitted_V2h1mnh2OnsetNorm <- predict(lmer(h1mnh2OnsetNorm~voiced.unvoiced_c+singGem_c+sex_c+vowelLength_c+syllType_c+place_c+
(voiced.unvoiced_c+singGem_c+vowelLength_c+syllType_c+place_c||speaker)+(1|wordTarget),
data = geminationV2,na.action=na.exclude,
REML=T,
control = lmerControl(optimizer = "bobyqa",optCtrl=list(maxfun=1e5)))
)
We use “ggplot2” to plot the results. This will use the “predicted” (or fitted) values adjusted by our statistical model. The dashed lines within the boxplots are the mean values; the median is marked by the horizontal straight lines.
labels <- c(singleton = "Singleton", geminate = "Geminate")
ggh1mnh2V2OnsetByVoice <- ggplot(geminationV2, aes(y=fitted_V2h1mnh2OnsetNorm, x=voiced.unvoiced)) +
geom_boxplot(outlier.colour = "NA",na.rm=TRUE) +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y..),
width = .75, linetype = "dashed") +
ylab(expression(paste(italic("H"),"1*-",italic("H"),"2* (dB)"))) + labs(title = "V2 Onset") +
theme_set(theme_bw()) + theme(text=element_text(size=20)) +
coord_cartesian(ylim = c(-10,10)) +
theme(axis.title.x = element_blank()) + theme(legend.title=element_blank()) +
scale_x_discrete(breaks=levels(geminationV2$voiced.unvoiced), labels=c("Voiceless","Voiced")) +
facet_wrap(~singGem,ncol=1,labeller=labeller(singGem = labels))
ggh1mnh2V2OnsetByVoice
After running our statistical model above, we wanted to evaluate whether the differences observed between “voicing” and “gemination” is statistically significant (or not). We present the means and standard deviations for each of the four-way contrast based on the predicted (or fitted) values.
The predicted means and SDs of the H1*-H2* at the Onset of the following vowel (V2Onset) are provided below. As can be seen, there are already differences observed related to both “voicing” and “gemination”.
# using "dplyr"
geminationV2 %>% group_by(singGem:voiced.unvoiced) %>% summarise(mean(fitted_V2h1mnh2OnsetNorm,na.rm = T), sd(fitted_V2h1mnh2OnsetNorm,na.rm = T))
We used the pairwise t-test as available in base R. The input was the predicted (or fitted) values as adjusted by our statistical model. It was not possible to directly use our statistical model’s output to compare models, given that our LMM model was looking at overall change as a main effect rather than as a simple effect’s model. This is why we have used the predicted (or fitted) values as adjusted by our statistical model; the results are the same if we were to use the “actual” results rather than the predicted ones. We used FDR (for False Discovery Rate) correction for multiple comparisons as this allowed for correcting for bias in the differences in doing multiple comparisons.
pairwise.t.test(geminationV2$fitted_V2h1mnh2OnsetNorm, geminationV2$singGem:geminationV2$voiced.unvoiced, p.adj = "fdr")
Pairwise comparisons using t tests with pooled SD
data: geminationV2$fitted_V2h1mnh2OnsetNorm and geminationV2$singGem:geminationV2$voiced.unvoiced
singleton:voiceless singleton:voiced
singleton:voiced 0.05387 -
geminate:voiceless 0.06751 0.87847
geminate:voiced 0.00017 0.04972
geminate:voiceless
singleton:voiced -
geminate:voiceless -
geminate:voiced 0.04972
P value adjustment method: fdr