## Use the code below to check if you have all required packages installed. If some are not installed already, the code below will install these. If you have all packages installed, then you could load them with the second code.
= c('tidyverse', 'broom', 'knitr', 'Hmisc', 'corrplot', 'lme4', 'lmerTest', 'party', 'ranger','doFuture', 'tidymodels', 'pROC', 'varImp', 'lattice', 'vip', 'emmeans', 'ggsignif', 'PresenceAbsence', 'languageR', 'FactoMineR', 'factoextra', 'RColorBrewer', 'scatterplot3d', 'cowplot', 'psycho', 'ordinal')
requiredPackages for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p)
library(p,character.only = TRUE)
}
Loading required package: tidyverse
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
-- Attaching packages -----------
v ggplot2 3.3.5 v purrr 0.3.4
v tibble 3.1.5 v dplyr 1.0.7
v tidyr 1.1.4 v stringr 1.4.0
v readr 2.0.2 v forcats 0.5.1
-- Conflicts --------------------
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
Loading required package: broom
Loading required package: knitr
Loading required package: Hmisc
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:dplyr’:
src, summarize
The following objects are masked from ‘package:base’:
format.pval, units
Loading required package: corrplot
corrplot 0.90 loaded
Loading required package: lme4
Loading required package: Matrix
Attaching package: ‘Matrix’
The following objects are masked from ‘package:tidyr’:
expand, pack, unpack
Loading required package: lmerTest
Attaching package: ‘lmerTest’
The following object is masked from ‘package:lme4’:
lmer
The following object is masked from ‘package:stats’:
step
Loading required package: party
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Attaching package: ‘modeltools’
The following object is masked from ‘package:lme4’:
refit
Loading required package: strucchange
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Loading required package: sandwich
Attaching package: ‘strucchange’
The following object is masked from ‘package:stringr’:
boundary
Loading required package: ranger
Loading required package: doFuture
Loading required package: foreach
Attaching package: ‘foreach’
The following objects are masked from ‘package:purrr’:
accumulate, when
Loading required package: future
Attaching package: ‘future’
The following object is masked from ‘package:survival’:
cluster
Loading required package: tidymodels
Registered S3 method overwritten by 'tune':
method from
required_pkgs.model_spec parsnip
-- Attaching packages -----------
v dials 0.0.10 v rsample 0.1.0
v infer 1.0.0 v tune 0.1.6
v modeldata 0.1.1 v workflows 0.2.4
v parsnip 0.1.7 v workflowsets 0.1.0
v recipes 0.1.17 v yardstick 0.0.8
-- Conflicts --------------------
x foreach::accumulate() masks purrr::accumulate()
x scales::discard() masks purrr::discard()
x Matrix::expand() masks tidyr::expand()
x dplyr::filter() masks stats::filter()
x parsnip::fit() masks infer::fit(), party::fit(), modeltools::fit()
x recipes::fixed() masks stringr::fixed()
x dplyr::lag() masks stats::lag()
x Matrix::pack() masks tidyr::pack()
x tune::parameters() masks dials::parameters(), modeltools::parameters()
x yardstick::spec() masks readr::spec()
x Hmisc::src() masks dplyr::src()
x recipes::step() masks lmerTest::step(), stats::step()
x Hmisc::summarize() masks dplyr::summarize()
x parsnip::translate() masks Hmisc::translate()
x Matrix::unpack() masks tidyr::unpack()
x recipes::update() masks stats4::update(), Matrix::update(), stats::update()
x foreach::when() masks purrr::when()
* Dig deeper into tidy modeling with R at https://www.tmwr.org
Loading required package: pROC
Type 'citation("pROC")' for a citation.
Attaching package: ‘pROC’
The following objects are masked from ‘package:stats’:
cov, smooth, var
Loading required package: varImp
Loading required package: measures
Loading required package: vip
Attaching package: ‘vip’
The following object is masked from ‘package:utils’:
vi
Loading required package: emmeans
Loading required package: ggsignif
Loading required package: PresenceAbsence
Attaching package: ‘PresenceAbsence’
The following object is masked from ‘package:pROC’:
auc
The following objects are masked from ‘package:yardstick’:
sensitivity, specificity
Loading required package: languageR
Loading required package: FactoMineR
Loading required package: factoextra
Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
Loading required package: RColorBrewer
Loading required package: scatterplot3d
Loading required package: cowplot
Loading required package: psycho
Attaching package: ‘psycho’
The following object is masked from ‘package:future’:
values
The following object is masked from ‘package:lme4’:
golden
Loading required package: ordinal
Attaching package: ‘ordinal’
The following objects are masked from ‘package:lme4’:
ranef, VarCorr
The following object is masked from ‘package:dplyr’:
slice
Let us start with a basic correlation test. We want to evaluate if two numeric variables are correlated with each other.
We use the function cor
to obtain the pearson correlation and cor.test
to run a basic correlation test on our data with significance testing
cor(english$RTlexdec, english$RTnaming, method = "pearson")
[1] 0.7587033
cor.test(english$RTlexdec, english$RTnaming)
Pearson's product-moment
correlation
data: english$RTlexdec and english$RTnaming
t = 78.699, df = 4566,
p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7461195 0.7707453
sample estimates:
cor
0.7587033
What these results are telling us? There is a positive correlation between RTlexdec
and RTnaming
. The correlation coefficient (R²) is 0.76 (limits between -1 and 1). This correlation is statistically significant with a t value of 78.699, degrees of freedom of 4566 and a p-value < 2.2e-16.
What are the degrees of freedom? These relate to number of total observations - number of comparisons. Here we have 4568 observations in the dataset, and two comparisons, hence 4568 - 2 = 4566.
For the p value, there is a threshold we usually use. This threshold is p = 0.05. This threshold means we have a minimum to consider any difference as significant or not. 0.05 means that we have a probability to find a significant difference that is at 5% or lower. IN our case, the p value is lower that 2.2e-16. How to interpret this number? this tells us to add 15 0s before the 2!! i.e., 0.0000000000000002. This probability is very (very!!) low. So we conclude that there is a statistically significant correlation between the two variables.
The formula to calculate the t value is below.
x̄ = sample mean μ0 = population mean s = sample standard deviation n = sample size
The p value is influenced by various factors, number of observations, strength of the difference, mean values, etc.. You should always be careful with interpreting p values taking everything else into account.
corrplot
Above, we did a correlation test on two predictors. What if we want to obtain a nice plot of all numeric predictors and add significance levels?
<-
corr %>%
english select(where(is.numeric)) %>%
cor() %>%
print()
RTlexdec
RTlexdec 1.000000000
RTnaming 0.758703280
Familiarity -0.444409734
WrittenFrequency -0.434814982
WrittenSpokenFrequencyRatio 0.039820007
FamilySize -0.349595853
DerivationalEntropy -0.161164620
InflectionalEntropy -0.088418681
NumberSimplexSynsets -0.309008140
NumberComplexSynsets -0.328613209
LengthInLetters 0.049747275
Ncount -0.065726313
MeanBigramFrequency 0.002633525
FrequencyInitialDiphone -0.074452719
ConspelV -0.032867467
ConspelN -0.107538023
ConphonV -0.021747588
ConphonN -0.080930543
ConfriendsV -0.025720833
ConfriendsN -0.117532883
ConffV -0.016494945
ConffN -0.005679088
ConfbV -0.022515482
ConfbN -0.018539499
NounFrequency -0.167189500
VerbFrequency -0.076388309
FrequencyInitialDiphoneWord -0.042640861
FrequencyInitialDiphoneSyllable -0.035503708
CorrectLexdec -0.253188184
RTnaming
RTlexdec 0.758703280
RTnaming 1.000000000
Familiarity -0.094793069
WrittenFrequency -0.095994313
WrittenSpokenFrequencyRatio 0.036592754
FamilySize -0.088037010
DerivationalEntropy -0.049456670
InflectionalEntropy -0.022110376
NumberSimplexSynsets -0.071900207
NumberComplexSynsets -0.076384846
LengthInLetters 0.094497065
Ncount -0.094669618
MeanBigramFrequency 0.048459360
FrequencyInitialDiphone -0.057216874
ConspelV -0.025165185
ConspelN -0.034801239
ConphonV 0.001175572
ConphonN -0.014364896
ConfriendsV -0.027741071
ConfriendsN -0.044064997
ConffV 0.007924551
ConffN 0.011407182
ConfbV 0.019159417
ConfbN 0.017011731
NounFrequency -0.043148572
VerbFrequency -0.024593780
FrequencyInitialDiphoneWord 0.020488545
FrequencyInitialDiphoneSyllable 0.026897756
CorrectLexdec 0.151348043
Familiarity
RTlexdec -0.44440973
RTnaming -0.09479307
Familiarity 1.00000000
WrittenFrequency 0.79125559
WrittenSpokenFrequencyRatio -0.18989881
FamilySize 0.59191973
DerivationalEntropy 0.22071588
InflectionalEntropy 0.10795420
NumberSimplexSynsets 0.51065170
NumberComplexSynsets 0.51001913
LengthInLetters -0.08215272
Ncount 0.09650461
MeanBigramFrequency 0.02962138
FrequencyInitialDiphone 0.12847193
ConspelV 0.07451417
ConspelN 0.21437628
ConphonV 0.05408975
ConphonN 0.16180440
ConfriendsV 0.04584224
ConfriendsN 0.20673977
ConffV 0.05610343
ConffN 0.02945722
ConfbV 0.05687343
ConfbN 0.05199653
NounFrequency 0.38119070
VerbFrequency 0.23817700
FrequencyInitialDiphoneWord 0.09106333
FrequencyInitialDiphoneSyllable 0.07354114
CorrectLexdec 0.52685458
WrittenFrequency
RTlexdec -0.43481498
RTnaming -0.09599431
Familiarity 0.79125559
WrittenFrequency 1.00000000
WrittenSpokenFrequencyRatio 0.07158067
FamilySize 0.66253864
DerivationalEntropy 0.25522889
InflectionalEntropy -0.04048005
NumberSimplexSynsets 0.55874958
NumberComplexSynsets 0.59105478
LengthInLetters -0.06663196
Ncount 0.10492564
MeanBigramFrequency 0.07758879
FrequencyInitialDiphone 0.16748670
ConspelV 0.05864228
ConspelN 0.28248908
ConphonV 0.08201255
ConphonN 0.22245283
ConfriendsV 0.02146455
ConfriendsN 0.26326498
ConffV 0.08162166
ConffN 0.05028101
ConfbV 0.11975724
ConfbN 0.10409106
NounFrequency 0.46955152
VerbFrequency 0.27879235
FrequencyInitialDiphoneWord 0.10827895
FrequencyInitialDiphoneSyllable 0.09111661
CorrectLexdec 0.45797185
WrittenSpokenFrequencyRatio
RTlexdec 0.039820007
RTnaming 0.036592754
Familiarity -0.189898811
WrittenFrequency 0.071580669
WrittenSpokenFrequencyRatio 1.000000000
FamilySize -0.108801543
DerivationalEntropy -0.010987756
InflectionalEntropy -0.118804044
NumberSimplexSynsets -0.085088573
NumberComplexSynsets -0.104598273
LengthInLetters 0.204091196
Ncount -0.188200595
MeanBigramFrequency 0.191513493
FrequencyInitialDiphone 0.020799272
ConspelV -0.157689304
ConspelN -0.057307387
ConphonV -0.034796879
ConphonN -0.025856126
ConfriendsV -0.136348207
ConfriendsN -0.055568525
ConffV -0.026731676
ConffN -0.003403134
ConfbV 0.093614167
ConfbN 0.070239322
NounFrequency 0.012482293
VerbFrequency -0.096554415
FrequencyInitialDiphoneWord 0.002627022
FrequencyInitialDiphoneSyllable 0.012872673
CorrectLexdec 0.008564774
FamilySize
RTlexdec -0.349595853
RTnaming -0.088037010
Familiarity 0.591919733
WrittenFrequency 0.662538635
WrittenSpokenFrequencyRatio -0.108801543
FamilySize 1.000000000
DerivationalEntropy 0.692088896
InflectionalEntropy 0.101743523
NumberSimplexSynsets 0.590763556
NumberComplexSynsets 0.645411663
LengthInLetters -0.122995009
Ncount 0.174107015
MeanBigramFrequency -0.001056468
FrequencyInitialDiphone 0.126804334
ConspelV 0.110812602
ConspelN 0.249522442
ConphonV 0.050171973
ConphonN 0.161531494
ConfriendsV 0.079271194
ConfriendsN 0.242711641
ConffV 0.080362377
ConffN 0.059199476
ConfbV 0.050020822
ConfbN 0.038457171
NounFrequency 0.417794301
VerbFrequency 0.114132925
FrequencyInitialDiphoneWord 0.096705342
FrequencyInitialDiphoneSyllable 0.086426850
CorrectLexdec 0.360613035
DerivationalEntropy
RTlexdec -0.161164620
RTnaming -0.049456670
Familiarity 0.220715880
WrittenFrequency 0.255228886
WrittenSpokenFrequencyRatio -0.010987756
FamilySize 0.692088896
DerivationalEntropy 1.000000000
InflectionalEntropy -0.050795034
NumberSimplexSynsets 0.223943027
NumberComplexSynsets 0.331924999
LengthInLetters -0.104860729
Ncount 0.123827732
MeanBigramFrequency -0.020837051
FrequencyInitialDiphone 0.083835479
ConspelV 0.046117273
ConspelN 0.137519738
ConphonV -0.002754648
ConphonN 0.074468614
ConfriendsV 0.028929529
ConfriendsN 0.131756437
ConffV 0.041097365
ConffN 0.040363830
ConfbV 0.007743200
ConfbN 0.011462290
NounFrequency 0.172254519
VerbFrequency -0.019725738
FrequencyInitialDiphoneWord 0.029479534
FrequencyInitialDiphoneSyllable 0.027755605
CorrectLexdec 0.188753214
InflectionalEntropy
RTlexdec -0.088418681
RTnaming -0.022110376
Familiarity 0.107954197
WrittenFrequency -0.040480046
WrittenSpokenFrequencyRatio -0.118804044
FamilySize 0.101743523
DerivationalEntropy -0.050795034
InflectionalEntropy 1.000000000
NumberSimplexSynsets 0.398736053
NumberComplexSynsets 0.005589502
LengthInLetters 0.052485031
Ncount -0.003252708
MeanBigramFrequency 0.024789643
FrequencyInitialDiphone -0.034461207
ConspelV 0.140798520
ConspelN 0.046826086
ConphonV 0.082962738
ConphonN 0.031410725
ConfriendsV 0.131247972
ConfriendsN 0.067675205
ConffV 0.014388810
ConffN 0.010758578
ConfbV 0.002863801
ConfbN -0.007583063
NounFrequency -0.114401007
VerbFrequency 0.094002603
FrequencyInitialDiphoneWord 0.052469468
FrequencyInitialDiphoneSyllable 0.050939450
CorrectLexdec 0.182065382
NumberSimplexSynsets
RTlexdec -0.3090081404
RTnaming -0.0719002065
Familiarity 0.5106516971
WrittenFrequency 0.5587495840
WrittenSpokenFrequencyRatio -0.0850885729
FamilySize 0.5907635556
DerivationalEntropy 0.2239430269
InflectionalEntropy 0.3987360535
NumberSimplexSynsets 1.0000000000
NumberComplexSynsets 0.5245365002
LengthInLetters -0.0063644110
Ncount 0.1129586209
MeanBigramFrequency 0.0539689516
FrequencyInitialDiphone 0.0571276406
ConspelV 0.1660590990
ConspelN 0.2275859556
ConphonV 0.0747186906
ConphonN 0.1443620165
ConfriendsV 0.1546554020
ConfriendsN 0.2524518522
ConffV 0.0362906719
ConffN 0.0227783558
ConfbV 0.0215807967
ConfbN 0.0002057108
NounFrequency 0.2380855612
VerbFrequency 0.1887961418
FrequencyInitialDiphoneWord 0.1270081956
FrequencyInitialDiphoneSyllable 0.1160585801
CorrectLexdec 0.3500774208
NumberComplexSynsets
RTlexdec -0.328613209
RTnaming -0.076384846
Familiarity 0.510019126
WrittenFrequency 0.591054783
WrittenSpokenFrequencyRatio -0.104598273
FamilySize 0.645411663
DerivationalEntropy 0.331924999
InflectionalEntropy 0.005589502
NumberSimplexSynsets 0.524536500
NumberComplexSynsets 1.000000000
LengthInLetters -0.120445975
Ncount 0.137748482
MeanBigramFrequency -0.023604116
FrequencyInitialDiphone 0.103684145
ConspelV 0.071760775
ConspelN 0.193733204
ConphonV 0.047783270
ConphonN 0.142498720
ConfriendsV 0.037761099
ConfriendsN 0.180629522
ConffV 0.082102822
ConffN 0.057010830
ConfbV 0.052172715
ConfbN 0.049175194
NounFrequency 0.349469930
VerbFrequency 0.092248597
FrequencyInitialDiphoneWord 0.058217178
FrequencyInitialDiphoneSyllable 0.047009152
CorrectLexdec 0.329011088
LengthInLetters
RTlexdec 0.049747275
RTnaming 0.094497065
Familiarity -0.082152716
WrittenFrequency -0.066631955
WrittenSpokenFrequencyRatio 0.204091196
FamilySize -0.122995009
DerivationalEntropy -0.104860729
InflectionalEntropy 0.052485031
NumberSimplexSynsets -0.006364411
NumberComplexSynsets -0.120445975
LengthInLetters 1.000000000
Ncount -0.625129141
MeanBigramFrequency 0.790492091
FrequencyInitialDiphone -0.060443836
ConspelV -0.226416938
ConspelN -0.170022083
ConphonV -0.202368726
ConphonN -0.205167896
ConfriendsV -0.192199942
ConfriendsN -0.156898314
ConffV -0.019244458
ConffN 0.010765359
ConfbV -0.040037290
ConfbN -0.069985486
NounFrequency -0.035331865
VerbFrequency -0.083729951
FrequencyInitialDiphoneWord 0.155454553
FrequencyInitialDiphoneSyllable 0.150391668
CorrectLexdec 0.046317578
Ncount
RTlexdec -0.065726313
RTnaming -0.094669618
Familiarity 0.096504609
WrittenFrequency 0.104925644
WrittenSpokenFrequencyRatio -0.188200595
FamilySize 0.174107015
DerivationalEntropy 0.123827732
InflectionalEntropy -0.003252708
NumberSimplexSynsets 0.112958621
NumberComplexSynsets 0.137748482
LengthInLetters -0.625129141
Ncount 1.000000000
MeanBigramFrequency -0.387546284
FrequencyInitialDiphone 0.135888890
ConspelV 0.474710938
ConspelN 0.346547943
ConphonV 0.210193628
ConphonN 0.190732679
ConfriendsV 0.436821402
ConfriendsN 0.340831193
ConffV 0.076838593
ConffN 0.069850580
ConfbV -0.036034465
ConfbN -0.042132633
NounFrequency 0.035870248
VerbFrequency 0.053361797
FrequencyInitialDiphoneWord 0.007890710
FrequencyInitialDiphoneSyllable 0.021719522
CorrectLexdec 0.016048288
MeanBigramFrequency
RTlexdec 0.002633525
RTnaming 0.048459360
Familiarity 0.029621385
WrittenFrequency 0.077588795
WrittenSpokenFrequencyRatio 0.191513493
FamilySize -0.001056468
DerivationalEntropy -0.020837051
InflectionalEntropy 0.024789643
NumberSimplexSynsets 0.053968952
NumberComplexSynsets -0.023604116
LengthInLetters 0.790492091
Ncount -0.387546284
MeanBigramFrequency 1.000000000
FrequencyInitialDiphone 0.324815461
ConspelV -0.091270605
ConspelN 0.060952203
ConphonV -0.122457211
ConphonN -0.064645303
ConfriendsV -0.078215883
ConfriendsN 0.049075415
ConffV 0.072593437
ConffN 0.113704972
ConfbV 0.002954953
ConfbN -0.019550727
NounFrequency 0.043361959
VerbFrequency -0.045835069
FrequencyInitialDiphoneWord 0.214165942
FrequencyInitialDiphoneSyllable 0.201570329
CorrectLexdec 0.063566285
FrequencyInitialDiphone
RTlexdec -0.0744527186
RTnaming -0.0572168742
Familiarity 0.1284719337
WrittenFrequency 0.1674867041
WrittenSpokenFrequencyRatio 0.0207992723
FamilySize 0.1268043344
DerivationalEntropy 0.0838354786
InflectionalEntropy -0.0344612070
NumberSimplexSynsets 0.0571276406
NumberComplexSynsets 0.1036841450
LengthInLetters -0.0604438357
Ncount 0.1358888899
MeanBigramFrequency 0.3248154611
FrequencyInitialDiphone 1.0000000000
ConspelV -0.0557304958
ConspelN 0.0309540623
ConphonV -0.0142352920
ConphonN 0.0104023606
ConfriendsV -0.0495263185
ConfriendsN 0.0362062483
ConffV -0.0001697975
ConffN 0.0021493798
ConfbV 0.0334157038
ConfbN 0.0191463198
NounFrequency 0.0985964971
VerbFrequency 0.0557187478
FrequencyInitialDiphoneWord 0.1310981285
FrequencyInitialDiphoneSyllable 0.1188976490
CorrectLexdec 0.0486800603
ConspelV
RTlexdec -0.03286747
RTnaming -0.02516518
Familiarity 0.07451417
WrittenFrequency 0.05864228
WrittenSpokenFrequencyRatio -0.15768930
FamilySize 0.11081260
DerivationalEntropy 0.04611727
InflectionalEntropy 0.14079852
NumberSimplexSynsets 0.16605910
NumberComplexSynsets 0.07176077
LengthInLetters -0.22641694
Ncount 0.47471094
MeanBigramFrequency -0.09127061
FrequencyInitialDiphone -0.05573050
ConspelV 1.00000000
ConspelN 0.64214341
ConphonV 0.54021641
ConphonN 0.41727634
ConfriendsV 0.91949493
ConfriendsN 0.62267751
ConffV 0.23618800
ConffN 0.16655427
ConfbV 0.04946527
ConfbN 0.02724069
NounFrequency -0.01696005
VerbFrequency 0.06291967
FrequencyInitialDiphoneWord 0.11861458
FrequencyInitialDiphoneSyllable 0.12276477
CorrectLexdec 0.04934274
ConspelN
RTlexdec -0.10753802
RTnaming -0.03480124
Familiarity 0.21437628
WrittenFrequency 0.28248908
WrittenSpokenFrequencyRatio -0.05730739
FamilySize 0.24952244
DerivationalEntropy 0.13751974
InflectionalEntropy 0.04682609
NumberSimplexSynsets 0.22758596
NumberComplexSynsets 0.19373320
LengthInLetters -0.17002208
Ncount 0.34654794
MeanBigramFrequency 0.06095220
FrequencyInitialDiphone 0.03095406
ConspelV 0.64214341
ConspelN 1.00000000
ConphonV 0.38047467
ConphonN 0.65365781
ConfriendsV 0.55820343
ConfriendsN 0.88292615
ConffV 0.27788182
ConffN 0.34213915
ConfbV 0.14221895
ConfbN 0.14531495
NounFrequency 0.11924516
VerbFrequency 0.12533768
FrequencyInitialDiphoneWord 0.11832011
FrequencyInitialDiphoneSyllable 0.11843351
CorrectLexdec 0.10432858
ConphonV
RTlexdec -0.021747588
RTnaming 0.001175572
Familiarity 0.054089750
WrittenFrequency 0.082012553
WrittenSpokenFrequencyRatio -0.034796879
FamilySize 0.050171973
DerivationalEntropy -0.002754648
InflectionalEntropy 0.082962738
NumberSimplexSynsets 0.074718691
NumberComplexSynsets 0.047783270
LengthInLetters -0.202368726
Ncount 0.210193628
MeanBigramFrequency -0.122457211
FrequencyInitialDiphone -0.014235292
ConspelV 0.540216414
ConspelN 0.380474673
ConphonV 1.000000000
ConphonN 0.665883587
ConfriendsV 0.533170763
ConfriendsN 0.378854310
ConffV 0.039617689
ConffN 0.060092579
ConfbV 0.741851531
ConfbN 0.609947163
NounFrequency -0.008769463
VerbFrequency 0.064268066
FrequencyInitialDiphoneWord 0.029920110
FrequencyInitialDiphoneSyllable 0.033639918
CorrectLexdec 0.020750103
ConphonN
RTlexdec -0.08093054
RTnaming -0.01436490
Familiarity 0.16180440
WrittenFrequency 0.22245283
WrittenSpokenFrequencyRatio -0.02585613
FamilySize 0.16153149
DerivationalEntropy 0.07446861
InflectionalEntropy 0.03141073
NumberSimplexSynsets 0.14436202
NumberComplexSynsets 0.14249872
LengthInLetters -0.20516790
Ncount 0.19073268
MeanBigramFrequency -0.06464530
FrequencyInitialDiphone 0.01040236
ConspelV 0.41727634
ConspelN 0.65365781
ConphonV 0.66588359
ConphonN 1.00000000
ConfriendsV 0.38675454
ConfriendsN 0.65028040
ConffV 0.08270634
ConffN 0.09538521
ConfbV 0.56997101
ConfbN 0.66832337
NounFrequency 0.08519330
VerbFrequency 0.10336888
FrequencyInitialDiphoneWord 0.05363076
FrequencyInitialDiphoneSyllable 0.05429738
CorrectLexdec 0.06878849
ConfriendsV
RTlexdec -0.02572083
RTnaming -0.02774107
Familiarity 0.04584224
WrittenFrequency 0.02146455
WrittenSpokenFrequencyRatio -0.13634821
FamilySize 0.07927119
DerivationalEntropy 0.02892953
InflectionalEntropy 0.13124797
NumberSimplexSynsets 0.15465540
NumberComplexSynsets 0.03776110
LengthInLetters -0.19219994
Ncount 0.43682140
MeanBigramFrequency -0.07821588
FrequencyInitialDiphone -0.04952632
ConspelV 0.91949493
ConspelN 0.55820343
ConphonV 0.53317076
ConphonN 0.38675454
ConfriendsV 1.00000000
ConfriendsN 0.64901860
ConffV -0.12054529
ConffN -0.10390690
ConfbV 0.01094067
ConfbN -0.01072111
NounFrequency -0.02244854
VerbFrequency 0.00393842
FrequencyInitialDiphoneWord 0.12811075
FrequencyInitialDiphoneSyllable 0.13787711
CorrectLexdec 0.04575846
ConfriendsN
RTlexdec -0.117532883
RTnaming -0.044064997
Familiarity 0.206739773
WrittenFrequency 0.263264980
WrittenSpokenFrequencyRatio -0.055568525
FamilySize 0.242711641
DerivationalEntropy 0.131756437
InflectionalEntropy 0.067675205
NumberSimplexSynsets 0.252451852
NumberComplexSynsets 0.180629522
LengthInLetters -0.156898314
Ncount 0.340831193
MeanBigramFrequency 0.049075415
FrequencyInitialDiphone 0.036206248
ConspelV 0.622677513
ConspelN 0.882926151
ConphonV 0.378854310
ConphonN 0.650280396
ConfriendsV 0.649018597
ConfriendsN 1.000000000
ConffV 0.006536121
ConffN 0.020956643
ConfbV 0.083786596
ConfbN 0.089220835
NounFrequency 0.120108606
VerbFrequency 0.118902818
FrequencyInitialDiphoneWord 0.115161203
FrequencyInitialDiphoneSyllable 0.121747881
CorrectLexdec 0.124420710
ConffV
RTlexdec -0.0164949450
RTnaming 0.0079245511
Familiarity 0.0561034304
WrittenFrequency 0.0816216587
WrittenSpokenFrequencyRatio -0.0267316761
FamilySize 0.0803623769
DerivationalEntropy 0.0410973651
InflectionalEntropy 0.0143888100
NumberSimplexSynsets 0.0362906719
NumberComplexSynsets 0.0821028216
LengthInLetters -0.0192444580
Ncount 0.0768385931
MeanBigramFrequency 0.0725934375
FrequencyInitialDiphone -0.0001697975
ConspelV 0.2361879962
ConspelN 0.2778818176
ConphonV 0.0396176894
ConphonN 0.0827063427
ConfriendsV -0.1205452871
ConfriendsN 0.0065361208
ConffV 1.0000000000
ConffN 0.8241820547
ConfbV 0.0729283492
ConfbN 0.0683948055
NounFrequency 0.0367115079
VerbFrequency 0.1198303241
FrequencyInitialDiphoneWord 0.0058281749
FrequencyInitialDiphoneSyllable -0.0110108598
CorrectLexdec 0.0072904730
ConffN
RTlexdec -0.005679088
RTnaming 0.011407182
Familiarity 0.029457215
WrittenFrequency 0.050281007
WrittenSpokenFrequencyRatio -0.003403134
FamilySize 0.059199476
DerivationalEntropy 0.040363830
InflectionalEntropy 0.010758578
NumberSimplexSynsets 0.022778356
NumberComplexSynsets 0.057010830
LengthInLetters 0.010765359
Ncount 0.069850580
MeanBigramFrequency 0.113704972
FrequencyInitialDiphone 0.002149380
ConspelV 0.166554266
ConspelN 0.342139148
ConphonV 0.060092579
ConphonN 0.095385208
ConfriendsV -0.103906902
ConfriendsN 0.020956643
ConffV 0.824182055
ConffN 1.000000000
ConfbV 0.114815595
ConfbN 0.093364309
NounFrequency 0.010288796
VerbFrequency 0.082163492
FrequencyInitialDiphoneWord 0.003618182
FrequencyInitialDiphoneSyllable -0.008732692
CorrectLexdec -0.007205900
ConfbV
RTlexdec -0.022515482
RTnaming 0.019159417
Familiarity 0.056873430
WrittenFrequency 0.119757242
WrittenSpokenFrequencyRatio 0.093614167
FamilySize 0.050020822
DerivationalEntropy 0.007743200
InflectionalEntropy 0.002863801
NumberSimplexSynsets 0.021580797
NumberComplexSynsets 0.052172715
LengthInLetters -0.040037290
Ncount -0.036034465
MeanBigramFrequency 0.002954953
FrequencyInitialDiphone 0.033415704
ConspelV 0.049465271
ConspelN 0.142218950
ConphonV 0.741851531
ConphonN 0.569971012
ConfriendsV 0.010940674
ConfriendsN 0.083786596
ConffV 0.072928349
ConffN 0.114815595
ConfbV 1.000000000
ConfbN 0.842446966
NounFrequency 0.021685118
VerbFrequency 0.050519739
FrequencyInitialDiphoneWord -0.019915368
FrequencyInitialDiphoneSyllable -0.027020380
CorrectLexdec 0.005393638
ConfbN
RTlexdec -0.0185394993
RTnaming 0.0170117309
Familiarity 0.0519965301
WrittenFrequency 0.1040910628
WrittenSpokenFrequencyRatio 0.0702393218
FamilySize 0.0384571715
DerivationalEntropy 0.0114622898
InflectionalEntropy -0.0075830633
NumberSimplexSynsets 0.0002057108
NumberComplexSynsets 0.0491751942
LengthInLetters -0.0699854865
Ncount -0.0421326334
MeanBigramFrequency -0.0195507270
FrequencyInitialDiphone 0.0191463198
ConspelV 0.0272406932
ConspelN 0.1453149511
ConphonV 0.6099471634
ConphonN 0.6683233736
ConfriendsV -0.0107211109
ConfriendsN 0.0892208353
ConffV 0.0683948055
ConffN 0.0933643088
ConfbV 0.8424469664
ConfbN 1.0000000000
NounFrequency 0.0252276045
VerbFrequency 0.0329567506
FrequencyInitialDiphoneWord -0.0188858839
FrequencyInitialDiphoneSyllable -0.0244143624
CorrectLexdec 0.0039391579
NounFrequency
RTlexdec -0.167189500
RTnaming -0.043148572
Familiarity 0.381190698
WrittenFrequency 0.469551521
WrittenSpokenFrequencyRatio 0.012482293
FamilySize 0.417794301
DerivationalEntropy 0.172254519
InflectionalEntropy -0.114401007
NumberSimplexSynsets 0.238085561
NumberComplexSynsets 0.349469930
LengthInLetters -0.035331865
Ncount 0.035870248
MeanBigramFrequency 0.043361959
FrequencyInitialDiphone 0.098596497
ConspelV -0.016960053
ConspelN 0.119245162
ConphonV -0.008769463
ConphonN 0.085193295
ConfriendsV -0.022448538
ConfriendsN 0.120108606
ConffV 0.036711508
ConffN 0.010288796
ConfbV 0.021685118
ConfbN 0.025227604
NounFrequency 1.000000000
VerbFrequency -0.003117231
FrequencyInitialDiphoneWord 0.047626002
FrequencyInitialDiphoneSyllable 0.034300335
CorrectLexdec 0.128263251
VerbFrequency
RTlexdec -0.076388309
RTnaming -0.024593780
Familiarity 0.238176996
WrittenFrequency 0.278792355
WrittenSpokenFrequencyRatio -0.096554415
FamilySize 0.114132925
DerivationalEntropy -0.019725738
InflectionalEntropy 0.094002603
NumberSimplexSynsets 0.188796142
NumberComplexSynsets 0.092248597
LengthInLetters -0.083729951
Ncount 0.053361797
MeanBigramFrequency -0.045835069
FrequencyInitialDiphone 0.055718748
ConspelV 0.062919672
ConspelN 0.125337679
ConphonV 0.064268066
ConphonN 0.103368885
ConfriendsV 0.003938420
ConfriendsN 0.118902818
ConffV 0.119830324
ConffN 0.082163492
ConfbV 0.050519739
ConfbN 0.032956751
NounFrequency -0.003117231
VerbFrequency 1.000000000
FrequencyInitialDiphoneWord 0.069596145
FrequencyInitialDiphoneSyllable 0.055821617
CorrectLexdec 0.050165423
FrequencyInitialDiphoneWord
RTlexdec -0.042640861
RTnaming 0.020488545
Familiarity 0.091063334
WrittenFrequency 0.108278953
WrittenSpokenFrequencyRatio 0.002627022
FamilySize 0.096705342
DerivationalEntropy 0.029479534
InflectionalEntropy 0.052469468
NumberSimplexSynsets 0.127008196
NumberComplexSynsets 0.058217178
LengthInLetters 0.155454553
Ncount 0.007890710
MeanBigramFrequency 0.214165942
FrequencyInitialDiphone 0.131098129
ConspelV 0.118614576
ConspelN 0.118320106
ConphonV 0.029920110
ConphonN 0.053630763
ConfriendsV 0.128110751
ConfriendsN 0.115161203
ConffV 0.005828175
ConffN 0.003618182
ConfbV -0.019915368
ConfbN -0.018885884
NounFrequency 0.047626002
VerbFrequency 0.069596145
FrequencyInitialDiphoneWord 1.000000000
FrequencyInitialDiphoneSyllable 0.978742189
CorrectLexdec 0.062039751
FrequencyInitialDiphoneSyllable
RTlexdec -0.035503708
RTnaming 0.026897756
Familiarity 0.073541144
WrittenFrequency 0.091116609
WrittenSpokenFrequencyRatio 0.012872673
FamilySize 0.086426850
DerivationalEntropy 0.027755605
InflectionalEntropy 0.050939450
NumberSimplexSynsets 0.116058580
NumberComplexSynsets 0.047009152
LengthInLetters 0.150391668
Ncount 0.021719522
MeanBigramFrequency 0.201570329
FrequencyInitialDiphone 0.118897649
ConspelV 0.122764768
ConspelN 0.118433514
ConphonV 0.033639918
ConphonN 0.054297378
ConfriendsV 0.137877114
ConfriendsN 0.121747881
ConffV -0.011010860
ConffN -0.008732692
ConfbV -0.027020380
ConfbN -0.024414362
NounFrequency 0.034300335
VerbFrequency 0.055821617
FrequencyInitialDiphoneWord 0.978742189
FrequencyInitialDiphoneSyllable 1.000000000
CorrectLexdec 0.057000795
CorrectLexdec
RTlexdec -0.253188184
RTnaming 0.151348043
Familiarity 0.526854585
WrittenFrequency 0.457971849
WrittenSpokenFrequencyRatio 0.008564774
FamilySize 0.360613035
DerivationalEntropy 0.188753214
InflectionalEntropy 0.182065382
NumberSimplexSynsets 0.350077421
NumberComplexSynsets 0.329011088
LengthInLetters 0.046317578
Ncount 0.016048288
MeanBigramFrequency 0.063566285
FrequencyInitialDiphone 0.048680060
ConspelV 0.049342737
ConspelN 0.104328581
ConphonV 0.020750103
ConphonN 0.068788492
ConfriendsV 0.045758455
ConfriendsN 0.124420710
ConffV 0.007290473
ConffN -0.007205900
ConfbV 0.005393638
ConfbN 0.003939158
NounFrequency 0.128263251
VerbFrequency 0.050165423
FrequencyInitialDiphoneWord 0.062039751
FrequencyInitialDiphoneSyllable 0.057000795
CorrectLexdec 1.000000000
print(corr)
RTlexdec
RTlexdec 1.000000000
RTnaming 0.758703280
Familiarity -0.444409734
WrittenFrequency -0.434814982
WrittenSpokenFrequencyRatio 0.039820007
FamilySize -0.349595853
DerivationalEntropy -0.161164620
InflectionalEntropy -0.088418681
NumberSimplexSynsets -0.309008140
NumberComplexSynsets -0.328613209
LengthInLetters 0.049747275
Ncount -0.065726313
MeanBigramFrequency 0.002633525
FrequencyInitialDiphone -0.074452719
ConspelV -0.032867467
ConspelN -0.107538023
ConphonV -0.021747588
ConphonN -0.080930543
ConfriendsV -0.025720833
ConfriendsN -0.117532883
ConffV -0.016494945
ConffN -0.005679088
ConfbV -0.022515482
ConfbN -0.018539499
NounFrequency -0.167189500
VerbFrequency -0.076388309
FrequencyInitialDiphoneWord -0.042640861
FrequencyInitialDiphoneSyllable -0.035503708
CorrectLexdec -0.253188184
RTnaming
RTlexdec 0.758703280
RTnaming 1.000000000
Familiarity -0.094793069
WrittenFrequency -0.095994313
WrittenSpokenFrequencyRatio 0.036592754
FamilySize -0.088037010
DerivationalEntropy -0.049456670
InflectionalEntropy -0.022110376
NumberSimplexSynsets -0.071900207
NumberComplexSynsets -0.076384846
LengthInLetters 0.094497065
Ncount -0.094669618
MeanBigramFrequency 0.048459360
FrequencyInitialDiphone -0.057216874
ConspelV -0.025165185
ConspelN -0.034801239
ConphonV 0.001175572
ConphonN -0.014364896
ConfriendsV -0.027741071
ConfriendsN -0.044064997
ConffV 0.007924551
ConffN 0.011407182
ConfbV 0.019159417
ConfbN 0.017011731
NounFrequency -0.043148572
VerbFrequency -0.024593780
FrequencyInitialDiphoneWord 0.020488545
FrequencyInitialDiphoneSyllable 0.026897756
CorrectLexdec 0.151348043
Familiarity
RTlexdec -0.44440973
RTnaming -0.09479307
Familiarity 1.00000000
WrittenFrequency 0.79125559
WrittenSpokenFrequencyRatio -0.18989881
FamilySize 0.59191973
DerivationalEntropy 0.22071588
InflectionalEntropy 0.10795420
NumberSimplexSynsets 0.51065170
NumberComplexSynsets 0.51001913
LengthInLetters -0.08215272
Ncount 0.09650461
MeanBigramFrequency 0.02962138
FrequencyInitialDiphone 0.12847193
ConspelV 0.07451417
ConspelN 0.21437628
ConphonV 0.05408975
ConphonN 0.16180440
ConfriendsV 0.04584224
ConfriendsN 0.20673977
ConffV 0.05610343
ConffN 0.02945722
ConfbV 0.05687343
ConfbN 0.05199653
NounFrequency 0.38119070
VerbFrequency 0.23817700
FrequencyInitialDiphoneWord 0.09106333
FrequencyInitialDiphoneSyllable 0.07354114
CorrectLexdec 0.52685458
WrittenFrequency
RTlexdec -0.43481498
RTnaming -0.09599431
Familiarity 0.79125559
WrittenFrequency 1.00000000
WrittenSpokenFrequencyRatio 0.07158067
FamilySize 0.66253864
DerivationalEntropy 0.25522889
InflectionalEntropy -0.04048005
NumberSimplexSynsets 0.55874958
NumberComplexSynsets 0.59105478
LengthInLetters -0.06663196
Ncount 0.10492564
MeanBigramFrequency 0.07758879
FrequencyInitialDiphone 0.16748670
ConspelV 0.05864228
ConspelN 0.28248908
ConphonV 0.08201255
ConphonN 0.22245283
ConfriendsV 0.02146455
ConfriendsN 0.26326498
ConffV 0.08162166
ConffN 0.05028101
ConfbV 0.11975724
ConfbN 0.10409106
NounFrequency 0.46955152
VerbFrequency 0.27879235
FrequencyInitialDiphoneWord 0.10827895
FrequencyInitialDiphoneSyllable 0.09111661
CorrectLexdec 0.45797185
WrittenSpokenFrequencyRatio
RTlexdec 0.039820007
RTnaming 0.036592754
Familiarity -0.189898811
WrittenFrequency 0.071580669
WrittenSpokenFrequencyRatio 1.000000000
FamilySize -0.108801543
DerivationalEntropy -0.010987756
InflectionalEntropy -0.118804044
NumberSimplexSynsets -0.085088573
NumberComplexSynsets -0.104598273
LengthInLetters 0.204091196
Ncount -0.188200595
MeanBigramFrequency 0.191513493
FrequencyInitialDiphone 0.020799272
ConspelV -0.157689304
ConspelN -0.057307387
ConphonV -0.034796879
ConphonN -0.025856126
ConfriendsV -0.136348207
ConfriendsN -0.055568525
ConffV -0.026731676
ConffN -0.003403134
ConfbV 0.093614167
ConfbN 0.070239322
NounFrequency 0.012482293
VerbFrequency -0.096554415
FrequencyInitialDiphoneWord 0.002627022
FrequencyInitialDiphoneSyllable 0.012872673
CorrectLexdec 0.008564774
FamilySize
RTlexdec -0.349595853
RTnaming -0.088037010
Familiarity 0.591919733
WrittenFrequency 0.662538635
WrittenSpokenFrequencyRatio -0.108801543
FamilySize 1.000000000
DerivationalEntropy 0.692088896
InflectionalEntropy 0.101743523
NumberSimplexSynsets 0.590763556
NumberComplexSynsets 0.645411663
LengthInLetters -0.122995009
Ncount 0.174107015
MeanBigramFrequency -0.001056468
FrequencyInitialDiphone 0.126804334
ConspelV 0.110812602
ConspelN 0.249522442
ConphonV 0.050171973
ConphonN 0.161531494
ConfriendsV 0.079271194
ConfriendsN 0.242711641
ConffV 0.080362377
ConffN 0.059199476
ConfbV 0.050020822
ConfbN 0.038457171
NounFrequency 0.417794301
VerbFrequency 0.114132925
FrequencyInitialDiphoneWord 0.096705342
FrequencyInitialDiphoneSyllable 0.086426850
CorrectLexdec 0.360613035
DerivationalEntropy
RTlexdec -0.161164620
RTnaming -0.049456670
Familiarity 0.220715880
WrittenFrequency 0.255228886
WrittenSpokenFrequencyRatio -0.010987756
FamilySize 0.692088896
DerivationalEntropy 1.000000000
InflectionalEntropy -0.050795034
NumberSimplexSynsets 0.223943027
NumberComplexSynsets 0.331924999
LengthInLetters -0.104860729
Ncount 0.123827732
MeanBigramFrequency -0.020837051
FrequencyInitialDiphone 0.083835479
ConspelV 0.046117273
ConspelN 0.137519738
ConphonV -0.002754648
ConphonN 0.074468614
ConfriendsV 0.028929529
ConfriendsN 0.131756437
ConffV 0.041097365
ConffN 0.040363830
ConfbV 0.007743200
ConfbN 0.011462290
NounFrequency 0.172254519
VerbFrequency -0.019725738
FrequencyInitialDiphoneWord 0.029479534
FrequencyInitialDiphoneSyllable 0.027755605
CorrectLexdec 0.188753214
InflectionalEntropy
RTlexdec -0.088418681
RTnaming -0.022110376
Familiarity 0.107954197
WrittenFrequency -0.040480046
WrittenSpokenFrequencyRatio -0.118804044
FamilySize 0.101743523
DerivationalEntropy -0.050795034
InflectionalEntropy 1.000000000
NumberSimplexSynsets 0.398736053
NumberComplexSynsets 0.005589502
LengthInLetters 0.052485031
Ncount -0.003252708
MeanBigramFrequency 0.024789643
FrequencyInitialDiphone -0.034461207
ConspelV 0.140798520
ConspelN 0.046826086
ConphonV 0.082962738
ConphonN 0.031410725
ConfriendsV 0.131247972
ConfriendsN 0.067675205
ConffV 0.014388810
ConffN 0.010758578
ConfbV 0.002863801
ConfbN -0.007583063
NounFrequency -0.114401007
VerbFrequency 0.094002603
FrequencyInitialDiphoneWord 0.052469468
FrequencyInitialDiphoneSyllable 0.050939450
CorrectLexdec 0.182065382
NumberSimplexSynsets
RTlexdec -0.3090081404
RTnaming -0.0719002065
Familiarity 0.5106516971
WrittenFrequency 0.5587495840
WrittenSpokenFrequencyRatio -0.0850885729
FamilySize 0.5907635556
DerivationalEntropy 0.2239430269
InflectionalEntropy 0.3987360535
NumberSimplexSynsets 1.0000000000
NumberComplexSynsets 0.5245365002
LengthInLetters -0.0063644110
Ncount 0.1129586209
MeanBigramFrequency 0.0539689516
FrequencyInitialDiphone 0.0571276406
ConspelV 0.1660590990
ConspelN 0.2275859556
ConphonV 0.0747186906
ConphonN 0.1443620165
ConfriendsV 0.1546554020
ConfriendsN 0.2524518522
ConffV 0.0362906719
ConffN 0.0227783558
ConfbV 0.0215807967
ConfbN 0.0002057108
NounFrequency 0.2380855612
VerbFrequency 0.1887961418
FrequencyInitialDiphoneWord 0.1270081956
FrequencyInitialDiphoneSyllable 0.1160585801
CorrectLexdec 0.3500774208
NumberComplexSynsets
RTlexdec -0.328613209
RTnaming -0.076384846
Familiarity 0.510019126
WrittenFrequency 0.591054783
WrittenSpokenFrequencyRatio -0.104598273
FamilySize 0.645411663
DerivationalEntropy 0.331924999
InflectionalEntropy 0.005589502
NumberSimplexSynsets 0.524536500
NumberComplexSynsets 1.000000000
LengthInLetters -0.120445975
Ncount 0.137748482
MeanBigramFrequency -0.023604116
FrequencyInitialDiphone 0.103684145
ConspelV 0.071760775
ConspelN 0.193733204
ConphonV 0.047783270
ConphonN 0.142498720
ConfriendsV 0.037761099
ConfriendsN 0.180629522
ConffV 0.082102822
ConffN 0.057010830
ConfbV 0.052172715
ConfbN 0.049175194
NounFrequency 0.349469930
VerbFrequency 0.092248597
FrequencyInitialDiphoneWord 0.058217178
FrequencyInitialDiphoneSyllable 0.047009152
CorrectLexdec 0.329011088
LengthInLetters
RTlexdec 0.049747275
RTnaming 0.094497065
Familiarity -0.082152716
WrittenFrequency -0.066631955
WrittenSpokenFrequencyRatio 0.204091196
FamilySize -0.122995009
DerivationalEntropy -0.104860729
InflectionalEntropy 0.052485031
NumberSimplexSynsets -0.006364411
NumberComplexSynsets -0.120445975
LengthInLetters 1.000000000
Ncount -0.625129141
MeanBigramFrequency 0.790492091
FrequencyInitialDiphone -0.060443836
ConspelV -0.226416938
ConspelN -0.170022083
ConphonV -0.202368726
ConphonN -0.205167896
ConfriendsV -0.192199942
ConfriendsN -0.156898314
ConffV -0.019244458
ConffN 0.010765359
ConfbV -0.040037290
ConfbN -0.069985486
NounFrequency -0.035331865
VerbFrequency -0.083729951
FrequencyInitialDiphoneWord 0.155454553
FrequencyInitialDiphoneSyllable 0.150391668
CorrectLexdec 0.046317578
Ncount
RTlexdec -0.065726313
RTnaming -0.094669618
Familiarity 0.096504609
WrittenFrequency 0.104925644
WrittenSpokenFrequencyRatio -0.188200595
FamilySize 0.174107015
DerivationalEntropy 0.123827732
InflectionalEntropy -0.003252708
NumberSimplexSynsets 0.112958621
NumberComplexSynsets 0.137748482
LengthInLetters -0.625129141
Ncount 1.000000000
MeanBigramFrequency -0.387546284
FrequencyInitialDiphone 0.135888890
ConspelV 0.474710938
ConspelN 0.346547943
ConphonV 0.210193628
ConphonN 0.190732679
ConfriendsV 0.436821402
ConfriendsN 0.340831193
ConffV 0.076838593
ConffN 0.069850580
ConfbV -0.036034465
ConfbN -0.042132633
NounFrequency 0.035870248
VerbFrequency 0.053361797
FrequencyInitialDiphoneWord 0.007890710
FrequencyInitialDiphoneSyllable 0.021719522
CorrectLexdec 0.016048288
MeanBigramFrequency
RTlexdec 0.002633525
RTnaming 0.048459360
Familiarity 0.029621385
WrittenFrequency 0.077588795
WrittenSpokenFrequencyRatio 0.191513493
FamilySize -0.001056468
DerivationalEntropy -0.020837051
InflectionalEntropy 0.024789643
NumberSimplexSynsets 0.053968952
NumberComplexSynsets -0.023604116
LengthInLetters 0.790492091
Ncount -0.387546284
MeanBigramFrequency 1.000000000
FrequencyInitialDiphone 0.324815461
ConspelV -0.091270605
ConspelN 0.060952203
ConphonV -0.122457211
ConphonN -0.064645303
ConfriendsV -0.078215883
ConfriendsN 0.049075415
ConffV 0.072593437
ConffN 0.113704972
ConfbV 0.002954953
ConfbN -0.019550727
NounFrequency 0.043361959
VerbFrequency -0.045835069
FrequencyInitialDiphoneWord 0.214165942
FrequencyInitialDiphoneSyllable 0.201570329
CorrectLexdec 0.063566285
FrequencyInitialDiphone
RTlexdec -0.0744527186
RTnaming -0.0572168742
Familiarity 0.1284719337
WrittenFrequency 0.1674867041
WrittenSpokenFrequencyRatio 0.0207992723
FamilySize 0.1268043344
DerivationalEntropy 0.0838354786
InflectionalEntropy -0.0344612070
NumberSimplexSynsets 0.0571276406
NumberComplexSynsets 0.1036841450
LengthInLetters -0.0604438357
Ncount 0.1358888899
MeanBigramFrequency 0.3248154611
FrequencyInitialDiphone 1.0000000000
ConspelV -0.0557304958
ConspelN 0.0309540623
ConphonV -0.0142352920
ConphonN 0.0104023606
ConfriendsV -0.0495263185
ConfriendsN 0.0362062483
ConffV -0.0001697975
ConffN 0.0021493798
ConfbV 0.0334157038
ConfbN 0.0191463198
NounFrequency 0.0985964971
VerbFrequency 0.0557187478
FrequencyInitialDiphoneWord 0.1310981285
FrequencyInitialDiphoneSyllable 0.1188976490
CorrectLexdec 0.0486800603
ConspelV
RTlexdec -0.03286747
RTnaming -0.02516518
Familiarity 0.07451417
WrittenFrequency 0.05864228
WrittenSpokenFrequencyRatio -0.15768930
FamilySize 0.11081260
DerivationalEntropy 0.04611727
InflectionalEntropy 0.14079852
NumberSimplexSynsets 0.16605910
NumberComplexSynsets 0.07176077
LengthInLetters -0.22641694
Ncount 0.47471094
MeanBigramFrequency -0.09127061
FrequencyInitialDiphone -0.05573050
ConspelV 1.00000000
ConspelN 0.64214341
ConphonV 0.54021641
ConphonN 0.41727634
ConfriendsV 0.91949493
ConfriendsN 0.62267751
ConffV 0.23618800
ConffN 0.16655427
ConfbV 0.04946527
ConfbN 0.02724069
NounFrequency -0.01696005
VerbFrequency 0.06291967
FrequencyInitialDiphoneWord 0.11861458
FrequencyInitialDiphoneSyllable 0.12276477
CorrectLexdec 0.04934274
ConspelN
RTlexdec -0.10753802
RTnaming -0.03480124
Familiarity 0.21437628
WrittenFrequency 0.28248908
WrittenSpokenFrequencyRatio -0.05730739
FamilySize 0.24952244
DerivationalEntropy 0.13751974
InflectionalEntropy 0.04682609
NumberSimplexSynsets 0.22758596
NumberComplexSynsets 0.19373320
LengthInLetters -0.17002208
Ncount 0.34654794
MeanBigramFrequency 0.06095220
FrequencyInitialDiphone 0.03095406
ConspelV 0.64214341
ConspelN 1.00000000
ConphonV 0.38047467
ConphonN 0.65365781
ConfriendsV 0.55820343
ConfriendsN 0.88292615
ConffV 0.27788182
ConffN 0.34213915
ConfbV 0.14221895
ConfbN 0.14531495
NounFrequency 0.11924516
VerbFrequency 0.12533768
FrequencyInitialDiphoneWord 0.11832011
FrequencyInitialDiphoneSyllable 0.11843351
CorrectLexdec 0.10432858
ConphonV
RTlexdec -0.021747588
RTnaming 0.001175572
Familiarity 0.054089750
WrittenFrequency 0.082012553
WrittenSpokenFrequencyRatio -0.034796879
FamilySize 0.050171973
DerivationalEntropy -0.002754648
InflectionalEntropy 0.082962738
NumberSimplexSynsets 0.074718691
NumberComplexSynsets 0.047783270
LengthInLetters -0.202368726
Ncount 0.210193628
MeanBigramFrequency -0.122457211
FrequencyInitialDiphone -0.014235292
ConspelV 0.540216414
ConspelN 0.380474673
ConphonV 1.000000000
ConphonN 0.665883587
ConfriendsV 0.533170763
ConfriendsN 0.378854310
ConffV 0.039617689
ConffN 0.060092579
ConfbV 0.741851531
ConfbN 0.609947163
NounFrequency -0.008769463
VerbFrequency 0.064268066
FrequencyInitialDiphoneWord 0.029920110
FrequencyInitialDiphoneSyllable 0.033639918
CorrectLexdec 0.020750103
ConphonN
RTlexdec -0.08093054
RTnaming -0.01436490
Familiarity 0.16180440
WrittenFrequency 0.22245283
WrittenSpokenFrequencyRatio -0.02585613
FamilySize 0.16153149
DerivationalEntropy 0.07446861
InflectionalEntropy 0.03141073
NumberSimplexSynsets 0.14436202
NumberComplexSynsets 0.14249872
LengthInLetters -0.20516790
Ncount 0.19073268
MeanBigramFrequency -0.06464530
FrequencyInitialDiphone 0.01040236
ConspelV 0.41727634
ConspelN 0.65365781
ConphonV 0.66588359
ConphonN 1.00000000
ConfriendsV 0.38675454
ConfriendsN 0.65028040
ConffV 0.08270634
ConffN 0.09538521
ConfbV 0.56997101
ConfbN 0.66832337
NounFrequency 0.08519330
VerbFrequency 0.10336888
FrequencyInitialDiphoneWord 0.05363076
FrequencyInitialDiphoneSyllable 0.05429738
CorrectLexdec 0.06878849
ConfriendsV
RTlexdec -0.02572083
RTnaming -0.02774107
Familiarity 0.04584224
WrittenFrequency 0.02146455
WrittenSpokenFrequencyRatio -0.13634821
FamilySize 0.07927119
DerivationalEntropy 0.02892953
InflectionalEntropy 0.13124797
NumberSimplexSynsets 0.15465540
NumberComplexSynsets 0.03776110
LengthInLetters -0.19219994
Ncount 0.43682140
MeanBigramFrequency -0.07821588
FrequencyInitialDiphone -0.04952632
ConspelV 0.91949493
ConspelN 0.55820343
ConphonV 0.53317076
ConphonN 0.38675454
ConfriendsV 1.00000000
ConfriendsN 0.64901860
ConffV -0.12054529
ConffN -0.10390690
ConfbV 0.01094067
ConfbN -0.01072111
NounFrequency -0.02244854
VerbFrequency 0.00393842
FrequencyInitialDiphoneWord 0.12811075
FrequencyInitialDiphoneSyllable 0.13787711
CorrectLexdec 0.04575846
ConfriendsN
RTlexdec -0.117532883
RTnaming -0.044064997
Familiarity 0.206739773
WrittenFrequency 0.263264980
WrittenSpokenFrequencyRatio -0.055568525
FamilySize 0.242711641
DerivationalEntropy 0.131756437
InflectionalEntropy 0.067675205
NumberSimplexSynsets 0.252451852
NumberComplexSynsets 0.180629522
LengthInLetters -0.156898314
Ncount 0.340831193
MeanBigramFrequency 0.049075415
FrequencyInitialDiphone 0.036206248
ConspelV 0.622677513
ConspelN 0.882926151
ConphonV 0.378854310
ConphonN 0.650280396
ConfriendsV 0.649018597
ConfriendsN 1.000000000
ConffV 0.006536121
ConffN 0.020956643
ConfbV 0.083786596
ConfbN 0.089220835
NounFrequency 0.120108606
VerbFrequency 0.118902818
FrequencyInitialDiphoneWord 0.115161203
FrequencyInitialDiphoneSyllable 0.121747881
CorrectLexdec 0.124420710
ConffV
RTlexdec -0.0164949450
RTnaming 0.0079245511
Familiarity 0.0561034304
WrittenFrequency 0.0816216587
WrittenSpokenFrequencyRatio -0.0267316761
FamilySize 0.0803623769
DerivationalEntropy 0.0410973651
InflectionalEntropy 0.0143888100
NumberSimplexSynsets 0.0362906719
NumberComplexSynsets 0.0821028216
LengthInLetters -0.0192444580
Ncount 0.0768385931
MeanBigramFrequency 0.0725934375
FrequencyInitialDiphone -0.0001697975
ConspelV 0.2361879962
ConspelN 0.2778818176
ConphonV 0.0396176894
ConphonN 0.0827063427
ConfriendsV -0.1205452871
ConfriendsN 0.0065361208
ConffV 1.0000000000
ConffN 0.8241820547
ConfbV 0.0729283492
ConfbN 0.0683948055
NounFrequency 0.0367115079
VerbFrequency 0.1198303241
FrequencyInitialDiphoneWord 0.0058281749
FrequencyInitialDiphoneSyllable -0.0110108598
CorrectLexdec 0.0072904730
ConffN
RTlexdec -0.005679088
RTnaming 0.011407182
Familiarity 0.029457215
WrittenFrequency 0.050281007
WrittenSpokenFrequencyRatio -0.003403134
FamilySize 0.059199476
DerivationalEntropy 0.040363830
InflectionalEntropy 0.010758578
NumberSimplexSynsets 0.022778356
NumberComplexSynsets 0.057010830
LengthInLetters 0.010765359
Ncount 0.069850580
MeanBigramFrequency 0.113704972
FrequencyInitialDiphone 0.002149380
ConspelV 0.166554266
ConspelN 0.342139148
ConphonV 0.060092579
ConphonN 0.095385208
ConfriendsV -0.103906902
ConfriendsN 0.020956643
ConffV 0.824182055
ConffN 1.000000000
ConfbV 0.114815595
ConfbN 0.093364309
NounFrequency 0.010288796
VerbFrequency 0.082163492
FrequencyInitialDiphoneWord 0.003618182
FrequencyInitialDiphoneSyllable -0.008732692
CorrectLexdec -0.007205900
ConfbV
RTlexdec -0.022515482
RTnaming 0.019159417
Familiarity 0.056873430
WrittenFrequency 0.119757242
WrittenSpokenFrequencyRatio 0.093614167
FamilySize 0.050020822
DerivationalEntropy 0.007743200
InflectionalEntropy 0.002863801
NumberSimplexSynsets 0.021580797
NumberComplexSynsets 0.052172715
LengthInLetters -0.040037290
Ncount -0.036034465
MeanBigramFrequency 0.002954953
FrequencyInitialDiphone 0.033415704
ConspelV 0.049465271
ConspelN 0.142218950
ConphonV 0.741851531
ConphonN 0.569971012
ConfriendsV 0.010940674
ConfriendsN 0.083786596
ConffV 0.072928349
ConffN 0.114815595
ConfbV 1.000000000
ConfbN 0.842446966
NounFrequency 0.021685118
VerbFrequency 0.050519739
FrequencyInitialDiphoneWord -0.019915368
FrequencyInitialDiphoneSyllable -0.027020380
CorrectLexdec 0.005393638
ConfbN
RTlexdec -0.0185394993
RTnaming 0.0170117309
Familiarity 0.0519965301
WrittenFrequency 0.1040910628
WrittenSpokenFrequencyRatio 0.0702393218
FamilySize 0.0384571715
DerivationalEntropy 0.0114622898
InflectionalEntropy -0.0075830633
NumberSimplexSynsets 0.0002057108
NumberComplexSynsets 0.0491751942
LengthInLetters -0.0699854865
Ncount -0.0421326334
MeanBigramFrequency -0.0195507270
FrequencyInitialDiphone 0.0191463198
ConspelV 0.0272406932
ConspelN 0.1453149511
ConphonV 0.6099471634
ConphonN 0.6683233736
ConfriendsV -0.0107211109
ConfriendsN 0.0892208353
ConffV 0.0683948055
ConffN 0.0933643088
ConfbV 0.8424469664
ConfbN 1.0000000000
NounFrequency 0.0252276045
VerbFrequency 0.0329567506
FrequencyInitialDiphoneWord -0.0188858839
FrequencyInitialDiphoneSyllable -0.0244143624
CorrectLexdec 0.0039391579
NounFrequency
RTlexdec -0.167189500
RTnaming -0.043148572
Familiarity 0.381190698
WrittenFrequency 0.469551521
WrittenSpokenFrequencyRatio 0.012482293
FamilySize 0.417794301
DerivationalEntropy 0.172254519
InflectionalEntropy -0.114401007
NumberSimplexSynsets 0.238085561
NumberComplexSynsets 0.349469930
LengthInLetters -0.035331865
Ncount 0.035870248
MeanBigramFrequency 0.043361959
FrequencyInitialDiphone 0.098596497
ConspelV -0.016960053
ConspelN 0.119245162
ConphonV -0.008769463
ConphonN 0.085193295
ConfriendsV -0.022448538
ConfriendsN 0.120108606
ConffV 0.036711508
ConffN 0.010288796
ConfbV 0.021685118
ConfbN 0.025227604
NounFrequency 1.000000000
VerbFrequency -0.003117231
FrequencyInitialDiphoneWord 0.047626002
FrequencyInitialDiphoneSyllable 0.034300335
CorrectLexdec 0.128263251
VerbFrequency
RTlexdec -0.076388309
RTnaming -0.024593780
Familiarity 0.238176996
WrittenFrequency 0.278792355
WrittenSpokenFrequencyRatio -0.096554415
FamilySize 0.114132925
DerivationalEntropy -0.019725738
InflectionalEntropy 0.094002603
NumberSimplexSynsets 0.188796142
NumberComplexSynsets 0.092248597
LengthInLetters -0.083729951
Ncount 0.053361797
MeanBigramFrequency -0.045835069
FrequencyInitialDiphone 0.055718748
ConspelV 0.062919672
ConspelN 0.125337679
ConphonV 0.064268066
ConphonN 0.103368885
ConfriendsV 0.003938420
ConfriendsN 0.118902818
ConffV 0.119830324
ConffN 0.082163492
ConfbV 0.050519739
ConfbN 0.032956751
NounFrequency -0.003117231
VerbFrequency 1.000000000
FrequencyInitialDiphoneWord 0.069596145
FrequencyInitialDiphoneSyllable 0.055821617
CorrectLexdec 0.050165423
FrequencyInitialDiphoneWord
RTlexdec -0.042640861
RTnaming 0.020488545
Familiarity 0.091063334
WrittenFrequency 0.108278953
WrittenSpokenFrequencyRatio 0.002627022
FamilySize 0.096705342
DerivationalEntropy 0.029479534
InflectionalEntropy 0.052469468
NumberSimplexSynsets 0.127008196
NumberComplexSynsets 0.058217178
LengthInLetters 0.155454553
Ncount 0.007890710
MeanBigramFrequency 0.214165942
FrequencyInitialDiphone 0.131098129
ConspelV 0.118614576
ConspelN 0.118320106
ConphonV 0.029920110
ConphonN 0.053630763
ConfriendsV 0.128110751
ConfriendsN 0.115161203
ConffV 0.005828175
ConffN 0.003618182
ConfbV -0.019915368
ConfbN -0.018885884
NounFrequency 0.047626002
VerbFrequency 0.069596145
FrequencyInitialDiphoneWord 1.000000000
FrequencyInitialDiphoneSyllable 0.978742189
CorrectLexdec 0.062039751
FrequencyInitialDiphoneSyllable
RTlexdec -0.035503708
RTnaming 0.026897756
Familiarity 0.073541144
WrittenFrequency 0.091116609
WrittenSpokenFrequencyRatio 0.012872673
FamilySize 0.086426850
DerivationalEntropy 0.027755605
InflectionalEntropy 0.050939450
NumberSimplexSynsets 0.116058580
NumberComplexSynsets 0.047009152
LengthInLetters 0.150391668
Ncount 0.021719522
MeanBigramFrequency 0.201570329
FrequencyInitialDiphone 0.118897649
ConspelV 0.122764768
ConspelN 0.118433514
ConphonV 0.033639918
ConphonN 0.054297378
ConfriendsV 0.137877114
ConfriendsN 0.121747881
ConffV -0.011010860
ConffN -0.008732692
ConfbV -0.027020380
ConfbN -0.024414362
NounFrequency 0.034300335
VerbFrequency 0.055821617
FrequencyInitialDiphoneWord 0.978742189
FrequencyInitialDiphoneSyllable 1.000000000
CorrectLexdec 0.057000795
CorrectLexdec
RTlexdec -0.253188184
RTnaming 0.151348043
Familiarity 0.526854585
WrittenFrequency 0.457971849
WrittenSpokenFrequencyRatio 0.008564774
FamilySize 0.360613035
DerivationalEntropy 0.188753214
InflectionalEntropy 0.182065382
NumberSimplexSynsets 0.350077421
NumberComplexSynsets 0.329011088
LengthInLetters 0.046317578
Ncount 0.016048288
MeanBigramFrequency 0.063566285
FrequencyInitialDiphone 0.048680060
ConspelV 0.049342737
ConspelN 0.104328581
ConphonV 0.020750103
ConphonN 0.068788492
ConfriendsV 0.045758455
ConfriendsN 0.124420710
ConffV 0.007290473
ConffN -0.007205900
ConfbV 0.005393638
ConfbN 0.003939158
NounFrequency 0.128263251
VerbFrequency 0.050165423
FrequencyInitialDiphoneWord 0.062039751
FrequencyInitialDiphoneSyllable 0.057000795
CorrectLexdec 1.000000000
corrplot(corr, method = 'ellipse', type = 'upper')
Let’s first compute the correlations between all numeric variables and plot these with the p values
## correlation using "corrplot"
## based on the function `rcorr' from the `Hmisc` package
## Need to change dataframe into a matrix
<-
corr %>%
english select(where(is.numeric)) %>%
as.matrix(english) %>%
rcorr(type = "pearson")
Warning in if (rownames.force %in% FALSE) NULL else if (rownames.force %in% :
the condition has length > 1 and only the first element will be used
print(corr)
RTlexdec
RTlexdec 1.00
RTnaming 0.76
Familiarity -0.44
WrittenFrequency -0.43
WrittenSpokenFrequencyRatio 0.04
FamilySize -0.35
DerivationalEntropy -0.16
InflectionalEntropy -0.09
NumberSimplexSynsets -0.31
NumberComplexSynsets -0.33
LengthInLetters 0.05
Ncount -0.07
MeanBigramFrequency 0.00
FrequencyInitialDiphone -0.07
ConspelV -0.03
ConspelN -0.11
ConphonV -0.02
ConphonN -0.08
ConfriendsV -0.03
ConfriendsN -0.12
ConffV -0.02
ConffN -0.01
ConfbV -0.02
ConfbN -0.02
NounFrequency -0.17
VerbFrequency -0.08
FrequencyInitialDiphoneWord -0.04
FrequencyInitialDiphoneSyllable -0.04
CorrectLexdec -0.25
RTnaming
RTlexdec 0.76
RTnaming 1.00
Familiarity -0.09
WrittenFrequency -0.10
WrittenSpokenFrequencyRatio 0.04
FamilySize -0.09
DerivationalEntropy -0.05
InflectionalEntropy -0.02
NumberSimplexSynsets -0.07
NumberComplexSynsets -0.08
LengthInLetters 0.09
Ncount -0.09
MeanBigramFrequency 0.05
FrequencyInitialDiphone -0.06
ConspelV -0.03
ConspelN -0.03
ConphonV 0.00
ConphonN -0.01
ConfriendsV -0.03
ConfriendsN -0.04
ConffV 0.01
ConffN 0.01
ConfbV 0.02
ConfbN 0.02
NounFrequency -0.04
VerbFrequency -0.02
FrequencyInitialDiphoneWord 0.02
FrequencyInitialDiphoneSyllable 0.03
CorrectLexdec 0.15
Familiarity
RTlexdec -0.44
RTnaming -0.09
Familiarity 1.00
WrittenFrequency 0.79
WrittenSpokenFrequencyRatio -0.19
FamilySize 0.59
DerivationalEntropy 0.22
InflectionalEntropy 0.11
NumberSimplexSynsets 0.51
NumberComplexSynsets 0.51
LengthInLetters -0.08
Ncount 0.10
MeanBigramFrequency 0.03
FrequencyInitialDiphone 0.13
ConspelV 0.07
ConspelN 0.21
ConphonV 0.05
ConphonN 0.16
ConfriendsV 0.05
ConfriendsN 0.21
ConffV 0.06
ConffN 0.03
ConfbV 0.06
ConfbN 0.05
NounFrequency 0.38
VerbFrequency 0.24
FrequencyInitialDiphoneWord 0.09
FrequencyInitialDiphoneSyllable 0.07
CorrectLexdec 0.53
WrittenFrequency
RTlexdec -0.43
RTnaming -0.10
Familiarity 0.79
WrittenFrequency 1.00
WrittenSpokenFrequencyRatio 0.07
FamilySize 0.66
DerivationalEntropy 0.26
InflectionalEntropy -0.04
NumberSimplexSynsets 0.56
NumberComplexSynsets 0.59
LengthInLetters -0.07
Ncount 0.10
MeanBigramFrequency 0.08
FrequencyInitialDiphone 0.17
ConspelV 0.06
ConspelN 0.28
ConphonV 0.08
ConphonN 0.22
ConfriendsV 0.02
ConfriendsN 0.26
ConffV 0.08
ConffN 0.05
ConfbV 0.12
ConfbN 0.10
NounFrequency 0.47
VerbFrequency 0.28
FrequencyInitialDiphoneWord 0.11
FrequencyInitialDiphoneSyllable 0.09
CorrectLexdec 0.46
WrittenSpokenFrequencyRatio
RTlexdec 0.04
RTnaming 0.04
Familiarity -0.19
WrittenFrequency 0.07
WrittenSpokenFrequencyRatio 1.00
FamilySize -0.11
DerivationalEntropy -0.01
InflectionalEntropy -0.12
NumberSimplexSynsets -0.09
NumberComplexSynsets -0.10
LengthInLetters 0.20
Ncount -0.19
MeanBigramFrequency 0.19
FrequencyInitialDiphone 0.02
ConspelV -0.16
ConspelN -0.06
ConphonV -0.03
ConphonN -0.03
ConfriendsV -0.14
ConfriendsN -0.06
ConffV -0.03
ConffN 0.00
ConfbV 0.09
ConfbN 0.07
NounFrequency 0.01
VerbFrequency -0.10
FrequencyInitialDiphoneWord 0.00
FrequencyInitialDiphoneSyllable 0.01
CorrectLexdec 0.01
FamilySize
RTlexdec -0.35
RTnaming -0.09
Familiarity 0.59
WrittenFrequency 0.66
WrittenSpokenFrequencyRatio -0.11
FamilySize 1.00
DerivationalEntropy 0.69
InflectionalEntropy 0.10
NumberSimplexSynsets 0.59
NumberComplexSynsets 0.65
LengthInLetters -0.12
Ncount 0.17
MeanBigramFrequency 0.00
FrequencyInitialDiphone 0.13
ConspelV 0.11
ConspelN 0.25
ConphonV 0.05
ConphonN 0.16
ConfriendsV 0.08
ConfriendsN 0.24
ConffV 0.08
ConffN 0.06
ConfbV 0.05
ConfbN 0.04
NounFrequency 0.42
VerbFrequency 0.11
FrequencyInitialDiphoneWord 0.10
FrequencyInitialDiphoneSyllable 0.09
CorrectLexdec 0.36
DerivationalEntropy
RTlexdec -0.16
RTnaming -0.05
Familiarity 0.22
WrittenFrequency 0.26
WrittenSpokenFrequencyRatio -0.01
FamilySize 0.69
DerivationalEntropy 1.00
InflectionalEntropy -0.05
NumberSimplexSynsets 0.22
NumberComplexSynsets 0.33
LengthInLetters -0.10
Ncount 0.12
MeanBigramFrequency -0.02
FrequencyInitialDiphone 0.08
ConspelV 0.05
ConspelN 0.14
ConphonV 0.00
ConphonN 0.07
ConfriendsV 0.03
ConfriendsN 0.13
ConffV 0.04
ConffN 0.04
ConfbV 0.01
ConfbN 0.01
NounFrequency 0.17
VerbFrequency -0.02
FrequencyInitialDiphoneWord 0.03
FrequencyInitialDiphoneSyllable 0.03
CorrectLexdec 0.19
InflectionalEntropy
RTlexdec -0.09
RTnaming -0.02
Familiarity 0.11
WrittenFrequency -0.04
WrittenSpokenFrequencyRatio -0.12
FamilySize 0.10
DerivationalEntropy -0.05
InflectionalEntropy 1.00
NumberSimplexSynsets 0.40
NumberComplexSynsets 0.01
LengthInLetters 0.05
Ncount 0.00
MeanBigramFrequency 0.02
FrequencyInitialDiphone -0.03
ConspelV 0.14
ConspelN 0.05
ConphonV 0.08
ConphonN 0.03
ConfriendsV 0.13
ConfriendsN 0.07
ConffV 0.01
ConffN 0.01
ConfbV 0.00
ConfbN -0.01
NounFrequency -0.11
VerbFrequency 0.09
FrequencyInitialDiphoneWord 0.05
FrequencyInitialDiphoneSyllable 0.05
CorrectLexdec 0.18
NumberSimplexSynsets
RTlexdec -0.31
RTnaming -0.07
Familiarity 0.51
WrittenFrequency 0.56
WrittenSpokenFrequencyRatio -0.09
FamilySize 0.59
DerivationalEntropy 0.22
InflectionalEntropy 0.40
NumberSimplexSynsets 1.00
NumberComplexSynsets 0.52
LengthInLetters -0.01
Ncount 0.11
MeanBigramFrequency 0.05
FrequencyInitialDiphone 0.06
ConspelV 0.17
ConspelN 0.23
ConphonV 0.07
ConphonN 0.14
ConfriendsV 0.15
ConfriendsN 0.25
ConffV 0.04
ConffN 0.02
ConfbV 0.02
ConfbN 0.00
NounFrequency 0.24
VerbFrequency 0.19
FrequencyInitialDiphoneWord 0.13
FrequencyInitialDiphoneSyllable 0.12
CorrectLexdec 0.35
NumberComplexSynsets
RTlexdec -0.33
RTnaming -0.08
Familiarity 0.51
WrittenFrequency 0.59
WrittenSpokenFrequencyRatio -0.10
FamilySize 0.65
DerivationalEntropy 0.33
InflectionalEntropy 0.01
NumberSimplexSynsets 0.52
NumberComplexSynsets 1.00
LengthInLetters -0.12
Ncount 0.14
MeanBigramFrequency -0.02
FrequencyInitialDiphone 0.10
ConspelV 0.07
ConspelN 0.19
ConphonV 0.05
ConphonN 0.14
ConfriendsV 0.04
ConfriendsN 0.18
ConffV 0.08
ConffN 0.06
ConfbV 0.05
ConfbN 0.05
NounFrequency 0.35
VerbFrequency 0.09
FrequencyInitialDiphoneWord 0.06
FrequencyInitialDiphoneSyllable 0.05
CorrectLexdec 0.33
LengthInLetters
RTlexdec 0.05
RTnaming 0.09
Familiarity -0.08
WrittenFrequency -0.07
WrittenSpokenFrequencyRatio 0.20
FamilySize -0.12
DerivationalEntropy -0.10
InflectionalEntropy 0.05
NumberSimplexSynsets -0.01
NumberComplexSynsets -0.12
LengthInLetters 1.00
Ncount -0.63
MeanBigramFrequency 0.79
FrequencyInitialDiphone -0.06
ConspelV -0.23
ConspelN -0.17
ConphonV -0.20
ConphonN -0.21
ConfriendsV -0.19
ConfriendsN -0.16
ConffV -0.02
ConffN 0.01
ConfbV -0.04
ConfbN -0.07
NounFrequency -0.04
VerbFrequency -0.08
FrequencyInitialDiphoneWord 0.16
FrequencyInitialDiphoneSyllable 0.15
CorrectLexdec 0.05
Ncount
RTlexdec -0.07
RTnaming -0.09
Familiarity 0.10
WrittenFrequency 0.10
WrittenSpokenFrequencyRatio -0.19
FamilySize 0.17
DerivationalEntropy 0.12
InflectionalEntropy 0.00
NumberSimplexSynsets 0.11
NumberComplexSynsets 0.14
LengthInLetters -0.63
Ncount 1.00
MeanBigramFrequency -0.39
FrequencyInitialDiphone 0.14
ConspelV 0.47
ConspelN 0.35
ConphonV 0.21
ConphonN 0.19
ConfriendsV 0.44
ConfriendsN 0.34
ConffV 0.08
ConffN 0.07
ConfbV -0.04
ConfbN -0.04
NounFrequency 0.04
VerbFrequency 0.05
FrequencyInitialDiphoneWord 0.01
FrequencyInitialDiphoneSyllable 0.02
CorrectLexdec 0.02
MeanBigramFrequency
RTlexdec 0.00
RTnaming 0.05
Familiarity 0.03
WrittenFrequency 0.08
WrittenSpokenFrequencyRatio 0.19
FamilySize 0.00
DerivationalEntropy -0.02
InflectionalEntropy 0.02
NumberSimplexSynsets 0.05
NumberComplexSynsets -0.02
LengthInLetters 0.79
Ncount -0.39
MeanBigramFrequency 1.00
FrequencyInitialDiphone 0.32
ConspelV -0.09
ConspelN 0.06
ConphonV -0.12
ConphonN -0.06
ConfriendsV -0.08
ConfriendsN 0.05
ConffV 0.07
ConffN 0.11
ConfbV 0.00
ConfbN -0.02
NounFrequency 0.04
VerbFrequency -0.05
FrequencyInitialDiphoneWord 0.21
FrequencyInitialDiphoneSyllable 0.20
CorrectLexdec 0.06
FrequencyInitialDiphone
RTlexdec -0.07
RTnaming -0.06
Familiarity 0.13
WrittenFrequency 0.17
WrittenSpokenFrequencyRatio 0.02
FamilySize 0.13
DerivationalEntropy 0.08
InflectionalEntropy -0.03
NumberSimplexSynsets 0.06
NumberComplexSynsets 0.10
LengthInLetters -0.06
Ncount 0.14
MeanBigramFrequency 0.32
FrequencyInitialDiphone 1.00
ConspelV -0.06
ConspelN 0.03
ConphonV -0.01
ConphonN 0.01
ConfriendsV -0.05
ConfriendsN 0.04
ConffV 0.00
ConffN 0.00
ConfbV 0.03
ConfbN 0.02
NounFrequency 0.10
VerbFrequency 0.06
FrequencyInitialDiphoneWord 0.13
FrequencyInitialDiphoneSyllable 0.12
CorrectLexdec 0.05
ConspelV
RTlexdec -0.03
RTnaming -0.03
Familiarity 0.07
WrittenFrequency 0.06
WrittenSpokenFrequencyRatio -0.16
FamilySize 0.11
DerivationalEntropy 0.05
InflectionalEntropy 0.14
NumberSimplexSynsets 0.17
NumberComplexSynsets 0.07
LengthInLetters -0.23
Ncount 0.47
MeanBigramFrequency -0.09
FrequencyInitialDiphone -0.06
ConspelV 1.00
ConspelN 0.64
ConphonV 0.54
ConphonN 0.42
ConfriendsV 0.92
ConfriendsN 0.62
ConffV 0.24
ConffN 0.17
ConfbV 0.05
ConfbN 0.03
NounFrequency -0.02
VerbFrequency 0.06
FrequencyInitialDiphoneWord 0.12
FrequencyInitialDiphoneSyllable 0.12
CorrectLexdec 0.05
ConspelN
RTlexdec -0.11
RTnaming -0.03
Familiarity 0.21
WrittenFrequency 0.28
WrittenSpokenFrequencyRatio -0.06
FamilySize 0.25
DerivationalEntropy 0.14
InflectionalEntropy 0.05
NumberSimplexSynsets 0.23
NumberComplexSynsets 0.19
LengthInLetters -0.17
Ncount 0.35
MeanBigramFrequency 0.06
FrequencyInitialDiphone 0.03
ConspelV 0.64
ConspelN 1.00
ConphonV 0.38
ConphonN 0.65
ConfriendsV 0.56
ConfriendsN 0.88
ConffV 0.28
ConffN 0.34
ConfbV 0.14
ConfbN 0.15
NounFrequency 0.12
VerbFrequency 0.13
FrequencyInitialDiphoneWord 0.12
FrequencyInitialDiphoneSyllable 0.12
CorrectLexdec 0.10
ConphonV
RTlexdec -0.02
RTnaming 0.00
Familiarity 0.05
WrittenFrequency 0.08
WrittenSpokenFrequencyRatio -0.03
FamilySize 0.05
DerivationalEntropy 0.00
InflectionalEntropy 0.08
NumberSimplexSynsets 0.07
NumberComplexSynsets 0.05
LengthInLetters -0.20
Ncount 0.21
MeanBigramFrequency -0.12
FrequencyInitialDiphone -0.01
ConspelV 0.54
ConspelN 0.38
ConphonV 1.00
ConphonN 0.67
ConfriendsV 0.53
ConfriendsN 0.38
ConffV 0.04
ConffN 0.06
ConfbV 0.74
ConfbN 0.61
NounFrequency -0.01
VerbFrequency 0.06
FrequencyInitialDiphoneWord 0.03
FrequencyInitialDiphoneSyllable 0.03
CorrectLexdec 0.02
ConphonN
RTlexdec -0.08
RTnaming -0.01
Familiarity 0.16
WrittenFrequency 0.22
WrittenSpokenFrequencyRatio -0.03
FamilySize 0.16
DerivationalEntropy 0.07
InflectionalEntropy 0.03
NumberSimplexSynsets 0.14
NumberComplexSynsets 0.14
LengthInLetters -0.21
Ncount 0.19
MeanBigramFrequency -0.06
FrequencyInitialDiphone 0.01
ConspelV 0.42
ConspelN 0.65
ConphonV 0.67
ConphonN 1.00
ConfriendsV 0.39
ConfriendsN 0.65
ConffV 0.08
ConffN 0.10
ConfbV 0.57
ConfbN 0.67
NounFrequency 0.09
VerbFrequency 0.10
FrequencyInitialDiphoneWord 0.05
FrequencyInitialDiphoneSyllable 0.05
CorrectLexdec 0.07
ConfriendsV
RTlexdec -0.03
RTnaming -0.03
Familiarity 0.05
WrittenFrequency 0.02
WrittenSpokenFrequencyRatio -0.14
FamilySize 0.08
DerivationalEntropy 0.03
InflectionalEntropy 0.13
NumberSimplexSynsets 0.15
NumberComplexSynsets 0.04
LengthInLetters -0.19
Ncount 0.44
MeanBigramFrequency -0.08
FrequencyInitialDiphone -0.05
ConspelV 0.92
ConspelN 0.56
ConphonV 0.53
ConphonN 0.39
ConfriendsV 1.00
ConfriendsN 0.65
ConffV -0.12
ConffN -0.10
ConfbV 0.01
ConfbN -0.01
NounFrequency -0.02
VerbFrequency 0.00
FrequencyInitialDiphoneWord 0.13
FrequencyInitialDiphoneSyllable 0.14
CorrectLexdec 0.05
ConfriendsN
RTlexdec -0.12
RTnaming -0.04
Familiarity 0.21
WrittenFrequency 0.26
WrittenSpokenFrequencyRatio -0.06
FamilySize 0.24
DerivationalEntropy 0.13
InflectionalEntropy 0.07
NumberSimplexSynsets 0.25
NumberComplexSynsets 0.18
LengthInLetters -0.16
Ncount 0.34
MeanBigramFrequency 0.05
FrequencyInitialDiphone 0.04
ConspelV 0.62
ConspelN 0.88
ConphonV 0.38
ConphonN 0.65
ConfriendsV 0.65
ConfriendsN 1.00
ConffV 0.01
ConffN 0.02
ConfbV 0.08
ConfbN 0.09
NounFrequency 0.12
VerbFrequency 0.12
FrequencyInitialDiphoneWord 0.12
FrequencyInitialDiphoneSyllable 0.12
CorrectLexdec 0.12
ConffV
RTlexdec -0.02
RTnaming 0.01
Familiarity 0.06
WrittenFrequency 0.08
WrittenSpokenFrequencyRatio -0.03
FamilySize 0.08
DerivationalEntropy 0.04
InflectionalEntropy 0.01
NumberSimplexSynsets 0.04
NumberComplexSynsets 0.08
LengthInLetters -0.02
Ncount 0.08
MeanBigramFrequency 0.07
FrequencyInitialDiphone 0.00
ConspelV 0.24
ConspelN 0.28
ConphonV 0.04
ConphonN 0.08
ConfriendsV -0.12
ConfriendsN 0.01
ConffV 1.00
ConffN 0.82
ConfbV 0.07
ConfbN 0.07
NounFrequency 0.04
VerbFrequency 0.12
FrequencyInitialDiphoneWord 0.01
FrequencyInitialDiphoneSyllable -0.01
CorrectLexdec 0.01
ConffN
RTlexdec -0.01
RTnaming 0.01
Familiarity 0.03
WrittenFrequency 0.05
WrittenSpokenFrequencyRatio 0.00
FamilySize 0.06
DerivationalEntropy 0.04
InflectionalEntropy 0.01
NumberSimplexSynsets 0.02
NumberComplexSynsets 0.06
LengthInLetters 0.01
Ncount 0.07
MeanBigramFrequency 0.11
FrequencyInitialDiphone 0.00
ConspelV 0.17
ConspelN 0.34
ConphonV 0.06
ConphonN 0.10
ConfriendsV -0.10
ConfriendsN 0.02
ConffV 0.82
ConffN 1.00
ConfbV 0.11
ConfbN 0.09
NounFrequency 0.01
VerbFrequency 0.08
FrequencyInitialDiphoneWord 0.00
FrequencyInitialDiphoneSyllable -0.01
CorrectLexdec -0.01
ConfbV
RTlexdec -0.02
RTnaming 0.02
Familiarity 0.06
WrittenFrequency 0.12
WrittenSpokenFrequencyRatio 0.09
FamilySize 0.05
DerivationalEntropy 0.01
InflectionalEntropy 0.00
NumberSimplexSynsets 0.02
NumberComplexSynsets 0.05
LengthInLetters -0.04
Ncount -0.04
MeanBigramFrequency 0.00
FrequencyInitialDiphone 0.03
ConspelV 0.05
ConspelN 0.14
ConphonV 0.74
ConphonN 0.57
ConfriendsV 0.01
ConfriendsN 0.08
ConffV 0.07
ConffN 0.11
ConfbV 1.00
ConfbN 0.84
NounFrequency 0.02
VerbFrequency 0.05
FrequencyInitialDiphoneWord -0.02
FrequencyInitialDiphoneSyllable -0.03
CorrectLexdec 0.01
ConfbN
RTlexdec -0.02
RTnaming 0.02
Familiarity 0.05
WrittenFrequency 0.10
WrittenSpokenFrequencyRatio 0.07
FamilySize 0.04
DerivationalEntropy 0.01
InflectionalEntropy -0.01
NumberSimplexSynsets 0.00
NumberComplexSynsets 0.05
LengthInLetters -0.07
Ncount -0.04
MeanBigramFrequency -0.02
FrequencyInitialDiphone 0.02
ConspelV 0.03
ConspelN 0.15
ConphonV 0.61
ConphonN 0.67
ConfriendsV -0.01
ConfriendsN 0.09
ConffV 0.07
ConffN 0.09
ConfbV 0.84
ConfbN 1.00
NounFrequency 0.03
VerbFrequency 0.03
FrequencyInitialDiphoneWord -0.02
FrequencyInitialDiphoneSyllable -0.02
CorrectLexdec 0.00
NounFrequency
RTlexdec -0.17
RTnaming -0.04
Familiarity 0.38
WrittenFrequency 0.47
WrittenSpokenFrequencyRatio 0.01
FamilySize 0.42
DerivationalEntropy 0.17
InflectionalEntropy -0.11
NumberSimplexSynsets 0.24
NumberComplexSynsets 0.35
LengthInLetters -0.04
Ncount 0.04
MeanBigramFrequency 0.04
FrequencyInitialDiphone 0.10
ConspelV -0.02
ConspelN 0.12
ConphonV -0.01
ConphonN 0.09
ConfriendsV -0.02
ConfriendsN 0.12
ConffV 0.04
ConffN 0.01
ConfbV 0.02
ConfbN 0.03
NounFrequency 1.00
VerbFrequency 0.00
FrequencyInitialDiphoneWord 0.05
FrequencyInitialDiphoneSyllable 0.03
CorrectLexdec 0.13
VerbFrequency
RTlexdec -0.08
RTnaming -0.02
Familiarity 0.24
WrittenFrequency 0.28
WrittenSpokenFrequencyRatio -0.10
FamilySize 0.11
DerivationalEntropy -0.02
InflectionalEntropy 0.09
NumberSimplexSynsets 0.19
NumberComplexSynsets 0.09
LengthInLetters -0.08
Ncount 0.05
MeanBigramFrequency -0.05
FrequencyInitialDiphone 0.06
ConspelV 0.06
ConspelN 0.13
ConphonV 0.06
ConphonN 0.10
ConfriendsV 0.00
ConfriendsN 0.12
ConffV 0.12
ConffN 0.08
ConfbV 0.05
ConfbN 0.03
NounFrequency 0.00
VerbFrequency 1.00
FrequencyInitialDiphoneWord 0.07
FrequencyInitialDiphoneSyllable 0.06
CorrectLexdec 0.05
FrequencyInitialDiphoneWord
RTlexdec -0.04
RTnaming 0.02
Familiarity 0.09
WrittenFrequency 0.11
WrittenSpokenFrequencyRatio 0.00
FamilySize 0.10
DerivationalEntropy 0.03
InflectionalEntropy 0.05
NumberSimplexSynsets 0.13
NumberComplexSynsets 0.06
LengthInLetters 0.16
Ncount 0.01
MeanBigramFrequency 0.21
FrequencyInitialDiphone 0.13
ConspelV 0.12
ConspelN 0.12
ConphonV 0.03
ConphonN 0.05
ConfriendsV 0.13
ConfriendsN 0.12
ConffV 0.01
ConffN 0.00
ConfbV -0.02
ConfbN -0.02
NounFrequency 0.05
VerbFrequency 0.07
FrequencyInitialDiphoneWord 1.00
FrequencyInitialDiphoneSyllable 0.98
CorrectLexdec 0.06
FrequencyInitialDiphoneSyllable
RTlexdec -0.04
RTnaming 0.03
Familiarity 0.07
WrittenFrequency 0.09
WrittenSpokenFrequencyRatio 0.01
FamilySize 0.09
DerivationalEntropy 0.03
InflectionalEntropy 0.05
NumberSimplexSynsets 0.12
NumberComplexSynsets 0.05
LengthInLetters 0.15
Ncount 0.02
MeanBigramFrequency 0.20
FrequencyInitialDiphone 0.12
ConspelV 0.12
ConspelN 0.12
ConphonV 0.03
ConphonN 0.05
ConfriendsV 0.14
ConfriendsN 0.12
ConffV -0.01
ConffN -0.01
ConfbV -0.03
ConfbN -0.02
NounFrequency 0.03
VerbFrequency 0.06
FrequencyInitialDiphoneWord 0.98
FrequencyInitialDiphoneSyllable 1.00
CorrectLexdec 0.06
CorrectLexdec
RTlexdec -0.25
RTnaming 0.15
Familiarity 0.53
WrittenFrequency 0.46
WrittenSpokenFrequencyRatio 0.01
FamilySize 0.36
DerivationalEntropy 0.19
InflectionalEntropy 0.18
NumberSimplexSynsets 0.35
NumberComplexSynsets 0.33
LengthInLetters 0.05
Ncount 0.02
MeanBigramFrequency 0.06
FrequencyInitialDiphone 0.05
ConspelV 0.05
ConspelN 0.10
ConphonV 0.02
ConphonN 0.07
ConfriendsV 0.05
ConfriendsN 0.12
ConffV 0.01
ConffN -0.01
ConfbV 0.01
ConfbN 0.00
NounFrequency 0.13
VerbFrequency 0.05
FrequencyInitialDiphoneWord 0.06
FrequencyInitialDiphoneSyllable 0.06
CorrectLexdec 1.00
n= 4568
P
RTlexdec
RTlexdec
RTnaming 0.0000
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0071
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0008
Ncount 0.0000
MeanBigramFrequency 0.8588
FrequencyInitialDiphone 0.0000
ConspelV 0.0263
ConspelN 0.0000
ConphonV 0.1417
ConphonN 0.0000
ConfriendsV 0.0822
ConfriendsN 0.0000
ConffV 0.2650
ConffN 0.7012
ConfbV 0.1281
ConfbN 0.2103
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0039
FrequencyInitialDiphoneSyllable 0.0164
CorrectLexdec 0.0000
RTnaming
RTlexdec 0.0000
RTnaming
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0134
FamilySize 0.0000
DerivationalEntropy 0.0008
InflectionalEntropy 0.1351
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0011
FrequencyInitialDiphone 0.0001
ConspelV 0.0890
ConspelN 0.0187
ConphonV 0.9367
ConphonN 0.3317
ConfriendsV 0.0608
ConfriendsN 0.0029
ConffV 0.5923
ConffN 0.4408
ConfbV 0.1954
ConfbN 0.2503
NounFrequency 0.0035
VerbFrequency 0.0965
FrequencyInitialDiphoneWord 0.1662
FrequencyInitialDiphoneSyllable 0.0691
CorrectLexdec 0.0000
Familiarity
RTlexdec 0.0000
RTnaming 0.0000
Familiarity
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0453
FrequencyInitialDiphone 0.0000
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0003
ConphonN 0.0000
ConfriendsV 0.0019
ConfriendsN 0.0000
ConffV 0.0001
ConffN 0.0465
ConfbV 0.0001
ConfbN 0.0004
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0000
WrittenFrequency
RTlexdec 0.0000
RTnaming 0.0000
Familiarity 0.0000
WrittenFrequency
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0062
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.0000
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.1469
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0007
ConfbV 0.0000
ConfbN 0.0000
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0000
WrittenSpokenFrequencyRatio
RTlexdec 0.0071
RTnaming 0.0134
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio
FamilySize 0.0000
DerivationalEntropy 0.4578
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.1599
ConspelV 0.0000
ConspelN 0.0001
ConphonV 0.0187
ConphonN 0.0806
ConfriendsV 0.0000
ConfriendsN 0.0002
ConffV 0.0708
ConffN 0.8181
ConfbV 0.0000
ConfbN 0.0000
NounFrequency 0.3990
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.8591
FrequencyInitialDiphoneSyllable 0.3844
CorrectLexdec 0.5628
FamilySize
RTlexdec 0.0000
RTnaming 0.0000
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize
DerivationalEntropy 0.0000
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.9431
FrequencyInitialDiphone 0.0000
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0007
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0000
ConfbV 0.0007
ConfbN 0.0093
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0000
DerivationalEntropy
RTlexdec 0.0000
RTnaming 0.0008
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.4578
FamilySize 0.0000
DerivationalEntropy
InflectionalEntropy 0.0006
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.1591
FrequencyInitialDiphone 0.0000
ConspelV 0.0018
ConspelN 0.0000
ConphonV 0.8523
ConphonN 0.0000
ConfriendsV 0.0506
ConfriendsN 0.0000
ConffV 0.0055
ConffN 0.0064
ConfbV 0.6008
ConfbN 0.4386
NounFrequency 0.0000
VerbFrequency 0.1825
FrequencyInitialDiphoneWord 0.0463
FrequencyInitialDiphoneSyllable 0.0607
CorrectLexdec 0.0000
InflectionalEntropy
RTlexdec 0.0000
RTnaming 0.1351
Familiarity 0.0000
WrittenFrequency 0.0062
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.0006
InflectionalEntropy
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.7057
LengthInLetters 0.0004
Ncount 0.8260
MeanBigramFrequency 0.0939
FrequencyInitialDiphone 0.0198
ConspelV 0.0000
ConspelN 0.0015
ConphonV 0.0000
ConphonN 0.0338
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.3309
ConffN 0.4672
ConfbV 0.8466
ConfbN 0.6084
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0004
FrequencyInitialDiphoneSyllable 0.0006
CorrectLexdec 0.0000
NumberSimplexSynsets
RTlexdec 0.0000
RTnaming 0.0000
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0000
NumberSimplexSynsets
NumberComplexSynsets 0.0000
LengthInLetters 0.6672
Ncount 0.0000
MeanBigramFrequency 0.0003
FrequencyInitialDiphone 0.0001
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.0142
ConffN 0.1237
ConfbV 0.1447
ConfbN 0.9889
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0000
NumberComplexSynsets
RTlexdec 0.0000
RTnaming 0.0000
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.7057
NumberSimplexSynsets 0.0000
NumberComplexSynsets
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.1107
FrequencyInitialDiphone 0.0000
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0012
ConphonN 0.0000
ConfriendsV 0.0107
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0001
ConfbV 0.0004
ConfbN 0.0009
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0015
CorrectLexdec 0.0000
LengthInLetters
RTlexdec 0.0008
RTnaming 0.0000
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0004
NumberSimplexSynsets 0.6672
NumberComplexSynsets 0.0000
LengthInLetters
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.0000
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.1935
ConffN 0.4670
ConfbV 0.0068
ConfbN 0.0000
NounFrequency 0.0169
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0017
Ncount
RTlexdec 0.0000
RTnaming 0.0000
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.8260
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.0000
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0000
ConfbV 0.0149
ConfbN 0.0044
NounFrequency 0.0153
VerbFrequency 0.0003
FrequencyInitialDiphoneWord 0.5939
FrequencyInitialDiphoneSyllable 0.1422
CorrectLexdec 0.2782
MeanBigramFrequency
RTlexdec 0.8588
RTnaming 0.0011
Familiarity 0.0453
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.9431
DerivationalEntropy 0.1591
InflectionalEntropy 0.0939
NumberSimplexSynsets 0.0003
NumberComplexSynsets 0.1107
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency
FrequencyInitialDiphone 0.0000
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.0009
ConffV 0.0000
ConffN 0.0000
ConfbV 0.8417
ConfbN 0.1865
NounFrequency 0.0034
VerbFrequency 0.0019
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0000
FrequencyInitialDiphone
RTlexdec 0.0000
RTnaming 0.0001
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.1599
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0198
NumberSimplexSynsets 0.0001
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone
ConspelV 0.0002
ConspelN 0.0364
ConphonV 0.3361
ConphonN 0.4821
ConfriendsV 0.0008
ConfriendsN 0.0144
ConffV 0.9908
ConffN 0.8845
ConfbV 0.0239
ConfbN 0.1957
NounFrequency 0.0000
VerbFrequency 0.0002
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0010
ConspelV
RTlexdec 0.0263
RTnaming 0.0890
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.0018
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.0002
ConspelV
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0000
ConfbV 0.0008
ConfbN 0.0656
NounFrequency 0.2518
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0008
ConspelN
RTlexdec 0.0000
RTnaming 0.0187
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0001
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0015
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.0364
ConspelV 0.0000
ConspelN
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0000
ConfbV 0.0000
ConfbN 0.0000
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0000
ConphonV
RTlexdec 0.1417
RTnaming 0.9367
Familiarity 0.0003
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0187
FamilySize 0.0007
DerivationalEntropy 0.8523
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0012
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.3361
ConspelV 0.0000
ConspelN 0.0000
ConphonV
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.0074
ConffN 0.0000
ConfbV 0.0000
ConfbN 0.0000
NounFrequency 0.5535
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0432
FrequencyInitialDiphoneSyllable 0.0230
CorrectLexdec 0.1609
ConphonN
RTlexdec 0.0000
RTnaming 0.3317
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0806
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0338
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.4821
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0000
ConfbV 0.0000
ConfbN 0.0000
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0003
FrequencyInitialDiphoneSyllable 0.0002
CorrectLexdec 0.0000
ConfriendsV
RTlexdec 0.0822
RTnaming 0.0608
Familiarity 0.0019
WrittenFrequency 0.1469
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.0506
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0107
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.0008
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0000
ConfbV 0.4597
ConfbN 0.4688
NounFrequency 0.1293
VerbFrequency 0.7902
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0020
ConfriendsN
RTlexdec 0.0000
RTnaming 0.0029
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0002
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0000
MeanBigramFrequency 0.0009
FrequencyInitialDiphone 0.0144
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN
ConffV 0.6587
ConffN 0.1567
ConfbV 0.0000
ConfbN 0.0000
NounFrequency 0.0000
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0000
ConffV
RTlexdec 0.2650
RTnaming 0.5923
Familiarity 0.0001
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0708
FamilySize 0.0000
DerivationalEntropy 0.0055
InflectionalEntropy 0.3309
NumberSimplexSynsets 0.0142
NumberComplexSynsets 0.0000
LengthInLetters 0.1935
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.9908
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0074
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.6587
ConffV
ConffN 0.0000
ConfbV 0.0000
ConfbN 0.0000
NounFrequency 0.0131
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.6937
FrequencyInitialDiphoneSyllable 0.4569
CorrectLexdec 0.6223
ConffN
RTlexdec 0.7012
RTnaming 0.4408
Familiarity 0.0465
WrittenFrequency 0.0007
WrittenSpokenFrequencyRatio 0.8181
FamilySize 0.0000
DerivationalEntropy 0.0064
InflectionalEntropy 0.4672
NumberSimplexSynsets 0.1237
NumberComplexSynsets 0.0001
LengthInLetters 0.4670
Ncount 0.0000
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.8845
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.0000
ConfriendsN 0.1567
ConffV 0.0000
ConffN
ConfbV 0.0000
ConfbN 0.0000
NounFrequency 0.4869
VerbFrequency 0.0000
FrequencyInitialDiphoneWord 0.8069
FrequencyInitialDiphoneSyllable 0.5551
CorrectLexdec 0.6263
ConfbV
RTlexdec 0.1281
RTnaming 0.1954
Familiarity 0.0001
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0007
DerivationalEntropy 0.6008
InflectionalEntropy 0.8466
NumberSimplexSynsets 0.1447
NumberComplexSynsets 0.0004
LengthInLetters 0.0068
Ncount 0.0149
MeanBigramFrequency 0.8417
FrequencyInitialDiphone 0.0239
ConspelV 0.0008
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.4597
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0000
ConfbV
ConfbN 0.0000
NounFrequency 0.1428
VerbFrequency 0.0006
FrequencyInitialDiphoneWord 0.1784
FrequencyInitialDiphoneSyllable 0.0678
CorrectLexdec 0.7155
ConfbN
RTlexdec 0.2103
RTnaming 0.2503
Familiarity 0.0004
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0093
DerivationalEntropy 0.4386
InflectionalEntropy 0.6084
NumberSimplexSynsets 0.9889
NumberComplexSynsets 0.0009
LengthInLetters 0.0000
Ncount 0.0044
MeanBigramFrequency 0.1865
FrequencyInitialDiphone 0.1957
ConspelV 0.0656
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.4688
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0000
ConfbV 0.0000
ConfbN
NounFrequency 0.0882
VerbFrequency 0.0259
FrequencyInitialDiphoneWord 0.2019
FrequencyInitialDiphoneSyllable 0.0990
CorrectLexdec 0.7901
NounFrequency
RTlexdec 0.0000
RTnaming 0.0035
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.3990
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0169
Ncount 0.0153
MeanBigramFrequency 0.0034
FrequencyInitialDiphone 0.0000
ConspelV 0.2518
ConspelN 0.0000
ConphonV 0.5535
ConphonN 0.0000
ConfriendsV 0.1293
ConfriendsN 0.0000
ConffV 0.0131
ConffN 0.4869
ConfbV 0.1428
ConfbN 0.0882
NounFrequency
VerbFrequency 0.8332
FrequencyInitialDiphoneWord 0.0013
FrequencyInitialDiphoneSyllable 0.0204
CorrectLexdec 0.0000
VerbFrequency
RTlexdec 0.0000
RTnaming 0.0965
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.0000
FamilySize 0.0000
DerivationalEntropy 0.1825
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.0003
MeanBigramFrequency 0.0019
FrequencyInitialDiphone 0.0002
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0000
ConphonN 0.0000
ConfriendsV 0.7902
ConfriendsN 0.0000
ConffV 0.0000
ConffN 0.0000
ConfbV 0.0006
ConfbN 0.0259
NounFrequency 0.8332
VerbFrequency
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0002
CorrectLexdec 0.0007
FrequencyInitialDiphoneWord
RTlexdec 0.0039
RTnaming 0.1662
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.8591
FamilySize 0.0000
DerivationalEntropy 0.0463
InflectionalEntropy 0.0004
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0000
Ncount 0.5939
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.0000
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0432
ConphonN 0.0003
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.6937
ConffN 0.8069
ConfbV 0.1784
ConfbN 0.2019
NounFrequency 0.0013
VerbFrequency 0.0000
FrequencyInitialDiphoneWord
FrequencyInitialDiphoneSyllable 0.0000
CorrectLexdec 0.0000
FrequencyInitialDiphoneSyllable
RTlexdec 0.0164
RTnaming 0.0691
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.3844
FamilySize 0.0000
DerivationalEntropy 0.0607
InflectionalEntropy 0.0006
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0015
LengthInLetters 0.0000
Ncount 0.1422
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.0000
ConspelV 0.0000
ConspelN 0.0000
ConphonV 0.0230
ConphonN 0.0002
ConfriendsV 0.0000
ConfriendsN 0.0000
ConffV 0.4569
ConffN 0.5551
ConfbV 0.0678
ConfbN 0.0990
NounFrequency 0.0204
VerbFrequency 0.0002
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable
CorrectLexdec 0.0001
CorrectLexdec
RTlexdec 0.0000
RTnaming 0.0000
Familiarity 0.0000
WrittenFrequency 0.0000
WrittenSpokenFrequencyRatio 0.5628
FamilySize 0.0000
DerivationalEntropy 0.0000
InflectionalEntropy 0.0000
NumberSimplexSynsets 0.0000
NumberComplexSynsets 0.0000
LengthInLetters 0.0017
Ncount 0.2782
MeanBigramFrequency 0.0000
FrequencyInitialDiphone 0.0010
ConspelV 0.0008
ConspelN 0.0000
ConphonV 0.1609
ConphonN 0.0000
ConfriendsV 0.0020
ConfriendsN 0.0000
ConffV 0.6223
ConffN 0.6263
ConfbV 0.7155
ConfbN 0.7901
NounFrequency 0.0000
VerbFrequency 0.0007
FrequencyInitialDiphoneWord 0.0000
FrequencyInitialDiphoneSyllable 0.0001
CorrectLexdec
# use corrplot to obtain a nice correlation plot!
corrplot(corr$r, p.mat = corr$P,
addCoef.col = "black", diag = FALSE, type = "upper", tl.srt = 55)
%>%
english group_by(AgeSubject) %>%
summarise(mean = mean(RTlexdec),
sd = sd(RTlexdec))
Up to now, we have looked at descriptive statistics, and evaluated summaries, correlations in the data (with p values).
We are now interested in looking at group differences.
The basic assumption of a Linear model is to create a regression analysis on the data. We have an outcome (or dependent variable) and a predictor (or an independent variable). The formula of a linear model is as follows outcome ~ predictor
that can be read as “outcome as a function of the predictor”. We can add “1” to specify an intercept, but this is by default added to the model
<- english %>%
english2 mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old")))
<- english2 %>%
mdl.lm lm(RTlexdec ~ AgeSubject, data = .)
#lm(RTlexdec ~ AgeSubject, data = english)
#also print(mdl.lm) mdl.lm
Call:
lm(formula = RTlexdec ~ AgeSubject, data = .)
Coefficients:
(Intercept) AgeSubjectold
6.4392 0.2217
summary(mdl.lm)
Call:
lm(formula = RTlexdec ~ AgeSubject, data = .)
Residuals:
Min 1Q Median
-0.25776 -0.08339 -0.01669
3Q Max
0.06921 0.52685
Coefficients:
Estimate
(Intercept) 6.439237
AgeSubjectold 0.221721
Std. Error t value
(Intercept) 0.002324 2771.03
AgeSubjectold 0.003286 67.47
Pr(>|t|)
(Intercept) <2e-16 ***
AgeSubjectold <2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1111 on 4566 degrees of freedom
Multiple R-squared: 0.4992, Adjusted R-squared: 0.4991
F-statistic: 4552 on 1 and 4566 DF, p-value: < 2.2e-16
# from library(broom)
tidy(mdl.lm) %>%
select(term, estimate) %>%
mutate(estimate = round(estimate, 3))
<- tidy(mdl.lm) %>% pull(estimate) mycoefE
Obtaining mean values from our model
#old
1] mycoefE[
[1] 6.439237
#young
1] + mycoefE[2] mycoefE[
[1] 6.660958
We can also obtain a nice table of our model summary. We can use the package knitr
or xtable
kable(summary(mdl.lm)$coef, digits = 3)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 6.439 | 0.002 | 2771.027 | 0 |
AgeSubjectold | 0.222 | 0.003 | 67.468 | 0 |
NA
tidy
output<- tidy(mdl.lm)
mdl.lmT kable(mdl.lmT, digits = 3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 6.439 | 0.002 | 2771.027 | 0 |
AgeSubjectold | 0.222 | 0.003 | 67.468 | 0 |
Let us dissect the model. If you use “str”, you will be able to see what is available under our linear model. To access some info from the model
str(mdl.lm)
List of 13
$ coefficients : Named num [1:2] 6.439 0.222
..- attr(*, "names")= chr [1:2] "(Intercept)" "AgeSubjectold"
$ residuals : Named num [1:4568] 0.1045 -0.0416 -0.1343 -0.015 0.0114 ...
..- attr(*, "names")= chr [1:4568] "1" "2" "3" "4" ...
$ effects : Named num [1:4568] -442.7013 7.4927 -0.1352 -0.0159 0.0105 ...
..- attr(*, "names")= chr [1:4568] "(Intercept)" "AgeSubjectold" "" "" ...
$ rank : int 2
$ fitted.values: Named num [1:4568] 6.44 6.44 6.44 6.44 6.44 ...
..- attr(*, "names")= chr [1:4568] "1" "2" "3" "4" ...
$ assign : int [1:2] 0 1
$ qr :List of 5
..$ qr : num [1:4568, 1:2] -67.587 0.0148 0.0148 0.0148 0.0148 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:4568] "1" "2" "3" "4" ...
.. .. ..$ : chr [1:2] "(Intercept)" "AgeSubjectold"
.. ..- attr(*, "assign")= int [1:2] 0 1
.. ..- attr(*, "contrasts")=List of 1
.. .. ..$ AgeSubject: chr "contr.treatment"
..$ qraux: num [1:2] 1.01 1.01
..$ pivot: int [1:2] 1 2
..$ tol : num 1e-07
..$ rank : int 2
..- attr(*, "class")= chr "qr"
$ df.residual : int 4566
$ contrasts :List of 1
..$ AgeSubject: chr "contr.treatment"
$ xlevels :List of 1
..$ AgeSubject: chr [1:2] "young" "old"
$ call : language lm(formula = RTlexdec ~ AgeSubject, data = .)
$ terms :Classes 'terms', 'formula' language RTlexdec ~ AgeSubject
.. ..- attr(*, "variables")= language list(RTlexdec, AgeSubject)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "RTlexdec" "AgeSubject"
.. .. .. ..$ : chr "AgeSubject"
.. ..- attr(*, "term.labels")= chr "AgeSubject"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: 0x000001e11015d748>
.. ..- attr(*, "predvars")= language list(RTlexdec, AgeSubject)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "factor"
.. .. ..- attr(*, "names")= chr [1:2] "RTlexdec" "AgeSubject"
$ model :'data.frame': 4568 obs. of 2 variables:
..$ RTlexdec : num [1:4568] 6.54 6.4 6.3 6.42 6.45 ...
..$ AgeSubject: Factor w/ 2 levels "young","old": 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "terms")=Classes 'terms', 'formula' language RTlexdec ~ AgeSubject
.. .. ..- attr(*, "variables")= language list(RTlexdec, AgeSubject)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "RTlexdec" "AgeSubject"
.. .. .. .. ..$ : chr "AgeSubject"
.. .. ..- attr(*, "term.labels")= chr "AgeSubject"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: 0x000001e11015d748>
.. .. ..- attr(*, "predvars")= language list(RTlexdec, AgeSubject)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "factor"
.. .. .. ..- attr(*, "names")= chr [1:2] "RTlexdec" "AgeSubject"
- attr(*, "class")= chr "lm"
coef(mdl.lm)
(Intercept) AgeSubjectold
6.4392366 0.2217215
## same as
## mdl.lm$coefficients
What if I want to obtain the “Intercept”? Or the coefficient for distance? What if I want the full row for distance?
coef(mdl.lm)[1] # same as mdl.lm$coefficients[1]
(Intercept)
6.439237
coef(mdl.lm)[2] # same as mdl.lm$coefficients[2]
AgeSubjectold
0.2217215
summary(mdl.lm)$coefficients[2, ] # full row
Estimate Std. Error
0.22172146 0.00328631
t value Pr(>|t|)
67.46820211 0.00000000
summary(mdl.lm)$coefficients[2, 4] #for p value
[1] 0
What about residuals (difference between the observed value and the estimated value of the quantity) and fitted values? This allows us to evaluate how normal our residuals are and how different they are from a normal distribution.
hist(residuals(mdl.lm))
qqnorm(residuals(mdl.lm)); qqline(residuals(mdl.lm))
plot(fitted(mdl.lm), residuals(mdl.lm), cex = 4)
AIC(mdl.lm) # Akaike's Information Criterion, lower values are better
[1] -7110.962
BIC(mdl.lm) # Bayesian AIC
[1] -7091.682
logLik(mdl.lm) # log likelihood
'log Lik.' 3558.481 (df=3)
Or use the following from broom
glance(mdl.lm)
Are the above informative? of course not directly. If we want to test for overall significance of model. We run a null model (aka intercept only) and compare models.
<- english %>%
mdl.lm.Null lm(RTlexdec ~ 1, data = .)
<- anova(mdl.lm.Null, mdl.lm)
mdl.comp mdl.comp
Analysis of Variance Table
Model 1: RTlexdec ~ 1
Model 2: RTlexdec ~ AgeSubject
Res.Df RSS Df Sum of Sq
1 4567 112.456
2 4566 56.314 1 56.141
F Pr(>F)
1
2 4552 < 2.2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
The results show that adding the variable “AgeSubject” improves the model fit. We can write this as follows: Model comparison showed that the addition of AgeSubject improved the model fit when compared with an intercept only model (\(F\)(1) = 4551.96, p < 0) (F(1) = 4552 , p < 2.2e-16)
Let’s plot our fitted values but only for the trend line
%>%
english ggplot(aes(x = AgeSubject, y = RTlexdec))+
geom_boxplot()+
theme_bw() + theme(text = element_text(size = 15))+
geom_smooth(aes(x = as.numeric(AgeSubject), y = predict(mdl.lm)), method = "lm", color = "blue") +
labs(x = "Age", y = "RTLexDec", title = "Boxplot and predicted trend line", subtitle = "with ggplot2")
`geom_smooth()` using formula 'y ~ x'
This allows us to plot the fitted values from our model with the predicted linear trend. This is exactly the same as our original data.
We can also plot the predicted means and linear trend
%>%
english ggplot(aes(x = AgeSubject, y = predict(mdl.lm)))+
geom_boxplot(color = "blue") +
theme_bw() + theme(text = element_text(size = 15)) +
geom_smooth(aes(x = as.numeric(AgeSubject), y = predict(mdl.lm)), method = "lm", color = "blue") +
labs(x = "Age", y = "RTLexDec", title = "Predicted means and trend line", subtitle = "with ggplot2")
`geom_smooth()` using formula 'y ~ x'
We can also plot the actual data, the predicted means and linear trend
%>%
english ggplot(aes(x = AgeSubject, y = RTlexdec))+
geom_boxplot() +
geom_boxplot(aes(x = AgeSubject, y = predict(mdl.lm)), color = "blue") +
theme_bw() + theme(text = element_text(size = 15)) +
geom_smooth(aes(x = as.numeric(AgeSubject), y = predict(mdl.lm)), method = "lm", color = "blue") +
labs(x = "Species", y = "Length", title = "Boxplot raw data, predicted means (in blue) and trend line", subtitle = "with ggplot2")
`geom_smooth()` using formula 'y ~ x'
We can use the p values generated from either our linear model to add significance levels on a plot. We use the code from above and add the significance level. We also add a trend line
%>%
english ggplot(aes(x = AgeSubject, y = RTlexdec))+
geom_boxplot() +
geom_boxplot(aes(x = AgeSubject, y = predict(mdl.lm)), color = "blue") +
theme_bw() + theme(text = element_text(size = 15)) +
geom_smooth(aes(x = as.numeric(AgeSubject), y = predict(mdl.lm)), method = "lm", color = "blue") +
labs(x = "Species", y = "Length", title = "Boxplot raw data, predicted means (in blue) and trend line", subtitle = "with significance testing") +
geom_signif(comparison = list(c("old", "young")),
map_signif_level = TRUE, test = function(a, b) {
list(p.value = summary(mdl.lm)$coefficients[2, 4])})
`geom_smooth()` using formula 'y ~ x'
When having three of more levels in our predictor, we can use pairwise comparisons, with corrections to evaluate differences between each level.
summary(mdl.lm)
Call:
lm(formula = RTlexdec ~ AgeSubject, data = .)
Residuals:
Min 1Q Median
-0.25776 -0.08339 -0.01669
3Q Max
0.06921 0.52685
Coefficients:
Estimate
(Intercept) 6.439237
AgeSubjectold 0.221721
Std. Error t value
(Intercept) 0.002324 2771.03
AgeSubjectold 0.003286 67.47
Pr(>|t|)
(Intercept) <2e-16 ***
AgeSubjectold <2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1111 on 4566 degrees of freedom
Multiple R-squared: 0.4992, Adjusted R-squared: 0.4991
F-statistic: 4552 on 1 and 4566 DF, p-value: < 2.2e-16
%>% emmeans(pairwise ~ AgeSubject, adjust = "fdr") -> mdl.emmeans
mdl.lm mdl.emmeans
$emmeans
AgeSubject emmean SE df
young 6.439 0.002324 4566
old 6.661 0.002324 4566
lower.CL upper.CL
6.435 6.444
6.656 6.666
Confidence level used: 0.95
$contrasts
contrast estimate SE
young - old -0.222 0.00329
df t.ratio p.value
4566 -67.468 <.0001
How to interpret the output? Discuss with your neighbour and share with the group.
Hint… Look at the emmeans values for each level of our factor “Species” and the contrasts.
Linear models require a numeric outcome, but the predictor can be either numeric or a factor. We can have more than one predictor. The only issue is that this complicates the interpretation of results
%>%
english lm(RTlexdec ~ AgeSubject * WordCategory, data = .) %>%
summary()
Call:
lm(formula = RTlexdec ~ AgeSubject * WordCategory, data = .)
Residuals:
Min 1Q Median
-0.25079 -0.08273 -0.01516
3Q Max
0.06940 0.52285
Coefficients:
Estimate
(Intercept) 6.664955
AgeSubjectyoung -0.220395
WordCategoryV -0.010972
AgeSubjectyoung:WordCategoryV -0.003642
Std. Error
(Intercept) 0.002911
AgeSubjectyoung 0.004116
WordCategoryV 0.004822
AgeSubjectyoung:WordCategoryV 0.006820
t value
(Intercept) 2289.950
AgeSubjectyoung -53.545
WordCategoryV -2.275
AgeSubjectyoung:WordCategoryV -0.534
Pr(>|t|)
(Intercept) <2e-16
AgeSubjectyoung <2e-16
WordCategoryV 0.0229
AgeSubjectyoung:WordCategoryV 0.5934
(Intercept) ***
AgeSubjectyoung ***
WordCategoryV *
AgeSubjectyoung:WordCategoryV
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1109 on 4564 degrees of freedom
Multiple R-squared: 0.5008, Adjusted R-squared: 0.5005
F-statistic: 1526 on 3 and 4564 DF, p-value: < 2.2e-16
And with an Anova
%>%
english lm(RTlexdec ~ AgeSubject * WordCategory, data = .) %>%
anova()
Analysis of Variance Table
Response: RTlexdec
Df
AgeSubject 1
WordCategory 1
AgeSubject:WordCategory 1
Residuals 4564
Sum Sq
AgeSubject 56.141
WordCategory 0.173
AgeSubject:WordCategory 0.004
Residuals 56.138
Mean Sq
AgeSubject 56.141
WordCategory 0.173
AgeSubject:WordCategory 0.004
Residuals 0.012
F value
AgeSubject 4564.2810
WordCategory 14.0756
AgeSubject:WordCategory 0.2851
Residuals
Pr(>F)
AgeSubject < 2.2e-16
WordCategory 0.0001778
AgeSubject:WordCategory 0.5933724
Residuals
AgeSubject ***
WordCategory ***
AgeSubject:WordCategory
Residuals
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
The results above tell us that all predictors used are significantly different.
Here we will look at an example when the outcome is binary. This simulated data is structured as follows. We asked one participant to listen to 165 sentences, and to judge whether these are “grammatical” or “ungrammatical”. There were 105 sentences that were “grammatical” and 60 “ungrammatical”. This fictitious example can apply in any other situation. Let’s think Geography: 165 lands: 105 “flat” and 60 “non-flat”, etc. This applies to any case where you need to “categorise” the outcome into two groups.
Let’s load in the data and do some basic summaries
<- read_csv("grammatical.csv") grammatical
Rows: 165 Columns: 2
-- Column specification ---------
Delimiter: ","
chr (2): grammaticality, resp...
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
grammatical
str(grammatical)
spec_tbl_df [165 x 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ grammaticality: chr [1:165] "grammatical" "grammatical" "grammatical" "grammatical" ...
$ response : chr [1:165] "yes" "yes" "yes" "yes" ...
- attr(*, "spec")=
.. cols(
.. grammaticality = col_character(),
.. response = col_character()
.. )
- attr(*, "problems")=<externalptr>
head(grammatical)
Let’s run a first GLM (Generalised Linear Model). A GLM uses a special family “binomial” as it assumes the outcome has a binomial distribution. In general, results from a Logistic Regression are close to what we get from SDT (see above).
To run the results, we will change the reference level for both response and grammaticality. The basic assumption about GLM is that we start with our reference level being the “no” responses to the “ungrammatical” category. Any changes to this reference will be seen in the coefficients as “yes” responses to the “grammatical” category.
The results below show the logodds for our model.
<- grammatical %>%
grammatical mutate(response = factor(response, levels = c("no", "yes")),
grammaticality = factor(grammaticality, levels = c("ungrammatical", "grammatical")))
%>%
grammatical group_by(grammaticality, response) %>%
table()
response
grammaticality no yes
ungrammatical 50 10
grammatical 5 100
<- grammatical %>%
mdl.glm glm(response ~ grammaticality, data = ., family = binomial)
summary(mdl.glm)
Call:
glm(formula = response ~ grammaticality, family = binomial, data = .)
Deviance Residuals:
Min 1Q Median
-2.4676 -0.6039 0.3124
3Q Max
0.3124 1.8930
Coefficients:
Estimate
(Intercept) -1.6094
grammaticalitygrammatical 4.6052
Std. Error
(Intercept) 0.3464
grammaticalitygrammatical 0.5744
z value
(Intercept) -4.646
grammaticalitygrammatical 8.017
Pr(>|z|)
(Intercept) 3.38e-06
grammaticalitygrammatical 1.08e-15
(Intercept) ***
grammaticalitygrammatical ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 210.050 on 164 degrees of freedom
Residual deviance: 94.271 on 163 degrees of freedom
AIC: 98.271
Number of Fisher Scoring iterations: 5
tidy(mdl.glm) %>%
select(term, estimate) %>%
mutate(estimate = round(estimate, 3))
# to only get the coefficients
<- tidy(mdl.glm) %>% pull(estimate) mycoef2
The results show that for one unit increase in the response (i.e., from no to yes), the logodds of being “grammatical” is increased by -0.1228119 (the intercept shows that when the response is “no”, the logodds are 0.5011178). The actual logodds for the response “yes” to grammatical is 0.3783059
Logodds can be modified to talk about the odds of an event. For our model above, the odds of “grammatical” receiving a “no” response is a mere 0.2; the odds of “grammatical” to receive a “yes” is a 20; i.e., 20 times more likely
exp(mycoef2[1])
[1] 0.2
exp(mycoef2[1] + mycoef2[2])
[1] 20
If you want to talk about the percentage “accuracy” of our model, then we can transform our loggodds into proportions. This shows that the proportion of “grammatical” receiving a “yes” response increases by 99% (or 95% based on our “true” coefficients)
plogis(mycoef2[1])
[1] 0.1666667
plogis(mycoef2[1] + mycoef2[2])
[1] 0.952381
<- grammatical %>%
grammatical mutate(prob = predict(mdl.glm, type = "response"))
%>%
grammatical ggplot(aes(x = as.numeric(grammaticality), y = prob)) +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "binomial"),
se = T) + theme_bw(base_size = 20)+
labs(y = "Probability", x = "")+
coord_cartesian(ylim = c(0,1))+
scale_x_discrete(limits = c("Ungrammatical", "Grammatical"))
`geom_smooth()` using formula 'y ~ x'
In this example, we will run a GLM model using a similar technique to that used in Al-Tamimi (2017)
and Baumann & Winter (2018)
. We use the package LanguageR
and the dataset English
.
In the model above, we used the equation as lm(RTlexdec ~ AgeSubject). We were interested in examining the impact of age of subject on reaction time in a lexical decision task. In this section, we are interested in understanding how reaction time allows to differentiate the participants based on their age. We use AgeSubject
as our outcome and RTlexdec
as our predictor using the equation glm(AgeSubject ~ RTlexdec). We usually can use RTlexdec
as is, but due to a possible quasi separation and the fact that we may want to compare coefficients using multiple acoustic metrics, we will z-score our predictor. We run below two models, with and without z-scoring
For the glm model, we need to specify family = "binomial"
.
<- english2 %>%
mdl.glm2 glm(AgeSubject ~ RTlexdec, data = ., family = "binomial")
tidy(mdl.glm2) %>%
select(term, estimate) %>%
mutate(estimate = round(estimate, 3))
# to only get the coefficients
<- tidy(mdl.glm2) %>% pull(estimate) mycoef2
If you want to talk about the percentage “accuracy” of our model, then we can transform our loggodds into proportions.
plogis(mycoef2[1])
[1] 1.368844e-56
plogis(mycoef2[1] + mycoef2[2])
[1] 4.678715e-48
<- english2 %>%
english2 mutate(prob = predict(mdl.glm2, type = "response"))
%>%
english2 ggplot(aes(x = as.numeric(AgeSubject), y = prob)) +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "binomial"),
se = T) + theme_bw(base_size = 20)+
labs(y = "Probability", x = "")+
coord_cartesian(ylim = c(0,1))+
scale_x_discrete(limits = c("Young", "Old"))
`geom_smooth()` using formula 'y ~ x'
The plot above show how the two groups differ using a glm. The results point to an overall increase in the proportion of reaction time when moving from the “Young” to the “Old” group. Let’s use z-scoring next
<- english2 %>%
english2 mutate(`RTlexdec_z` = scale(RTlexdec, center = TRUE, scale = TRUE))
'RTlexdec_z'] <- as.data.frame(scale(english2$RTlexdec))
english2[
<- english2 %>%
mdl.glm3 glm(AgeSubject ~ RTlexdec_z, data = ., family = "binomial")
tidy(mdl.glm3) %>%
select(term, estimate) %>%
mutate(estimate = round(estimate, 3))
# to only get the coefficients
<- tidy(mdl.glm3) %>% pull(estimate) mycoef2
If you want to talk about the percentage “accuracy” of our model, then we can transform our loggodds into proportions.
plogis(mycoef2[1])
[1] 0.5192147
plogis(mycoef2[1] + mycoef2[2])
[1] 0.959313
<- english2 %>%
english2 mutate(prob = predict(mdl.glm3, type = "response"))
%>%
english2 ggplot(aes(x = as.numeric(AgeSubject), y = prob)) +
geom_point() +
geom_smooth(method = "glm",
method.args = list(family = "binomial"),
se = T) + theme_bw(base_size = 20)+
labs(y = "Probability", x = "")+
coord_cartesian(ylim = c(0,1))+
scale_x_discrete(limits = c("Young", "Old"))
`geom_smooth()` using formula 'y ~ x'
We obtain the exact same plots, but the model estimations are different. Let’s use another type of predictions
<- seq(-3, 3, 0.01)
z_vals
<- data.frame(RTlexdec_z = z_vals)
dfPredNew
## store the predicted probabilities for each value of RTlexdec_z
<- cbind(dfPredNew, prob = predict(mdl.glm3, newdata = dfPredNew, type = "response"))
pp
%>%
pp ggplot(aes(x = RTlexdec_z, y = prob)) +
geom_point() +
theme_bw(base_size = 20)+
labs(y = "Probability", x = "")+
coord_cartesian(ylim = c(0,1))+
scale_x_continuous(breaks = c(-3, -2, -1, 0, 1, 2, 3))
We obtain the exact same plots, but the model estimations are different.
We are generally interested in performance, i.e., whether the we have “accurately” categorised the outcome or not and at the same time want to evaluate our biases in responses. When deciding on categories, we are usually biased in our selection.
Let’s ask the question: How many of you have a Mac laptop and how many a Windows laptop? For those with a Mac, what was the main reason for choosing it? Are you biased in anyway by your decision?
To correct for these biases, we use some variants from Signal Detection Theory to obtain the true estimates without being influenced by the biases.
Let’s do some stats on this
Yes | No | Total | |
---|---|---|---|
Grammatical (Yes Actual) | TP = 100 | FN = 5 | (Yes Actual) 105 |
Ungrammatical (No Actual) | FP = 10 | TN = 50 | (No Actual) 60 |
Total | (Yes Response) 110 | (No Response) 55 | 165 |
<- grammatical %>%
grammatical mutate(response = factor(response, levels = c("yes", "no")),
grammaticality = factor(grammaticality, levels = c("grammatical", "ungrammatical")))
## TP = True Positive (Hit); FP = False Positive; FN = False Negative; TN = True Negative
<- nrow(grammatical %>%
TP filter(grammaticality == "grammatical" &
== "yes"))
response <- nrow(grammatical %>%
FN filter(grammaticality == "grammatical" &
== "no"))
response <- nrow(grammatical %>%
FP filter(grammaticality == "ungrammatical" &
== "yes"))
response <- nrow(grammatical %>%
TN filter(grammaticality == "ungrammatical" &
== "no"))
response TP
[1] 100
FN
[1] 5
FP
[1] 10
TN
[1] 50
<- nrow(grammatical)
Total Total
[1] 165
+TN)/Total # accuracy (TP
[1] 0.9090909
+FN)/Total # error, also 1-accuracy (FP
[1] 0.09090909
# When stimulus = yes, how many times response = yes?
/(TP+FN) # also True Positive Rate or Specificity TP
[1] 0.952381
# When stimulus = no, how many times response = yes?
/(FP+TN) # False Positive Rate, FP
[1] 0.1666667
# When stimulus = no, how many times response = no?
/(FP+TN) # True Negative Rate or Sensitivity TN
[1] 0.8333333
# When subject responds "yes" how many times is (s)he correct?
/(TP+FP) # precision TP
[1] 0.9090909
# getting dprime (or the sensitivity index); beta (bias criterion, 0-1, lower=increase in "yes"); Aprime (estimate of discriminability, 0-1, 1=good discrimination; 0 at chance); bppd (b prime prime d, -1 to 1; 0 = no bias, negative = tendency to respond "yes", positive = tendency to respond "no"); c (index of bias, equals to SD)
#(see also https://www.r-bloggers.com/compute-signal-detection-theory-indices-with-r/amp/)
::dprime(TP, FP, FN, TN,
psychon_targets = TP+FN,
n_distractors = FP+TN,
adjust=F)
$dprime
[1] 2.635813
$beta
[1] 0.3970026
$aprime
[1] 0.9419643
$bppd
[1] -0.5076923
$c
[1] -0.3504848
The most important from above, is d-prime. This is modelling the difference between the rate of “True Positive” responses and “False Positive” responses in standard unit (or z-scores). The formula can be written as:
d' (d prime) = Z(True Positive Rate) - Z(False Positive Rate)
The code below demonstrates the links between our GLM model and what we had obtained above from SDT. The predictions’ table shows that our GLM was successful at obtaining prediction that are identical to our initial data setup. Look at the table here and the table above. Once we have created our table of outcome, we can compute percent correct, the specificity, the sensitivity, the Kappa score, etc.. this yields the actual value with the SD that is related to variations in responses.
## predict(mdl.glm)>0.5 is identical to
## predict(glm(response~grammaticality,data=grammatical,family = binomial),type="response")
<- grammatical %>%
grammatical mutate(response = factor(response, levels = c("yes", "no")),
grammaticality = factor(grammaticality, levels = c("grammatical", "ungrammatical")))
<- grammatical %>%
mdl.glm.C glm(response ~ grammaticality, data = .,family = binomial)
<- table(grammatical$response, predict(mdl.glm.C, type = "response")>0.5)
tbl.glm colnames(tbl.glm) <- c("grammatical", "ungrammatical")
tbl.glm
grammatical ungrammatical
yes 100 10
no 5 50
::pcc(tbl.glm) PresenceAbsence
::specificity(tbl.glm) PresenceAbsence
::sensitivity(tbl.glm) PresenceAbsence
###etc..
If you look at the results from SDT above, these results are the same as the following
Accuracy: (TP+TN)/Total (0.9090909)
True Positive Rate (or Specificity) TP/(TP+FN) (0.952381)
True Negative Rate (or Sensitivity) TN/(FP+TN) (0.8333333)
The values obtained here match those obtained from SDT. For d prime, the difference stems from the use of the logit variant of the Binomial family. By using a probit variant, one obtains the same values (see here for more details). A probit variant models the z-score differences in the outcome and is evaluated in change in 1-standard unit. This is modelling the change from “ungrammatical” “no” responses into “grammatical” “yes” responses in z-scores. The same conceptual underpinnings of d-prime from Signal Detection Theory.
## d prime
::dprime(TP, FP, FN, TN,
psychon_targets = TP+FN,
n_distractors = FP+TN,
adjust=F)$dprime
[1] 2.635813
## GLM with probit
coef(glm(response ~ grammaticality, data = grammatical, family = binomial(probit)))[2]
grammaticalityungrammatical
2.635813
If your data does not fit a binomial distribution, and is a multinomial (i.e., three or more response categories) or poisson (count data), then you need to use the glm function with a specific family function.
These models work perfectly with rating data. Ratings are inherently ordered, 1, 2, … n, and expect to observe an increase (or decrease) in overall ratings from 1 to n. To demonstrate this, we will use an example using the package “ordinal”. Data were from a rating experiment where six participants rated the percept of nasality in the production of particular consonants in Arabic. The data came from nine producing subjects. The ratings were from 1 to 5. This example can apply to any study, e.g., rating grammaticality of sentences, rating how positive the sentiments are in a article, interview responses, etc.
We start by importing the data and process it. We change the reference level in the predictor
<- read_csv("rating.csv") rating
New names:
* `` -> ...1
Rows: 405 Columns: 6
-- Column specification ---------
Delimiter: ","
chr (4): Context, Subject, It...
dbl (2): ...1, Response
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
rating
<- rating %>%
rating mutate(Response = factor(Response),
Context = factor(Context)) %>%
mutate(Context = relevel(Context, "isolation"))
rating
We run our first clm model as a simple, i.e., with no random effects
<- rating %>%
mdl.clm clm(Response ~ Context, data = .)
summary(mdl.clm)
formula: Response ~ Context
data: .
Coefficients:
Estimate Std. Error
Context3--3 -0.1384 0.5848
Context3-n 3.5876 0.4721
Context3-o -0.4977 0.3859
Context7-n 2.3271 0.5079
Context7-o 0.2904 0.4002
Contextn-3 2.8957 0.6685
Contextn-7 2.2678 0.4978
Contextn-n 2.8697 0.4317
Contextn-o 3.5152 0.4397
Contexto-3 -0.2540 0.4017
Contexto-7 -0.6978 0.3769
Contexto-n 2.9640 0.4159
Contexto-o -0.6147 0.3934
z value Pr(>|z|)
Context3--3 -0.237 0.8130
Context3-n 7.600 2.96e-14 ***
Context3-o -1.290 0.1971
Context7-n 4.582 4.60e-06 ***
Context7-o 0.726 0.4680
Contextn-3 4.331 1.48e-05 ***
Contextn-7 4.556 5.22e-06 ***
Contextn-n 6.647 2.99e-11 ***
Contextn-o 7.994 1.30e-15 ***
Contexto-3 -0.632 0.5272
Contexto-7 -1.851 0.0641 .
Contexto-n 7.126 1.03e-12 ***
Contexto-o -1.562 0.1182
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -1.4615 0.2065 -7.077
2|3 0.4843 0.1824 2.655
3|4 1.5492 0.2044 7.578
4|5 3.1817 0.2632 12.089
We can evaluate whether “Context” improves the model fit, by comparing a null model with our model. Of course “Context” is improving the model fit.
<- rating %>%
mdl.clm.Null clm(Response ~ 1, data = .)
anova(mdl.clm, mdl.clm.Null)
Likelihood ratio tests of cumulative link models:
no.par AIC
mdl.clm.Null 4 1281.1
mdl.clm 17 1086.3
logLik LR.stat df
mdl.clm.Null -636.56
mdl.clm -526.16 220.8 13
Pr(>Chisq)
mdl.clm.Null
mdl.clm < 2.2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
As a way to interpret the model, we can look at the coefficients and make sense of the results. A CLM model is a Logistic model with a cumulative effect. The “Coefficients” are the estimates for each level of the fixed effect; the “Threshold coefficients” are those of the response. For the former, a negative coefficient indicates a negative association with the response; and a positive is positively associated with the response. The p values are indicating the significance of each level. For the “Threshold coefficients”, we can see the cumulative effects of ratings 1|2, 2|3, 3|4 and 4|5 which indicate an overall increase in the ratings from 1 to 5.
We use a modified version of a plotting function that allows us to visualise the effects. For this, we use the base R plotting functions. The version below is without confidence intervals.
par(oma=c(1, 0, 0, 3),mgp=c(2, 1, 0))
= c(min(mdl.clm$beta), max(mdl.clm$beta))
xlimNas = c(0,1)
ylimNas plot(0,0,xlim=xlimNas, ylim=ylimNas, type="n", ylab=expression(Probability), xlab="", xaxt = "n",main="Predicted curves - Nasalisation",cex=2,cex.lab=1.5,cex.main=1.5,cex.axis=1.5)
axis(side = 1, at = c(0,mdl.clm$beta),labels = levels(rating$Context), las=2,cex=2,cex.lab=1.5,cex.axis=1.5)
= seq(xlimNas[1], xlimNas[2], length.out=100)
xsNas lines(xsNas, plogis(mdl.clm$Theta[1] - xsNas), col='black')
lines(xsNas, plogis(mdl.clm$Theta[2] - xsNas)-plogis(mdl.clm$Theta[1] - xsNas), col='red')
lines(xsNas, plogis(mdl.clm$Theta[3] - xsNas)-plogis(mdl.clm$Theta[2] - xsNas), col='green')
lines(xsNas, plogis(mdl.clm$Theta[4] - xsNas)-plogis(mdl.clm$Theta[3] - xsNas), col='orange')
lines(xsNas, 1-(plogis(mdl.clm$Theta[4] - xsNas)), col='blue')
abline(v=c(0,mdl.clm$beta),lty=3)
abline(h=0, lty="dashed")
abline(h=0.2, lty="dashed")
abline(h=0.4, lty="dashed")
abline(h=0.6, lty="dashed")
abline(h=0.8, lty="dashed")
abline(h=1, lty="dashed")
legend(par('usr')[2], par('usr')[4], bty='n', xpd=NA,lty=1, col=c("black", "red", "green", "orange", "blue"),
legend=c("Oral", "2", "3", "4", "Nasal"),cex=0.75)
Here is an attempt to add the 97.5% confidence intervals to these plots. This is an experimantal attempt and any feedback is welcome!
par(oma=c(1, 0, 0, 3),mgp=c(2, 1, 0))
= c(min(mdl.clm$beta), max(mdl.clm$beta))
xlimNas = c(0,1)
ylimNas plot(0,0,xlim=xlimNas, ylim=ylimNas, type="n", ylab=expression(Probability), xlab="", xaxt = "n",main="Predicted curves - Nasalisation",cex=2,cex.lab=1.5,cex.main=1.5,cex.axis=1.5)
axis(side = 1, at = c(0,mdl.clm$beta),labels = levels(rating$Context), las=2,cex=2,cex.lab=1.5,cex.axis=1.5)
= seq(xlimNas[1], xlimNas[2], length.out=100)
xsNas
#+CI
lines(xsNas, plogis(mdl.clm$Theta[1]+(summary(mdl.clm)$coefficient[,2][[1]]/1.96) - xsNas), col='black')
lines(xsNas, plogis(mdl.clm$Theta[2]+(summary(mdl.clm)$coefficient[,2][[2]]/1.96) - xsNas)-plogis(mdl.clm$Theta[1]+(summary(mdl.clm)$coefficient[,2][[1]]/1.96) - xsNas), col='red')
lines(xsNas, plogis(mdl.clm$Theta[3]+(summary(mdl.clm)$coefficient[,2][[3]]/1.96) - xsNas)-plogis(mdl.clm$Theta[2]+(summary(mdl.clm)$coefficient[,2][[2]]/1.96) - xsNas), col='green')
lines(xsNas, plogis(mdl.clm$Theta[4]+(summary(mdl.clm)$coefficient[,2][[4]]/1.96) - xsNas)-plogis(mdl.clm$Theta[3]+(summary(mdl.clm)$coefficient[,2][[3]]/1.96) - xsNas), col='orange')
lines(xsNas, 1-(plogis(mdl.clm$Theta[4]+(summary(mdl.clm)$coefficient[,2][[4]]/1.96) - xsNas)), col='blue')
#-CI
lines(xsNas, plogis(mdl.clm$Theta[1]-(summary(mdl.clm)$coefficient[,2][[1]]/1.96) - xsNas), col='black')
lines(xsNas, plogis(mdl.clm$Theta[2]-(summary(mdl.clm)$coefficient[,2][[2]]/1.96) - xsNas)-plogis(mdl.clm$Theta[1]-(summary(mdl.clm)$coefficient[,2][[1]]/1.96) - xsNas), col='red')
lines(xsNas, plogis(mdl.clm$Theta[3]-(summary(mdl.clm)$coefficient[,2][[3]]/1.96) - xsNas)-plogis(mdl.clm$Theta[2]-(summary(mdl.clm)$coefficient[,2][[2]]/1.96) - xsNas), col='green')
lines(xsNas, plogis(mdl.clm$Theta[4]-(summary(mdl.clm)$coefficient[,2][[4]]/1.96) - xsNas)-plogis(mdl.clm$Theta[3]-(summary(mdl.clm)$coefficient[,2][[3]]/1.96) - xsNas), col='orange')
lines(xsNas, 1-(plogis(mdl.clm$Theta[4]-(summary(mdl.clm)$coefficient[,2][[4]]/1.96) - xsNas)), col='blue')
# fill area around CI using c(x, rev(x)), c(y2, rev(y1))
polygon(c(xsNas, rev(xsNas)),
c(plogis(mdl.clm$Theta[1]+(summary(mdl.clm)$coefficient[,2][[1]]/1.96) - xsNas), rev(plogis(mdl.clm$Theta[1]-(summary(mdl.clm)$coefficient[,2][[1]]/1.96) - xsNas))), col = "gray90")
polygon(c(xsNas, rev(xsNas)),
c(plogis(mdl.clm$Theta[2]+(summary(mdl.clm)$coefficient[,2][[2]]/1.96) - xsNas)-plogis(mdl.clm$Theta[1]+(summary(mdl.clm)$coefficient[,2][[1]]/1.96) - xsNas), rev(plogis(mdl.clm$Theta[2]-(summary(mdl.clm)$coefficient[,2][[2]]/1.96) - xsNas)-plogis(mdl.clm$Theta[1]-(summary(mdl.clm)$coefficient[,2][[1]]/1.96) - xsNas))), col = "gray90")
polygon(c(xsNas, rev(xsNas)),
c(plogis(mdl.clm$Theta[3]+(summary(mdl.clm)$coefficient[,2][[3]]/1.96) - xsNas)-plogis(mdl.clm$Theta[2]+(summary(mdl.clm)$coefficient[,2][[2]]/1.96) - xsNas), rev(plogis(mdl.clm$Theta[3]-(summary(mdl.clm)$coefficient[,2][[3]]/1.96) - xsNas)-plogis(mdl.clm$Theta[2]-(summary(mdl.clm)$coefficient[,2][[2]]/1.96) - xsNas))), col = "gray90")
polygon(c(xsNas, rev(xsNas)),
c(plogis(mdl.clm$Theta[4]+(summary(mdl.clm)$coefficient[,2][[4]]/1.96) - xsNas)-plogis(mdl.clm$Theta[3]+(summary(mdl.clm)$coefficient[,2][[3]]/1.96) - xsNas), rev(plogis(mdl.clm$Theta[4]-(summary(mdl.clm)$coefficient[,2][[4]]/1.96) - xsNas)-plogis(mdl.clm$Theta[3]-(summary(mdl.clm)$coefficient[,2][[3]]/1.96) - xsNas))), col = "gray90")
polygon(c(xsNas, rev(xsNas)),
c(1-(plogis(mdl.clm$Theta[4]-(summary(mdl.clm)$coefficient[,2][[4]]/1.96) - xsNas)), rev(1-(plogis(mdl.clm$Theta[4]+(summary(mdl.clm)$coefficient[,2][[4]]/1.96) - xsNas)))), col = "gray90")
lines(xsNas, plogis(mdl.clm$Theta[1] - xsNas), col='black')
lines(xsNas, plogis(mdl.clm$Theta[2] - xsNas)-plogis(mdl.clm$Theta[1] - xsNas), col='red')
lines(xsNas, plogis(mdl.clm$Theta[3] - xsNas)-plogis(mdl.clm$Theta[2] - xsNas), col='green')
lines(xsNas, plogis(mdl.clm$Theta[4] - xsNas)-plogis(mdl.clm$Theta[3] - xsNas), col='orange')
lines(xsNas, 1-(plogis(mdl.clm$Theta[4] - xsNas)), col='blue')
abline(v=c(0,mdl.clm$beta),lty=3)
abline(h=0, lty="dashed")
abline(h=0.2, lty="dashed")
abline(h=0.4, lty="dashed")
abline(h=0.6, lty="dashed")
abline(h=0.8, lty="dashed")
abline(h=1, lty="dashed")
legend(par('usr')[2], par('usr')[4], bty='n', xpd=NA,lty=1, col=c("black", "red", "green", "orange", "blue"),
legend=c("Oral", "2", "3", "4", "Nasal"),cex=0.75)
Let’s generate a new dataframe that we will use later on for our mixed models
## Courtesy of Bodo Winter
set.seed(666)
#we create 6 subjects
<- paste0('S', 1:6)
subjects #here we add repetitions within speakers
<- rep(subjects, each = 20)
subjects <- paste0('Item', 1:20)
items #below repeats
<- rep(items, 6)
items #below is to generate random numbers that are log values
<- round(rexp(20)*5, 2)
logFreq #below we are repeating the logFreq 6 times to fit with the number of speakers and items
<- rep(logFreq, 6)
logFreq <- data.frame(subjects, items, logFreq)
xdf #below removes the individual variables we had created because they are already in the dataframe
rm(subjects, items, logFreq)
$Intercept <- 300
xdf<- rep(rnorm(6, sd = 40), 20)
submeans #sort make the means for each subject is the same...
<- sort(submeans)
submeans $submeans <- submeans
xdf#we create the same thing for items... we allow the items mean to vary between words...
<- rep(rnorm(20, sd = 20), 6)
itsmeans $itsmeans <- itsmeans
xdf$error <- rnorm(120, sd = 20)
xdf#here we create an effect column,
#here for each logFreq, we have a decrease of -5 of that particular logFreq
$effect <- -5 * xdf$logFreq
xdf
$dur <- xdf$Intercept + xdf$submeans + xdf$itsmeans + xdf$error + xdf$effect
xdf#below is to subset the data and get only a few columns.. the -c(4:8) removes the columns 4 to 8..
<- xdf[,-c(4:8)]
xreal head(xreal)
rm(xdf, submeans, itsmeans)
Let’s start by doing a correlation test and plotting the data. Our results show that there is a negative correlation between duration and LogFrequency, and the plot shows this decrease.
<- as.matrix(xreal[-c(1:2)]) %>%
corrMixed rcorr(type="pearson")
print(corrMixed)
logFreq dur
logFreq 1.00 -0.54
dur -0.54 1.00
n= 120
P
logFreq dur
logFreq 0
dur 0
corrplot(corrMixed$r, method = "circle", type = "upper", tl.srt = 45,
addCoef.col = "black", diag = FALSE,
p.mat = corrMixed$p, sig.level = 0.05)
<- xreal %>%
ggplot.xreal ggplot(aes(x = logFreq, y = dur)) +
geom_point()+ theme_bw(base_size = 20) +
labs(y = "Duration", x = "Frequency (Log)") +
geom_smooth(method = lm, se=F)
ggplot.xreal
`geom_smooth()` using formula 'y ~ x'
Let’s run a simple linear model on the data. As we can see below, there are some issues with the “simple” linear model: we had set our SD for subjects to be 40, but this was picked up as 120 (see histogram of residuals). The QQ Plot is not “normal”.
<- xreal %>%
mdl.lm.xreal lm(dur ~ logFreq, data = .)
summary(mdl.lm.xreal)
Call:
lm(formula = dur ~ logFreq, data = .)
Residuals:
Min 1Q Median 3Q
-94.322 -35.465 -4.364 33.020
Max
123.955
Coefficients:
Estimate Std. Error
(Intercept) 337.9730 6.2494
logFreq -5.4601 0.7846
t value Pr(>|t|)
(Intercept) 54.081 < 2e-16 ***
logFreq -6.959 2.06e-10 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 48.29 on 118 degrees of freedom
Multiple R-squared: 0.291, Adjusted R-squared: 0.285
F-statistic: 48.43 on 1 and 118 DF, p-value: 2.057e-10
hist(residuals(mdl.lm.xreal))
qqnorm(residuals(mdl.lm.xreal)); qqline(residuals(mdl.lm.xreal))
plot(fitted(mdl.lm.xreal), residuals(mdl.lm.xreal), cex = 4)
Our Linear Mixed effects Model will take into account the random effects we added and also our model specifications. We use a Maximum Likelihood estimate (REML = FALSE) as this is what we will use for model comparison. The Linear Mixed Model is reflecting our model specifications The SD of our subjects is picked up correctly. The model results are “almost” the same as our linear model above. The coefficient for the “Intercept” is at 337.973 and the coefficient for LogFrequency is at -5.460. This indicates that for each unit of increase in the LogFrequency, there is a decrease by 5.460 (ms).
<- xreal %>%
mdl.lmer.xreal lmer(dur ~ logFreq +(1|subjects) + (1|items), data = ., REML = FALSE)
summary(mdl.lmer.xreal)
Linear mixed model fit by
maximum likelihood . t-tests
use Satterthwaite's method [
lmerModLmerTest]
Formula:
dur ~ logFreq + (1 | subjects) + (1 | items)
Data: .
AIC BIC logLik
1105.8 1119.8 -547.9
deviance df.resid
1095.8 115
Scaled residuals:
Min 1Q Median
-2.06735 -0.60675 0.07184
3Q Max
0.61122 2.39854
Random effects:
Groups Name Variance
items (Intercept) 589.8
subjects (Intercept) 1471.7
Residual 284.0
Std.Dev.
24.29
38.36
16.85
Number of obs: 120, groups:
items, 20; subjects, 6
Fixed effects:
Estimate Std. Error
(Intercept) 337.973 17.587
logFreq -5.460 1.004
df t value
(Intercept) 9.126 19.218
logFreq 19.215 -5.436
Pr(>|t|)
(Intercept) 1.08e-08 ***
logFreq 2.92e-05 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
logFreq -0.322
hist(residuals(mdl.lmer.xreal))
qqnorm(residuals(mdl.lmer.xreal)); qqline(residuals(mdl.lmer.xreal))
plot(fitted(mdl.lmer.xreal), residuals(mdl.lmer.xreal), cex = 4)
This second model add a by-subject random slope. Random slopes allow for the variation that exists in the random effects to be taken into account. An intercept only model provides an averaged values to our participants.
.2 <- xreal %>%
mdl.lmer.xreallmer(dur ~ logFreq + (logFreq|subjects) + (1|items), data = ., REML = FALSE)
boundary (singular) fit: see ?isSingular
summary(mdl.lmer.xreal.2)
Linear mixed model fit by
maximum likelihood . t-tests
use Satterthwaite's method [
lmerModLmerTest]
Formula:
dur ~ logFreq + (logFreq | subjects) + (1 | items)
Data: .
AIC BIC logLik
1109.5 1129.0 -547.7
deviance df.resid
1095.5 113
Scaled residuals:
Min 1Q Median 3Q
-2.1087 -0.6067 0.0623 0.5828
Max
2.4564
Random effects:
Groups Name Variance
items (Intercept) 5.897e+02
subjects (Intercept) 1.400e+03
logFreq 2.902e-02
Residual 2.829e+02
Std.Dev. Corr
24.2838
37.4229
0.1704 1.00
16.8196
Number of obs: 120, groups:
items, 20; subjects, 6
Fixed effects:
Estimate Std. Error
(Intercept) 337.973 17.245
logFreq -5.460 1.007
df t value
(Intercept) 9.093 19.598
logFreq 19.361 -5.424
Pr(>|t|)
(Intercept) 9.50e-09 ***
logFreq 2.92e-05 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
logFreq -0.267
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
hist(residuals(mdl.lmer.xreal.2))
qqnorm(residuals(mdl.lmer.xreal.2)); qqline(residuals(mdl.lmer.xreal.2))
plot(fitted(mdl.lmer.xreal.2), residuals(mdl.lmer.xreal.2), cex = 4)
But where are our p values? The lme4 developers decided not to include p values due to various issues with estimating df. What we can do instead is to compare models. We need to create a null model to allow for significance testing. As expected our predictor is significantly contributing to the difference.
<- xreal %>%
mdl.lmer.xreal.Null lmer(dur ~ 1 + (logFreq|subjects) + (1|items), data = ., REML = FALSE)
boundary (singular) fit: see ?isSingular
anova(mdl.lmer.xreal.Null, mdl.lmer.xreal.2)
Data: .
Models:
mdl.lmer.xreal.Null: dur ~ 1 + (logFreq | subjects) + (1 | items)
mdl.lmer.xreal.2: dur ~ logFreq + (logFreq | subjects) + (1 | items)
npar AIC
mdl.lmer.xreal.Null 6 1125.4
mdl.lmer.xreal.2 7 1109.5
BIC
mdl.lmer.xreal.Null 1142.1
mdl.lmer.xreal.2 1129.0
logLik
mdl.lmer.xreal.Null -556.68
mdl.lmer.xreal.2 -547.73
deviance
mdl.lmer.xreal.Null 1113.4
mdl.lmer.xreal.2 1095.5
Chisq Df
mdl.lmer.xreal.Null
mdl.lmer.xreal.2 17.892 1
Pr(>Chisq)
mdl.lmer.xreal.Null
mdl.lmer.xreal.2 2.339e-05
mdl.lmer.xreal.Null
mdl.lmer.xreal.2 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
Also, do we really need random slopes? From the result below, we don’t seem to need random slopes at all, given that adding random slopes does not improve the model fit. I always recommend testing this. Most of the time I keep random slopes.
anova(mdl.lmer.xreal, mdl.lmer.xreal.2)
Data: .
Models:
mdl.lmer.xreal: dur ~ logFreq + (1 | subjects) + (1 | items)
mdl.lmer.xreal.2: dur ~ logFreq + (logFreq | subjects) + (1 | items)
npar AIC
mdl.lmer.xreal 5 1105.8
mdl.lmer.xreal.2 7 1109.5
BIC logLik
mdl.lmer.xreal 1119.8 -547.92
mdl.lmer.xreal.2 1129.0 -547.73
deviance Chisq
mdl.lmer.xreal 1095.8
mdl.lmer.xreal.2 1095.5 0.3788
Df Pr(>Chisq)
mdl.lmer.xreal
mdl.lmer.xreal.2 2 0.8274
But if you are really (really!!!) obsessed by p values, then you can also use lmerTest. BUT use after comparing models to evaluate contribution of predictors
<- xreal %>%
mdl.lmer.xreal.lmerTest lmer(dur ~ logFreq + (logFreq|subjects) + (1|items), data = ., REML = TRUE)
boundary (singular) fit: see ?isSingular
summary(mdl.lmer.xreal.lmerTest)
Linear mixed model fit by
REML. t-tests use
Satterthwaite's method [
lmerModLmerTest]
Formula:
dur ~ logFreq + (logFreq | subjects) + (1 | items)
Data: .
REML criterion at convergence:
1086.1
Scaled residuals:
Min 1Q Median
-2.09691 -0.60118 0.06418
3Q Max
0.58483 2.46245
Random effects:
Groups Name Variance
items (Intercept) 629.5679
subjects (Intercept) 1651.2357
logFreq 0.0342
Residual 282.8593
Std.Dev. Corr
25.0912
40.6354
0.1849 1.00
16.8184
Number of obs: 120, groups:
items, 20; subjects, 6
Fixed effects:
Estimate Std. Error
(Intercept) 337.973 18.526
logFreq -5.460 1.038
df t value
(Intercept) 7.396 18.24
logFreq 18.136 -5.26
Pr(>|t|)
(Intercept) 2.03e-07 ***
logFreq 5.18e-05 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr)
logFreq -0.250
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
detach("package:lmerTest", unload = TRUE)
Our final model uses REML (or Restricted Maximum Likelihood Estimate of Variance Component) to estimate the model.
<- xreal %>%
mdl.lmer.xreal.Full lmer(dur ~ logFreq + (logFreq|subjects) + (1|items), data = ., REML = TRUE)
boundary (singular) fit: see ?isSingular
summary(mdl.lmer.xreal.Full)
Linear mixed model fit by REML [
lmerMod]
Formula:
dur ~ logFreq + (logFreq | subjects) + (1 | items)
Data: .
REML criterion at convergence:
1086.1
Scaled residuals:
Min 1Q Median
-2.09691 -0.60118 0.06418
3Q Max
0.58483 2.46245
Random effects:
Groups Name Variance
items (Intercept) 629.5679
subjects (Intercept) 1651.2357
logFreq 0.0342
Residual 282.8593
Std.Dev. Corr
25.0912
40.6354
0.1849 1.00
16.8184
Number of obs: 120, groups:
items, 20; subjects, 6
Fixed effects:
Estimate Std. Error
(Intercept) 337.973 18.526
logFreq -5.460 1.038
t value
(Intercept) 18.24
logFreq -5.26
Correlation of Fixed Effects:
(Intr)
logFreq -0.250
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see ?isSingular
anova(mdl.lmer.xreal.Full)
Analysis of Variance Table
npar Sum Sq Mean Sq
logFreq 1 7826.9 7826.9
F value
logFreq 27.671
hist(residuals(mdl.lmer.xreal.Full))
qqnorm(residuals(mdl.lmer.xreal.Full)); qqline(residuals(mdl.lmer.xreal.Full))
plot(fitted(mdl.lmer.xreal.Full), residuals(mdl.lmer.xreal.Full), cex = 4)
coef(mdl.lmer.xreal.Full)
$items
(Intercept) logFreq
Item1 352.3567 -5.460115
Item10 331.7618 -5.460115
Item11 324.7269 -5.460115
Item12 350.2318 -5.460115
Item13 353.1174 -5.460115
Item14 311.8355 -5.460115
Item15 354.0591 -5.460115
Item16 353.9389 -5.460115
Item17 288.7843 -5.460115
Item18 362.4702 -5.460115
Item19 338.1424 -5.460115
Item2 325.1855 -5.460115
Item20 359.7414 -5.460115
Item3 370.1804 -5.460115
Item4 302.4265 -5.460115
Item5 350.0499 -5.460115
Item6 338.9482 -5.460115
Item7 362.8402 -5.460115
Item8 295.5943 -5.460115
Item9 333.0693 -5.460115
$subjects
(Intercept) logFreq
S1 314.4694 -5.567073
S2 303.9037 -5.615155
S3 314.2920 -5.567881
S4 318.4282 -5.549058
S5 373.3006 -5.299350
S6 403.4443 -5.162175
attr(,"class")
[1] "coef.mer"
fixef(mdl.lmer.xreal.Full)
(Intercept) logFreq
337.973044 -5.460115
fixef(mdl.lmer.xreal.Full)[1]
(Intercept)
337.973
fixef(mdl.lmer.xreal.Full)[2]
logFreq
-5.460115
coef(mdl.lmer.xreal.Full)$`subjects`[1]
coef(mdl.lmer.xreal.Full)$`subjects`[2]
coef(mdl.lmer.xreal.Full)$`items`[1]
coef(mdl.lmer.xreal.Full)$`items`[2]
NA
In general, I use the prediction from my final model in any plots. To generate this, we can use the following
<- xreal %>%
xreal mutate(Pred_Dur = predict(mdl.lmer.xreal.Full))
%>%
xreal ggplot(aes(x = logFreq, y = Pred_Dur)) +
geom_point() + theme_bw(base_size = 20) +
labs(y = "Duration", x = "Frequency (Log)", title = "Predicted") +
geom_smooth(method = lm, se = F) + coord_cartesian(ylim = c(200,450))
`geom_smooth()` using formula 'y ~ x'
## original plot
%>%
xreal ggplot(aes(x = logFreq , y = dur)) +
geom_point() + theme_bw(base_size = 20)+
labs(y = "Duration", x = "Frequency (Log)", title = "Original")+
geom_smooth(method = lm, se = F) + coord_cartesian(ylim = c(200,450))
`geom_smooth()` using formula 'y ~ x'
The code above was using a Linear Mixed Effects Modelling. The outcome was a numeric object. In some cases (as we have seen above), we may have:
The code below gives you an idea of how to specify these models
## Binomial family
## lme4::glmer(outcome~predictor(s)+(1|subject)+(1|items)..., data=data, family=binomial)
## Poisson family
## lme4::glmer(outcome~predictor(s)+(1|subject)+(1|items)..., data=data, family=poisson)
## Multinomial family
## a bit complicated as there is a need to use Bayesian approaches, see e.g.,
## glmmADMB
## mixcat
## MCMCglmm
## see https://gist.github.com/casallas/8263818
## Rating data, use following
## ordinal::clmm(outcome~predictor(s)+(1|subject)+(1|items)..., data=data)
## Remember to test for random effects and whether slopes are needed.
<- read_csv("dfPharV2.csv") dfPharV2
Rows: 402 Columns: 24
-- Column specification ---------
Delimiter: ","
chr (1): context
dbl (23): CPP, Energy, H1A1c,...
i Use `spec()` to retrieve the full column specification for this data.
i Specify the column types or set `show_col_types = FALSE` to quiet this message.
dfPharV2
<- dfPharV2 %>%
dfPharV2 mutate(context = factor(context, levels = c("Non-Guttural", "Guttural")))
We use the package FactoMineR
to run our PCA. We use all acoustic measures as predictors and our qualitative variable as the context
.
<- PCA(dfPharV2,
pcaDat1 quali.sup = 1, graph = TRUE,
scale.unit = TRUE, ncp = 5)
Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider increasing max.overlaps
Based on the summary of results, we observe that the first 6 dimensions account 64% of the variance in the data; each contribute individually to more than 5% of the variance.
summary(pcaDat1)
Call:
PCA(X = dfPharV2, scale.unit = TRUE, ncp = 5, quali.sup = 1,
graph = TRUE)
Eigenvalues
Dim.1
Variance 8.660
% of var. 37.652
Cumulative % of var. 37.652
Dim.2
Variance 5.868
% of var. 25.512
Cumulative % of var. 63.164
Dim.3
Variance 2.625
% of var. 11.412
Cumulative % of var. 74.577
Dim.4
Variance 2.017
% of var. 8.769
Cumulative % of var. 83.346
Dim.5
Variance 1.143
% of var. 4.972
Cumulative % of var. 88.317
Dim.6
Variance 0.749
% of var. 3.259
Cumulative % of var. 91.576
Dim.7
Variance 0.549
% of var. 2.388
Cumulative % of var. 93.964
Dim.8
Variance 0.419
% of var. 1.820
Cumulative % of var. 95.784
Dim.9
Variance 0.284
% of var. 1.235
Cumulative % of var. 97.019
Dim.10
Variance 0.220
% of var. 0.957
Cumulative % of var. 97.976
Dim.11
Variance 0.141
% of var. 0.612
Cumulative % of var. 98.588
Dim.12
Variance 0.110
% of var. 0.479
Cumulative % of var. 99.067
Dim.13
Variance 0.087
% of var. 0.377
Cumulative % of var. 99.444
Dim.14
Variance 0.057
% of var. 0.246
Cumulative % of var. 99.690
Dim.15
Variance 0.032
% of var. 0.138
Cumulative % of var. 99.828
Dim.16
Variance 0.015
% of var. 0.066
Cumulative % of var. 99.894
Dim.17
Variance 0.012
% of var. 0.054
Cumulative % of var. 99.948
Dim.18
Variance 0.008
% of var. 0.036
Cumulative % of var. 99.984
Dim.19
Variance 0.002
% of var. 0.010
Cumulative % of var. 99.994
Dim.20
Variance 0.001
% of var. 0.006
Cumulative % of var. 100.000
Dim.21
Variance 0.000
% of var. 0.000
Cumulative % of var. 100.000
Dim.22
Variance 0.000
% of var. 0.000
Cumulative % of var. 100.000
Dim.23
Variance 0.000
% of var. 0.000
Cumulative % of var. 100.000
Individuals (the 10 first)
Dist Dim.1
1 | 4.230 | -3.646
2 | 5.604 | -4.717
3 | 6.376 | -5.102
4 | 5.258 | -4.034
5 | 3.640 | -3.132
6 | 5.729 | -4.601
7 | 6.871 | -4.756
8 | 5.728 | -3.279
9 | 6.642 | -5.052
10 | 6.294 | -3.773
ctr cos2
1 0.382 0.743 |
2 0.639 0.709 |
3 0.748 0.640 |
4 0.467 0.589 |
5 0.282 0.740 |
6 0.608 0.645 |
7 0.650 0.479 |
8 0.309 0.328 |
9 0.733 0.578 |
10 0.409 0.359 |
Dim.2 ctr
1 -0.047 0.000
2 0.420 0.007
3 0.843 0.030
4 1.004 0.043
5 0.687 0.020
6 0.274 0.003
7 -2.241 0.213
8 -1.258 0.067
9 -1.599 0.108
10 -3.037 0.391
cos2 Dim.3
1 0.000 | 1.467
2 0.006 | 1.767
3 0.017 | 1.056
4 0.036 | 1.171
5 0.036 | 1.400
6 0.002 | 0.740
7 0.106 | 2.535
8 0.048 | 2.704
9 0.058 | 1.378
10 0.233 | 1.921
ctr cos2
1 0.204 0.120 |
2 0.296 0.099 |
3 0.106 0.027 |
4 0.130 0.050 |
5 0.186 0.148 |
6 0.052 0.017 |
7 0.609 0.136 |
8 0.693 0.223 |
9 0.180 0.043 |
10 0.350 0.093 |
Variables (the 10 first)
Dim.1 ctr
CPP | 0.524 3.166
Energy | -0.571 3.768
H1A1c | 0.671 5.198
H1A2c | -0.114 0.150
H1A3c | -0.237 0.647
H1H2c | 0.866 8.653
H2H4c | -0.528 3.221
H2KH5Kc | -0.116 0.157
H42Kc | -0.380 1.668
HNR05 | 0.911 9.579
cos2 Dim.2
CPP 0.274 | 0.236
Energy 0.326 | -0.127
H1A1c 0.450 | 0.045
H1A2c 0.013 | -0.874
H1A3c 0.056 | 0.591
H1H2c 0.749 | -0.159
H2H4c 0.279 | 0.116
H2KH5Kc 0.014 | -0.709
H42Kc 0.144 | 0.809
HNR05 0.830 | 0.044
ctr cos2
CPP 0.945 0.055 |
Energy 0.274 0.016 |
H1A1c 0.034 0.002 |
H1A2c 13.012 0.764 |
H1A3c 5.946 0.349 |
H1H2c 0.432 0.025 |
H2H4c 0.229 0.013 |
H2KH5Kc 8.569 0.503 |
H42Kc 11.158 0.655 |
HNR05 0.033 0.002 |
Dim.3 ctr
CPP 0.312 3.699
Energy 0.559 11.905
H1A1c 0.610 14.177
H1A2c 0.149 0.844
H1A3c 0.000 0.000
H1H2c 0.080 0.246
H2H4c 0.127 0.612
H2KH5Kc -0.589 13.223
H42Kc 0.209 1.660
HNR05 0.297 3.359
cos2
CPP 0.097 |
Energy 0.312 |
H1A1c 0.372 |
H1A2c 0.022 |
H1A3c 0.000 |
H1H2c 0.006 |
H2H4c 0.016 |
H2KH5Kc 0.347 |
H42Kc 0.044 |
HNR05 0.088 |
Supplementary categories
Dist Dim.1
Non-Guttural | 0.713 | 0.116
Guttural | 0.880 | -0.143
cos2 v.test
Non-Guttural 0.026 0.875 |
Guttural 0.026 -0.875 |
Dim.2 cos2
Non-Guttural 0.561 0.618
Guttural -0.691 0.618
v.test Dim.3
Non-Guttural 5.147 | -0.207
Guttural -5.147 | 0.256
cos2 v.test
Non-Guttural 0.085 -2.846 |
Guttural 0.085 2.846 |
Below, we look at the contributions of the main 5 dimensions.
dimdesc(pcaDat1, axes = 1:5, proba = 0.05)
$Dim.1
$quanti
correlation
F0Bark 0.9471236
HNR25 0.9346079
soe 0.9284685
HNR35 0.9248084
HNR05 0.9108131
HNR15 0.8954107
H1H2c 0.8656253
H1A1c 0.6709032
CPP 0.5236515
H1A2c -0.1137865
H2KH5Kc -0.1164455
H1A3c -0.2366755
H42Kc -0.3800759
Z1mnZ0 -0.4268127
A1mnA2 -0.4963663
H2H4c -0.5281247
Energy -0.5712695
A1mnA3 -0.6262636
SHR -0.6393517
p.value
F0Bark 1.429483e-199
HNR25 1.131178e-181
soe 3.755961e-174
HNR35 5.569479e-170
HNR05 8.807770e-156
HNR15 1.220387e-142
H1H2c 3.099776e-122
H1A1c 6.724616e-54
CPP 1.098679e-29
H1A2c 2.250563e-02
H2KH5Kc 1.952233e-02
H1A3c 1.595378e-06
H42Kc 2.903273e-15
Z1mnZ0 3.150556e-19
A1mnA2 2.148508e-26
H2H4c 2.967330e-30
Energy 3.377633e-36
A1mnA3 3.576749e-45
SHR 1.394910e-47
attr(,"class")
[1] "condes" "list"
$Dim.2
$quanti
correlation
A2mnA3 0.9488972
Z2mnZ1 0.8611635
H42Kc 0.8091541
H1A3c 0.5906500
A1mnA3 0.4113725
CPP 0.2355070
HNR15 0.2195716
Z4mnZ3 0.1585383
H2H4c 0.1158268
Energy -0.1267753
Z1mnZ0 -0.1396388
SHR -0.1585845
H1H2c -0.1591314
HNR35 -0.1742803
H2KH5Kc -0.7090710
A1mnA2 -0.7561523
H1A2c -0.8738035
Z3mnZ2 -0.9727026
p.value
A2mnA3 1.862448e-202
Z2mnZ1 1.326843e-119
H42Kc 2.107251e-94
H1A3c 3.640784e-39
A1mnA3 7.545605e-18
CPP 1.801379e-06
HNR15 8.870306e-06
Z4mnZ3 1.427779e-03
H2H4c 2.018384e-02
Energy 1.095302e-02
Z1mnZ0 5.035009e-03
SHR 1.423139e-03
H1H2c 1.369225e-03
HNR35 4.475517e-04
H2KH5Kc 1.143697e-62
A1mnA2 1.141274e-75
H1A2c 2.589277e-127
Z3mnZ2 7.067910e-256
$quali
R2 p.value
context 0.06606739 1.736324e-07
$category
Estimate
context=Non-Guttural 0.6260556
context=Guttural -0.6260556
p.value
context=Non-Guttural 1.736324e-07
context=Guttural 1.736324e-07
attr(,"class")
[1] "condes" "list"
$Dim.3
$quanti
correlation
Z1mnZ0 0.8535706
H1A1c 0.6100189
Energy 0.5590167
CPP 0.3115902
HNR05 0.2969497
H42Kc 0.2087229
F0Bark 0.1702103
Z4mnZ3 0.1495364
H1A2c 0.1488647
H2H4c 0.1267249
A2mnA3 -0.1051933
HNR35 -0.2151263
HNR15 -0.2372089
A1mnA2 -0.2407206
HNR25 -0.2492222
Z2mnZ1 -0.3697651
A1mnA3 -0.4089538
H2KH5Kc -0.5891288
p.value
Z1mnZ0 2.491707e-115
H1A1c 2.445703e-42
Energy 2.020458e-34
CPP 1.684891e-10
HNR05 1.256987e-09
H42Kc 2.460412e-05
F0Bark 6.100220e-04
Z4mnZ3 2.649038e-03
H1A2c 2.770444e-03
H2H4c 1.098501e-02
A2mnA3 3.499609e-02
HNR35 1.355901e-05
HNR15 1.509029e-06
A1mnA2 1.042740e-06
HNR25 4.161621e-07
Z2mnZ1 1.802391e-14
A1mnA3 1.222863e-17
H2KH5Kc 6.329691e-39
$quali
R2 p.value
context 0.02019995 0.004300124
$category
Estimate
context=Guttural 0.2315316
context=Non-Guttural -0.2315316
p.value
context=Guttural 0.004300124
context=Non-Guttural 0.004300124
attr(,"class")
[1] "condes" "list"
$Dim.4
$quanti
correlation p.value
Z4mnZ3 0.8760843 8.580651e-129
H1A3c 0.7256396 6.212261e-67
A1mnA3 0.4631838 9.034769e-23
H1A2c 0.2978370 1.116376e-09
SHR 0.2447883 6.747376e-07
A2mnA3 0.1991310 5.808302e-05
A1mnA2 0.1779271 3.371233e-04
H1H2c 0.1679119 7.244124e-04
HNR35 0.1435091 3.934983e-03
HNR05 0.1197102 1.633473e-02
H1A1c 0.1181063 1.783878e-02
Z1mnZ0 0.1123235 2.430980e-02
Z3mnZ2 0.1060653 3.350479e-02
F0Bark 0.1056122 3.427281e-02
CPP -0.1843072 2.026259e-04
Energy -0.2139211 1.518911e-05
Z2mnZ1 -0.2771545 1.597946e-08
$quali
R2 p.value
context 0.056498 1.434538e-06
$category
Estimate
context=Non-Guttural 0.3394259
context=Guttural -0.3394259
p.value
context=Non-Guttural 1.434538e-06
context=Guttural 1.434538e-06
attr(,"class")
[1] "condes" "list"
$Dim.5
$quanti
correlation p.value
H2H4c 0.67917208 1.098695e-55
CPP 0.58268792 6.373444e-38
HNR15 0.24837172 4.569075e-07
HNR25 0.18018109 2.821810e-04
HNR05 0.16392456 9.710453e-04
HNR35 0.15076346 2.439664e-03
SHR 0.09962685 4.590553e-02
F0Bark -0.13227856 7.916814e-03
H42Kc -0.15107672 2.388688e-03
H1H2c -0.33610840 4.506374e-12
attr(,"class")
[1] "condes" "list"
$call
$call$num.var
[1] 1
$call$proba
[1] 0.05
$call$weights
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[15] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[29] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[43] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[57] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[85] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[99] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[113] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[127] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[141] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[155] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[169] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[183] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[197] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[211] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[225] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[239] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[253] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[267] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[281] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[295] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[309] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[323] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[337] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[351] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[365] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[379] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[393] 1 1 1 1 1 1 1 1 1 1
$call$X
NANA
We look next at the contribution of the top 10 predictors on each of the 6 dimensions
fviz_contrib(pcaDat1, choice = "var", axes = 1, top = 10)
fviz_contrib(pcaDat1, choice = "var", axes = 2, top = 10)
fviz_contrib(pcaDat1, choice = "var", axes = 3, top = 10)
fviz_contrib(pcaDat1, choice = "var", axes = 4, top = 10)
fviz_contrib(pcaDat1, choice = "var", axes = 5, top = 10)
fviz_pca_ind(pcaDat1, col.ind = "cos2",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping (slow if many points)
)
Warning: ggrepel: 397 unlabeled data points (too many overlaps). Consider increasing max.overlaps
fviz_pca_biplot(pcaDat1, repel = TRUE, habillage = dfPharV2$context, addEllipses = TRUE, title = "MSA - Biplot")
Warning: ggrepel: 401 unlabeled data points (too many overlaps). Consider increasing max.overlaps
Warning: ggrepel: 6 unlabeled data points (too many overlaps). Consider increasing max.overlaps
fviz_pca_biplot(pcaDat1, axes = c(3, 4), repel = TRUE, habillage = dfPharV2$context, addEllipses = TRUE, title = "MCA - Biplot")
Warning: ggrepel: 397 unlabeled data points (too many overlaps). Consider increasing max.overlaps
Warning: ggrepel: 17 unlabeled data points (too many overlaps). Consider increasing max.overlaps
fviz_pca_ind(pcaDat1,
label = "none", # hide individual labels
habillage = dfPharV2$context, # color by groups
addEllipses = TRUE # Concentration ellipses
)
<- pcaDat1$quali.sup$coord[1:2,0]
coord coord
Non-Guttural
Guttural
#
with(pcaDat1, {
<- scatterplot3d(pcaDat1$quali.sup$coord[,1], pcaDat1$quali.sup$coord[,2], pcaDat1$quali.sup$coord[,3], # x y and z axis
s3d color=c("blue", "red"), pch=19, # filled blue and red circles
type="h", # vertical lines to the x-y plane
main="PCA 3-D Scatterplot",
xlab="Dim1(37.7%)",
ylab="",
zlab="Dim3(11.4%)",
#xlim = c(-1.5, 1.5), ylim = c(-1.5, 1.5), zlim = c(-0.8, 0.8)
)<- s3d$xyz.convert(pcaDat1$quali.sup$coord[,1], pcaDat1$quali.sup$coord[,2], pcaDat1$quali.sup$coord[,3]) # convert 3D coords to 2D projection
s3d.coords text(s3d.coords$x, s3d.coords$y, # x and y coordinates
labels=row.names(coord), col = c("blue", "red"), # text to plot
cex=1, pos=4) # shrink text 50% and place to right of points)
})<- par("usr")
dims <- dims[1]+ 0.8*diff(dims[1:2])
x <- dims[3]+ 0.08*diff(dims[3:4])
y text(x, y, "Dim2(25.5%)", srt = 25,col="black")
Decision trees are a statistical tool that uses the combination of predictors to identify patterns in the data and provides classification accuracy for the model.
The decision tree used is based on conditional inference trees
that looks at each predictor and splits the data into multiple nodes (branches) through recursive partitioning in a tree-structured regression model
. Each node is also split into leaves (difference between levels of outcome).
Decision trees via ctree
does the following:
Let’s see this in an example using the same dataset. To understand what the decision tree is doing, we will dissect it, by creating one tree with one predictor and move to the next.
We run a GLM with context
as our outcome, and Z2-Z1
as our predictor. We want to evaluate whether the two classes can be separated when using the acoustic metric Z2-Z1
. Context has two levels, and this will be considered as a binomial distribution.
<- dfPharV2 %>%
mdl.glm.Z2mnZ1 glm(context ~ Z2mnZ1, data = ., family = binomial)
summary(mdl.glm.Z2mnZ1)
Call:
glm(formula = context ~ Z2mnZ1, family = binomial, data = .)
Deviance Residuals:
Min 1Q Median
-1.2879 -1.1358 -0.8703
3Q Max
1.1538 1.4998
Coefficients:
Estimate Std. Error
(Intercept) 0.50112 0.23036
Z2mnZ1 -0.12281 0.03621
z value Pr(>|z|)
(Intercept) 2.175 0.029605 *
Z2mnZ1 -3.391 0.000696 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’
0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 552.89 on 401 degrees of freedom
Residual deviance: 541.01 on 400 degrees of freedom
AIC: 545.01
Number of Fisher Scoring iterations: 4
tidy(mdl.glm.Z2mnZ1) %>%
select(term, estimate) %>%
mutate(estimate = round(estimate, 3))
# to only get the coefficients
<- tidy(mdl.glm.Z2mnZ1) %>% pull(estimate) mycoef2
The result above shows that when moving from the non-guttural
(intercept), a unit increase (i.e., guttural
) yields a statistically significant decrease in the logodds associated with Z2-Z1
. We can evaluate this further from a classification point of view, using plogis
.
# non-guttural
plogis(mycoef2[1])
[1] 0.622722
#guttural
plogis(mycoef2[1] + mycoef2[2])
[1] 0.5934644
This shows that Z2-Z1
is able to explain the difference in the guttural
class with an accuracy of 59%. Let’s continue with this model further.
As above, we obtain predictions from the model. Because we are using a numeric predictor, we need to assign a threshold for the predict function. The threshold can be thought of as telling the predict function to assign any predictions lower than 50% to one group, and any higher to another.
<- predict(mdl.glm.Z2mnZ1, type = "response")>0.5
pred.glm.Z2mnZ1
<- table(pred.glm.Z2mnZ1, dfPharV2$context)
tbl.glm.Z2mnZ1 rownames(tbl.glm.Z2mnZ1) <- c("Non-Guttural", "Guttural")
tbl.glm.Z2mnZ1
pred.glm.Z2mnZ1 Non-Guttural
Non-Guttural 167
Guttural 55
pred.glm.Z2mnZ1 Guttural
Non-Guttural 75
Guttural 105
# from PresenceAbsence
::pcc(tbl.glm.Z2mnZ1) PresenceAbsence
::specificity(tbl.glm.Z2mnZ1) PresenceAbsence
::sensitivity(tbl.glm.Z2mnZ1) PresenceAbsence
<- pROC::roc(dfPharV2$context, as.numeric(pred.glm.Z2mnZ1)) roc.glm.Z2mnZ1
Setting levels: control = Non-Guttural, case = Guttural
Setting direction: controls < cases
roc.glm.Z2mnZ1
Call:
roc.default(response = dfPharV2$context, predictor = as.numeric(pred.glm.Z2mnZ1))
Data: as.numeric(pred.glm.Z2mnZ1) in 222 controls (dfPharV2$context Non-Guttural) < 180 cases (dfPharV2$context Guttural).
Area under the curve: 0.6678
::plot.roc(roc.glm.Z2mnZ1, legacy.axes = TRUE) pROC
The model above was able to explain the difference between the two classes with an accuracy of 67.7%. It has a slightly low specificity (0.58) to detect gutturals
, but a flighty high sensitivity (0.75) to reject the non-gutturals
. Looking at the confusion matrix, we observe that both groups were relatively accurately identified, but we have relatively large errors (or confusions). The AUC is at 0.67, which is not too high.
Let’s continue with GLM to evaluate it further. We start by running a correlation test to evaluate issues with GLM.
## from the package party
set.seed(123456)
<- dfPharV2 %>%
tree1 ctree(
~ Z2mnZ1,
context data = .)
print(tree1)
Conditional inference tree with 4 terminal nodes
Response: context
Input: Z2mnZ1
Number of observations: 402
1) Z2mnZ1 <= 9.551456; criterion = 0.999, statistic = 11.678
2) Z2mnZ1 <= 6.779068; criterion = 1, statistic = 12.368
3) Z2mnZ1 <= 4.004879; criterion = 1, statistic = 56.773
4)* weights = 157
3) Z2mnZ1 > 4.004879
5)* weights = 106
2) Z2mnZ1 > 6.779068
6)* weights = 64
1) Z2mnZ1 > 9.551456
7)* weights = 75
plot(tree1, main = "Conditional Inference Tree ")
How to interpret this figure? Let’s look at mean values and a plot for this variable. This is the difference between F2
and F1
using the bark scale. Because gutturals are produced within the pharynx (regardless of where), the predictions is that a high F1
and a low F2
will be the acoustic correlates related to this constriction location. The closeness between these formants yields a lower Z2-Z1
. Hence, the prediction is as follow: the smaller the difference, the more pharyngeal-like constriction these consonants have (all else being equal!). Let’s compute the mean/median and plot the difference between the two contexts.
%>%
dfPharV2 group_by(context) %>%
summarise(mean = mean(Z2mnZ1),
median = median(Z2mnZ1),
count = n())
%>%
dfPharV2 ggplot(aes(x = context, y = Z2mnZ1)) +
geom_boxplot()
The table above reports the mean and median of Z2-Z1
for both levels of context and the plots show the difference between the two. We have a total of 180 cases in the guttural
, and 222 in the non-guttural
. When considering the conditional inference tree output, various splits were obtained. The first is any value higher than 9.55 being assigned to the non-guttural
class (around 98% of 75 cases) Then, with anything lower than 9.55, a second split was obtained. A threshold of 6.78: higher assigned to guttural
(around 98% of 64 cases), lower, were split again with a threshold of 4 Bark. A third split was obtained: values lower of equal to 4 Bark are assigned to the guttural
(around 70% of 157 cases) and values higher than 4 Barks assigned to the non-guttural
(around 90% of 106 cases).
Dissecting the tree like this allows interpretation of the output. In this example, this is quite a complex case and ctree
allowed to fine tune the different patterns seen with Now let’s look at the full dataset to make sense of the combination of predictors to the difference.
set.seed(123456)
<- dfPharV2 %>%
fit ctree(
~ .,
context data = .)
print(fit)
Conditional inference tree with 8 terminal nodes
Response: context
Inputs: CPP, Energy, H1A1c, H1A2c, H1A3c, H1H2c, H2H4c, H2KH5Kc, H42Kc, HNR05, HNR15, HNR25, HNR35, SHR, soe, Z1mnZ0, Z2mnZ1, Z3mnZ2, Z4mnZ3, F0Bark, A1mnA2, A1mnA3, A2mnA3
Number of observations: 402
1) A2mnA3 <= -13.78; criterion = 1, statistic = 42.329
2) Z4mnZ3 <= 1.592125; criterion = 1, statistic = 40.991
3) H2H4c <= -8.396333; criterion = 0.993, statistic = 13.141
4)* weights = 8
3) H2H4c > -8.396333
5)* weights = 100
2) Z4mnZ3 > 1.592125
6) Energy <= 2.8295; criterion = 0.999, statistic = 16.923
7)* weights = 25
6) Energy > 2.8295
8)* weights = 10
1) A2mnA3 > -13.78
9) H1H2c <= 10.27167; criterion = 0.953, statistic = 9.458
10) SHR <= 0.1566667; criterion = 1, statistic = 18.337
11)* weights = 99
10) SHR > 0.1566667
12) H1H2c <= 0.7411667; criterion = 0.972, statistic = 10.449
13)* weights = 103
12) H1H2c > 0.7411667
14)* weights = 30
9) H1H2c > 10.27167
15)* weights = 27
plot(fit, main = "Conditional Inference Tree")
How to interpret this complex decision tree?
Let’s obtain the median value for each predictor grouped by context. Discuss some of the patterns.
%>%
dfPharV2 group_by(context) %>%
summarize_all(list(mean = mean))
We started with context
as our outcome, and all 23 acoustic measures as predictors. A total of 8 terminal nodes were identified with multiple binary splits in their leaves, allowing separation of the two categories. Looking specifically at the output, we observe a few things.
The first node was based on A2*-A3*
, detecting a difference between non-gutturals and gutturals. For the first binary split, a threshold of -13.78 Bark was used (mean non guttural = -7.86; mean guttural = -14.58), then for values lower of equal to this threshold, a second split was performed using Z4-Z3
(mean non guttural = 1.67; mean guttural = 1.43) with any value smaller and equal to 1.59, then another binary split using H2*-H4*
, etc…
Once done, the ctree
provides multiple binary splits into guttural or non-guttural.
Any possible issues/interesting patterns you can identify? Look at the interactions between predictors.
Let’s obtain some predictions from the model and evaluate how successful it is in dealing with the data.
set.seed(123456)
<- predict(fit)
pred.ctree <- table(pred.ctree, dfPharV2$context)
tbl.ctree tbl.ctree
pred.ctree Non-Guttural
Non-Guttural 194
Guttural 28
pred.ctree Guttural
Non-Guttural 41
Guttural 139
::pcc(tbl.ctree) PresenceAbsence
::specificity(tbl.ctree) PresenceAbsence
::sensitivity(tbl.ctree) PresenceAbsence
<- pROC::roc(dfPharV2$context, as.numeric(pred.ctree)) roc.ctree
Setting levels: control = Non-Guttural, case = Guttural
Setting direction: controls < cases
roc.ctree
Call:
roc.default(response = dfPharV2$context, predictor = as.numeric(pred.ctree))
Data: as.numeric(pred.ctree) in 222 controls (dfPharV2$context Non-Guttural) < 180 cases (dfPharV2$context Guttural).
Area under the curve: 0.823
::plot.roc(roc.ctree, legacy.axes = TRUE) pROC
This full model has a classification accuracy of 82.8%.This is not bad!! It has a relatively moderate specificity at 0.77 (at detecting the gutturals) but a high sensitivity at 0.87 (at detecting the non-gutturals). The ROC curve shows the relationship between the two with an AUC of 0.823
One important issue is that the trees we grew above are biased. They are based on the full dataset, which means they are very likely to overfit the data. We did not add any random selection and we only grew one tree each time. If you think about it, is it possible that we obtained such results simply by chance?
What if we add some randomness in the process of creating a conditional inference tree?
We change a small option in ctree
to allow for random selection of variables, to mimic what Random Forests will do. We use controls
to specify mtry = 5
, which is the rounded square root of number of predictors.
set.seed(123456)
<- dfPharV2 %>%
fit1 ctree(
~ .,
context data = .,
controls = ctree_control(mtry = 5))
plot(fit1, main = "Conditional Inference Tree")
<- predict(fit1)
pred.ctree1 <- table(pred.ctree1, dfPharV2$context)
tbl.ctree1 tbl.ctree1
pred.ctree1 Non-Guttural
Non-Guttural 214
Guttural 8
pred.ctree1 Guttural
Non-Guttural 82
Guttural 98
::pcc(tbl.ctree1) PresenceAbsence
::specificity(tbl.ctree1) PresenceAbsence
::sensitivity(tbl.ctree1) PresenceAbsence
<- pROC::roc(dfPharV2$context, as.numeric(pred.ctree1)) roc.ctree1
Setting levels: control = Non-Guttural, case = Guttural
Setting direction: controls < cases
roc.ctree1
Call:
roc.default(response = dfPharV2$context, predictor = as.numeric(pred.ctree1))
Data: as.numeric(pred.ctree1) in 222 controls (dfPharV2$context Non-Guttural) < 180 cases (dfPharV2$context Guttural).
Area under the curve: 0.7542
::plot.roc(roc.ctree1, legacy.axes = TRUE) pROC
Can you compare results between you and discuss what is going on?
When adding one random selection process to our ctree
, we allow it to obtain more robust predictions. We could even go further and grow multiple small trees with a portion of datapoints (e.g., 100 rows, 200 rows). When doing these multiple random selections, you are growing multiple trees that are decorrelated from each other. These become independent trees and one can combine the results of these trees to come with clear predictions.
This is how Random Forests work. You would start from a dataset, then grow multiple trees, vary number of observations used (nrow), and number of predictors used (mtry), adjust branches, and depth of nodes and at the end, combine the results in a forest. You can also run permutation tests to evaluate contributions of each predictor to the outcome. This is the beauty of Random Forests. They do all of these steps automatically at once for you!
As their name indicate, a Random Forest is a forest of trees implemented through bagging ensemble algorithms. Each tree has multiple branches (nodes), and will provide predictions based on recursive partitioning of the data. Then using the predictions from the multiple grown trees, Random Forests will create averaged
predictions and come up with prediction accuracy, etc.
There are multiple packages that one can use to grow Random Forests:
randomForest
: The original implementation of Random Forests.party
and partykit
: using conditional inference trees as base learnersranger
: a reimplementation of Random Forests; faster and more flexible than original implementationThe first implementation of Random Forests is widely used in research. One of the issues in this first implementation is that it favoured specific types of predictors (e.g., categorical predictors, predictors with multiple cut-offs, etc). Random Forests grown via Conditional Inference Trees as implemented in party
guard against this bias, but they are computationally demanding. Random Forests grown via permutation tests as implemented in ranger
speed up the computations and can mimic the unbiased selection process.
We start by declaring parallel computing on your devices. This is essential to run these complex computations. The code below is designed to only use 1 core from your machine (and it is not too complex), but if you try to increase the complexity of your computations, you will need parallel computing.
set.seed(123456)
#Declare parallel computing
<- availableCores()
ncores cat(paste0("Number of cores available for model calculations set to ", ncores, "."))
Number of cores available for model calculations set to 8.
registerDoFuture()
makeClusterPSOCK(ncores)
Socket cluster with 8 nodes where 8 nodes are on host ‘localhost’ (R version 4.1.2 (2021-11-01), platform x86_64-w64-mingw32)
plan(multisession)
ncores
system
8
# below we register our random number generator. This will mostly be used within the tidymodels below. This allows replication of the results
# below to suppress any warnings from doFuture
options(doFuture.rng.onMisuse = "ignore")
Random Forests grown via conditional inference trees, are different from the original implementation. They offer an unbiased selection process that guards against overfitting of the data. There are various points we need to consider in growing the forest, including number of trees and predictors to use each time. Let us run our first Random Forest via conditional inference trees. To make sure the code runs as fast as it can, we use a very low number of trees: only 100 It is well known that the more trees you grow, the more confidence you have in the results, as model estimation will be more stable. In this example, I would easily go with 500 trees..
To grow the forest, we use the function cforest
. We use all of the dataset for the moment. We need to specify a few options within controls:
ntree = 100
= number of trees to grow. Default = 500.mtry = round(sqrt(23))
: number of predictors to use each time. Default is 5, but specifying it is advised to account for the structure of the dataBy default, cforest_unbiased
has two additional important options that are used for an unbiased selection process. WARNING: you should not change these unless you know what you are doing. Also, by default, the data are split into a training and a testing set. The training is equal to 2/3s of the data; the testing is 1/3.
replace = FALSE
= Use subsampling with or without replacement. Default is FALSE
, i.e., use subsets of the data without replacing these.fraction = 0.632
= Use 63.2% of the data in each split.set.seed(123456)
<- dfPharV2 %>%
mdl.cforest cforest(context ~ ., data = .,
controls = cforest_unbiased(ntree = 100,
mtry = round(sqrt(23))))
To obtain predictions from the model, we use the predict
function and add OOB = TRUE
. This uses the out-of-bag sample (i.e., 1/3 of the data).
set.seed(123456)
<- predict(mdl.cforest, OOB = TRUE)
pred.cforest <- table(pred.cforest, dfPharV2$context)
tbl.cforest tbl.cforest
pred.cforest Non-Guttural
Non-Guttural 203
Guttural 19
pred.cforest Guttural
Non-Guttural 40
Guttural 140
::pcc(tbl.cforest) PresenceAbsence
::specificity(tbl.cforest) PresenceAbsence
::sensitivity(tbl.cforest) PresenceAbsence
<- pROC::roc(dfPharV2$context, as.numeric(pred.cforest)) roc.cforest
Setting levels: control = Non-Guttural, case = Guttural
Setting direction: controls < cases
roc.cforest
Call:
roc.default(response = dfPharV2$context, predictor = as.numeric(pred.cforest))
Data: as.numeric(pred.cforest) in 222 controls (dfPharV2$context Non-Guttural) < 180 cases (dfPharV2$context Guttural).
Area under the curve: 0.8461
::plot.roc(roc.cforest, legacy.axes = TRUE) pROC
Compared with the 82.8% classification accuracy we obtained using ctree
using our full dataset above (model 1), here we obtain 85.5% with an 2.7% increase. Compared with the 67.4% from model 2 from ctree
with random selection of predictors, we have an 18.1% increase in classification accuracy!
We could test whether there is statistically significant difference between our ctree
and cforest
models. Using the ROC curves, the roc.test
conducts a non-parametric Z test of significance on the correlated ROC curves. The results show a statistically significant improvement using the cforest
model. This is normal because we are growing 100 different trees, with random selection of both predictors and samples and provide an averaged
prediction.
::roc.test(roc.ctree, roc.cforest) pROC
DeLong's test for two
correlated ROC curves
data: roc.ctree and roc.cforest
Z = -1.0148, p-value =
0.3102
alternative hypothesis: true difference in AUC is not equal to 0
95 percent confidence interval:
-0.06756458 0.02146848
sample estimates:
AUC of roc1 AUC of roc2
0.8230480 0.8460961
::roc.test(roc.ctree1, roc.cforest) pROC
DeLong's test for two
correlated ROC curves
data: roc.ctree1 and roc.cforest
Z = -3.7128, p-value =
0.000205
alternative hypothesis: true difference in AUC is not equal to 0
95 percent confidence interval:
-0.14040087 -0.04338292
sample estimates:
AUC of roc1 AUC of roc2
0.7542042 0.8460961
One important feature in ctree
was to show which predictor was used first is splitting the data, which was then followed by the other predictors. We use a similar functionality with cforest
to obtain variable importance scores to pinpoint strong
and weak
predictors.
There are two ways to obtain this:
The former is generally comparable across packages and provides a normal permutation test; the latter runs a permutation test on a grid defined by the correlation matrix and corrects for possible collinearity. This is similar to a regression analysis, but looks at both main effects and interactions.
You could use the normal varimp
as implemented in party
. This uses mean decrease in accuracy scores. We will use variable importance scores via an AUC based permutation tests as this uses both accuracy and errors in the model, using varImpAUC
from the varImp
package.
DANGER ZONE: using conditional permutation test requires a lot of RAMs, unless you have access to a cluster, and/or a lot of RAMs, do not attempt running it. We will run the non-conditional version here for demonstration.
set.seed(123456)
<- varImp::varImpAUC(mdl.cforest, conditional = FALSE) VarImp.cforest
Warning: closing unused connection 11 (<-DESKTOP-A9ARQR4:11846)
Warning: closing unused connection 10 (<-DESKTOP-A9ARQR4:11846)
Warning: closing unused connection 9 (<-DESKTOP-A9ARQR4:11846)
Warning: closing unused connection 8 (<-DESKTOP-A9ARQR4:11846)
Warning: closing unused connection 7 (<-DESKTOP-A9ARQR4:11846)
Warning: closing unused connection 6 (<-DESKTOP-A9ARQR4:11846)
Warning: closing unused connection 5 (<-DESKTOP-A9ARQR4:11846)
Warning: closing unused connection 4 (<-DESKTOP-A9ARQR4:11846)
::barchart(sort(VarImp.cforest)) lattice
The Variable Importance Scores via non-conditional permutation tests showed that A2*-A3*
(i.e., energy in mid-high frequencies around F2 and F3) is the most important variable at explaining the difference between gutturals and non-gutturals, followed by Z4-Z3
(pharyngeal constriction), H1*-A3*
(energy in mid-high frequency component), Z2-Z1
(degree of compactness), Z3-Z2
(spectral divergence), H1*-A2
(energy in mid frequency component) and Z1-Z0
(degree of openness). All other predictors contribute to the contrast but to varying degrees (from H1*-H2*
to H1*-A1*
). The last 5 predictors are the least important and and the CPP has a 0 mean decrease in accuracy and can even be ignored.
set.seed(123456)
<- varImp::varImpAUC(mdl.cforest, conditional = TRUE)
VarImp.cforest ::barchart(sort(VarImp.cforest)) lattice
The party
package is powerful at growing Random Forests via conditional Inference trees, but is computationally prohibitive when increasing number of trees and using conditional permutation tests of variable importance scores. We look next at the package ranger
due to its speed in computation and flexibility.
The ranger
package proposes a reimplementation of the original Random Forests algorithms, written in C++ and allows for parallel computing. It offers more flexibility in terms of model specification.
In the model below specification below, there are already a few options we are familiar with, with additional ones described below:
num.tree
= Number of trees to grow. We use the default valuemtry
= Number of predictors to use. Default = floor(sqrt(Variables))
. For compatibility with party
, we use round(sqrt(23))
replace = FALSE
= Use subsampling with or without replacement. Default replace = TRUE
, i.e., is with replacement.sample.fraction = 0.632
= Use 63.2% of the data in each split. Default is full dataset, i.e., sample.fraction = 1
importance = "permutation"
= Compute variable importance scores via permutation testsscale.permutation.importance = FALSE
= whether to scale variable importance scores to be out of 100%. Default is TRUE. This is likely to introduce biases in variable importance estimation.splitrule = "extratrees"
= rule used for splitting trees.num.threads
= allow for parallel computing. Here we only specify 1 thread, but can use all thread on your computer (or cluster).We use options 2-7 to make sure we have an unbiased selection process with ranger
. You can try on your own running the model below by using the defaults to see how the rate of classification increases more, but with the caveat that it has a biased selection process.
set.seed(123456)
<- dfPharV2 %>%
mdl.ranger ranger(context ~ ., data = ., num.trees = 500, mtry = round(sqrt(23)),
replace = FALSE, sample.fraction = 0.632,
importance = "permutation", scale.permutation.importance = FALSE,
splitrule = "extratrees", num.threads = ncores)
mdl.ranger
Ranger result
Call:
ranger(context ~ ., data = ., num.trees = 500, mtry = round(sqrt(23)), replace = FALSE, sample.fraction = 0.632, importance = "permutation", scale.permutation.importance = FALSE, splitrule = "extratrees", num.threads = ncores)
Type: Classification
Number of trees: 500
Sample size: 402
Number of independent variables: 23
Mtry: 5
Target node size: 1
Variable importance mode: permutation
Splitrule: extratrees
Number of random splits: 1
OOB prediction error: 7.21 %
Results of our Random Forest shows an OOB (Out-Of-Bag) error rate of 8.2%, i.e., an accuracy of 91.8%.
Unfortunately, when growing a tree with ranger
, we cannot use predictions from the OOB sample as there are no comparable options to do so on the predictions. We need to hard-code this. We split the data into a training and a testing sets. The training will be on 2/3s of the data; the testing is on the remaining 1/3.
set.seed(123456)
<- sample(nrow(dfPharV2), 2/3 * nrow(dfPharV2))
train.idx <- dfPharV2[train.idx, ]
gutt.train <- dfPharV2[-train.idx, ] gutt.test
We use the same model specification as above, except from using the training set and saving the forest (with write.forest = TRUE
).
set.seed(123456)
<- gutt.train %>%
mdl.ranger2 ranger(context ~ ., data = ., num.trees = 500, mtry = round(sqrt(23)),
replace = FALSE, sample.fraction = 0.632,
importance = "permutation", scale.permutation.importance = FALSE,
splitrule = "extratrees", num.threads = ncores, write.forest = TRUE)
mdl.ranger2
Ranger result
Call:
ranger(context ~ ., data = ., num.trees = 500, mtry = round(sqrt(23)), replace = FALSE, sample.fraction = 0.632, importance = "permutation", scale.permutation.importance = FALSE, splitrule = "extratrees", num.threads = ncores, write.forest = TRUE)
Type: Classification
Number of trees: 500
Sample size: 268
Number of independent variables: 23
Mtry: 5
Target node size: 1
Variable importance mode: permutation
Splitrule: extratrees
Number of random splits: 1
OOB prediction error: 10.45 %
With the training set, we have an OOB error rate of 9.3%; i.e., an accuracy rate of 90.7%.
For the predictions, we use the testing set as a validation set. This is to be considered as a true reflection of the model. This is unseen data not used in the training set.
set.seed(123456)
<- predict(mdl.ranger2, data = gutt.test)
pred.ranger2 <- table(pred.ranger2$predictions, gutt.test$context)
tbl.ranger2 tbl.ranger2
Non-Guttural
Non-Guttural 68
Guttural 6
Guttural
Non-Guttural 5
Guttural 55
::pcc(tbl.ranger2) PresenceAbsence
::specificity(tbl.ranger2) PresenceAbsence
::sensitivity(tbl.ranger2) PresenceAbsence
<- pROC::roc(gutt.test$context, as.numeric(pred.ranger2$predictions)) roc.ranger
Setting levels: control = Non-Guttural, case = Guttural
Setting direction: controls < cases
roc.ranger
Call:
roc.default(response = gutt.test$context, predictor = as.numeric(pred.ranger2$predictions))
Data: as.numeric(pred.ranger2$predictions) in 74 controls (gutt.test$context Non-Guttural) < 60 cases (gutt.test$context Guttural).
Area under the curve: 0.9178
::plot.roc(roc.ranger, legacy.axes = TRUE) pROC
The classification rate based on the testing set is 86.6%. This is comparable to the one we obtained with cforest
. The changes in the settings allow for similarities in the predictions obtained from both party
and ranger
.
For the variable importance scores, we obtain them from either the training set or the full model above.
set.seed(123456)
::barchart(sort(mdl.ranger2$variable.importance), main = "Variable Importance scores - training set") lattice
::barchart(sort(mdl.ranger$variable.importance), main = "Variable Importance scores - full set") lattice
There are similarities between cforest
and ranger
, with minor differences. Z2-Z1
is the best predictor at explaining the differences between gutturals and non-gutturals with ranger
followed by Z3-Z2
and then A2*-A3*
, (reverse with cforest
!). The order of the additional predictors is sightly different between the two models. This is expected as the cforest
model only used 100 trees, whereas the ranger
model used 500 trees.
A clear difference between the packages party
and ranger
is that the former allows for conditional permutation tests for variable importance scores; this is absent from ranger
. However, there is a debate in the literature on whether correlated data are harmful within Random Forests. It is clear that how Random Forests work, i.e., the randomness in the selection process in number of data points, predictors, splitting rules, etc. allow the trees to be decorrelated from each other. Hence, the conditional permutation tests may not be required. But what they offer is to condition variable importance scores on each other (based on correlation tests) to mimic what a multiple regression analysis does (but without suffering from suppression!). Strong predictors will show major contribution, while weak ones will be squashed giving them extremely low (or even negative) scores. Within ranger
, it is possible to evaluate this by estimating p values associated with each variable importance.We use the altman
method. See documentation for more details.
DANGER ZONE: This requires heavy computations. Use with all cores on your machine or in the cluster. Recommendations are to use a minimum of 100 permutations or more, i.e., num.permutations = 100
. Here, we only use 20 to show the output.
set.seed(123456)
<- importance_pvalues(mdl.ranger2, method = "altmann",
VarImp.pval num.permutations = 20,
formula = context ~ ., data = gutt.train,
num.threads = ncores)
VarImp.pval
importance pvalue
CPP 0.004484848 0.09523810
Energy 0.015979798 0.04761905
H1A1c 0.008363636 0.04761905
H1A2c 0.025292929 0.04761905
H1A3c 0.028080808 0.04761905
H1H2c 0.013313131 0.04761905
H2H4c 0.010747475 0.04761905
H2KH5Kc 0.011595960 0.04761905
H42Kc 0.015939394 0.04761905
HNR05 0.006303030 0.04761905
HNR15 0.012121212 0.04761905
HNR25 0.009353535 0.04761905
HNR35 0.010242424 0.04761905
SHR 0.013737374 0.09523810
soe 0.010767677 0.04761905
Z1mnZ0 0.030060606 0.04761905
Z2mnZ1 0.073070707 0.04761905
Z3mnZ2 0.040181818 0.04761905
Z4mnZ3 0.047171717 0.04761905
F0Bark 0.010646465 0.04761905
A1mnA2 0.014646465 0.04761905
A1mnA3 0.021313131 0.04761905
A2mnA3 0.037555556 0.04761905
Of course, the output above shows variable p values. The lowest is at 0.048 for all predictors; one at 0.14 for CPP. Recall that CPP received the lowest variable importance score within ranger
and cforest
. If you increase permutations to 100 or 200, you will get more confidence in your results and can report the p values
In the next part, we look at the tidymodels
and introduce their philosophy.
The tidymodels
are a bundle of packages used to streamline and simplify the use of machine learning. The tidymodels
are not restricted to Random Forests, and you can even use them to run simple linear models, logistic regressions, PCA, Random Forests, Deep Learning, etc.
The tidymodels
’ philosophy is to separate data processing on the training and testing sets, and use of a workflow. Below, is an full example of how one can run Random Forests with via ranger
using the tidymodels
.
We start by creating a training and a testing set using the function initial_split
. Using strata = context
allows the model to split the data taking into account its structure and splits the data according to proportions of each group.
set.seed(123456)
<-
train_test_split initial_split(
data = dfPharV2,
strata = "context",
prop = 0.667)
train_test_split
<Analysis/Assess/Total>
<268/134/402>
<- train_test_split %>% training()
train_tbl <- train_test_split %>% testing() test_tbl
We can (if we want to), create a 10-folds cross-validation on the training set. This allows to fine tune the training by obtaining the forest with the highest accuracy. This is a clear difference with ranger
. While it is not impossible to hard code that, tidymodels
simplify it for us!!
set.seed(123456)
<- vfold_cv(train_tbl, v = 10, strata = "context") train_cv
Within the model specification, we need to specify multiple options:
recipe
: This is the recipe and is related to any data processing one wants to apply on the data.engine
: We need to specify the engine
to use. Here we want to run a Random Forest.tuning
: Here we can tune our engineworkflow
: here we specify the various steps of the workflowWhen defining the recipe, you need to think of the type of “transformations” you will apply to your data.
step_dummy()
: 1s and 2s for binary; or use one-hot-encoding step_dummy(predictor, one_hot = TRUE)
See documentations of tidymodels
for what you can apply!!
set.seed(123456)
<-
recipe %>%
train_tbl recipe(context ~ .) %>%
step_center(all_predictors(), -all_outcomes()) %>%
step_scale(all_predictors(), -all_outcomes()) %>%
prep()
<- bake(recipe, new_data = train_tbl) # convert to the train data to the newly imputed data
trainData_baked trainData_baked
NA
Once we have prepared the recipe
, we can bake it
to see the changes applied to it.
= function(data, x, y) {
box_fun_plot ggplot(data = data, aes(x = .data[[x]],
y = .data[[y]],
fill = .data[[x]])) +
geom_boxplot() +
labs(title = y,
x = x,
y = y) +
theme(
legend.position = "none"
+
) theme_bw()
}
# Create vector of predictors
<- names(trainData_baked)[-(dim(trainData_baked)[2])]#step_corr
expl
# Loop vector with map
<- map(expl, ~box_fun_plot(data = trainData_baked, x = "context", y = .x) )
expl_plots_box plot_grid(plotlist = expl_plots_box)
We set the engine here as a rand_forest
. We specify a classification mode. Then, we set an engine with engine specific parameters.
set.seed(123456)
<- rand_forest(
engine_tidym mode = "classification",
engine = "ranger",
%>% tune(),
mtry %>% tune(),
trees min_n = 1
%>%
) set_engine("ranger", importance = "permutation", sample.fraction = 0.632,
replace = FALSE, write.forest = T, splitrule = "extratrees",
scale.permutation.importance = FALSE) # we add engine specific settings
If we want to tune the model, then uncomment the lines below. It is important to use an mtry that hovers around the round(sqrt(Variables)). If you use all available variables, then your forest is biased as it is able to see all predictors. For number of trees, low numbers are not great, as you can easily underfit the data and not produce meaningful results. Large numbers are fine and Random Forests do not overfit (in theory).
The full dataset has around 2000 observations, and 23 predictors (well even more, but let’s ignore it for the moment). I tuned mtry
to be between 4 and 6, and trees
to be between 1000 and 5000 in a 30 step increment. In total, with a 10-folds cross validation, I grew 30 random forests on each fold for a total of 300 Random Forests on the training set!!! This of course will take a loooooong time to compute on your computer if using one thread. So use parallel computing or a cluster. When running in the cluster with 20 cores, each with 11GB RAMs, and it took around 260.442 seconds to run with 220GB RAMS! Of course, with smaller RAMs and number of cores, the code will still run but will take longer.
set.seed(123456)
<- grid_random(
gridy_tidym mtry() %>% range_set(c(4, 6)),
trees() %>% range_set(c(1000, 2000)),
size = 30
)
Now we define the workflow adding the recipe
and the model
.
set.seed(123456)
<- workflow() %>%
wkfl_tidym add_recipe(recipe) %>%
add_model(engine_tidym)
Here we run the model starting with the workflow, the cross-validation sample, the tuning parameters and asking for specific metrics.
The model below will do the following: 1. Use a 10-folds cross validation on the training test 2. Tune the hyper-parameters to reach the model with the best predictions 3. Within each fold, we grow 30 random forests; we have a total of 300 Random Forests, and we use an ROC-AUC based search for the best performing model
Of course, you could use a larger size to grow more trees, with this will take longer to run!
The model will run for about 2-3 minutes with an 8 cores machine and 32GB of RAMs. For demonstration purposes, the tuning of number of trees is restricted to between 1000 and 2000 trees. This can of course be increased to 5000 trees (or more) depending on the size of the dataset
set.seed(123456)
system.time(grid_tidym <-
tune_grid(wkfl_tidym,
resamples = train_cv,
grid = gridy_tidym,
metrics = metric_set(accuracy, roc_auc, sens, spec,f_meas, precision, recall),
control = control_grid(save_pred = TRUE, parallel_over = NULL))
)
user system elapsed
2.29 0.36 217.36
print(grid_tidym)
# Tuning results
# 10-fold cross-validation using stratification
We obtain the best performing model from cross-validation, then finalise the workflow by predicting the results on the testing set and obtain the results of the best performing model
set.seed(123456)
collect_metrics(grid_tidym)
<- select_best(grid_tidym, metric = "roc_auc")
grid_tidym_best grid_tidym_best
<- finalize_workflow(wkfl_tidym, grid_tidym_best)
wkfl_tidym_best <- last_fit(wkfl_tidym_best, split = train_test_split) wkfl_tidym_final
For the results, we can obtain various metrics on the training and testing sets.
percent(show_best(grid_tidym, metric = "accuracy", n = 1)$mean)
[1] "92%"
# Cross-validated training performance
show_best(grid_tidym, metric = "roc_auc", n = 1)$mean
[1] 0.97
show_best(grid_tidym, metric = "sens", n = 1)$mean
[1] 0.9519048
show_best(grid_tidym, metric = "spec", n = 1)$mean
[1] 0.8833333
# Cross-validated training performance
show_best(grid_tidym, metric = "f_meas", n = 1)$mean
[1] 0.930738
# Cross-validated training performance
show_best(grid_tidym, metric = "precision", n = 1)$mean
[1] 0.9120031
# Cross-validated training performance
show_best(grid_tidym, metric = "recall", n = 1)$mean
[1] 0.9519048
$.metrics wkfl_tidym_final
[[1]]
NA
#accuracy
percent(wkfl_tidym_final$.metrics[[1]]$.estimate[[1]])
[1] "90%"
#roc-auc
$.metrics[[1]]$.estimate[[2]] wkfl_tidym_final
[1] 0.9490991
$.predictions[[1]] %>%
wkfl_tidym_finalconf_mat(context, .pred_class) %>%
pluck(1) %>%
as_tibble() %>%
group_by(Truth) %>% # group by Truth to compute percentages
mutate(prop =percent(prop.table(n))) %>% # calculate percentages row-wise
ggplot(aes(Prediction, Truth, alpha = prop)) +
geom_tile(show.legend = FALSE) +
geom_text(aes(label = prop), colour = "white", alpha = 1, size = 8)
vip(pull_workflow_fit(wkfl_tidym_final$.workflow[[1]]))
Warning: `pull_workflow_fit()` was deprecated in workflows 0.2.3.
Please use `extract_fit_parsnip()` instead.
vip(pull_workflow_fit(wkfl_tidym_final$.workflow[[1]]), num_features = 23)
Warning: `pull_workflow_fit()` was deprecated in workflows 0.2.3.
Please use `extract_fit_parsnip()` instead.
This is an interesting features that show how much is gained when looking at various portions of the data. We see a gradual increase in the values. When 50% of the data were tested, around 83% of the results within the non-guttural class were already identified. The more testing was performed, the more confidence in the results there are and then when 84.96% of the data were tested, 100% of the cases were found.
$.predictions[[1]] %>%
wkfl_tidym_finalgain_curve(context, `.pred_Non-Guttural`) %>%
autoplot()
$.predictions[[1]] %>%
wkfl_tidym_finalroc_curve(context, `.pred_Non-Guttural`) %>%
autoplot()
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats4 grid stats
[4] graphics grDevices utils
[7] datasets methods base
other attached packages:
[1] ordinal_2019.12-10
[2] psycho_0.6.1
[3] cowplot_1.1.1
[4] scatterplot3d_0.3-41
[5] RColorBrewer_1.1-2
[6] factoextra_1.0.7
[7] FactoMineR_2.4
[8] languageR_1.5.0
[9] PresenceAbsence_1.1.9
[10] ggsignif_0.6.3
[11] emmeans_1.7.0
[12] vip_0.3.2
[13] varImp_0.4
[14] measures_0.3
[15] pROC_1.18.0
[16] yardstick_0.0.8
[17] workflowsets_0.1.0
[18] workflows_0.2.4
[19] tune_0.1.6
[20] rsample_0.1.0
[21] recipes_0.1.17
[22] parsnip_0.1.7
[23] modeldata_0.1.1
[24] infer_1.0.0
[25] dials_0.0.10
[26] scales_1.1.1
[27] tidymodels_0.1.4
[28] doFuture_0.12.0
[29] future_1.23.0
[30] foreach_1.5.1
[31] ranger_0.13.1
[32] party_1.3-9
[33] strucchange_1.5-2
[34] sandwich_3.0-1
[35] zoo_1.8-9
[36] modeltools_0.2-23
[37] mvtnorm_1.1-3
[38] lme4_1.1-27.1
[39] Matrix_1.3-4
[40] corrplot_0.90
[41] Hmisc_4.6-0
[42] Formula_1.2-4
[43] survival_3.2-13
[44] lattice_0.20-45
[45] knitr_1.36
[46] broom_0.7.10
[47] forcats_0.5.1
[48] stringr_1.4.0
[49] dplyr_1.0.7
[50] purrr_0.3.4
[51] readr_2.0.2
[52] tidyr_1.1.4
[53] tibble_3.1.5
[54] ggplot2_3.3.5
[55] tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] utf8_1.2.2
[2] tidyselect_1.1.1
[3] htmlwidgets_1.5.4
[4] munsell_0.5.0
[5] codetools_0.2-18
[6] DT_0.19
[7] withr_2.4.2
[8] colorspace_2.0-2
[9] highr_0.9
[10] rstudioapi_0.13
[11] leaps_3.1
[12] listenv_0.8.0
[13] labeling_0.4.2
[14] bit64_4.0.5
[15] DiceDesign_1.9
[16] farver_2.1.0
[17] coda_0.19-4
[18] parallelly_1.28.1
[19] vctrs_0.3.8
[20] generics_0.1.1
[21] TH.data_1.1-0
[22] ipred_0.9-12
[23] xfun_0.27
[24] R6_2.5.1
[25] lhs_1.1.3
[26] assertthat_0.2.1
[27] vroom_1.5.5
[28] multcomp_1.4-17
[29] nnet_7.3-16
[30] gtable_0.3.0
[31] globals_0.14.0
[32] timeDate_3043.102
[33] rlang_0.4.12
[34] splines_4.1.2
[35] rstatix_0.7.0
[36] checkmate_2.0.0
[37] abind_1.4-5
[38] yaml_2.2.1
[39] modelr_0.1.8
[40] backports_1.3.0
[41] tools_4.1.2
[42] lava_1.6.10
[43] ellipsis_0.3.2
[44] Rcpp_1.0.7
[45] plyr_1.8.6
[46] base64enc_0.1-3
[47] ggpubr_0.4.0
[48] rpart_4.1-15
[49] haven_2.4.3
[50] ggrepel_0.9.1
[51] cluster_2.1.2
[52] fs_1.5.0
[53] furrr_0.2.3
[54] magrittr_2.0.1
[55] data.table_1.14.2
[56] openxlsx_4.2.4
[57] reprex_2.0.1
[58] GPfit_1.0-8
[59] matrixStats_0.61.0
[60] hms_1.1.1
[61] evaluate_0.14
[62] xtable_1.8-4
[63] rio_0.5.27
[64] jpeg_0.1-9
[65] readxl_1.3.1
[66] gridExtra_2.3
[67] compiler_4.1.2
[68] crayon_1.4.2
[69] minqa_1.2.4
[70] htmltools_0.5.2
[71] mgcv_1.8-38
[72] tzdb_0.2.0
[73] libcoin_1.0-9
[74] lubridate_1.8.0
[75] DBI_1.1.1
[76] dbplyr_2.1.1
[77] MASS_7.3-54
[78] boot_1.3-28
[79] car_3.0-11
[80] cli_3.1.0
[81] parallel_4.1.2
[82] gower_0.2.2
[83] pkgconfig_2.0.3
[84] flashClust_1.01-2
[85] numDeriv_2016.8-1.1
[86] coin_1.4-2
[87] foreign_0.8-81
[88] xml2_1.3.2
[89] hardhat_0.1.6
[90] estimability_1.3
[91] prodlim_2019.11.13
[92] rvest_1.0.2
[93] digest_0.6.28
[94] rmarkdown_2.11
[95] cellranger_1.1.0
[96] htmlTable_2.3.0
[97] curl_4.3.2
[98] nloptr_1.2.2.2
[99] lifecycle_1.0.1
[100] nlme_3.1-153
[101] jsonlite_1.7.2
[102] carData_3.0-4
[103] fansi_0.5.0
[104] pillar_1.6.4
[105] fastmap_1.1.0
[106] httr_1.4.2
[107] glue_1.4.2
[108] zip_2.2.0
[109] png_0.1-7
[110] iterators_1.0.13
[111] bit_4.0.4
[112] class_7.3-19
[113] stringi_1.7.5
[114] latticeExtra_0.6-29
[115] ucminf_1.1-4
[116] future.apply_1.8.1