This repo contains material for a workshop on Random Forests in phonetics/phonology research
This repo contains details of the material used in the workshop Introduction to Random Forests delivered to the group: Sounds of Language and Speech: Aarhus University’s phonetics and phonology (and more) research group, on 16th June 2021.
Here is the necessary material for the session:
In this workshop, we start by introducing very briefly predictive modelling as a classification tool. We (re-)introduce Generalised Linear Models (GLM) as a classification tool. We then present links between GLM and Signal Detection Theory and introduce the notions of accuracy, error, sensitivity, specificity, Area Under the Curve, and dprime.
We will discuss issues with GLM on correlated data and introduce Decision trees as a possible solution. We will grow our first tree using the Conditional-Inference Trees framework. We use real data from my current research on the phonetic basis of the guttural natural class in Levantine Arabic using acoustic predictors of formant bark-differences, and voice quality as obtained from VoiceSauce. We look at how to interpret the tree based on one predictor, before using multiple predictors to evaluate differences between groups. We report on accuracy, sensitivity, specificity, and AUC for each tree. We will try to mimic how Random Forests work based on decision trees, before introducing Random Forests.
For Random Forests, we will use two frameorks:
We use specific settings in ranger to mimic the unbiased selection process implemented in the package party.
At the end, we introduce the tidymodels and briefly discuss their philosophy. We use ranger as an engine and showcase the strength of tidymodels to facilitate the use of machine learning.
requiredPackages = c('tidyverse', 'broom', 'knitr', 'corrplot', 'psycho',
'PresenceAbsence', 'party', 'ranger', 'tidymodels', 'pROC', 'varImp',
'lattice', 'vip', 'doFuture', 'doRNG', 'parallelly')
for(p in requiredPackages){
if(!require(p, character.only = TRUE)) install.packages(p, dependencies = TRUE)
library(p, character.only = TRUE)
}