• 1 Introduction
  • 2 Introduction to R, the Tidyverse - Visualisation
    • 2.1 Loading packages
    • 2.2 Intro R Markdown
      • 2.2.1 General
      • 2.2.2 Knitting to other formats
    • 2.3 R and R Studio
      • 2.3.1 R
      • 2.3.2 R Studio
      • 2.3.3 Other options?
    • 2.4 Am I ready to use R now?
      • 2.4.1 Setting working directory
      • 2.4.2 Creating a project
    • 2.5 How to use packages?
      • 2.5.1 Installation
      • 2.5.2 Loading
      • 2.5.3 Finding packages and help
    • 2.6 Let’s get started with R
      • 2.6.1 R as a calculator
        • 2.6.1.1 Simple calculations
        • 2.6.1.2 Functions
          • 2.6.1.2.1 Basic functions
          • 2.6.1.2.2 Creating variables
          • 2.6.1.2.3 Sequences
      • 2.6.2 Objects
        • 2.6.2.1 Basic objects
        • 2.6.2.2 Other functions and objects
          • 2.6.2.2.1 Some more calculations
          • 2.6.2.2.2 Referring to a specific position
    • 2.7 Matrices and dataframes
      • 2.7.1 Matrix
        • 2.7.1.1 General
        • 2.7.1.2 Referring to specific location
      • 2.7.2 Dataframes
        • 2.7.2.1 Creating a dataframe from scratch
        • 2.7.2.2 Deleting variables from the Environment
        • 2.7.2.3 Saving and reading the dataframe
        • 2.7.2.4 Reading and Saving in .csv
        • 2.7.2.5 Reading and saving other formats
        • 2.7.2.6 Checking the structure
        • 2.7.2.7 Changing the class of a variable
        • 2.7.2.8 Referring to particular variables, observations
      • 2.7.3 Descriptive statistics
        • 2.7.3.1 Basic summaries, tables
        • 2.7.3.2 Basic manipulations
          • 2.7.3.2.1 Creating variables
          • 2.7.3.2.2 Changing column names
          • 2.7.3.2.3 Deleting variables
          • 2.7.3.2.4 Changing names of observations
          • 2.7.3.2.5 Checking levels of factors
          • 2.7.3.2.6 Subsetting the dataframe
    • 2.8 The Tidyverse
      • 2.8.1 Introduction
      • 2.8.2 Using piping
      • 2.8.3 Demo subsetting
      • 2.8.4 Basic manipulations
        • 2.8.4.1 First steps
        • 2.8.4.2 Loading dataset
        • 2.8.4.3 View
        • 2.8.4.4 Structure
        • 2.8.4.5 See first 6 rows
        • 2.8.4.6 See last 6 rows
        • 2.8.4.7 Selecting variables
        • 2.8.4.8 Selecting observations
        • 2.8.4.9 Changing order of levels
        • 2.8.4.10 Changing reference value
      • 2.8.5 String manipulation
        • 2.8.5.1 str_sub
        • 2.8.5.2 str_detect
        • 2.8.5.3 str_locate
        • 2.8.5.4 str_locate_all
        • 2.8.5.5 str_replace
        • 2.8.5.6 str_replace_all
      • 2.8.6 Regular expressions
        • 2.8.6.1 Initial and final
        • 2.8.6.2 Other characters
      • 2.8.7 Advanced manipulations
        • 2.8.7.1 Columns to rows
        • 2.8.7.2 Rows to columns
      • 2.8.8 Basic descriptive statistics
        • 2.8.8.1 Basic summaries
        • 2.8.8.2 Summary for a specific variable
        • 2.8.8.3 Summarise_all
        • 2.8.8.4 Group_by
        • 2.8.8.5 One variable
        • 2.8.8.6 Multiple variables
      • 2.8.9 Summary tables
        • 2.8.9.1 Using summarytools
          • 2.8.9.1.1 Summary of full dataset with graphs
          • 2.8.9.1.2 Descriptive statistics for numeric variables
          • 2.8.9.1.3 Frequencies for non-numeric variables
        • 2.8.9.2 Using gtsummary
          • 2.8.9.2.1 Basic summaries
          • 2.8.9.2.2 With statistics (comparing means)
      • 2.8.10 Visualisation
        • 2.8.10.1 First steps
          • 2.8.10.1.1 Empty plot area
          • 2.8.10.1.2 Adding x and y values
          • 2.8.10.1.3 Adding geoms
          • 2.8.10.1.4 Adding line of best fit
          • 2.8.10.1.5 By word category
          • 2.8.10.1.6 Making final touches
        • 2.8.10.2 Additional plots
          • 2.8.10.2.1 A bar plot
          • 2.8.10.2.2 A histogram
          • 2.8.10.2.3 A density plot
            • 2.8.10.2.3.1 Histogram and density plot
            • 2.8.10.2.3.2 Density plot only
          • 2.8.10.2.4 A boxplot
          • 2.8.10.2.5 A Violin plot
        • 2.8.10.3 Advanced plots
          • 2.8.10.3.1 A dotplot
          • 2.8.10.3.2 Rain cloud plots
          • 2.8.10.3.3 Eye plots and half-eye plots
            • 2.8.10.3.3.1 Eye plot
            • 2.8.10.3.3.2 Half-eye plot
        • 2.8.10.4 Facet_grid
          • 2.8.10.4.1 Two categorical variables
            • 2.8.10.4.1.1 First steps
            • 2.8.10.4.1.2 Changing order of levels within a variable and its labels
          • 2.8.10.4.2 Three or more categorical variables
          • 2.8.10.4.3 Comparing two numeric outcomes
          • 2.8.10.4.4 Exporting images
          • 2.8.10.4.5 Conclusion
    • 2.9 session info
  • 3 Correlation plots - LM, GLM, CLM
    • 3.1 Loading packages
    • 3.2 Correlation tests
      • 3.2.1 Basic correlations
      • 3.2.2 Using the package corrplot
        • 3.2.2.1 Correlation plots
        • 3.2.2.2 More advanced
    • 3.3 Linear Models
      • 3.3.1 Introduction
        • 3.3.1.1 Model estimation
      • 3.3.2 Role of coding schemes
        • 3.3.2.1 Intro
        • 3.3.2.2 Treatment coding
        • 3.3.2.3 Contrast (or sum) coding
          • 3.3.2.3.1 Default
          • 3.3.2.3.2 Modified
      • 3.3.3 Further steps
        • 3.3.3.1 Tidying the output
        • 3.3.3.2 Nice table of our model summary
          • 3.3.3.2.1 Directly from model summary
          • 3.3.3.2.2 From the tidy output
          • 3.3.3.2.3 Model’s fit
        • 3.3.3.3 Dissecting the model
          • 3.3.3.3.1 “str” and “coef”
          • 3.3.3.3.2 “coef” and “coefficients”
          • 3.3.3.3.3 Residuals
            • 3.3.3.3.3.1 Histogram
            • 3.3.3.3.3.2 qqplots
            • 3.3.3.3.3.3 Residuals vs fitted values
            • 3.3.3.3.3.4 Generating and plotting residual plot with DHARMa
          • 3.3.3.3.4 Goodness of fit?
          • 3.3.3.3.5 Significance testing
      • 3.3.4 Plotting fitted values
        • 3.3.4.1 Trend line
        • 3.3.4.2 Predicted means and the trend line
        • 3.3.4.3 Raw data, predicted means and the trend line
        • 3.3.4.4 Add significance levels and trend line on a plot?
      • 3.3.5 What about pairwise comparison?
      • 3.3.6 Multiple predictors?
    • 3.4 Generalised Linear Models
      • 3.4.1 Load and summaries
      • 3.4.2 GLM - Categorical predictors
        • 3.4.2.1 Model estimation and results
        • 3.4.2.2 Model’s fit
        • 3.4.2.3 Logodds to Odd ratios
        • 3.4.2.4 LogOdds to proportions
        • 3.4.2.5 Plotting
      • 3.4.3 GLM - Numeric predictors
        • 3.4.3.1 Without z-scoring of predictor
          • 3.4.3.1.1 Model estimation
          • 3.4.3.1.2 Model’s fit
          • 3.4.3.1.3 LogOdds to proportions
          • 3.4.3.1.4 Plotting
        • 3.4.3.2 With z-scoring of predictor
          • 3.4.3.2.1 Model estimation
          • 3.4.3.2.2 Model’s fit
          • 3.4.3.2.3 LogOdds to proportions
          • 3.4.3.2.4 Plotting
            • 3.4.3.2.4.1 Normal
            • 3.4.3.2.4.2 z-scores
      • 3.4.4 Signal Detection Theory
        • 3.4.4.1 Rationale
        • 3.4.4.2 Running stats
        • 3.4.4.3 Below we obtain multiple measures
          • 3.4.4.3.1 TP, FP, FN, TN
          • 3.4.4.3.2 Accuracy, Error, Sensitivity, Specificity, Precision, etc.
            • 3.4.4.3.2.1 Accuracy and Error
            • 3.4.4.3.2.2 Sensitivity, Specificity, Precision, etc.
            • 3.4.4.3.2.3 STD measures
        • 3.4.4.4 GLM and d prime
        • 3.4.4.5 GLM as a classification tool
        • 3.4.4.6 GLM: Other distributions
    • 3.5 Cumulative Logit Link Models
      • 3.5.1 Ratings of percept of nasality
        • 3.5.1.1 Importing and pre-processing
        • 3.5.1.2 Our first model
        • 3.5.1.3 Testing significance
        • 3.5.1.4 Model’s fit
        • 3.5.1.5 Interpreting a cumulative model
          • 3.5.1.5.1 Plotting
          • 3.5.1.5.2 No confidence intervals
          • 3.5.1.5.3 With confidence intervals
      • 3.5.2 Subjective estimates of the weight of the referents of 81 English nouns.
        • 3.5.2.1 Importing and pre-processing
        • 3.5.2.2 Model specifications
          • 3.5.2.2.1 No random effects
        • 3.5.2.3 Testing significance
          • 3.5.2.3.1 Null vs no random
        • 3.5.2.4 Model’s fit
        • 3.5.2.5 Interpreting a cumulative model
        • 3.5.2.6 Plotting
          • 3.5.2.6.1 No confidence intervals
          • 3.5.2.6.2 With confidence intervals
    • 3.6 session info
  • 4 Random effects - LMM, GLMM, CLMM, GAMMs
    • 4.1 Loading packages
    • 4.2 Random effects?
      • 4.2.1 How to choose fixed and random effects
      • 4.2.2 What about Random Intercepts and Random Slopes
    • 4.3 Linear Mixed-effects Models. Why random effects matter
      • 4.3.1 Dataframe (simulated)
        • 4.3.1.1 Counts
          • 4.3.1.1.1 Subjects
          • 4.3.1.1.2 Items
          • 4.3.1.1.3 Age
          • 4.3.1.1.4 Cond
        • 4.3.1.2 Visualisations
          • 4.3.1.2.1 Condition by Age
          • 4.3.1.2.2 Subject by Condition
          • 4.3.1.2.3 Subject by Age
          • 4.3.1.2.4 Item by Condition
          • 4.3.1.2.5 Subject by Age
      • 4.3.2 Modelling strategy
        • 4.3.2.1 Simple Linear Model
        • 4.3.2.2 No interaction
          • 4.3.2.2.1 Crossed random intercepts
          • 4.3.2.2.2 Crossed random intercepts + By-speaker random slopes
          • 4.3.2.2.3 Crossed random intercepts + By-speaker and by-item random slopes
          • 4.3.2.2.4 Crossed random intercepts + By-speaker and by-item random slopes
        • 4.3.2.3 With interaction
          • 4.3.2.3.1 Crossed random intercepts + Interaction
          • 4.3.2.3.2 Crossed random intercepts + By-speaker random slopes + Interaction
          • 4.3.2.3.3 Crossed random intercepts + By-speaker and by-item random slopes + Interaction
          • 4.3.2.3.4 Crossed random intercepts + By-speaker and by-item random slopes
        • 4.3.2.4 Model comparison
        • 4.3.2.5 Optimal model
          • 4.3.2.5.1 Model criticism
            • 4.3.2.5.1.1 Histogram
            • 4.3.2.5.1.2 QQ-plot
            • 4.3.2.5.1.3 Residuals vs Fitted
          • 4.3.2.5.2 Summary
          • 4.3.2.5.3 ANOVA
          • 4.3.2.5.4 Model’s fit
      • 4.3.3 Plotting model’s output
        • 4.3.3.1 With ggstats
          • 4.3.3.1.1 A plot
        • 4.3.3.2 A plot + a table + 95% CI
        • 4.3.3.3 With ggstatsplot
      • 4.3.4 Exploring random effects
        • 4.3.4.1 Subject random effects
        • 4.3.4.2 Items random effects
        • 4.3.4.3 Plots
          • 4.3.4.3.1 Fixed effects
            • 4.3.4.3.1.1 Condition
            • 4.3.4.3.1.2 Age
            • 4.3.4.3.1.3 Condition by Age
          • 4.3.4.3.2 Random effects
            • 4.3.4.3.2.1 Subject
            • 4.3.4.3.2.2 Subject by Condition
            • 4.3.4.3.2.3 Subject by Age
            • 4.3.4.3.2.4 Item
            • 4.3.4.3.2.5 Item by Cond
            • 4.3.4.3.2.6 Item by Age
            • 4.3.4.3.2.7 Item by Cond facetted by Age
            • 4.3.4.3.2.8 Subject by Item
            • 4.3.4.3.2.9 Subject by Item facetted by Cond
            • 4.3.4.3.2.10 Subject by Item facetted by Age
          • 4.3.4.3.3 With Lattice
            • 4.3.4.3.3.1 Subject Intercepts
            • 4.3.4.3.3.2 Subject Slopes
            • 4.3.4.3.3.3 Item Intercepts
            • 4.3.4.3.3.4 Item Slopes for Cond
            • 4.3.4.3.3.5 Item Slopes for Age
      • 4.3.5 Conclusion
    • 4.4 Generalised Linear Mixed-effects Models
      • 4.4.1 GLMM - Categorical predictors
        • 4.4.1.1 Simulating a new dataset
        • 4.4.1.2 Manipulations and a table
        • 4.4.1.3 Model estimation and results
          • 4.4.1.3.1 Random Intercepts
          • 4.4.1.3.2 Random Slopes
          • 4.4.1.3.3 Random Slopes decorrelated
        • 4.4.1.4 Model comparison
        • 4.4.1.5 Getting results
          • 4.4.1.5.1 Model’s fit
          • 4.4.1.5.2 Fixed effects
          • 4.4.1.5.3 Random effects
          • 4.4.1.5.4 Logodds to Odd ratios
          • 4.4.1.5.5 LogOdds to proportions
          • 4.4.1.5.6 Plotting
      • 4.4.2 GLMM - Numeric predictors
        • 4.4.2.1 Without z-scoring of predictor
          • 4.4.2.1.1 Model estimation
            • 4.4.2.1.1.1 Random Intercepts
            • 4.4.2.1.1.2 Random Slopes
        • 4.4.2.2 Model comparison
        • 4.4.2.3 Gettings results
          • 4.4.2.3.1 Model’s fit
          • 4.4.2.3.2 Fixed effects
          • 4.4.2.3.3 Random effects
          • 4.4.2.3.4 Logodds to Odd ratios
          • 4.4.2.3.5 LogOdds to proportions
          • 4.4.2.3.6 Plotting
        • 4.4.2.4 With z-scoring of predictor
          • 4.4.2.4.1 Model estimation
            • 4.4.2.4.1.1 Random Intercepts
            • 4.4.2.4.1.2 Random Slopes
          • 4.4.2.4.2 Model comparison
          • 4.4.2.4.3 Gettings results
            • 4.4.2.4.3.1 Model’s fit
            • 4.4.2.4.3.2 Fixed effects
            • 4.4.2.4.3.3 Random effects
          • 4.4.2.4.4 Plotting
            • 4.4.2.4.4.1 Normal
            • 4.4.2.4.4.2 z-scores
    • 4.5 Cumulative Logit Link Mixed-effects Models
      • 4.5.1 Ratings of percept of nasality
        • 4.5.1.1 Importing and pre-processing
        • 4.5.1.2 Model specifications
          • 4.5.1.2.1 No random effects
          • 4.5.1.2.2 Random effects 1 - Intercepts only
          • 4.5.1.2.3 Random effects 2 - Intercepts and Slopes
        • 4.5.1.3 Testing significance
          • 4.5.1.3.1 Null vs no random
          • 4.5.1.3.2 No random vs Random Intercepts
          • 4.5.1.3.3 No random vs Random Intercepts
        • 4.5.1.4 Model’s fit
        • 4.5.1.5 Interpreting a cumulative model
        • 4.5.1.6 Plotting
          • 4.5.1.6.1 No confidence intervals
          • 4.5.1.6.2 With confidence intervals
      • 4.5.2 Subjective estimates of the weight of the referents of 81 English nouns.
        • 4.5.2.1 Importing and pre-processing
        • 4.5.2.2 Model specifications
          • 4.5.2.2.1 No random effects
          • 4.5.2.2.2 Random effects 1 - Intercepts only
          • 4.5.2.2.3 Random effects 2 - Intercepts and Slopes
        • 4.5.2.3 Testing significance
          • 4.5.2.3.1 Null vs no random
          • 4.5.2.3.2 No random vs Random Intercepts
          • 4.5.2.3.3 Random Intercepts vs Random Slope
        • 4.5.2.4 Model’s fit
        • 4.5.2.5 Interpreting a cumulative model
        • 4.5.2.6 Plotting
          • 4.5.2.6.1 No confidence intervals
          • 4.5.2.6.2 With confidence intervals
    • 4.6 Generalised Additive Mixed-effects Models (GAMMs)
      • 4.6.1 Loading dataframe
      • 4.6.2 Manipulation
        • 4.6.2.1 Wide to Long format
        • 4.6.2.2 Transforming and arranging dataframe
        • 4.6.2.3 Ordering predictors
        • 4.6.2.4 Start value for autocorrelation
      • 4.6.3 Model specifications
        • 4.6.3.1 No AR1 model
          • 4.6.3.1.1 Model estimation
            • 4.6.3.1.1.1 ACF No AR1
            • 4.6.3.1.1.2 Gam check
            • 4.6.3.1.1.3 Estimating Rho
        • 4.6.3.2 AR1 model
          • 4.6.3.2.0.1 Model estimation
          • 4.6.3.2.0.2 ACF AR1
          • 4.6.3.2.0.3 Summary
          • 4.6.3.2.0.4 Model’s fit
      • 4.6.4 Significance testing second Autoregressive GAM
        • 4.6.4.1 Models
          • 4.6.4.1.1 Full Model
          • 4.6.4.1.2 Model 2 (without ConVowelInt.ord)
          • 4.6.4.1.3 Null Model
        • 4.6.4.2 Testing significance
      • 4.6.5 Visualising smooths
        • 4.6.5.1 /i:/
        • 4.6.5.2 /a:/
        • 4.6.5.3 /u:/
      • 4.6.6 Difference smooths
        • 4.6.6.1 /i:/
        • 4.6.6.2 /a:/
        • 4.6.6.3 /u:/
    • 4.7 Other distibutions
    • 4.8 session info
  • 5 Accessing textual datasets
    • 5.1 Loading packages
    • 5.2 Where to find textual datasets?
      • 5.2.1 janeaustenr
        • 5.2.1.1 Look at books
        • 5.2.1.2 Summary
        • 5.2.1.3 Import into Global Environment
        • 5.2.1.4 Look into data
      • 5.2.2 proustr
        • 5.2.2.1 Look at books
        • 5.2.2.2 Summary
        • 5.2.2.3 Import into Global Environment
        • 5.2.2.4 Look into data
      • 5.2.3 gutenbergr
        • 5.2.3.1 Search for available work
        • 5.2.3.2 Filter available text
        • 5.2.3.3 Look at a specific work
        • 5.2.3.4 Download specific work
        • 5.2.3.5 Summary
        • 5.2.3.6 Look into data
      • 5.2.4 textdata
        • 5.2.4.1 Available datasets
        • 5.2.4.2 Download datasets
        • 5.2.4.3 Look into data
      • 5.2.5 readtext
        • 5.2.5.1 Inaugural Corpus USA
          • 5.2.5.1.1 Importing data
          • 5.2.5.1.2 Checking structure
          • 5.2.5.1.3 Unnest
        • 5.2.5.2 Universal Declaration of Human Rights
          • 5.2.5.2.1 Importing data
          • 5.2.5.2.2 Checking structure
          • 5.2.5.2.3 Unnest
        • 5.2.5.3 Twitter data
          • 5.2.5.3.1 Importing data
          • 5.2.5.3.2 Checking structure
          • 5.2.5.3.3 Unnest
        • 5.2.5.4 Converting from a PDF file
          • 5.2.5.4.1 Importing data
          • 5.2.5.4.2 Check encoding
          • 5.2.5.4.3 Checking structure
          • 5.2.5.4.4 Unnest
        • 5.2.5.5 Different encodings
          • 5.2.5.5.1 Temp path
          • 5.2.5.5.2 Importing data
          • 5.2.5.5.3 Export encoding
      • 5.2.6 Webscrapping
        • 5.2.6.1 A single webpage
          • 5.2.6.1.1 Read_html
          • 5.2.6.1.2 Extract headline
          • 5.2.6.1.3 Extract text
        • 5.2.6.2 Multiple webpages
          • 5.2.6.2.1 Read_html
          • 5.2.6.2.2 Extract headline
          • 5.2.6.2.3 Extract subpages
          • 5.2.6.2.4 Extract text
    • 5.3 session info
  • 6 Creating a corpus and basic manipulations
    • 6.1 Loading packages
    • 6.2 janeaustenr
      • 6.2.1 Import into Global Environment
      • 6.2.2 Look into data
      • 6.2.3 Transform to a dataframe
      • 6.2.4 Create a corpus
        • 6.2.4.1 Summary
        • 6.2.4.2 Accessing parts of corpus
      • 6.2.5 Basic manipulations
        • 6.2.5.1 Sentences
          • 6.2.5.1.1 Transform
          • 6.2.5.1.2 Summary
          • 6.2.5.1.3 Subset
        • 6.2.5.2 Paragraphs
          • 6.2.5.2.1 Transform
          • 6.2.5.2.2 Summary
          • 6.2.5.2.3 Subset
        • 6.2.5.3 Tokens
          • 6.2.5.3.1 With punctuations
          • 6.2.5.3.2 Without punctuations
        • 6.2.5.4 Types
        • 6.2.5.5 Keyword-in-contexts (kwic)
          • 6.2.5.5.1 Pattern
          • 6.2.5.5.2 Phrase
        • 6.2.5.6 stopwords
        • 6.2.5.7 wordstem
          • 6.2.5.7.1 Simple example
          • 6.2.5.7.2 tokens_wordstem
          • 6.2.5.7.3 dfm_wordstem
    • 6.3 proustr
      • 6.3.1 Import into Global Environment
      • 6.3.2 Look into data
      • 6.3.3 Create a corpus
        • 6.3.3.1 Summary
        • 6.3.3.2 Accessing parts of corpus
        • 6.3.3.3 Document-level information
        • 6.3.3.4 Unique variable names (for volume)
      • 6.3.4 Basic manipulations
        • 6.3.4.1 Sentences
          • 6.3.4.1.1 Transform
          • 6.3.4.1.2 Summary
          • 6.3.4.1.3 Subset
        • 6.3.4.2 Paragraphs
          • 6.3.4.2.1 Transform
          • 6.3.4.2.2 Summary
          • 6.3.4.2.3 Subset
        • 6.3.4.3 Tokens
          • 6.3.4.3.1 With punctuations
          • 6.3.4.3.2 Without punctuations
        • 6.3.4.4 Types
        • 6.3.4.5 Keyword-in-contexts (kwic)
          • 6.3.4.5.1 Pattern
          • 6.3.4.5.2 Phrase
        • 6.3.4.6 stopwords
        • 6.3.4.7 wordstem
          • 6.3.4.7.1 tokens_wordstem
          • 6.3.4.7.2 dfm_wordstem
    • 6.4 Inaugural Corpus USA
      • 6.4.1 Importing data
      • 6.4.2 Create a corpus
        • 6.4.2.1 Summary
        • 6.4.2.2 Editing docnames
        • 6.4.2.3 Accessing parts of corpus
        • 6.4.2.4 Document-level information
        • 6.4.2.5 Unique variable names (for volume)
      • 6.4.3 Basic manipulations
        • 6.4.3.1 Sentences
          • 6.4.3.1.1 Transform
          • 6.4.3.1.2 Summary
          • 6.4.3.1.3 Subset
        • 6.4.3.2 Paragraphs
          • 6.4.3.2.1 Transform
          • 6.4.3.2.2 Summary
          • 6.4.3.2.3 Subset
        • 6.4.3.3 Tokens
          • 6.4.3.3.1 With punctuations
          • 6.4.3.3.2 Without punctuations
        • 6.4.3.4 Types
        • 6.4.3.5 Keyword-in-contexts (kwic)
          • 6.4.3.5.1 Pattern
          • 6.4.3.5.2 Phrase
        • 6.4.3.6 stopwords
        • 6.4.3.7 wordstem
          • 6.4.3.7.1 tokens_wordstem
          • 6.4.3.7.2 dfm_wordstem
    • 6.5 Universal Declaration of Human Rights
      • 6.5.1 Importing data
      • 6.5.2 Create a corpus
        • 6.5.2.1 Summary
        • 6.5.2.2 Accessing parts of corpus
        • 6.5.2.3 Document-level information
      • 6.5.3 Basic manipulations
        • 6.5.3.1 Sentences
          • 6.5.3.1.1 Transform
          • 6.5.3.1.2 Summary
          • 6.5.3.1.3 Subset
        • 6.5.3.2 Paragraphs
          • 6.5.3.2.1 Transform
          • 6.5.3.2.2 Summary
          • 6.5.3.2.3 Subset
        • 6.5.3.3 Tokens
          • 6.5.3.3.1 With punctuations
          • 6.5.3.3.2 Without punctuations
        • 6.5.3.4 Types
        • 6.5.3.5 Keyword-in-contexts (kwic)
          • 6.5.3.5.1 Pattern
          • 6.5.3.5.2 Phrase
        • 6.5.3.6 stopwords
          • 6.5.3.6.1 Single language
          • 6.5.3.6.2 Multiple language
        • 6.5.3.7 wordstem
          • 6.5.3.7.1 tokens_wordstem
          • 6.5.3.7.2 dfm_wordstem
    • 6.6 Twitter data
      • 6.6.1 Importing data
      • 6.6.2 Create a corpus
        • 6.6.2.1 Summary
        • 6.6.2.2 Accessing parts of corpus
        • 6.6.2.3 Document-level information
        • 6.6.2.4 Unique variable names (for volume)
      • 6.6.3 Basic manipulations
        • 6.6.3.1 Sentences
          • 6.6.3.1.1 Transform
          • 6.6.3.1.2 Summary
          • 6.6.3.1.3 Subset
        • 6.6.3.2 Paragraphs
          • 6.6.3.2.1 Transform
          • 6.6.3.2.2 Summary
          • 6.6.3.2.3 Subset
        • 6.6.3.3 Tokens
          • 6.6.3.3.1 With punctuations
          • 6.6.3.3.2 Without punctuations
        • 6.6.3.4 Types
        • 6.6.3.5 Keyword-in-contexts (kwic)
          • 6.6.3.5.1 Pattern
          • 6.6.3.5.2 Phrase
        • 6.6.3.6 stopwords
        • 6.6.3.7 wordstem
          • 6.6.3.7.1 tokens_wordstem
          • 6.6.3.7.2 dfm_wordstem
    • 6.7 Single web page
      • 6.7.1 Read_html
      • 6.7.2 Extract headline
      • 6.7.3 Extract text
      • 6.7.4 Create a corpus
        • 6.7.4.1 Summary
        • 6.7.4.2 Accessing parts of corpus
        • 6.7.4.3 Document-level information
      • 6.7.5 Basic manipulations
        • 6.7.5.1 Sentences
          • 6.7.5.1.1 Transform
          • 6.7.5.1.2 Summary
          • 6.7.5.1.3 Subset
        • 6.7.5.2 Paragraphs
          • 6.7.5.2.1 Transform
          • 6.7.5.2.2 Summary
          • 6.7.5.2.3 Subset
        • 6.7.5.3 Tokens
          • 6.7.5.3.1 With punctuations
          • 6.7.5.3.2 Without punctuations
        • 6.7.5.4 Types
        • 6.7.5.5 Keyword-in-contexts (kwic)
          • 6.7.5.5.1 Pattern
          • 6.7.5.5.2 Phrase
        • 6.7.5.6 stopwords
        • 6.7.5.7 wordstem
          • 6.7.5.7.1 tokens_wordstem
          • 6.7.5.7.2 dfm_wordstem
    • 6.8 Multiple webpages
      • 6.8.1 Read_html
      • 6.8.2 Extract headline
      • 6.8.3 Extract subpages
      • 6.8.4 Extract text
      • 6.8.5 Create a corpus
        • 6.8.5.1 Summary
        • 6.8.5.2 Accessing parts of corpus
        • 6.8.5.3 Document-level information
      • 6.8.6 Basic manipulations
        • 6.8.6.1 Sentences
          • 6.8.6.1.1 Transform
          • 6.8.6.1.2 Summary
          • 6.8.6.1.3 Subset
        • 6.8.6.2 Paragraphs
          • 6.8.6.2.1 Transform
          • 6.8.6.2.2 Summary
          • 6.8.6.2.3 Subset
        • 6.8.6.3 Tokens
          • 6.8.6.3.1 With punctuations
          • 6.8.6.3.2 Without punctuations
        • 6.8.6.4 Types
        • 6.8.6.5 Keyword-in-contexts (kwic)
          • 6.8.6.5.1 Pattern
          • 6.8.6.5.2 Phrase
        • 6.8.6.6 stopwords
        • 6.8.6.7 wordstem
          • 6.8.6.7.1 tokens_wordstem
          • 6.8.6.7.2 dfm_wordstem
    • 6.9 session info
  • 7 Advanced manipulations of corpora
    • 7.1 Loading packages
    • 7.2 janeaustenr
      • 7.2.1 Import into Global Environment
      • 7.2.2 Look into data
      • 7.2.3 Transform to a dataframe
      • 7.2.4 Create a corpus
        • 7.2.4.1 Summary
        • 7.2.4.2 Accessing parts of corpus
      • 7.2.5 Advanced manipulations
        • 7.2.5.1 Tokens
          • 7.2.5.1.1 With punctuations
          • 7.2.5.1.2 Without punctuations
        • 7.2.5.2 Compound words
          • 7.2.5.2.1 kwic Phrase
          • 7.2.5.2.2 Compounds
        • 7.2.5.3 N-grams
          • 7.2.5.3.1 Multi-grams
          • 7.2.5.3.2 Skip-grams
        • 7.2.5.4 Dictionary
          • 7.2.5.4.1 Create dictionary
          • 7.2.5.4.2 Token lookup
          • 7.2.5.4.3 DFM
        • 7.2.5.5 Part of Speech tagging
          • 7.2.5.5.1 Download and load language model
          • 7.2.5.5.2 Tokenise, tag, dependency parsing
          • 7.2.5.5.3 Dependency parsing
        • 7.2.5.6 Feature co-occurrence matrix (FCM)
          • 7.2.5.6.1 Computing number of co-occurrences
          • 7.2.5.6.2 Features co-occurrences
    • 7.3 proustr
      • 7.3.1 Import into Global Environment
      • 7.3.2 Look into data
      • 7.3.3 Create a corpus
        • 7.3.3.1 Summary
        • 7.3.3.2 Accessing parts of corpus
        • 7.3.3.3 Document-level information
        • 7.3.3.4 Unique variable names (for volume)
      • 7.3.4 Advanced manipulations
        • 7.3.4.1 Tokens
          • 7.3.4.1.1 With punctuations
          • 7.3.4.1.2 Without punctuations
        • 7.3.4.2 Compound words
          • 7.3.4.2.1 kwic Phrase
          • 7.3.4.2.2 Compounds
        • 7.3.4.3 N-grams
          • 7.3.4.3.1 Multi-grams
          • 7.3.4.3.2 Skip-grams
        • 7.3.4.4 Dictionary
          • 7.3.4.4.1 Create dictionary
          • 7.3.4.4.2 Token lookup
          • 7.3.4.4.3 DFM
        • 7.3.4.5 Part of Speech tagging
          • 7.3.4.5.1 Download and load language model
          • 7.3.4.5.2 Tokenise, tag, dependency parsing
          • 7.3.4.5.3 Dependency parsing
        • 7.3.4.6 Feature co-occurrence matrix (FCM)
          • 7.3.4.6.1 Computing number of co-occurrences
          • 7.3.4.6.2 Features co-occurrences
    • 7.4 Inaugural Corpus USA
      • 7.4.1 Importing data
      • 7.4.2 Create a corpus
        • 7.4.2.1 Summary
        • 7.4.2.2 Editing docnames
        • 7.4.2.3 Accessing parts of corpus
        • 7.4.2.4 Document-level information
        • 7.4.2.5 Unique variable names (for volume)
      • 7.4.3 Advanced manipulations
        • 7.4.3.1 Tokens
          • 7.4.3.1.1 With punctuations
          • 7.4.3.1.2 Without punctuations
        • 7.4.3.2 Compound words
          • 7.4.3.2.1 kwic Phrase
          • 7.4.3.2.2 Compounds
        • 7.4.3.3 N-grams
          • 7.4.3.3.1 Multi-grams
          • 7.4.3.3.2 Skip-grams
        • 7.4.3.4 Dictionary
          • 7.4.3.4.1 Create dictionary
          • 7.4.3.4.2 Token lookup
          • 7.4.3.4.3 DFM
        • 7.4.3.5 Part of Speech tagging
          • 7.4.3.5.1 Download and load language model
          • 7.4.3.5.2 Tokenise, tag, dependency parsing
          • 7.4.3.5.3 Dependency parsing
        • 7.4.3.6 Feature co-occurrence matrix (FCM)
          • 7.4.3.6.1 Computing number of co-occurrences
          • 7.4.3.6.2 Features co-occurrences
    • 7.5 Universal Declaration of Human Rights
      • 7.5.1 Importing data
      • 7.5.2 Create a corpus
        • 7.5.2.1 Summary
        • 7.5.2.2 Accessing parts of corpus
        • 7.5.2.3 Document-level information
      • 7.5.3 Advanced manipulations
        • 7.5.3.1 Tokens
          • 7.5.3.1.1 With punctuations
          • 7.5.3.1.2 Without punctuations
        • 7.5.3.2 Compound words
          • 7.5.3.2.1 kwic Phrase
          • 7.5.3.2.2 Compounds
        • 7.5.3.3 N-grams
          • 7.5.3.3.1 Multi-grams
          • 7.5.3.3.2 Skip-grams
        • 7.5.3.4 Dictionary
          • 7.5.3.4.1 Create dictionary
          • 7.5.3.4.2 Token lookup
          • 7.5.3.4.3 DFM
        • 7.5.3.5 Part of Speech tagging
          • 7.5.3.5.1 Download and load language models
          • 7.5.3.5.2 Tokenise, tag, dependency parsing
            • 7.5.3.5.2.1 Chinese
            • 7.5.3.5.2.2 Czech
            • 7.5.3.5.2.3 Danish
            • 7.5.3.5.2.4 English
            • 7.5.3.5.2.5 French
            • 7.5.3.5.2.6 Greek
            • 7.5.3.5.2.7 Hungarian
            • 7.5.3.5.2.8 Irish
            • 7.5.3.5.2.9 Japanese
            • 7.5.3.5.2.10 Russian
            • 7.5.3.5.2.11 Vietnamese
          • 7.5.3.5.3 Dependency parsing
            • 7.5.3.5.3.1 Chinese
            • 7.5.3.5.3.2 Czech
            • 7.5.3.5.3.3 Danish
            • 7.5.3.5.3.4 English
            • 7.5.3.5.3.5 French
            • 7.5.3.5.3.6 Greek
            • 7.5.3.5.3.7 Hungarian
            • 7.5.3.5.3.8 Irish
            • 7.5.3.5.3.9 Japanese
            • 7.5.3.5.3.10 Russian
            • 7.5.3.5.3.11 Vietnamese
        • 7.5.3.6 Feature co-occurrence matrix (FCM)
          • 7.5.3.6.1 Computing number of co-occurrences
            • 7.5.3.6.1.1 English
            • 7.5.3.6.1.2 French
          • 7.5.3.6.2 Features co-occurrences
            • 7.5.3.6.2.1 English
            • 7.5.3.6.2.2 French
    • 7.6 Twitter data
      • 7.6.1 Importing data
        • 7.6.1.1 Create a corpus
        • 7.6.1.2 Summary
        • 7.6.1.3 Accessing parts of corpus
        • 7.6.1.4 Document-level information
        • 7.6.1.5 Unique variable names (for volume)
      • 7.6.2 Advanced manipulations
        • 7.6.2.1 Tokens
          • 7.6.2.1.1 With punctuations
          • 7.6.2.1.2 Without punctuations
        • 7.6.2.2 Compound words
          • 7.6.2.2.1 kwic Phrase
          • 7.6.2.2.2 Compounds
        • 7.6.2.3 N-grams
          • 7.6.2.3.1 Multi-grams
          • 7.6.2.3.2 Skip-grams
        • 7.6.2.4 Dictionary
          • 7.6.2.4.1 Create dictionary
          • 7.6.2.4.2 Token lookup
          • 7.6.2.4.3 DFM
        • 7.6.2.5 Part of Speech tagging
          • 7.6.2.5.1 Download and load language model
          • 7.6.2.5.2 Tokenise, tag, dependency parsing
          • 7.6.2.5.3 Dependency parsing
        • 7.6.2.6 Feature co-occurrence matrix (FCM)
          • 7.6.2.6.1 Computing number of co-occurrences
          • 7.6.2.6.2 Features co-occurrences
    • 7.7 Single web page
      • 7.7.1 Read_html
      • 7.7.2 Extract headline
      • 7.7.3 Extract text
      • 7.7.4 Create a corpus
        • 7.7.4.1 Summary
        • 7.7.4.2 Accessing parts of corpus
        • 7.7.4.3 Document-level information
      • 7.7.5 Advanced manipulations
        • 7.7.5.1 Tokens
          • 7.7.5.1.1 With punctuations
          • 7.7.5.1.2 Without punctuations
        • 7.7.5.2 Compound words
          • 7.7.5.2.1 kwic Phrase
          • 7.7.5.2.2 Compounds
        • 7.7.5.3 N-grams
          • 7.7.5.3.1 Multi-grams
          • 7.7.5.3.2 Skip-grams
        • 7.7.5.4 Dictionary
          • 7.7.5.4.1 Create dictionary
          • 7.7.5.4.2 Token lookup
          • 7.7.5.4.3 DFM
        • 7.7.5.5 Part of Speech tagging
          • 7.7.5.5.1 Download and load language model
          • 7.7.5.5.2 Tokenise, tag, dependency parsing
          • 7.7.5.5.3 Dependency parsing
        • 7.7.5.6 Feature co-occurrence matrix (FCM)
          • 7.7.5.6.1 Computing number of co-occurrences
          • 7.7.5.6.2 Features co-occurrences
    • 7.8 Multiple webpages
      • 7.8.1 Read_html
      • 7.8.2 Extract headline
      • 7.8.3 Extract subpages
      • 7.8.4 Extract text
      • 7.8.5 Create a corpus
        • 7.8.5.1 Summary
        • 7.8.5.2 Accessing parts of corpus
        • 7.8.5.3 Document-level information
      • 7.8.6 Advanced manipulations
        • 7.8.6.1 Tokens
          • 7.8.6.1.1 With punctuations
          • 7.8.6.1.2 Without punctuations
        • 7.8.6.2 Compound words
          • 7.8.6.2.1 kwic Phrase
          • 7.8.6.2.2 Compounds
        • 7.8.6.3 N-grams
          • 7.8.6.3.1 Multi-grams
          • 7.8.6.3.2 Skip-grams
        • 7.8.6.4 Dictionary
          • 7.8.6.4.1 Create dictionary
          • 7.8.6.4.2 Token lookup
          • 7.8.6.4.3 DFM
        • 7.8.6.5 Part of Speech tagging
          • 7.8.6.5.1 Download and load language model
          • 7.8.6.5.2 Tokenise, tag, dependency parsing
          • 7.8.6.5.3 Dependency parsing
        • 7.8.6.6 Feature co-occurrence matrix (FCM)
          • 7.8.6.6.1 Computing number of co-occurrences
          • 7.8.6.6.2 Features co-occurrences
    • 7.9 session info
  • 8 Corpus analysis and statistics
    • 8.1 Loading packages
    • 8.2 janeaustenr
      • 8.2.1 Import into Global Environment
      • 8.2.2 Look into data
      • 8.2.3 Transform to a dataframe
      • 8.2.4 Create a corpus
        • 8.2.4.1 Summary
        • 8.2.4.2 Accessing parts of corpus
      • 8.2.5 Tokens
        • 8.2.5.1 With punctuations
        • 8.2.5.2 Without punctuations
      • 8.2.6 Stop words
      • 8.2.7 Statistical analyses
        • 8.2.7.1 Simple frequency analysis
          • 8.2.7.1.1 DFM
          • 8.2.7.1.2 Frequencies
          • 8.2.7.1.3 Plot
        • 8.2.7.2 Lexical diversity
          • 8.2.7.2.1 TTR (Type-Token Ratio)
            • 8.2.7.2.1.1 Computing TTR
            • 8.2.7.2.1.2 Plotting TTR
          • 8.2.7.2.2 CTTR (Corrected Type-Token Ratio)
            • 8.2.7.2.2.1 Computing CTTR
            • 8.2.7.2.2.2 Plotting TTR
          • 8.2.7.2.3 K (Yule’s K)
            • 8.2.7.2.3.1 Computing K
            • 8.2.7.2.3.2 Plotting K
        • 8.2.7.3 Keyness - relative frequency analysis
          • 8.2.7.3.1 Computing keyness
          • 8.2.7.3.2 Plotting
        • 8.2.7.4 Collocations - scoring multi-word expressions
        • 8.2.7.5 Word clouds
        • 8.2.7.6 Network of feature co-occurrences
        • 8.2.7.7 Poisson regression
          • 8.2.7.7.1 Computing GLM
          • 8.2.7.7.2 Visualising coefficients
            • 8.2.7.7.2.1 A plot
            • 8.2.7.7.2.2 A plot + a table + 95% CI
    • 8.3 proustr
      • 8.3.1 Import into Global Environment
      • 8.3.2 Look into data
      • 8.3.3 Create a corpus
        • 8.3.3.1 Summary
        • 8.3.3.2 Accessing parts of corpus
        • 8.3.3.3 Document-level information
        • 8.3.3.4 Unique variable names (for volume)
      • 8.3.4 Tokens
        • 8.3.4.1 With punctuations
        • 8.3.4.2 Without punctuations
      • 8.3.5 Stop words
      • 8.3.6 Statistical analyses
        • 8.3.6.1 Simple frequency analysis
          • 8.3.6.1.1 DFM
          • 8.3.6.1.2 Frequencies
          • 8.3.6.1.3 Plot
        • 8.3.6.2 Lexical diversity
          • 8.3.6.2.1 TTR (Type-Token Ratio)
            • 8.3.6.2.1.1 Computing TTR
            • 8.3.6.2.1.2 Plotting TTR
          • 8.3.6.2.2 CTTR (Corrected Type-Token Ratio)
            • 8.3.6.2.2.1 Computing CTTR
            • 8.3.6.2.2.2 Plotting TTR
          • 8.3.6.2.3 K (Yule’s K)
            • 8.3.6.2.3.1 Computing K
            • 8.3.6.2.3.2 Plotting K
        • 8.3.6.3 Keyness - relative frequency analysis
          • 8.3.6.3.1 Computing keyness
          • 8.3.6.3.2 Plotting
        • 8.3.6.4 Collocations - scoring multi-word expressions
        • 8.3.6.5 Word clouds
        • 8.3.6.6 Network of feature co-occurrences
        • 8.3.6.7 Poisson regression
          • 8.3.6.7.1 Computing GLM
          • 8.3.6.7.2 Visualising coefficients
            • 8.3.6.7.2.1 A plot
            • 8.3.6.7.2.2 A plot + a table + 95% CI
    • 8.4 Inaugural Corpus USA
      • 8.4.1 Importing data
      • 8.4.2 Create a corpus
        • 8.4.2.1 Summary
        • 8.4.2.2 Editing docnames
        • 8.4.2.3 Accessing parts of corpus
        • 8.4.2.4 Unique variable names (for volume)
      • 8.4.3 Tokens
        • 8.4.3.1 With punctuations
        • 8.4.3.2 Without punctuations
      • 8.4.4 Stop words
      • 8.4.5 Statistical analyses
        • 8.4.5.1 Simple frequency analysis
          • 8.4.5.1.1 DFM
          • 8.4.5.1.2 Frequencies
          • 8.4.5.1.3 Plot
        • 8.4.5.2 Lexical diversity
          • 8.4.5.2.1 TTR (Type-Token Ratio)
            • 8.4.5.2.1.1 Computing TTR
            • 8.4.5.2.1.2 Plotting TTR
          • 8.4.5.2.2 CTTR (Corrected Type-Token Ratio)
            • 8.4.5.2.2.1 Computing CTTR
            • 8.4.5.2.2.2 Plotting TTR
          • 8.4.5.2.3 K (Yule’s K)
            • 8.4.5.2.3.1 Computing K
            • 8.4.5.2.3.2 Plotting K
        • 8.4.5.3 Keyness - relative frequency analysis
          • 8.4.5.3.1 Computing keyness
          • 8.4.5.3.2 Plotting
        • 8.4.5.4 Collocations - scoring multi-word expressions
        • 8.4.5.5 Word clouds
        • 8.4.5.6 Network of feature co-occurrences
        • 8.4.5.7 Poisson regression
          • 8.4.5.7.1 Computing GLM
          • 8.4.5.7.2 Visualising coefficients
            • 8.4.5.7.2.1 A plot
            • 8.4.5.7.2.2 A plot + a table + 95% CI
    • 8.5 Universal Declaration of Human Rights
      • 8.5.1 Importing data
      • 8.5.2 Create a corpus
        • 8.5.2.1 Summary
        • 8.5.2.2 Accessing parts of corpus
        • 8.5.2.3 Document-level information
      • 8.5.3 Tokens
        • 8.5.3.1 With punctuations
        • 8.5.3.2 Without punctuations
      • 8.5.4 Stop words
      • 8.5.5 Statistical analyses
        • 8.5.5.1 Simple frequency analysis
          • 8.5.5.1.1 DFM
          • 8.5.5.1.2 Frequencies
          • 8.5.5.1.3 Plot
        • 8.5.5.2 Lexical diversity
          • 8.5.5.2.1 TTR (Type-Token Ratio)
            • 8.5.5.2.1.1 Computing TTR
            • 8.5.5.2.1.2 Plotting TTR
          • 8.5.5.2.2 CTTR (Corrected Type-Token Ratio)
            • 8.5.5.2.2.1 Computing CTTR
            • 8.5.5.2.2.2 Plotting TTR
          • 8.5.5.2.3 K (Yule’s K)
            • 8.5.5.2.3.1 Computing K
            • 8.5.5.2.3.2 Plotting K
        • 8.5.5.3 Keyness - relative frequency analysis
          • 8.5.5.3.1 Computing keyness
          • 8.5.5.3.2 Plotting
        • 8.5.5.4 Collocations - scoring multi-word expressions
        • 8.5.5.5 Word clouds
        • 8.5.5.6 Network of feature co-occurrences
        • 8.5.5.7 Poisson regression
          • 8.5.5.7.1 Computing GLM
          • 8.5.5.7.2 Visualising coefficients
            • 8.5.5.7.2.1 A plot
            • 8.5.5.7.2.2 A plot + a table + 95% CI
    • 8.6 Twitter data
      • 8.6.1 Importing data
      • 8.6.2 Create a corpus
        • 8.6.2.1 Summary
        • 8.6.2.2 Accessing parts of corpus
        • 8.6.2.3 Document-level information
        • 8.6.2.4 Unique variable names (for volume)
      • 8.6.3 Tokens
        • 8.6.3.1 With punctuations
        • 8.6.3.2 Without punctuations
      • 8.6.4 Stop words
      • 8.6.5 Statistical analyses
        • 8.6.5.1 Simple frequency analysis
          • 8.6.5.1.1 DFM
          • 8.6.5.1.2 Frequencies
          • 8.6.5.1.3 Plot
        • 8.6.5.2 Lexical diversity
          • 8.6.5.2.1 TTR (Type-Token Ratio)
            • 8.6.5.2.1.1 Computing TTR
            • 8.6.5.2.1.2 Plotting TTR
          • 8.6.5.2.2 CTTR (Corrected Type-Token Ratio)
            • 8.6.5.2.2.1 Computing CTTR
            • 8.6.5.2.2.2 Plotting TTR
          • 8.6.5.2.3 K (Yule’s K)
            • 8.6.5.2.3.1 Computing K
            • 8.6.5.2.3.2 Plotting K
        • 8.6.5.3 Keyness - relative frequency analysis
          • 8.6.5.3.1 Computing keyness
          • 8.6.5.3.2 Plotting
        • 8.6.5.4 Collocations - scoring multi-word expressions
        • 8.6.5.5 Word clouds
        • 8.6.5.6 Network of feature co-occurrences
        • 8.6.5.7 Poisson regression
          • 8.6.5.7.1 Computing GLM
          • 8.6.5.7.2 Visualising coefficients
            • 8.6.5.7.2.1 A plot
            • 8.6.5.7.2.2 A plot + a table + 95% CI
    • 8.7 Single web page
      • 8.7.1 Read_html
      • 8.7.2 Extract headline
      • 8.7.3 Extract text
      • 8.7.4 Create a corpus
        • 8.7.4.1 Summary
        • 8.7.4.2 Accessing parts of corpus
        • 8.7.4.3 Document-level information
      • 8.7.5 Tokens
        • 8.7.5.1 With punctuations
        • 8.7.5.2 Without punctuations
      • 8.7.6 Stop words
      • 8.7.7 Statistical analyses
        • 8.7.7.1 Simple frequency analysis
          • 8.7.7.1.1 DFM
          • 8.7.7.1.2 Frequencies
          • 8.7.7.1.3 Plot
        • 8.7.7.2 Lexical diversity
          • 8.7.7.2.1 TTR (Type-Token Ratio)
            • 8.7.7.2.1.1 Computing TTR
            • 8.7.7.2.1.2 Plotting TTR
          • 8.7.7.2.2 CTTR (Corrected Type-Token Ratio)
            • 8.7.7.2.2.1 Computing CTTR
            • 8.7.7.2.2.2 Plotting TTR
          • 8.7.7.2.3 K (Yule’s K)
            • 8.7.7.2.3.1 Computing K
            • 8.7.7.2.3.2 Plotting K
        • 8.7.7.3 Keyness - relative frequency analysis
          • 8.7.7.3.1 Computing keyness
          • 8.7.7.3.2 Plotting
        • 8.7.7.4 Collocations - scoring multi-word expressions
        • 8.7.7.5 Word clouds
        • 8.7.7.6 Network of feature co-occurrences
        • 8.7.7.7 Poisson regression
          • 8.7.7.7.1 Computing GLM
          • 8.7.7.7.2 Visualising coefficients
            • 8.7.7.7.2.1 A plot
            • 8.7.7.7.2.2 A plot + a table + 95% CI
    • 8.8 Multiple webpages
      • 8.8.1 Read_html
      • 8.8.2 Extract headline
      • 8.8.3 Extract subpages
      • 8.8.4 Extract text
      • 8.8.5 Create a corpus
        • 8.8.5.1 Summary
        • 8.8.5.2 Accessing parts of corpus
        • 8.8.5.3 Document-level information
      • 8.8.6 Tokens
        • 8.8.6.1 With punctuations
        • 8.8.6.2 Without punctuations
      • 8.8.7 Stop words
      • 8.8.8 Statistical analyses
        • 8.8.8.1 Simple frequency analysis
          • 8.8.8.1.1 DFM
          • 8.8.8.1.2 Frequencies
          • 8.8.8.1.3 Plot
        • 8.8.8.2 Lexical diversity
          • 8.8.8.2.1 TTR (Type-Token Ratio)
            • 8.8.8.2.1.1 Computing TTR
            • 8.8.8.2.1.2 Plotting TTR
          • 8.8.8.2.2 CTTR (Corrected Type-Token Ratio)
            • 8.8.8.2.2.1 Computing CTTR
            • 8.8.8.2.2.2 Plotting TTR
          • 8.8.8.2.3 K (Yule’s K)
            • 8.8.8.2.3.1 Computing K
            • 8.8.8.2.3.2 Plotting K
        • 8.8.8.3 Keyness - relative frequency analysis
          • 8.8.8.3.1 Computing keyness
          • 8.8.8.3.2 Plotting
        • 8.8.8.4 Collocations - scoring multi-word expressions
        • 8.8.8.5 Word clouds
        • 8.8.8.6 Network of feature co-occurrences
        • 8.8.8.7 Poisson regression
          • 8.8.8.7.1 Computing GLM
          • 8.8.8.7.2 Visualising coefficients
            • 8.8.8.7.2.1 A plot
            • 8.8.8.7.2.2 A plot + a table + 95% CI
    • 8.9 session info
  • 9 Reducing Dimensionality: Correlations, Principal Component Analysis, cluster analysis and Multidimensional scaling
    • 9.1 Loading packages
    • 9.2 Read dataset
    • 9.3 Correlation tests
      • 9.3.1 Correlation plots
        • 9.3.1.1 Correlation plots 1
        • 9.3.1.2 Correlation plots 2
      • 9.3.2 Reduce dimensionality by selecting uncorrelated predictors
        • 9.3.2.1 R² = 0.9
        • 9.3.2.2 R² = 0.75
        • 9.3.2.3 R² = 0.5
        • 9.3.2.4 R² = 0.25
        • 9.3.2.5 R² = 0.1
    • 9.4 Principal Component Analyses (PCA)
      • 9.4.1 Model specification
      • 9.4.2 Scree plot
      • 9.4.3 Results
        • 9.4.3.1 Summary of results
        • 9.4.3.2 Contribution of predictors and groups
        • 9.4.3.3 Contribution of variables
          • 9.4.3.3.1 Dimension 1
          • 9.4.3.3.2 Dimension 2
          • 9.4.3.3.3 Dimension 3
          • 9.4.3.3.4 Dimension 4
          • 9.4.3.3.5 Dimension 5
      • 9.4.4 Plots
        • 9.4.4.1 PCA Individuals
        • 9.4.4.2 PCA Biplot 1:2
        • 9.4.4.3 PCA Biplot 3:4
      • 9.4.5 Clustering
      • 9.4.6 3-D By Groups
    • 9.5 Kmeans Clustering
      • 9.5.1 Nb Clusters
        • 9.5.1.1 Computing clusters
        • 9.5.1.2 Dendograms
        • 9.5.1.3 Plots
        • 9.5.1.4 Individual Clusters
      • 9.5.2 Conclusion
    • 9.6 Multidimensional scaling
      • 9.6.1 Computing MDS
      • 9.6.2 Kmeans clustering
      • 9.6.3 Plot
    • 9.7 session info
  • 10 Decision trees and Random Forests
    • 10.1 What is predictive modelling
    • 10.2 Loading packages
    • 10.3 Read dataset
    • 10.4 GLM as a classification tool
      • 10.4.1 Model specification
      • 10.4.2 Plogis
      • 10.4.3 Model predictions
    • 10.5 Issues with GLM (and regression analyses in general)
      • 10.5.1 Correlation tests
      • 10.5.2 Plots to visualise the data
      • 10.5.3 GLM on correlated data
    • 10.6 Decision Trees
      • 10.6.1 Individual trees
        • 10.6.1.1 Tree 1
      • 10.6.2 Model 1
        • 10.6.2.1 Model specification
        • 10.6.2.2 Predictions from the full model
        • 10.6.2.3 Model 2
    • 10.7 Random Forests
      • 10.7.1 Party
        • 10.7.1.1 Model specification
        • 10.7.1.2 Predictions
        • 10.7.1.3 Variable Importance Scores
          • 10.7.1.3.1 Non-conditional permutation tests
          • 10.7.1.3.2 Conditional permutation tests
        • 10.7.1.4 Conclusion
      • 10.7.2 Ranger
        • 10.7.2.1 Declare parallel computing
        • 10.7.2.2 Model specification
        • 10.7.2.3 Going further
          • 10.7.2.3.1 Create a training and a testing set
          • 10.7.2.3.2 Model specification
          • 10.7.2.3.3 Predictions
          • 10.7.2.3.4 Variable Importance Scores
            • 10.7.2.3.4.1 Default
            • 10.7.2.3.4.2 With p values
      • 10.7.3 Random forests with Tidymodels
        • 10.7.3.1 Training and testing sets
        • 10.7.3.2 Set for cross-validation
        • 10.7.3.3 Model Specification
          • 10.7.3.3.1 Recipe
          • 10.7.3.3.2 Setting the engine
          • 10.7.3.3.3 Settings for tuning
          • 10.7.3.3.4 Workflow
          • 10.7.3.3.5 Tuning and running model
          • 10.7.3.3.6 Finalise model
        • 10.7.3.4 Results
          • 10.7.3.4.1 Cross-validation on training set
            • 10.7.3.4.1.1 Accuracy
            • 10.7.3.4.1.2 ROC-AUC
            • 10.7.3.4.1.3 Sensitivity
            • 10.7.3.4.1.4 Specificity
            • 10.7.3.4.1.5 F-measure
            • 10.7.3.4.1.6 Precision
            • 10.7.3.4.1.7 Recall
          • 10.7.3.4.2 Predictions testing set
            • 10.7.3.4.2.1 Overall
            • 10.7.3.4.2.2 Accuracy
            • 10.7.3.4.2.3 ROC-AUC
          • 10.7.3.4.3 Confusion Matrix training set
          • 10.7.3.4.4 Variable Importance
            • 10.7.3.4.4.1 Best 10
            • 10.7.3.4.4.2 All predictors
            • 10.7.3.4.4.3 P-values computed with permutation importance (PIMP)
          • 10.7.3.4.5 Gains curves
          • 10.7.3.4.6 ROC Curves
    • 10.8 session info

Statistics for Linguists using R - Quantitative and qualitative approaches

4.5 Cumulative Logit Link Mixed-effects Models

These models work perfectly with rating data. Ratings are inherently ordered, 1, 2, … n, and expect to observe an increase (or decrease) in overall ratings from 1 to n. To demonstrate this, we will use an example using the package “ordinal”.

We use two datasets. We previously ran these two models, however, in this subset of the full dataset, we did not take into account the fact that there were multiple producing speakers and items.

4.5.1 Ratings of percept of nasality

The first comes from a likert-scale a rating experiment where six participants rated the percept of nasality in the production of particular consonants in Arabic. The data came from nine producing subjects. The ratings were from 1 to 5, with 1 reflecting an oral percept; 5 a nasal percept.

4.5.1.1 Importing and pre-processing

We start by importing the data and process it. We change the reference level in the predictor

rating <- read_csv("data/rating.csv")[-1]
rating
## # A tibble: 405 × 5
##    Response Context   Subject Item    Rater
##       <dbl> <chr>     <chr>   <chr>   <chr>
##  1        2 n-3       p04     noo3-w  R01  
##  2        4 isolation p04     noo3-v  R01  
##  3        2 o-3       p04     djuu3-w R01  
##  4        4 isolation p04     djuu3-v R01  
##  5        3 n-7       p04     nuu7-w  R01  
##  6        3 isolation p04     nuu7-v  R01  
##  7        1 3--3      p04     3oo3-w  R01  
##  8        2 isolation p04     3oo3-v  R01  
##  9        2 o-7       p04     loo7-w  R01  
## 10        1 o-3       p04     bii3-w  R01  
## # ℹ 395 more rows
rating <- rating %>% 
  mutate(Response = factor(Response),
         Context = factor(Context),
         Subject = factor(Subject),
         Item = factor(Item)) %>% 
  mutate(Context = relevel(Context, "isolation"))
rating %>% 
  head(10)
## # A tibble: 10 × 5
##    Response Context   Subject Item    Rater
##    <fct>    <fct>     <fct>   <fct>   <chr>
##  1 2        n-3       p04     noo3-w  R01  
##  2 4        isolation p04     noo3-v  R01  
##  3 2        o-3       p04     djuu3-w R01  
##  4 4        isolation p04     djuu3-v R01  
##  5 3        n-7       p04     nuu7-w  R01  
##  6 3        isolation p04     nuu7-v  R01  
##  7 1        3--3      p04     3oo3-w  R01  
##  8 2        isolation p04     3oo3-v  R01  
##  9 2        o-7       p04     loo7-w  R01  
## 10 1        o-3       p04     bii3-w  R01

4.5.1.2 Model specifications

4.5.1.2.1 No random effects

We run our first clm model as a simple, i.e., with no random effects

system.time(mdl.clm <- rating %>% 
  clm(Response ~ Context, data = .))
##    user  system elapsed 
##    0.01    0.00    0.00
summary(mdl.clm)
## formula: Response ~ Context
## data:    .
## 
##  link  threshold nobs logLik  AIC     niter max.grad cond.H 
##  logit flexible  405  -526.16 1086.31 5(0)  3.61e-09 1.3e+02
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## Context3--3  -0.1384     0.5848  -0.237   0.8130    
## Context3-n    3.5876     0.4721   7.600 2.96e-14 ***
## Context3-o   -0.4977     0.3859  -1.290   0.1971    
## Context7-n    2.3271     0.5079   4.582 4.60e-06 ***
## Context7-o    0.2904     0.4002   0.726   0.4680    
## Contextn-3    2.8957     0.6685   4.331 1.48e-05 ***
## Contextn-7    2.2678     0.4978   4.556 5.22e-06 ***
## Contextn-n    2.8697     0.4317   6.647 2.99e-11 ***
## Contextn-o    3.5152     0.4397   7.994 1.30e-15 ***
## Contexto-3   -0.2540     0.4017  -0.632   0.5272    
## Contexto-7   -0.6978     0.3769  -1.851   0.0641 .  
## Contexto-n    2.9640     0.4159   7.126 1.03e-12 ***
## Contexto-o   -0.6147     0.3934  -1.562   0.1182    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2  -1.4615     0.2065  -7.077
## 2|3   0.4843     0.1824   2.655
## 3|4   1.5492     0.2044   7.578
## 4|5   3.1817     0.2632  12.089
4.5.1.2.2 Random effects 1 - Intercepts only

We run our first clmm model as a simple, i.e., with random intercepts

system.time(mdl.clmm.Int <- rating %>% 
  clmm(Response ~ Context + (1|Subject) + (1|Item), data = .))
##    user  system elapsed 
##    2.97    0.13    3.18
summary(mdl.clmm.Int)
## Cumulative Link Mixed Model fitted with the Laplace approximation
## 
## formula: Response ~ Context + (1 | Subject) + (1 | Item)
## data:    .
## 
##  link  threshold nobs logLik  AIC     niter      max.grad cond.H 
##  logit flexible  405  -520.89 1079.79 1395(2832) 7.38e-04 1.2e+02
## 
## Random effects:
##  Groups  Name        Variance  Std.Dev. 
##  Item    (Intercept) 1.387e-13 3.724e-07
##  Subject (Intercept) 1.942e-01 4.407e-01
## Number of groups:  Item 45,  Subject 9 
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## Context3--3  -0.1634     0.5798  -0.282   0.7781    
## Context3-n    3.6779     0.4804   7.657 1.91e-14 ***
## Context3-o   -0.5156     0.3873  -1.331   0.1831    
## Context7-n    2.3775     0.5185   4.585 4.54e-06 ***
## Context7-o    0.3279     0.4046   0.810   0.4177    
## Contextn-3    3.0361     0.6677   4.547 5.43e-06 ***
## Contextn-7    2.3598     0.4925   4.792 1.65e-06 ***
## Contextn-n    2.9633     0.4339   6.830 8.52e-12 ***
## Contextn-o    3.6644     0.4495   8.153 3.56e-16 ***
## Contexto-3   -0.2772     0.4000  -0.693   0.4883    
## Contexto-7   -0.7334     0.3800  -1.930   0.0536 .  
## Contexto-n    3.0672     0.4220   7.268 3.65e-13 ***
## Contexto-o   -0.6505     0.4015  -1.620   0.1052    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2  -1.5141     0.2554  -5.928
## 2|3   0.5077     0.2358   2.153
## 3|4   1.6039     0.2538   6.319
## 4|5   3.2921     0.3072  10.718
4.5.1.2.3 Random effects 2 - Intercepts and Slopes

We run our second clmm model as a simple, i.e., with random intercepts and random slopes. Because the model will run for a while, we added an if condition to say f the model was run previously, simply load the rds file rather than running it.

system.time(mdl.clmm.Slope <- rating %>% 
                                       clmm(Response ~ Context + (Context|Subject) + (1|Item), data = .))
##    user  system elapsed 
##  719.67   29.11  754.31
summary(mdl.clmm.Slope)
## Cumulative Link Mixed Model fitted with the Laplace approximation
## 
## formula: Response ~ Context + (Context | Subject) + (1 | Item)
## data:    .
## 
##  link  threshold nobs logLik  AIC     niter         max.grad cond.H
##  logit flexible  405  -492.63 1231.27 51581(203526) 3.05e-01 NaN   
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr                              
##  Item    (Intercept) 0.05821  0.2413                                     
##  Subject (Intercept) 0.67506  0.8216                                     
##          Context3--3 2.16140  1.4702   -0.738                            
##          Context3-n  4.31779  2.0779   -0.216  0.542                     
##          Context3-o  1.26733  1.1258   -0.655  0.980  0.619              
##          Context7-n  4.23790  2.0586   -0.532  0.734  0.342  0.711       
##          Context7-o  1.48091  1.2169   -0.797  0.740  0.021  0.671  0.425
##          Contextn-3  7.73428  2.7811   -0.196  0.639  0.810  0.734  0.155
##          Contextn-7  4.65365  2.1572   -0.234  0.489  0.510  0.528 -0.124
##          Contextn-n  3.33897  1.8273   -0.632  0.877  0.649  0.904  0.397
##          Contextn-o  2.12414  1.4574   -0.719  0.627  0.000  0.593  0.128
##          Contexto-3  0.97831  0.9891   -0.108  0.431  0.418  0.478 -0.228
##          Contexto-7  1.42516  1.1938   -0.879  0.695  0.162  0.618  0.443
##          Contexto-n  2.60347  1.6135   -0.866  0.845  0.361  0.783  0.570
##          Contexto-o  1.74444  1.3208   -0.883  0.616  0.265  0.532  0.729
##                                                          
##                                                          
##                                                          
##                                                          
##                                                          
##                                                          
##                                                          
##                                                          
##   0.280                                                  
##   0.516  0.819                                           
##   0.723  0.852  0.817                                    
##   0.909  0.396  0.623  0.739                             
##   0.420  0.820  0.960  0.744  0.593                      
##   0.937  0.241  0.479  0.702  0.808  0.307               
##   0.916  0.423  0.554  0.824  0.765  0.395  0.965        
##   0.548 -0.015 -0.095  0.387  0.337 -0.272  0.719  0.724 
## Number of groups:  Item 45,  Subject 9 
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## Context3--3  -0.3011        NaN     NaN      NaN
## Context3-n    4.5714        NaN     NaN      NaN
## Context3-o   -0.6035        NaN     NaN      NaN
## Context7-n    2.4888        NaN     NaN      NaN
## Context7-o    0.3444        NaN     NaN      NaN
## Contextn-3    4.1471        NaN     NaN      NaN
## Contextn-7    2.8862        NaN     NaN      NaN
## Contextn-n    3.5184        NaN     NaN      NaN
## Contextn-o    4.3856        NaN     NaN      NaN
## Contexto-3   -0.3771        NaN     NaN      NaN
## Contexto-7   -0.8558        NaN     NaN      NaN
## Contexto-n    3.5827        NaN     NaN      NaN
## Contexto-o   -0.6766        NaN     NaN      NaN
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2  -1.7329        NaN     NaN
## 2|3   0.5451        NaN     NaN
## 3|4   1.8206        NaN     NaN
## 4|5   3.9534        NaN     NaN

4.5.1.3 Testing significance

We can evaluate whether “Context” improves the model fit, by comparing a null model with our model. Of course “Context” is improving the model fit.

mdl.clm.Null <- rating %>% 
  clm(Response ~ 1, data = .)
4.5.1.3.1 Null vs no random
anova(mdl.clm, mdl.clm.Null)
## Likelihood ratio tests of cumulative link models:
##  
##              formula:           link: threshold:
## mdl.clm.Null Response ~ 1       logit flexible  
## mdl.clm      Response ~ Context logit flexible  
## 
##              no.par    AIC  logLik LR.stat df Pr(>Chisq)    
## mdl.clm.Null      4 1281.1 -636.56                          
## mdl.clm          17 1086.3 -526.16   220.8 13  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4.5.1.3.2 No random vs Random Intercepts
anova(mdl.clm, mdl.clmm.Int)
## Likelihood ratio tests of cumulative link models:
##  
##              formula:                                        link: threshold:
## mdl.clm      Response ~ Context                              logit flexible  
## mdl.clmm.Int Response ~ Context + (1 | Subject) + (1 | Item) logit flexible  
## 
##              no.par    AIC  logLik LR.stat df Pr(>Chisq)   
## mdl.clm          17 1086.3 -526.16                         
## mdl.clmm.Int     19 1079.8 -520.89  10.525  2   0.005182 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4.5.1.3.3 No random vs Random Intercepts
anova(mdl.clmm.Int, mdl.clmm.Slope)
## Likelihood ratio tests of cumulative link models:
##  
##                formula:                                              link:
## mdl.clmm.Int   Response ~ Context + (1 | Subject) + (1 | Item)       logit
## mdl.clmm.Slope Response ~ Context + (Context | Subject) + (1 | Item) logit
##                threshold:
## mdl.clmm.Int   flexible  
## mdl.clmm.Slope flexible  
## 
##                no.par    AIC  logLik LR.stat  df Pr(>Chisq)
## mdl.clmm.Int       19 1079.8 -520.89                       
## mdl.clmm.Slope    123 1231.3 -492.63  56.521 104          1

The model comparison above shows that using random intercepts is enough in our case. By subject Random Slopes are not needed; subjects “seem” to show similarities in how they produced the items. In our publication, by Rater Random Slopes for context were needed.

4.5.1.4 Model’s fit

print(tab_model(mdl.clmm.Int, file = paste0("outputs/mdl.clmm.Int.html")))
htmltools::includeHTML("outputs/mdl.clmm.Int.html")
  Response
Predictors Odds Ratios CI p
1|2 0.22 0.13 – 0.36 <0.001
2|3 1.66 1.05 – 2.64 0.031
3|4 4.97 3.02 – 8.18 <0.001
4|5 26.90 14.73 – 49.11 <0.001
Context3--3 0.85 0.27 – 2.65 0.778
Context [3-n] 39.56 15.43 – 101.43 <0.001
Context [3-o] 0.60 0.28 – 1.28 0.183
Context [7-n] 10.78 3.90 – 29.78 <0.001
Context [7-o] 1.39 0.63 – 3.07 0.418
Contextn-3 20.82 5.63 – 77.07 <0.001
Contextn-7 10.59 4.03 – 27.80 <0.001
Context [n-n] 19.36 8.27 – 45.32 <0.001
Context [n-o] 39.03 16.17 – 94.19 <0.001
Contexto-3 0.76 0.35 – 1.66 0.488
Contexto-7 0.48 0.23 – 1.01 0.054
Context [o-n] 21.48 9.39 – 49.12 <0.001
Context [o-o] 0.52 0.24 – 1.15 0.105
Random Effects
σ2 3.29
τ00 Item 0.00
τ00 Subject 0.19
N Subject 9
N Item 45
Observations 405
Marginal R2 / Conditional R2 0.453 / NA

4.5.1.5 Interpreting a cumulative model

As a way to interpret the model, we can look at the coefficients and make sense of the results. A CLM model is a Logistic model with a cumulative effect. The “Coefficients” are the estimates for each level of the fixed effect; the “Threshold coefficients” are those of the response. For the former, a negative coefficient indicates a negative association with the response; and a positive is positively associated with the response. The p values are indicating the significance of each level. For the “Threshold coefficients”, we can see the cumulative effects of ratings 1|2, 2|3, 3|4 and 4|5 which indicate an overall increase in the ratings from 1 to 5.

4.5.1.6 Plotting

4.5.1.6.1 No confidence intervals

We use a modified version of a plotting function that allows us to visualise the effects. For this, we use the base R plotting functions. The version below is without confidence intervals.

par(oma=c(1, 0, 0, 3),mgp=c(2, 1, 0))
xlimNas = c(min(mdl.clmm.Int$beta), max(mdl.clmm.Int$beta))
ylimNas = c(0,1)
plot(0,0,xlim=xlimNas, ylim=ylimNas, type="n", ylab=expression(Probability), xlab="", xaxt = "n",main="Predicted curves - Nasalisation",cex=2,cex.lab=1.5,cex.main=1.5,cex.axis=1.5)
axis(side = 1, at = c(0,mdl.clmm.Int$beta),labels = levels(rating$Context), las=2,cex=2,cex.lab=1.5,cex.axis=1.5)
xsNas = seq(xlimNas[1], xlimNas[2], length.out=100)
lines(xsNas, plogis(mdl.clmm.Int$Theta[1] - xsNas), col='black')
lines(xsNas, plogis(mdl.clmm.Int$Theta[2] - xsNas)-plogis(mdl.clmm.Int$Theta[1] - xsNas), col='red')
lines(xsNas, plogis(mdl.clmm.Int$Theta[3] - xsNas)-plogis(mdl.clmm.Int$Theta[2] - xsNas), col='green')
lines(xsNas, plogis(mdl.clmm.Int$Theta[4] - xsNas)-plogis(mdl.clmm.Int$Theta[3] - xsNas), col='orange')
lines(xsNas, 1-(plogis(mdl.clmm.Int$Theta[4] - xsNas)), col='blue')
abline(v=c(0,mdl.clmm.Int$beta),lty=3)
abline(h=0, lty="dashed")
abline(h=0.2, lty="dashed")
abline(h=0.4, lty="dashed")
abline(h=0.6, lty="dashed")
abline(h=0.8, lty="dashed")
abline(h=1, lty="dashed")

legend(par('usr')[2], par('usr')[4], bty='n', xpd=NA,lty=1, col=c("black", "red", "green", "orange", "blue"), 
       legend=c("Oral", "2", "3", "4", "Nasal"),cex=0.75)

4.5.1.6.2 With confidence intervals

Here is an attempt to add the 97.5% confidence intervals to these plots. This is an experimental attempt and any feedback is welcome!

par(oma=c(1, 0, 0, 3),mgp=c(2, 1, 0))
xlimNas = c(min(mdl.clmm.Int$beta), max(mdl.clmm.Int$beta))
ylimNas = c(0,1)
plot(0,0,xlim=xlimNas, ylim=ylimNas, type="n", ylab=expression(Probability), xlab="", xaxt = "n",main="Predicted curves - Nasalisation",cex=2,cex.lab=1.5,cex.main=1.5,cex.axis=1.5)
axis(side = 1, at = c(0,mdl.clmm.Int$beta),labels = levels(rating$Context), las=2,cex=2,cex.lab=1.5,cex.axis=1.5)
xsNas = seq(xlimNas[1], xlimNas[2], length.out=100)


#+CI 
lines(xsNas, plogis(mdl.clmm.Int$Theta[1]+(summary(mdl.clmm.Int)$coefficient[,2][[1]]/1.96) - xsNas), col='black')
lines(xsNas, plogis(mdl.clmm.Int$Theta[2]+(summary(mdl.clmm.Int)$coefficient[,2][[2]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[1]+(summary(mdl.clmm.Int)$coefficient[,2][[1]]/1.96) - xsNas), col='red')
lines(xsNas, plogis(mdl.clmm.Int$Theta[3]+(summary(mdl.clmm.Int)$coefficient[,2][[3]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[2]+(summary(mdl.clmm.Int)$coefficient[,2][[2]]/1.96) - xsNas), col='green')
lines(xsNas, plogis(mdl.clmm.Int$Theta[4]+(summary(mdl.clmm.Int)$coefficient[,2][[4]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[3]+(summary(mdl.clmm.Int)$coefficient[,2][[3]]/1.96) - xsNas), col='orange')
lines(xsNas, 1-(plogis(mdl.clmm.Int$Theta[4]+(summary(mdl.clmm.Int)$coefficient[,2][[4]]/1.96) - xsNas)), col='blue')

#-CI 
lines(xsNas, plogis(mdl.clmm.Int$Theta[1]-(summary(mdl.clmm.Int)$coefficient[,2][[1]]/1.96) - xsNas), col='black')
lines(xsNas, plogis(mdl.clmm.Int$Theta[2]-(summary(mdl.clmm.Int)$coefficient[,2][[2]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[1]-(summary(mdl.clmm.Int)$coefficient[,2][[1]]/1.96) - xsNas), col='red')
lines(xsNas, plogis(mdl.clmm.Int$Theta[3]-(summary(mdl.clmm.Int)$coefficient[,2][[3]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[2]-(summary(mdl.clmm.Int)$coefficient[,2][[2]]/1.96) - xsNas), col='green')
lines(xsNas, plogis(mdl.clmm.Int$Theta[4]-(summary(mdl.clmm.Int)$coefficient[,2][[4]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[3]-(summary(mdl.clmm.Int)$coefficient[,2][[3]]/1.96) - xsNas), col='orange')
lines(xsNas, 1-(plogis(mdl.clmm.Int$Theta[4]-(summary(mdl.clmm.Int)$coefficient[,2][[4]]/1.96) - xsNas)), col='blue')

## fill area around CI using c(x, rev(x)), c(y2, rev(y1))
polygon(c(xsNas, rev(xsNas)),
        c(plogis(mdl.clmm.Int$Theta[1]+(summary(mdl.clmm.Int)$coefficient[,2][[1]]/1.96) - xsNas), rev(plogis(mdl.clmm.Int$Theta[1]-(summary(mdl.clmm.Int)$coefficient[,2][[1]]/1.96) - xsNas))), col = "gray90")

polygon(c(xsNas, rev(xsNas)),
        c(plogis(mdl.clmm.Int$Theta[2]+(summary(mdl.clmm.Int)$coefficient[,2][[2]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[1]+(summary(mdl.clmm.Int)$coefficient[,2][[1]]/1.96) - xsNas), rev(plogis(mdl.clmm.Int$Theta[2]-(summary(mdl.clmm.Int)$coefficient[,2][[2]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[1]-(summary(mdl.clmm.Int)$coefficient[,2][[1]]/1.96) - xsNas))), col = "gray90")


polygon(c(xsNas, rev(xsNas)),
        c(plogis(mdl.clmm.Int$Theta[3]+(summary(mdl.clmm.Int)$coefficient[,2][[3]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[2]+(summary(mdl.clmm.Int)$coefficient[,2][[2]]/1.96) - xsNas), rev(plogis(mdl.clmm.Int$Theta[3]-(summary(mdl.clmm.Int)$coefficient[,2][[3]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[2]-(summary(mdl.clmm.Int)$coefficient[,2][[2]]/1.96) - xsNas))), col = "gray90")

polygon(c(xsNas, rev(xsNas)),
        c(plogis(mdl.clmm.Int$Theta[4]+(summary(mdl.clmm.Int)$coefficient[,2][[4]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[3]+(summary(mdl.clmm.Int)$coefficient[,2][[3]]/1.96) - xsNas), rev(plogis(mdl.clmm.Int$Theta[4]-(summary(mdl.clmm.Int)$coefficient[,2][[4]]/1.96) - xsNas)-plogis(mdl.clmm.Int$Theta[3]-(summary(mdl.clmm.Int)$coefficient[,2][[3]]/1.96) - xsNas))), col = "gray90")

        
polygon(c(xsNas, rev(xsNas)),
        c(1-(plogis(mdl.clmm.Int$Theta[4]-(summary(mdl.clmm.Int)$coefficient[,2][[4]]/1.96) - xsNas)), rev(1-(plogis(mdl.clmm.Int$Theta[4]+(summary(mdl.clmm.Int)$coefficient[,2][[4]]/1.96) - xsNas)))), col = "gray90")       

lines(xsNas, plogis(mdl.clmm.Int$Theta[1] - xsNas), col='black')
lines(xsNas, plogis(mdl.clmm.Int$Theta[2] - xsNas)-plogis(mdl.clmm.Int$Theta[1] - xsNas), col='red')
lines(xsNas, plogis(mdl.clmm.Int$Theta[3] - xsNas)-plogis(mdl.clmm.Int$Theta[2] - xsNas), col='green')
lines(xsNas, plogis(mdl.clmm.Int$Theta[4] - xsNas)-plogis(mdl.clmm.Int$Theta[3] - xsNas), col='orange')
lines(xsNas, 1-(plogis(mdl.clmm.Int$Theta[4] - xsNas)), col='blue')
abline(v=c(0,mdl.clmm.Int$beta),lty=3)

abline(h=0, lty="dashed")
abline(h=0.2, lty="dashed")
abline(h=0.4, lty="dashed")
abline(h=0.6, lty="dashed")
abline(h=0.8, lty="dashed")
abline(h=1, lty="dashed")


legend(par('usr')[2], par('usr')[4], bty='n', xpd=NA,lty=1, col=c("black", "red", "green", "orange", "blue"), 
       legend=c("Oral", "2", "3", "4", "Nasal"),cex=0.75)

Check if the results are different between our initial model (with clm) and our new model (with clmm).

4.5.2 Subjective estimates of the weight of the referents of 81 English nouns.

This dataset comes from the LanguageR package. It contains the subjective estimates of the weight of the referents of 81 English nouns. This dataset is a little complex. Data comes from multiple subjects who rated 81 nouns. The nouns are from a a class of animals and plants. The subjects are either males or females.

We can model it in various ways. Here we decided to explore whether the ratings given to a particular word are different, when the class is either animal or a plant and if males rated the nouns differently from males. We will only use subject as a random effect. We also model the contribution of frequency.

4.5.2.1 Importing and pre-processing

weightRatings <- weightRatings %>%
  mutate(Rating = factor(Rating),
         Subject = factor(Subject),
         Sex = factor(Sex),
         Word = factor(Word),
         Class = factor(Class))
weightRatings %>% 
  head(10)
##    Subject Rating Trial Sex       Word Frequency  Class
## 1       A1      5     1   F      horse  7.771910 animal
## 2       A1      1     2   F    gherkin  2.079442  plant
## 3       A1      3     3   F   hedgehog  3.637586 animal
## 4       A1      1     4   F        bee  5.700444 animal
## 5       A1      1     5   F     peanut  4.595120  plant
## 6       A1      2     6   F       pear  4.727388  plant
## 7       A1      3     7   F  pineapple  3.988984  plant
## 8       A1      2     8   F       frog  5.129899 animal
## 9       A1      1     9   F blackberry  4.060443  plant
## 10      A1      3    10   F     pigeon  5.262690 animal

4.5.2.2 Model specifications

4.5.2.2.1 No random effects

We run our first clm model as a simple, i.e., with no random effects

system.time(mdl.clm.1 <- weightRatings %>% 
  clm(Rating ~ Class * Sex  * Frequency, data = .))
##    user  system elapsed 
##    0.01    0.00    0.01
summary(mdl.clm)
## formula: Response ~ Context
## data:    .
## 
##  link  threshold nobs logLik  AIC     niter max.grad cond.H 
##  logit flexible  405  -526.16 1086.31 5(0)  3.61e-09 1.3e+02
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## Context3--3  -0.1384     0.5848  -0.237   0.8130    
## Context3-n    3.5876     0.4721   7.600 2.96e-14 ***
## Context3-o   -0.4977     0.3859  -1.290   0.1971    
## Context7-n    2.3271     0.5079   4.582 4.60e-06 ***
## Context7-o    0.2904     0.4002   0.726   0.4680    
## Contextn-3    2.8957     0.6685   4.331 1.48e-05 ***
## Contextn-7    2.2678     0.4978   4.556 5.22e-06 ***
## Contextn-n    2.8697     0.4317   6.647 2.99e-11 ***
## Contextn-o    3.5152     0.4397   7.994 1.30e-15 ***
## Contexto-3   -0.2540     0.4017  -0.632   0.5272    
## Contexto-7   -0.6978     0.3769  -1.851   0.0641 .  
## Contexto-n    2.9640     0.4159   7.126 1.03e-12 ***
## Contexto-o   -0.6147     0.3934  -1.562   0.1182    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2  -1.4615     0.2065  -7.077
## 2|3   0.4843     0.1824   2.655
## 3|4   1.5492     0.2044   7.578
## 4|5   3.1817     0.2632  12.089
4.5.2.2.2 Random effects 1 - Intercepts only

We run our first model as a simple, i.e., with random intercepts

system.time(mdl.clmm.Int.1 <- weightRatings %>% 
  clmm(Rating ~ Class * Sex  * Frequency + (1|Subject:Word), data = .))
##    user  system elapsed 
##   14.61    0.13   14.80
summary(mdl.clmm.Int)
## Cumulative Link Mixed Model fitted with the Laplace approximation
## 
## formula: Response ~ Context + (1 | Subject) + (1 | Item)
## data:    .
## 
##  link  threshold nobs logLik  AIC     niter      max.grad cond.H 
##  logit flexible  405  -520.89 1079.79 1395(2832) 7.38e-04 1.2e+02
## 
## Random effects:
##  Groups  Name        Variance  Std.Dev. 
##  Item    (Intercept) 1.387e-13 3.724e-07
##  Subject (Intercept) 1.942e-01 4.407e-01
## Number of groups:  Item 45,  Subject 9 
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## Context3--3  -0.1634     0.5798  -0.282   0.7781    
## Context3-n    3.6779     0.4804   7.657 1.91e-14 ***
## Context3-o   -0.5156     0.3873  -1.331   0.1831    
## Context7-n    2.3775     0.5185   4.585 4.54e-06 ***
## Context7-o    0.3279     0.4046   0.810   0.4177    
## Contextn-3    3.0361     0.6677   4.547 5.43e-06 ***
## Contextn-7    2.3598     0.4925   4.792 1.65e-06 ***
## Contextn-n    2.9633     0.4339   6.830 8.52e-12 ***
## Contextn-o    3.6644     0.4495   8.153 3.56e-16 ***
## Contexto-3   -0.2772     0.4000  -0.693   0.4883    
## Contexto-7   -0.7334     0.3800  -1.930   0.0536 .  
## Contexto-n    3.0672     0.4220   7.268 3.65e-13 ***
## Contexto-o   -0.6505     0.4015  -1.620   0.1052    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2  -1.5141     0.2554  -5.928
## 2|3   0.5077     0.2358   2.153
## 3|4   1.6039     0.2538   6.319
## 4|5   3.2921     0.3072  10.718
4.5.2.2.3 Random effects 2 - Intercepts and Slopes

We run our second model as a simple, i.e., with random intercepts and random slopes for subject by class.

system.time(mdl.clmm.Slope.1 <- weightRatings %>% 
  clmm(Rating ~ Class * Sex  * Frequency + (Class|Subject:Word), data = .))
##    user  system elapsed 
##   42.14    1.61   44.05
summary(mdl.clmm.Slope)
## Cumulative Link Mixed Model fitted with the Laplace approximation
## 
## formula: Response ~ Context + (Context | Subject) + (1 | Item)
## data:    .
## 
##  link  threshold nobs logLik  AIC     niter         max.grad cond.H
##  logit flexible  405  -492.63 1231.27 51581(203526) 3.05e-01 NaN   
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr                              
##  Item    (Intercept) 0.05821  0.2413                                     
##  Subject (Intercept) 0.67506  0.8216                                     
##          Context3--3 2.16140  1.4702   -0.738                            
##          Context3-n  4.31779  2.0779   -0.216  0.542                     
##          Context3-o  1.26733  1.1258   -0.655  0.980  0.619              
##          Context7-n  4.23790  2.0586   -0.532  0.734  0.342  0.711       
##          Context7-o  1.48091  1.2169   -0.797  0.740  0.021  0.671  0.425
##          Contextn-3  7.73428  2.7811   -0.196  0.639  0.810  0.734  0.155
##          Contextn-7  4.65365  2.1572   -0.234  0.489  0.510  0.528 -0.124
##          Contextn-n  3.33897  1.8273   -0.632  0.877  0.649  0.904  0.397
##          Contextn-o  2.12414  1.4574   -0.719  0.627  0.000  0.593  0.128
##          Contexto-3  0.97831  0.9891   -0.108  0.431  0.418  0.478 -0.228
##          Contexto-7  1.42516  1.1938   -0.879  0.695  0.162  0.618  0.443
##          Contexto-n  2.60347  1.6135   -0.866  0.845  0.361  0.783  0.570
##          Contexto-o  1.74444  1.3208   -0.883  0.616  0.265  0.532  0.729
##                                                          
##                                                          
##                                                          
##                                                          
##                                                          
##                                                          
##                                                          
##                                                          
##   0.280                                                  
##   0.516  0.819                                           
##   0.723  0.852  0.817                                    
##   0.909  0.396  0.623  0.739                             
##   0.420  0.820  0.960  0.744  0.593                      
##   0.937  0.241  0.479  0.702  0.808  0.307               
##   0.916  0.423  0.554  0.824  0.765  0.395  0.965        
##   0.548 -0.015 -0.095  0.387  0.337 -0.272  0.719  0.724 
## Number of groups:  Item 45,  Subject 9 
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## Context3--3  -0.3011        NaN     NaN      NaN
## Context3-n    4.5714        NaN     NaN      NaN
## Context3-o   -0.6035        NaN     NaN      NaN
## Context7-n    2.4888        NaN     NaN      NaN
## Context7-o    0.3444        NaN     NaN      NaN
## Contextn-3    4.1471        NaN     NaN      NaN
## Contextn-7    2.8862        NaN     NaN      NaN
## Contextn-n    3.5184        NaN     NaN      NaN
## Contextn-o    4.3856        NaN     NaN      NaN
## Contexto-3   -0.3771        NaN     NaN      NaN
## Contexto-7   -0.8558        NaN     NaN      NaN
## Contexto-n    3.5827        NaN     NaN      NaN
## Contexto-o   -0.6766        NaN     NaN      NaN
## 
## Threshold coefficients:
##     Estimate Std. Error z value
## 1|2  -1.7329        NaN     NaN
## 2|3   0.5451        NaN     NaN
## 3|4   1.8206        NaN     NaN
## 4|5   3.9534        NaN     NaN

4.5.2.3 Testing significance

We can evaluate whether “Context” improves the model fit, by comparing a null model with our model. Of course “Context” is improving the model fit.

mdl.clm.Null.1 <- weightRatings %>% 
  clm(Rating ~ 1, data = .)
4.5.2.3.1 Null vs no random
anova(mdl.clm.1, mdl.clm.Null.1)
## Likelihood ratio tests of cumulative link models:
##  
##                formula:                         link: threshold:
## mdl.clm.Null.1 Rating ~ 1                       logit flexible  
## mdl.clm.1      Rating ~ Class * Sex * Frequency logit flexible  
## 
##                no.par    AIC  logLik LR.stat df Pr(>Chisq)    
## mdl.clm.Null.1      6 5430.2 -2709.1                          
## mdl.clm.1          13 4800.2 -2387.1  644.05  7  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
4.5.2.3.2 No random vs Random Intercepts
anova(mdl.clm.1, mdl.clmm.Int.1)
## Likelihood ratio tests of cumulative link models:
##  
##                formula:                                              link:
## mdl.clm.1      Rating ~ Class * Sex * Frequency                      logit
## mdl.clmm.Int.1 Rating ~ Class * Sex * Frequency + (1 | Subject:Word) logit
##                threshold:
## mdl.clm.1      flexible  
## mdl.clmm.Int.1 flexible  
## 
##                no.par    AIC  logLik LR.stat df Pr(>Chisq)
## mdl.clm.1          13 4800.2 -2387.1                      
## mdl.clmm.Int.1     14 4801.5 -2386.8  0.6602  1     0.4165
4.5.2.3.3 Random Intercepts vs Random Slope
anova(mdl.clmm.Int.1, mdl.clmm.Slope.1)
## Likelihood ratio tests of cumulative link models:
##  
##                  formula:                                                 
## mdl.clmm.Int.1   Rating ~ Class * Sex * Frequency + (1 | Subject:Word)    
## mdl.clmm.Slope.1 Rating ~ Class * Sex * Frequency + (Class | Subject:Word)
##                  link: threshold:
## mdl.clmm.Int.1   logit flexible  
## mdl.clmm.Slope.1 logit flexible  
## 
##                  no.par    AIC  logLik LR.stat df Pr(>Chisq)    
## mdl.clmm.Int.1       14 4801.5 -2386.8                          
## mdl.clmm.Slope.1     16 4789.3 -2378.7  16.198  2  0.0003039 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model comparison above shows that using random intercepts is enough in our case. By subject Random Slopes are not needed; subjects “seem” to show similarities in how they produced the items.

4.5.2.4 Model’s fit

print(tab_model(mdl.clmm.Int.1, file = paste0("outputs/mdl.clmm.Int.1.html")))
htmltools::includeHTML("outputs/mdl.clmm.Int.1.html")
  Rating
Predictors Odds Ratios CI p
1|2 0.42 0.35 – 0.50 <0.001
2|3 1.76 1.54 – 2.01 <0.001
3|4 4.67 4.15 – 5.26 <0.001
4|5 9.59 8.57 – 10.72 <0.001
5|6 22.51 22.50 – 22.52 <0.001
6|7 50.70 50.68 – 50.72 <0.001
Class [plant] 0.31 0.16 – 0.60 <0.001
Sex [M] 0.51 0.51 – 0.51 <0.001
Frequency 1.27 1.27 – 1.28 <0.001
Class [plant] × Sex [M] 1.47 0.43 – 4.98 0.535
Class [plant] × Frequency 0.76 0.66 – 0.88 <0.001
Sex [M] × Frequency 1.08 1.08 – 1.08 <0.001
(Class [plant] × Sex [M])
× Frequency
0.92 0.70 – 1.21 0.553
N Subject 20
N Word 81
Observations 1620

4.5.2.5 Interpreting a cumulative model

As a way to interpret the model, we can look at the coefficients and make sense of the results. A CLM model is a Logistic model with a cumulative effect. The “Coefficients” are the estimates for each level of the fixed effect; the “Threshold coefficients” are those of the response. For the former, a negative coefficient indicates a negative association with the response; and a positive is positively associated with the response. The p values are indicating the significance of each level. For the “Threshold coefficients”, we can see the cumulative effects of ratings 1|2, 2|3, 3|4 and 4|5 which indicate an overall increase in the ratings from 1 to 5.

4.5.2.6 Plotting

4.5.2.6.1 No confidence intervals

We use a modified version of a plotting function that allows us to visualise the effects. For this, we use the base R plotting functions. The version below is without confidence intervals.

par(oma = c(4, 0, 0, 3), mgp = c(2, 1, 0))
xlim  =  c(min(mdl.clmm.Int.1$beta), max(mdl.clmm.Int.1$beta))
ylim  =  c(0, 1)
plot(0, 0, xlim = xlim, ylim = ylim, type = "n", ylab = expression(Probability), xlab = "", xaxt = "n", main = "Predicted curves", cex = 2, cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.5)
axis(side = 1, at = mdl.clmm.Int.1$beta, labels = names(mdl.clmm.Int.1$beta), las = 2, cex = 0.75, cex.lab = 0.75, cex.axis = 0.75)
xs  =  seq(xlim[1], xlim[2], length.out = 100)
lines(xs, plogis(mdl.clmm.Int.1$Theta[1] - xs), col = 'black')
lines(xs, plogis(mdl.clmm.Int.1$Theta[2] - xs) - plogis(mdl.clmm.Int.1$Theta[1] - xs), col = 'red')
lines(xs, plogis(mdl.clmm.Int.1$Theta[3] - xs) - plogis(mdl.clmm.Int.1$Theta[2] - xs), col = 'green')
lines(xs, plogis(mdl.clmm.Int.1$Theta[4] - xs) - plogis(mdl.clmm.Int.1$Theta[3] - xs), col = 'orange')
lines(xs, plogis(mdl.clmm.Int.1$Theta[5] - xs) - plogis(mdl.clmm.Int.1$Theta[4] - xs), col = 'yellow')
lines(xs, plogis(mdl.clmm.Int.1$Theta[6] - xs) - plogis(mdl.clmm.Int.1$Theta[5] - xs), col = 'grey')
lines(xs, 1 - (plogis(mdl.clmm.Int.1$Theta[6] - xs)), col = 'blue')
abline(v = c(0,mdl.clmm.Int.1$beta),lty = 3)
abline(h = 0, lty = "dashed")
abline(h = 0.2, lty = "dashed")
abline(h = 0.4, lty = "dashed")
abline(h = 0.6, lty = "dashed")
abline(h = 0.8, lty = "dashed")
abline(h = 1, lty = "dashed")

legend(par('usr')[2], par('usr')[4], bty = 'n', xpd = NA, lty = 1, 
       col = c("black", "red", "green", "orange", "yellow", "grey", "blue"), 
       legend = c("1", "2", "3", "4", "5", "6", "7"), cex = 0.75)

4.5.2.6.2 With confidence intervals

Here is an attempt to add the 97.5% confidence intervals to these plots. This is an experimental attempt and any feedback is welcome!

par(oma = c(4, 0, 0, 3), mgp = c(2, 1, 0))
xlim  =  c(min(mdl.clmm.Int.1$beta), max(mdl.clmm.Int.1$beta))
ylim  =  c(0, 1)
plot(0, 0, xlim = xlim, ylim = ylim, type = "n", ylab = expression(Probability), xlab = "", xaxt = "n", main = "Predicted curves", cex = 2, cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.5)
axis(side = 1, at = mdl.clmm.Int.1$beta, labels = names(mdl.clmm.Int.1$beta), las = 2, cex = 0.75, cex.lab = 0.75, cex.axis = 0.75)
xs  =  seq(xlim[1], xlim[2], length.out = 100)


#+CI 
lines(xs, plogis(mdl.clmm.Int.1$Theta[1]+(summary(mdl.clmm.Int.1)$coefficient[,2][[1]]/1.96) - xs), col='black')
lines(xs, plogis(mdl.clmm.Int.1$Theta[2]+(summary(mdl.clmm.Int.1)$coefficient[,2][[2]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[1]+(summary(mdl.clmm.Int.1)$coefficient[,2][[1]]/1.96) - xs), col='red')
lines(xs, plogis(mdl.clmm.Int.1$Theta[3]+(summary(mdl.clmm.Int.1)$coefficient[,2][[3]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[2]+(summary(mdl.clmm.Int.1)$coefficient[,2][[2]]/1.96) - xs), col='green')
lines(xs, plogis(mdl.clmm.Int.1$Theta[4]+(summary(mdl.clmm.Int.1)$coefficient[,2][[4]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[3]+(summary(mdl.clmm.Int.1)$coefficient[,2][[3]]/1.96) - xs), col='orange')
lines(xs, plogis(mdl.clmm.Int.1$Theta[5]-(summary(mdl.clmm.Int.1)$coefficient[,2][[5]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[4]-(summary(mdl.clmm.Int.1)$coefficient[,2][[4]]/1.96) - xs), col='yellow')
lines(xs, plogis(mdl.clmm.Int.1$Theta[6]-(summary(mdl.clmm.Int.1)$coefficient[,2][[6]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[5]-(summary(mdl.clmm.Int.1)$coefficient[,2][[5]]/1.96) - xs), col='grey')
lines(xs, 1-(plogis(mdl.clmm.Int.1$Theta[6]-(summary(mdl.clmm.Int.1)$coefficient[,2][[6]]/1.96) - xs)), col='blue')

#-CI 
lines(xs, plogis(mdl.clmm.Int.1$Theta[1]-(summary(mdl.clmm.Int.1)$coefficient[,2][[1]]/1.96) - xs), col='black')
lines(xs, plogis(mdl.clmm.Int.1$Theta[2]-(summary(mdl.clmm.Int.1)$coefficient[,2][[2]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[1]-(summary(mdl.clmm.Int.1)$coefficient[,2][[1]]/1.96) - xs), col='red')
lines(xs, plogis(mdl.clmm.Int.1$Theta[3]-(summary(mdl.clmm.Int.1)$coefficient[,2][[3]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[2]-(summary(mdl.clmm.Int.1)$coefficient[,2][[2]]/1.96) - xs), col='green')
lines(xs, plogis(mdl.clmm.Int.1$Theta[4]-(summary(mdl.clmm.Int.1)$coefficient[,2][[4]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[3]-(summary(mdl.clmm.Int.1)$coefficient[,2][[3]]/1.96) - xs), col='orange')
lines(xs, plogis(mdl.clmm.Int.1$Theta[5]-(summary(mdl.clmm.Int.1)$coefficient[,2][[5]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[4]-(summary(mdl.clmm.Int.1)$coefficient[,2][[4]]/1.96) - xs), col='yellow')
lines(xs, plogis(mdl.clmm.Int.1$Theta[6]-(summary(mdl.clmm.Int.1)$coefficient[,2][[6]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[5]-(summary(mdl.clmm.Int.1)$coefficient[,2][[5]]/1.96) - xs), col='grey')
lines(xs, 1-(plogis(mdl.clmm.Int.1$Theta[6]-(summary(mdl.clmm.Int.1)$coefficient[,2][[6]]/1.96) - xs)), col='blue')

## fill area around CI using c(x, rev(x)), c(y2, rev(y1))
polygon(c(xs, rev(xs)),
        c(plogis(mdl.clmm.Int.1$Theta[1]+(summary(mdl.clmm.Int.1)$coefficient[,2][[1]]/1.96) - xs), rev(plogis(mdl.clmm.Int.1$Theta[1]-(summary(mdl.clmm.Int.1)$coefficient[,2][[1]]/1.96) - xs))), col = "gray90")

polygon(c(xs, rev(xs)),
        c(plogis(mdl.clmm.Int.1$Theta[2]+(summary(mdl.clmm.Int.1)$coefficient[,2][[2]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[1]+(summary(mdl.clmm.Int.1)$coefficient[,2][[1]]/1.96) - xs), rev(plogis(mdl.clmm.Int.1$Theta[2]-(summary(mdl.clmm.Int.1)$coefficient[,2][[2]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[1]-(summary(mdl.clmm.Int.1)$coefficient[,2][[1]]/1.96) - xs))), col = "gray90")


polygon(c(xs, rev(xs)),
        c(plogis(mdl.clmm.Int.1$Theta[3]+(summary(mdl.clmm.Int.1)$coefficient[,2][[3]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[2]+(summary(mdl.clmm.Int.1)$coefficient[,2][[2]]/1.96) - xs), rev(plogis(mdl.clmm.Int.1$Theta[3]-(summary(mdl.clmm.Int.1)$coefficient[,2][[3]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[2]-(summary(mdl.clmm.Int.1)$coefficient[,2][[2]]/1.96) - xs))), col = "gray90")

polygon(c(xs, rev(xs)),
        c(plogis(mdl.clmm.Int.1$Theta[4]+(summary(mdl.clmm.Int.1)$coefficient[,2][[4]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[3]+(summary(mdl.clmm.Int.1)$coefficient[,2][[3]]/1.96) - xs), rev(plogis(mdl.clmm.Int.1$Theta[4]-(summary(mdl.clmm.Int.1)$coefficient[,2][[4]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[3]-(summary(mdl.clmm.Int.1)$coefficient[,2][[3]]/1.96) - xs))), col = "gray90")

polygon(c(xs, rev(xs)),
        c(plogis(mdl.clmm.Int.1$Theta[5]+(summary(mdl.clmm.Int.1)$coefficient[,2][[5]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[4]+(summary(mdl.clmm.Int.1)$coefficient[,2][[4]]/1.96) - xs), rev(plogis(mdl.clmm.Int.1$Theta[5]-(summary(mdl.clmm.Int.1)$coefficient[,2][[5]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[4]-(summary(mdl.clmm.Int.1)$coefficient[,2][[4]]/1.96) - xs))), col = "gray90")

polygon(c(xs, rev(xs)),
        c(plogis(mdl.clmm.Int.1$Theta[6]+(summary(mdl.clmm.Int.1)$coefficient[,2][[6]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[5]+(summary(mdl.clmm.Int.1)$coefficient[,2][[5]]/1.96) - xs), rev(plogis(mdl.clmm.Int.1$Theta[6]-(summary(mdl.clmm.Int.1)$coefficient[,2][[6]]/1.96) - xs)-plogis(mdl.clmm.Int.1$Theta[5]-(summary(mdl.clmm.Int.1)$coefficient[,2][[5]]/1.96) - xs))), col = "gray90")

        
polygon(c(xs, rev(xs)),
        c(1-(plogis(mdl.clmm.Int.1$Theta[6]-(summary(mdl.clmm.Int.1)$coefficient[,2][[6]]/1.96) - xs)), rev(1-(plogis(mdl.clmm.Int.1$Theta[6]+(summary(mdl.clmm.Int.1)$coefficient[,2][[6]]/1.96) - xs)))), col = "gray90")     



lines(xs, plogis(mdl.clmm.Int.1$Theta[1] - xs), col = 'black')
lines(xs, plogis(mdl.clmm.Int.1$Theta[2] - xs) - plogis(mdl.clmm.Int.1$Theta[1] - xs), col = 'red')
lines(xs, plogis(mdl.clmm.Int.1$Theta[3] - xs) - plogis(mdl.clmm.Int.1$Theta[2] - xs), col = 'green')
lines(xs, plogis(mdl.clmm.Int.1$Theta[4] - xs) - plogis(mdl.clmm.Int.1$Theta[3] - xs), col = 'orange')
lines(xs, plogis(mdl.clmm.Int.1$Theta[5] - xs) - plogis(mdl.clmm.Int.1$Theta[4] - xs), col = 'yellow')
lines(xs, plogis(mdl.clmm.Int.1$Theta[6] - xs) - plogis(mdl.clmm.Int.1$Theta[5] - xs), col = 'grey')
lines(xs, 1 - (plogis(mdl.clmm.Int.1$Theta[6] - xs)), col = 'blue')
abline(v = c(0,mdl.clmm.Int.1$beta),lty = 3)
abline(h = 0, lty = "dashed")
abline(h = 0.2, lty = "dashed")
abline(h = 0.4, lty = "dashed")
abline(h = 0.6, lty = "dashed")
abline(h = 0.8, lty = "dashed")
abline(h = 1, lty = "dashed")

legend(par('usr')[2], par('usr')[4], bty = 'n', xpd = NA, lty = 1, 
       col = c("black", "red", "green", "orange", "yellow", "grey", "blue"), 
       legend = c("1", "2", "3", "4", "5", "6", "7"), cex = 0.75)

Check if the results are different between our initial model (with clm) and our new model (with clmm).