1 Loading packages

## Use the code below to check if you have all required packages installed. If some are not installed already, the code below will install these. If you have all packages installed, then you could load them with the second code.
requiredPackages = c('tidyverse', 'languageR')
for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p)
  library(p,character.only = TRUE)
}

2 The Tidyverse

2.1 Introduction

The Tidyverse is a family of packages used to speed up the use of R.

You need to first install it (if you haven’t already done so) and then load it. To install, use Tools > Install packages or install.packages() then add tidyverse. To load a package, use the library() function.

Look at how many packages are installed within the Tidyverse. The messages you see are telling you which packages are loaded and which functions are in conflict (i.e., these are functions from other packages that are found within the Tidyverse). If you want to use the original function, simply add package_name::function.

2.2 Visualisation

In the tidyverse, the package for making elegant plots is called ggplot2. It works a lot like how pipes work, but since it was originally designed as a separate package, it uses + instead of %>%.

2.2.1 First steps

2.2.1.1 Empty plot area

Let’s produce a basic plot with nothing drawn on it. This is the basic plotting area in R. We need to then add layers on top of it to show our plot

english %>% 
  ggplot() +
  theme_bw()

2.2.1.2 Adding x and y values

Let’s add the x and y values from our dataset. X = subjective familiarity rating, y = RT in Visual Lexical Decision task

english %>% 
  ggplot(aes(x = Familiarity, 
             y = RTlexdec)) +
  theme_bw()

There are no differences between the two. We need to tell ggplot2 to add a geometric function for plotting

2.2.1.3 Adding geoms

Geoms are integrated within ggplot2 to obtain various types of plots.

english %>% 
  ggplot(aes(x = Familiarity, 
             y = RTlexdec)) +
  theme_bw() +
  geom_point()

2.2.1.4 Adding line of best fit

We will add a line of best fit. This is used to evaluate presence/absence of a relationship between two numeric variables

english %>% 
  ggplot(aes(x = Familiarity, 
             y = RTlexdec)) +
  theme_bw() +
  geom_point() +
  geom_smooth(method = "lm") # line of best fit based on the lm() method
`geom_smooth()` using formula 'y ~ x'

The result shows a nice negative correlation! RT lexical decision decreases when familiarity rating increases.

We can ask, are there differences related to the word category, i.e., verb vs noun?

2.2.1.5 By word category

We change colour by levels of word category;

english %>% 
  ggplot(aes(x = Familiarity, 
             y = RTlexdec,
             colour = WordCategory)) + # add colour to the base aesthetics
  theme_bw() +
  geom_point() +
  geom_smooth(method = "lm")
`geom_smooth()` using formula 'y ~ x'

2.2.1.6 Making final touches

Let’s add a title and a subtitle, change x and y labels, change size of overall plot, and colours of the categories.

english %>% 
  ggplot(aes(x = Familiarity, 
             y = RTlexdec,
             colour = WordCategory)) + # add colour to the base aesthetics
  theme_bw() +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "Familiarity rating", y = "RT Lexical Decision", title = "Familiarity rating vs RT in a lexical decision task", subtitle = "with a trend line") + # add labels
  theme(text = element_text(size = 15)) + # increase size of plot
  theme(legend.position = "bottom", legend.title = element_blank()) + # remove legend title and change position
  scale_color_manual(labels = c("Nouns", "Verbs"), values = c("blue", "red")) # change colours and names of legend
`geom_smooth()` using formula 'y ~ x'

To choose colours, use the addin colourpicker from above. See this link for full list of colours available. Use colours that are colour-blind friendly here

2.2.2 Activity on your own 3

Work through a few examples to plot data

2.3 Additional plots

We looked above at one example of plots (with points). We could use additional types of plots.

2.3.1 A bar plot

Will show barplots of the dataset

english %>%
  ggplot(aes(x = RTlexdec, 
             colour = AgeSubject)) +
  theme_bw() +
  geom_bar()

And another view with error bars! This is a nice example that shows how you can combine multiple chains with the pipe:

  • Group by Age of subject
  • Compute mean and SD
  • use ggplot2 syntax to plot a barplot and error bars
english %>%
  group_by(AgeSubject) %>%
  summarise(
    sd = sd(RTlexdec),
    RTlexdecM = mean(RTlexdec)
  ) %>% 
  ggplot(aes(x = AgeSubject, 
             y = RTlexdecM)) +
  theme_bw() +
  geom_col(fill = "lightgray", color = "black") +
  geom_errorbar(aes(ymin = RTlexdecM-sd, ymax = RTlexdecM+sd), width = 0.2)

2.3.2 A histogram

This looks at the distribution of the variable. We look at a histogram

english %>%
  ggplot(aes(x = RTlexdec, 
             colour = AgeSubject)) +
  theme_bw() +
  geom_histogram(fill = "white") +
  scale_color_manual(values = c("red", "blue"))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

2.3.3 A density plot

This looks at the distribution of the variable. We see that the two variables have different means. We can superpose the density plot on top of the histogram or have the density plot on its own.

# histogram and density plot
english %>%
  ggplot(aes(x = RTlexdec, 
             colour = AgeSubject)) +
  theme_bw() +
  geom_histogram(aes(y = ..density..), fill = "white") +
  scale_color_manual(values = c("red", "blue")) +
  geom_density()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# density plot only
english %>%
  ggplot(aes(x = RTlexdec, 
             colour = AgeSubject)) +
  theme_bw() +
  geom_density()

2.3.4 A boxplot

This allows you to see various information, including the Median, SD, Quartiles (25% and 75%) and outliers. Looking at the medians, we see clear difference between the two distributions.

english %>%
  ggplot(aes(x = AgeSubject, 
             y = RTlexdec)) +
  theme_bw() +
  geom_boxplot()

2.3.5 A Violin plot

This allows you to see various information, including the Median, SD, Quartiles (25% and 75%) and outliers. Looking at the medians, we see clear difference between the two distributions.

english %>%
  ggplot(aes(x = AgeSubject, 
             y = RTlexdec)) +
  theme_bw() +
  geom_violin()

2.4 Facet_grid

The plots we used so far allowed to plot data as a function of one categorical variable, e.g., AgeSubject. What if we wanted to show the different patterns emerging when combining AgeSubject (old vs young), WordCategory (Noun or Verb), CV (Consonant or Vowel) and Voice (Voiced and Voiceless) ? What if we also wanted to modify the labels and order of levels of variables?

We will start slowly below to show how we can combine two categorical variables and extend them to additional ones

2.4.1 Two categorical variables

2.4.1.1 First steps

Here we obtain a boxplot with two categorical variables AgeSubject and WordCategory

english %>%
  ggplot(aes(x = AgeSubject, 
             y = RTlexdec)) +
  theme_bw() +
  geom_boxplot() +
  facet_grid(~ WordCategory)

2.4.1.2 Changing order of levels within a variable and its labels

What would you do to change both order of levels within a variable and its labels? We want to change order for AgeSubject to be Young vs Old (rather than old vs young) and change labels of WordCategory from N vs V to Noun vs Verb.

Work on this with your peers. Answer is below!

2.4.1.2.1 Activity on your own 1
english %>%
  ggplot(aes(x = AgeSubject, 
             y = RTlexdec)) +
  theme_bw() +
  geom_boxplot() +
  facet_grid(~ WordCategory)

2.4.1.2.2 Answer
english %>%
  mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old"), labels = c("Young", "Old")),
         WordCategory = factor(WordCategory, labels = c("Noun", "Verb"))) %>% 
  ggplot(aes(x = AgeSubject, 
             y = RTlexdec)) +
  theme_bw() +
  geom_boxplot() +
  facet_grid(~ WordCategory)

2.4.2 Three or more categorical variables

Let us obtain a boxplot with four categorical variables AgeSubject, WordCategory, CV and Voice. We still need to change names. We can also add margins = TRUE to obtain mean values for all categories (under all). We can also use scale = "free" to change limits of the y-axis.

Of course this figure is so complex that it needs a lot of interpretation. But it allows you to see how we can use facet_grid to get more categorical variables in. This visualisation suggests that there are no clear differences when plotting results by this 4-way interaction as we always have clear differences between “Young” and “Old” participants, with “Young” being faster than “Old” participants.

english %>%
  mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old"), labels = c("Young", "Old")),
         WordCategory = factor(WordCategory, labels = c("Noun", "Verb")),
         CV = factor(CV, labels = c("Consonant", "Vowel"))) %>% 
  ggplot(aes(x = AgeSubject, 
             y = RTlexdec)) +
  theme_bw() +
  geom_boxplot() +
  facet_grid(CV + Voice ~ WordCategory, margins = TRUE, scales = "free")

2.4.3 Comparing two numeric outcomes

What if we want to compare performance in relation to reaction time for the lexical decision task (RTlexdec) and reaction time for naming (RTnaming). We want to see if there are differences related to the AgeSubject, WordCategory. We use pivot_longer here to do change the format of our table and then change names and use facet_grid.

english %>%
  select(RTlexdec, RTnaming, AgeSubject, WordCategory) %>% 
  pivot_longer(cols = c(RTlexdec, RTnaming),
               names_to = "variable",
               values_to = "values") %>% 
  mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old"), labels = c("Young", "Old")),
         WordCategory = factor(WordCategory, labels = c("Noun", "Verb"))) %>% 
  ggplot(aes(x = variable, 
             y = values)) +
  theme_bw() +
  geom_boxplot() +
  facet_grid(AgeSubject ~ WordCategory, margins = TRUE, scales = "free")

2.5 Exporting images

When you use Rmarkdown, your figures are already embedded within the generated output. If you are using an R script and/or want to add the figure in a different document, you can use the following code:

jpeg(filename = "test.jpeg", width = 15, height = 15, units = "cm", res = 300)

english %>%
  select(RTlexdec, RTnaming, AgeSubject, WordCategory) %>% 
  pivot_longer(cols = c(RTlexdec, RTnaming),
               names_to = "variable",
               values_to = "values") %>% 
  mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old"), labels = c("Young", "Old")),
         WordCategory = factor(WordCategory, labels = c("Noun", "Verb"))) %>% 
  ggplot(aes(x = variable, 
             y = values)) +
  theme_bw() +
  geom_boxplot() +
  facet_grid(AgeSubject ~ WordCategory, margins = TRUE, scales = "free")
dev.off()
null device 
          1 

The image is automatically saved into your working directory and you can import it to your word () document.

You can use any device to save the output. Jpeg, PNG, PDF, TIFF, etc.. From an R script, you can run the code and then the image will appear within the “Plots” area. Simply click on export and you will be able to save the image.

2.6 Conclusion

As you can see, visualisations in R using the Tidyverse provide you with many options and you can explore these further.

See here for a full list of geoms. This will help you in thinking about visualisation.

See extensions to ggplot2 here for additional plugins to enhance plots.

In the next section, we start with initial inferential statistics.

2.7 Going further

See here for a full list of geoms. This will help you in thinking about visualisation.

See extensions to ggplot2 here for additional plugins to enhance plots.

3 End of the session

This is the end of the third session. We used the package Tidyverse to manipulate objects. We obtained then basic summaries and basic plots. We looked at how to build a plot from scratch

Next week, we will look at basic inferential statistics

4 session info

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] languageR_1.5.0 forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7    
 [5] purrr_0.3.4     readr_2.0.2     tidyr_1.1.4     tibble_3.1.5   
 [9] ggplot2_3.3.5   tidyverse_1.3.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7            lubridate_1.8.0       lattice_0.20-45      
 [4] assertthat_0.2.1      digest_0.6.28         utf8_1.2.2           
 [7] R6_2.5.1              cellranger_1.1.0      backports_1.3.0      
[10] reprex_2.0.1          evaluate_0.14         httr_1.4.2           
[13] pillar_1.6.4          rlang_0.4.12          readxl_1.3.1         
[16] rstudioapi_0.13       minqa_1.2.4           jquerylib_0.1.4      
[19] nloptr_1.2.2.2        Matrix_1.3-4          rmarkdown_2.11       
[22] labeling_0.4.2        splines_4.1.2         lme4_1.1-27.1        
[25] munsell_0.5.0         broom_0.7.10          compiler_4.1.2       
[28] modelr_0.1.8          xfun_0.27             pkgconfig_2.0.3      
[31] PresenceAbsence_1.1.9 mgcv_1.8-38           htmltools_0.5.2      
[34] tidyselect_1.1.1      fansi_0.5.0           crayon_1.4.2         
[37] tzdb_0.2.0            dbplyr_2.1.1          withr_2.4.2          
[40] MASS_7.3-54           psycho_0.6.1          grid_4.1.2           
[43] nlme_3.1-153          jsonlite_1.7.2        gtable_0.3.0         
[46] lifecycle_1.0.1       DBI_1.1.1             magrittr_2.0.1       
[49] scales_1.1.1          cli_3.1.0             stringi_1.7.5        
[52] farver_2.1.0          fs_1.5.0              xml2_1.3.2           
[55] bslib_0.3.1           ellipsis_0.3.2        generics_0.1.1       
[58] vctrs_0.3.8           boot_1.3-28           tools_4.1.2          
[61] glue_1.4.2            hms_1.1.1             fastmap_1.1.0        
[64] yaml_2.2.1            colorspace_2.0-2      rvest_1.0.2          
[67] knitr_1.36            haven_2.4.3           sass_0.4.0           
