Loading packages
## Use the code below to check if you have all required packages installed. If some are not installed already, the code below will install these. If you have all packages installed, then you could load them with the second code.
requiredPackages = c('tidyverse', 'languageR')
for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p)
library(p,character.only = TRUE)
}
The Tidyverse
Introduction
The Tidyverse
is a family of packages used to speed up the use of R.
You need to first install it (if you haven’t already done so) and then load it. To install, use Tools > Install packages
or install.packages()
then add tidyverse. To load a package, use the library()
function.
Look at how many packages are installed within the Tidyverse
. The messages you see are telling you which packages are loaded and which functions are in conflict (i.e., these are functions from other packages that are found within the Tidyverse
). If you want to use the original function, simply add package_name::function
.
Visualisation
In the tidyverse
, the package for making elegant plots is called ggplot2
. It works a lot like how pipes work, but since it was originally designed as a separate package, it uses +
instead of %>%
.
First steps
Empty plot area
Let’s produce a basic plot with nothing drawn on it. This is the basic plotting area in R
. We need to then add layers on top of it to show our plot
english %>%
ggplot() +
theme_bw()
Adding x and y values
Let’s add the x and y values from our dataset. X = subjective familiarity rating, y = RT in Visual Lexical Decision task
english %>%
ggplot(aes(x = Familiarity,
y = RTlexdec)) +
theme_bw()
There are no differences between the two. We need to tell ggplot2
to add a geometric function for plotting
Adding geoms
Geoms are integrated within ggplot2
to obtain various types of plots.
english %>%
ggplot(aes(x = Familiarity,
y = RTlexdec)) +
theme_bw() +
geom_point()
Adding line of best fit
We will add a line of best fit. This is used to evaluate presence/absence of a relationship between two numeric variables
english %>%
ggplot(aes(x = Familiarity,
y = RTlexdec)) +
theme_bw() +
geom_point() +
geom_smooth(method = "lm") # line of best fit based on the lm() method
`geom_smooth()` using formula 'y ~ x'
The result shows a nice negative correlation! RT lexical decision decreases when familiarity rating increases.
We can ask, are there differences related to the word category, i.e., verb vs noun?
By word category
We change colour by levels of word category;
english %>%
ggplot(aes(x = Familiarity,
y = RTlexdec,
colour = WordCategory)) + # add colour to the base aesthetics
theme_bw() +
geom_point() +
geom_smooth(method = "lm")
`geom_smooth()` using formula 'y ~ x'
Making final touches
Let’s add a title and a subtitle, change x and y labels, change size of overall plot, and colours of the categories.
english %>%
ggplot(aes(x = Familiarity,
y = RTlexdec,
colour = WordCategory)) + # add colour to the base aesthetics
theme_bw() +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Familiarity rating", y = "RT Lexical Decision", title = "Familiarity rating vs RT in a lexical decision task", subtitle = "with a trend line") + # add labels
theme(text = element_text(size = 15)) + # increase size of plot
theme(legend.position = "bottom", legend.title = element_blank()) + # remove legend title and change position
scale_color_manual(labels = c("Nouns", "Verbs"), values = c("blue", "red")) # change colours and names of legend
`geom_smooth()` using formula 'y ~ x'
To choose colours, use the addin colourpicker
from above. See this link for full list of colours available. Use colours that are colour-blind friendly here
Activity on your own 3
Work through a few examples to plot data
Additional plots
We looked above at one example of plots (with points). We could use additional types of plots.
A bar plot
Will show barplots of the dataset
english %>%
ggplot(aes(x = RTlexdec,
colour = AgeSubject)) +
theme_bw() +
geom_bar()
And another view with error bars! This is a nice example that shows how you can combine multiple chains with the pipe:
- Group by Age of subject
- Compute mean and SD
- use ggplot2 syntax to plot a barplot and error bars
english %>%
group_by(AgeSubject) %>%
summarise(
sd = sd(RTlexdec),
RTlexdecM = mean(RTlexdec)
) %>%
ggplot(aes(x = AgeSubject,
y = RTlexdecM)) +
theme_bw() +
geom_col(fill = "lightgray", color = "black") +
geom_errorbar(aes(ymin = RTlexdecM-sd, ymax = RTlexdecM+sd), width = 0.2)
A histogram
This looks at the distribution of the variable. We look at a histogram
english %>%
ggplot(aes(x = RTlexdec,
colour = AgeSubject)) +
theme_bw() +
geom_histogram(fill = "white") +
scale_color_manual(values = c("red", "blue"))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
A density plot
This looks at the distribution of the variable. We see that the two variables have different means. We can superpose the density plot on top of the histogram or have the density plot on its own.
# histogram and density plot
english %>%
ggplot(aes(x = RTlexdec,
colour = AgeSubject)) +
theme_bw() +
geom_histogram(aes(y = ..density..), fill = "white") +
scale_color_manual(values = c("red", "blue")) +
geom_density()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# density plot only
english %>%
ggplot(aes(x = RTlexdec,
colour = AgeSubject)) +
theme_bw() +
geom_density()
A boxplot
This allows you to see various information, including the Median, SD, Quartiles (25% and 75%) and outliers. Looking at the medians, we see clear difference between the two distributions.
english %>%
ggplot(aes(x = AgeSubject,
y = RTlexdec)) +
theme_bw() +
geom_boxplot()
A Violin plot
This allows you to see various information, including the Median, SD, Quartiles (25% and 75%) and outliers. Looking at the medians, we see clear difference between the two distributions.
english %>%
ggplot(aes(x = AgeSubject,
y = RTlexdec)) +
theme_bw() +
geom_violin()
Facet_grid
The plots we used so far allowed to plot data as a function of one categorical variable, e.g., AgeSubject
. What if we wanted to show the different patterns emerging when combining AgeSubject
(old vs young), WordCategory
(Noun or Verb), CV
(Consonant or Vowel) and Voice
(Voiced and Voiceless) ? What if we also wanted to modify the labels and order of levels of variables?
We will start slowly below to show how we can combine two categorical variables and extend them to additional ones
Two categorical variables
First steps
Here we obtain a boxplot with two categorical variables AgeSubject
and WordCategory
english %>%
ggplot(aes(x = AgeSubject,
y = RTlexdec)) +
theme_bw() +
geom_boxplot() +
facet_grid(~ WordCategory)
Changing order of levels within a variable and its labels
What would you do to change both order of levels within a variable and its labels? We want to change order for AgeSubject
to be Young vs Old (rather than old vs young) and change labels of WordCategory
from N vs V to Noun vs Verb.
Work on this with your peers. Answer is below!
Activity on your own 1
english %>%
ggplot(aes(x = AgeSubject,
y = RTlexdec)) +
theme_bw() +
geom_boxplot() +
facet_grid(~ WordCategory)
Answer
english %>%
mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old"), labels = c("Young", "Old")),
WordCategory = factor(WordCategory, labels = c("Noun", "Verb"))) %>%
ggplot(aes(x = AgeSubject,
y = RTlexdec)) +
theme_bw() +
geom_boxplot() +
facet_grid(~ WordCategory)
Three or more categorical variables
Let us obtain a boxplot with four categorical variables AgeSubject
, WordCategory
, CV
and Voice
. We still need to change names. We can also add margins = TRUE
to obtain mean values for all categories (under all
). We can also use scale = "free"
to change limits of the y-axis
.
Of course this figure is so complex that it needs a lot of interpretation. But it allows you to see how we can use facet_grid
to get more categorical variables in. This visualisation suggests that there are no clear differences when plotting results by this 4-way interaction as we always have clear differences between “Young” and “Old” participants, with “Young” being faster than “Old” participants.
english %>%
mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old"), labels = c("Young", "Old")),
WordCategory = factor(WordCategory, labels = c("Noun", "Verb")),
CV = factor(CV, labels = c("Consonant", "Vowel"))) %>%
ggplot(aes(x = AgeSubject,
y = RTlexdec)) +
theme_bw() +
geom_boxplot() +
facet_grid(CV + Voice ~ WordCategory, margins = TRUE, scales = "free")
Comparing two numeric outcomes
What if we want to compare performance in relation to reaction time for the lexical decision task (RTlexdec) and reaction time for naming (RTnaming). We want to see if there are differences related to the AgeSubject
, WordCategory
. We use pivot_longer
here to do change the format of our table and then change names and use facet_grid
.
english %>%
select(RTlexdec, RTnaming, AgeSubject, WordCategory) %>%
pivot_longer(cols = c(RTlexdec, RTnaming),
names_to = "variable",
values_to = "values") %>%
mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old"), labels = c("Young", "Old")),
WordCategory = factor(WordCategory, labels = c("Noun", "Verb"))) %>%
ggplot(aes(x = variable,
y = values)) +
theme_bw() +
geom_boxplot() +
facet_grid(AgeSubject ~ WordCategory, margins = TRUE, scales = "free")
Exporting images
When you use Rmarkdown, your figures are already embedded within the generated output. If you are using an R script and/or want to add the figure in a different document, you can use the following code:
jpeg(filename = "test.jpeg", width = 15, height = 15, units = "cm", res = 300)
english %>%
select(RTlexdec, RTnaming, AgeSubject, WordCategory) %>%
pivot_longer(cols = c(RTlexdec, RTnaming),
names_to = "variable",
values_to = "values") %>%
mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old"), labels = c("Young", "Old")),
WordCategory = factor(WordCategory, labels = c("Noun", "Verb"))) %>%
ggplot(aes(x = variable,
y = values)) +
theme_bw() +
geom_boxplot() +
facet_grid(AgeSubject ~ WordCategory, margins = TRUE, scales = "free")
dev.off()
null device
1
The image is automatically saved into your working directory and you can import it to your word () document.
You can use any device to save the output. Jpeg, PNG, PDF, TIFF, etc.. From an R script, you can run the code and then the image will appear within the “Plots” area. Simply click on export and you will be able to save the image.
Conclusion
As you can see, visualisations in R
using the Tidyverse
provide you with many options and you can explore these further.
See here for a full list of geoms. This will help you in thinking about visualisation.
See extensions to ggplot2 here for additional plugins to enhance plots.
In the next section, we start with initial inferential statistics.
Going further
See here for a full list of geoms. This will help you in thinking about visualisation.
See extensions to ggplot2 here for additional plugins to enhance plots.
End of the session
This is the end of the third session. We used the package Tidyverse
to manipulate objects. We obtained then basic summaries and basic plots. We looked at how to build a plot from scratch
Next week, we will look at basic inferential statistics
session info
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] languageR_1.5.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[5] purrr_0.3.4 readr_2.0.2 tidyr_1.1.4 tibble_3.1.5
[9] ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 lubridate_1.8.0 lattice_0.20-45
[4] assertthat_0.2.1 digest_0.6.28 utf8_1.2.2
[7] R6_2.5.1 cellranger_1.1.0 backports_1.3.0
[10] reprex_2.0.1 evaluate_0.14 httr_1.4.2
[13] pillar_1.6.4 rlang_0.4.12 readxl_1.3.1
[16] rstudioapi_0.13 minqa_1.2.4 jquerylib_0.1.4
[19] nloptr_1.2.2.2 Matrix_1.3-4 rmarkdown_2.11
[22] labeling_0.4.2 splines_4.1.2 lme4_1.1-27.1
[25] munsell_0.5.0 broom_0.7.10 compiler_4.1.2
[28] modelr_0.1.8 xfun_0.27 pkgconfig_2.0.3
[31] PresenceAbsence_1.1.9 mgcv_1.8-38 htmltools_0.5.2
[34] tidyselect_1.1.1 fansi_0.5.0 crayon_1.4.2
[37] tzdb_0.2.0 dbplyr_2.1.1 withr_2.4.2
[40] MASS_7.3-54 psycho_0.6.1 grid_4.1.2
[43] nlme_3.1-153 jsonlite_1.7.2 gtable_0.3.0
[46] lifecycle_1.0.1 DBI_1.1.1 magrittr_2.0.1
[49] scales_1.1.1 cli_3.1.0 stringi_1.7.5
[52] farver_2.1.0 fs_1.5.0 xml2_1.3.2
[55] bslib_0.3.1 ellipsis_0.3.2 generics_0.1.1
[58] vctrs_0.3.8 boot_1.3-28 tools_4.1.2
[61] glue_1.4.2 hms_1.1.1 fastmap_1.1.0
[64] yaml_2.2.1 colorspace_2.0-2 rvest_1.0.2
[67] knitr_1.36 haven_2.4.3 sass_0.4.0
