## Use the code below to check if you have all required packages installed. If some are not installed already, the code below will install these. If you have all packages installed, then you could load them with the second code.
= c('tidyverse', 'languageR')
requiredPackages for(p in requiredPackages){
if(!require(p,character.only = TRUE)) install.packages(p)
library(p,character.only = TRUE)
}
Tidyverse
The Tidyverse
is a family of packages used to speed up the use of R.
You need to first install it (if you haven’t already done so) and then load it. To install, use Tools > Install packages
or install.packages()
then add tidyverse. To load a package, use the library()
function.
Look at how many packages are installed within the Tidyverse
. The messages you see are telling you which packages are loaded and which functions are in conflict (i.e., these are functions from other packages that are found within the Tidyverse
). If you want to use the original function, simply add package_name::function
.
The difference between base R and the Tidyverse’s way of doing things is that base R can sometimes be more complex, while tidyverse is more straightforward and allows you to “see” within a dataframe easily. You need to learn how to use the “pipe” in magrittr
that is part of the Tidyverse
.
Pipes are written in R as %>%
(note you must use a percentage sign before and after the pipe). To demonstrate what pipes do, have a look at the following pseudocode. You can use a shortcut in your keyboard, type Ctrl+Shift+m to add a pipe
(for mac users, it is Cmd+Shift+m).
Since R
version 4.1.0
, there is a native pipe |>
. It seems to be doing almost the same thing as the %>%
. We will still use %>%
as this is integrated within the Tidyverse
.
Below are two code lines for how to subset the dataframe using base R
and piping from the magrittr
package.
With base R, we always need to refer to the dataset twice: once at the beginning and then to look into the dataset to select a variable.
<- c("a", "the", "lamp", "not", "jump", "it", "coffee", "walk", "on")
word <- c(500, 600, 7, 200, 30, 450, 130, 33, 300) # note this is completely made up!!
freq <- c("y", "y", "n", "y", "n", "y", "n", "n", "y")
functionword <- c(1, 3, 4, 3, 4, 2, 6, 4, 2)
length <- as.data.frame(cbind(word,freq,functionword,length))
df rm(word,freq,functionword,length)
$functionword <- as.character(df$functionword)
df$functionword[df$functionword == "y"] <- "yes"
df$functionword[df$functionword == "no"] <- "no"
df$functionword <- as.factor(df$functionword) df
<- df[which(df$functionword == 'yes'),]
df_Yes1 df_Yes1
With the pipe, you only need to specify the dataset once: By adding the pipe, you can already look into the dataset and select the variable you need.
<- df %>% filter(functionword =='yes')
df_Yes1_pipe_tidy df_Yes1_pipe_tidy
And this is with the base R pipe (combined with code from the Tidyverse
family)
<- df |> filter(functionword =='yes')
df_Yes1_pipe_base df_Yes1_pipe_base
As you can see, using the pipe (either within the Tidyverse
or with base R) is a quick and easy way to do various operations.
Out of convenience and because we will use other packages integrated within the Tidyverse
, we will use its pipe.
ReCap:
%>%
is called a “pipe”data
argument of the next lineWe will use the pipe with the Tidyverse
to obtain summaries. We will use an R
built-in dataset. Type data()
to see the full list of datasets installed by default in R
. You can use data(package = .packages(all.available = TRUE))
to see all datasets installed within all packages.
Here is a list of all available datasets
data()
data(package = .packages(all.available = TRUE))
We will use the dataset english
from the package languageR
. This is a package that contains many linguistically-oriented datasets. See details of the dataset here. Or by typing ?languageR::english
(or simply ?english
if the package is already loaded) in the console.
You can load the dataset after loading the package. Simply refer to it by its name.
?english
To see the dataset, run the code below to visualise it.
%>%
english View()
# or without pipe
View(english)
We can use str()
to look at the structure of the dataset. Here we have a relatively large dataset with 4568 observations (=rows) and 36 variables (=columns).
%>%
english str()
'data.frame': 4568 obs. of 36 variables:
$ RTlexdec : num 6.54 6.4 6.3 6.42 6.45 ...
$ RTnaming : num 6.15 6.25 6.14 6.13 6.2 ...
$ Familiarity : num 2.37 4.43 5.6 3.87 3.93 3.27 3.73 5.67 3.1 4.43 ...
$ Word : Factor w/ 2197 levels "ace","act","add",..: 467 2124 1838 1321 1302 1347 434 468 15 1632 ...
$ AgeSubject : Factor w/ 2 levels "old","young": 2 2 2 2 2 2 2 2 2 2 ...
$ WordCategory : Factor w/ 2 levels "N","V": 1 1 1 1 1 1 1 1 1 1 ...
$ WrittenFrequency : num 3.91 4.52 6.51 5.02 4.89 ...
$ WrittenSpokenFrequencyRatio : num 1.022 0.35 2.089 -0.526 -1.045 ...
$ FamilySize : num 1.39 1.39 1.61 1.95 2.2 ...
$ DerivationalEntropy : num 0.141 0.427 0.062 0.43 0.359 ...
$ InflectionalEntropy : num 0.0211 0.942 1.4434 0 1.7539 ...
$ NumberSimplexSynsets : num 0.693 1.099 2.485 1.099 2.485 ...
$ NumberComplexSynsets : num 0 0 1.95 2.64 2.48 ...
$ LengthInLetters : int 3 5 6 4 4 4 4 3 3 5 ...
$ Ncount : int 8 5 0 8 3 9 6 13 3 3 ...
$ MeanBigramFrequency : num 7.04 9.54 9.88 8.31 7.94 ...
$ FrequencyInitialDiphone : num 12 12.6 13.3 12.1 11.9 ...
$ ConspelV : int 10 20 10 5 17 19 10 13 1 7 ...
$ ConspelN : num 3.74 7.87 6.69 6.68 4.76 ...
$ ConphonV : int 41 38 13 6 17 21 13 7 11 14 ...
$ ConphonN : num 8.84 9.78 7.04 3.83 4.76 ...
$ ConfriendsV : int 8 20 10 4 17 19 10 6 0 7 ...
$ ConfriendsN : num 3.3 7.87 6.69 3.53 4.76 ...
$ ConffV : num 0.693 0 0 0.693 0 ...
$ ConffN : num 2.71 0 0 6.63 0 ...
$ ConfbV : num 3.5 2.94 1.39 1.1 0 ...
$ ConfbN : num 8.83 9.61 5.82 2.56 0 ...
$ NounFrequency : int 49 142 565 150 170 125 582 2061 144 522 ...
$ VerbFrequency : int 0 0 473 0 120 280 110 76 4 86 ...
$ CV : Factor w/ 2 levels "C","V": 1 1 1 1 1 1 1 1 2 1 ...
$ Obstruent : Factor w/ 2 levels "cont","obst": 2 2 2 2 2 2 2 2 1 2 ...
$ Frication : Factor w/ 4 levels "burst","frication",..: 1 2 2 1 1 1 1 1 3 2 ...
$ Voice : Factor w/ 2 levels "voiced","voiceless": 1 2 2 2 2 2 1 1 1 2 ...
$ FrequencyInitialDiphoneWord : num 10.13 9.05 12.42 10.05 11.8 ...
$ FrequencyInitialDiphoneSyllable: num 10.41 9.15 13.13 11 12.16 ...
$ CorrectLexdec : int 27 30 30 30 26 28 30 28 25 29 ...
# or without pipe
str(english)
'data.frame': 4568 obs. of 36 variables:
$ RTlexdec : num 6.54 6.4 6.3 6.42 6.45 ...
$ RTnaming : num 6.15 6.25 6.14 6.13 6.2 ...
$ Familiarity : num 2.37 4.43 5.6 3.87 3.93 3.27 3.73 5.67 3.1 4.43 ...
$ Word : Factor w/ 2197 levels "ace","act","add",..: 467 2124 1838 1321 1302 1347 434 468 15 1632 ...
$ AgeSubject : Factor w/ 2 levels "old","young": 2 2 2 2 2 2 2 2 2 2 ...
$ WordCategory : Factor w/ 2 levels "N","V": 1 1 1 1 1 1 1 1 1 1 ...
$ WrittenFrequency : num 3.91 4.52 6.51 5.02 4.89 ...
$ WrittenSpokenFrequencyRatio : num 1.022 0.35 2.089 -0.526 -1.045 ...
$ FamilySize : num 1.39 1.39 1.61 1.95 2.2 ...
$ DerivationalEntropy : num 0.141 0.427 0.062 0.43 0.359 ...
$ InflectionalEntropy : num 0.0211 0.942 1.4434 0 1.7539 ...
$ NumberSimplexSynsets : num 0.693 1.099 2.485 1.099 2.485 ...
$ NumberComplexSynsets : num 0 0 1.95 2.64 2.48 ...
$ LengthInLetters : int 3 5 6 4 4 4 4 3 3 5 ...
$ Ncount : int 8 5 0 8 3 9 6 13 3 3 ...
$ MeanBigramFrequency : num 7.04 9.54 9.88 8.31 7.94 ...
$ FrequencyInitialDiphone : num 12 12.6 13.3 12.1 11.9 ...
$ ConspelV : int 10 20 10 5 17 19 10 13 1 7 ...
$ ConspelN : num 3.74 7.87 6.69 6.68 4.76 ...
$ ConphonV : int 41 38 13 6 17 21 13 7 11 14 ...
$ ConphonN : num 8.84 9.78 7.04 3.83 4.76 ...
$ ConfriendsV : int 8 20 10 4 17 19 10 6 0 7 ...
$ ConfriendsN : num 3.3 7.87 6.69 3.53 4.76 ...
$ ConffV : num 0.693 0 0 0.693 0 ...
$ ConffN : num 2.71 0 0 6.63 0 ...
$ ConfbV : num 3.5 2.94 1.39 1.1 0 ...
$ ConfbN : num 8.83 9.61 5.82 2.56 0 ...
$ NounFrequency : int 49 142 565 150 170 125 582 2061 144 522 ...
$ VerbFrequency : int 0 0 473 0 120 280 110 76 4 86 ...
$ CV : Factor w/ 2 levels "C","V": 1 1 1 1 1 1 1 1 2 1 ...
$ Obstruent : Factor w/ 2 levels "cont","obst": 2 2 2 2 2 2 2 2 1 2 ...
$ Frication : Factor w/ 4 levels "burst","frication",..: 1 2 2 1 1 1 1 1 3 2 ...
$ Voice : Factor w/ 2 levels "voiced","voiceless": 1 2 2 2 2 2 1 1 1 2 ...
$ FrequencyInitialDiphoneWord : num 10.13 9.05 12.42 10.05 11.8 ...
$ FrequencyInitialDiphoneSyllable: num 10.41 9.15 13.13 11 12.16 ...
$ CorrectLexdec : int 27 30 30 30 26 28 30 28 25 29 ...
%>%
english head()
# or without pipe
head(english)
%>%
english tail()
# or without pipe
tail(english)
Here, we select a few variables to use. For variables
or columns
, use the function select
%>%
english select(RTlexdec, RTnaming, Familiarity)
# or without pipe
select(english, RTlexdec, RTnaming, Familiarity)
If we want to select observations, we use the function filter
. We will use select
to select particular variables and then use filter
to select specific observations. This example shows how the pipe chain works, by combining multiple functions and using pipes
%>%
english select(RTlexdec, RTnaming, Familiarity, AgeSubject) %>%
filter(AgeSubject == "old")
# or without pipe
filter(select(english, RTlexdec, RTnaming, Familiarity, AgeSubject), AgeSubject == "old")
Use some of the code above to manipulate the dataframe but now using code from the Tidyverse
. As you will see, once you know how to manipulate a dataset with base R
, you can easily apply the same techniques with the Tidyverse
. The Tidyverse
provides additional ways to manipulate a dataframe.
For example, if I want to check levels of a variable and change the reference level, I will use the following code
levels(english$AgeSubject)
[1] "old" "young"
To change levels of AgeSubject
, we need to save a new dataset (do not override the original dataset!!). The mutate
function means we are manipulating an object.
<- english %>%
english2mutate(AgeSubject = factor(AgeSubject, levels = c("young", "old")))
# or without pipe
<- mutate(english, AgeSubject = factor(AgeSubject, levels = c("young", "old")))
english2
levels(english2$AgeSubject)
[1] "young" "old"
You can change the reference value by using fct_relevel
. This is useful if you have many levels in one of the factors you are working with and you simply need to change the reference.
<- english %>%
english2mutate(AgeSubject = fct_relevel(AgeSubject, "old"))
# or without pipe
<- mutate(english, AgeSubject = fct_relevel(AgeSubject, "old"))
english2
levels(english2$AgeSubject)
[1] "old" "young"
The Tidyverse
contains many functions that are useful for data manipulation. We will look at additional ones next week
Use any of the other factors and try to change its levels and/or its reference level
Sometimes, you may have a dataset that comes in a wide format (i.e., columns contain data from participants) and you want to change to long format (i.e., each row contains one observation with minimal number of columns). Let’s look at the functions pivot_longer
and pivot_wider
Let’s use the english
dataset to transform it from wide to long.
%>%
english select(Word, RTlexdec, RTnaming, Familiarity) %>%
pivot_longer(cols = c(RTlexdec, RTnaming, Familiarity), # you can also add index, i.e., 2:4
names_to = "variable",
values_to = "values")
# or without pipe
pivot_longer(select(english, Word, RTlexdec, RTnaming, Familiarity),
cols = c(RTlexdec, RTnaming, Familiarity), # you can also add index, i.e., 2:4
names_to = "variable",
values_to = "values")
Let’s use the same code above and change the code from long format, back to wide format. Pivot_wider allows you to go back to the original dataset. You will need to use unnest
to get all rows in the correct place. Try without it to see the result.
%>%
english select(Word, RTlexdec, RTnaming, Familiarity) %>%
pivot_longer(cols = c(RTlexdec, RTnaming, Familiarity), # you can also add index, i.e., 2:4
names_to = "variable",
values_to = "values") %>%
pivot_wider(names_from = "variable",
values_from = "values")
Warning: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
# or without pipe
pivot_wider(pivot_longer(select(english, Word, RTlexdec, RTnaming, Familiarity),
cols = c(RTlexdec, RTnaming, Familiarity), # you can also add index, i.e., 2:4
names_to = "variable",
values_to = "values"),
names_from = "variable",
values_from = "values")
Warning: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
But wait, where are the results? They are added in lists. We need to use the function unnest()
to obtain the full results.
%>%
english select(Word, RTlexdec, RTnaming, Familiarity) %>%
pivot_longer(cols = c(RTlexdec, RTnaming, Familiarity), # you can also add index, i.e., 2:4
names_to = "variable",
values_to = "values") %>%
pivot_wider(names_from = "variable",
values_from = "values") %>%
unnest()
Warning: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
Warning: `cols` is now required when using unnest().
Please use `cols = c(RTlexdec, RTnaming, Familiarity)`
# or without pipe
unnest(pivot_wider(pivot_longer(select(english, Word, RTlexdec, RTnaming, Familiarity),
cols = c(RTlexdec, RTnaming, Familiarity), # you can also add index, i.e., 2:4
names_to = "variable",
values_to = "values"),
names_from = "variable",
values_from = "values"))
Warning: Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates
Warning: `cols` is now required when using unnest().
Please use `cols = c(RTlexdec, RTnaming, Familiarity)`
Ah that is better. But we get warnings. What does the warnings tell us? These are simple warnings and not errors. You can use the suggestions the Tidyverse
makes. By default, we are told that the results are shown as lists of columns (what we are after). The second warning tells you to use a specific specification with unnest().
We can use summary()
to obtain basic summaries of the dataset. For numeric variables, this will give you the minimum, maximum, mean, median, 1st and 3rd quartiles; for factors/characters, this will be the count. If there are missing values, you will get number of NAs. Look at the summaries of the dataset below.
%>%
english summary()
RTlexdec RTnaming Familiarity Word
Min. :6.205 Min. :6.022 Min. :1.100 arm : 4
1st Qu.:6.426 1st Qu.:6.149 1st Qu.:3.000 barge : 4
Median :6.550 Median :6.342 Median :3.700 bark : 4
Mean :6.550 Mean :6.323 Mean :3.796 bear : 4
3rd Qu.:6.653 3rd Qu.:6.490 3rd Qu.:4.570 beef : 4
Max. :7.188 Max. :6.696 Max. :6.970 bind : 4
(Other):4544
AgeSubject WordCategory WrittenFrequency WrittenSpokenFrequencyRatio
old :2284 N:2904 Min. : 0.000 Min. :-6.55393
young:2284 V:1664 1st Qu.: 3.761 1st Qu.:-0.07402
Median : 4.832 Median : 0.68118
Mean : 5.021 Mean : 0.67763
3rd Qu.: 6.247 3rd Qu.: 1.44146
Max. :11.357 Max. : 5.63071
FamilySize DerivationalEntropy InflectionalEntropy
Min. :0.6931 Min. :0.00000 Min. :0.0000
1st Qu.:1.0986 1st Qu.:0.03932 1st Qu.:0.7442
Median :1.7918 Median :0.41097 Median :1.0982
Mean :1.8213 Mean :0.54089 Mean :1.1186
3rd Qu.:2.3026 3rd Qu.:0.89323 3rd Qu.:1.6325
Max. :5.5175 Max. :5.20728 Max. :2.4514
NumberSimplexSynsets NumberComplexSynsets LengthInLetters
Min. :0.000 Min. :0.000 Min. :2.000
1st Qu.:1.099 1st Qu.:0.000 1st Qu.:4.000
Median :1.609 Median :1.386 Median :4.000
Mean :1.708 Mean :1.568 Mean :4.342
3rd Qu.:2.197 3rd Qu.:2.565 3rd Qu.:5.000
Max. :4.357 Max. :6.111 Max. :7.000
Ncount MeanBigramFrequency FrequencyInitialDiphone
Min. : 0.000 Min. : 5.390 Min. : 4.143
1st Qu.: 2.000 1st Qu.: 8.100 1st Qu.:11.277
Median : 5.000 Median : 8.559 Median :12.023
Mean : 6.266 Mean : 8.490 Mean :11.963
3rd Qu.: 9.000 3rd Qu.: 8.973 3rd Qu.:12.697
Max. :22.000 Max. :10.283 Max. :14.654
ConspelV ConspelN ConphonV ConphonN
Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
1st Qu.: 6.00 1st Qu.: 4.519 1st Qu.:10.00 1st Qu.: 5.268
Median :11.00 Median : 5.710 Median :16.00 Median : 6.340
Mean :11.71 Mean : 5.605 Mean :18.26 Mean : 6.318
3rd Qu.:17.00 3rd Qu.: 6.997 3rd Qu.:24.00 3rd Qu.: 7.491
Max. :32.00 Max. :10.492 Max. :66.00 Max. :10.600
ConfriendsV ConfriendsN ConffV ConffN
Min. : 0.00 Min. : 0.000 Min. :0.0000 Min. : 0.000
1st Qu.: 4.00 1st Qu.: 4.159 1st Qu.:0.0000 1st Qu.: 0.000
Median :10.00 Median : 5.487 Median :0.0000 Median : 0.000
Mean :10.42 Mean : 5.265 Mean :0.4109 Mean : 1.308
3rd Qu.:15.00 3rd Qu.: 6.642 3rd Qu.:0.6931 3rd Qu.: 1.386
Max. :31.00 Max. :10.303 Max. :3.3322 Max. :10.347
ConfbV ConfbN NounFrequency
Min. :0.0000 Min. : 0.000 Min. : 0.00
1st Qu.:0.6931 1st Qu.: 0.000 1st Qu.: 28.75
Median :1.3863 Median : 4.143 Median : 108.00
Mean :1.5570 Mean : 3.890 Mean : 600.19
3rd Qu.:2.5649 3rd Qu.: 6.242 3rd Qu.: 424.75
Max. :4.1897 Max. :10.600 Max. :35351.00
VerbFrequency CV Obstruent Frication
Min. : 0.0 C:4446 cont:1068 burst :1840
1st Qu.: 0.0 V: 122 obst:3500 frication:1660
Median : 30.0 long : 88
Mean : 881.0 short : 980
3rd Qu.: 164.2
Max. :242066.0
Voice FrequencyInitialDiphoneWord
voiced :2060 Min. : 3.091
voiceless:2508 1st Qu.: 9.557
Median :10.517
Mean :10.359
3rd Qu.:11.320
Max. :13.925
FrequencyInitialDiphoneSyllable CorrectLexdec
Min. : 3.367 Min. : 1.00
1st Qu.:10.000 1st Qu.:27.00
Median :10.972 Median :29.00
Mean :10.789 Mean :27.05
3rd Qu.:11.703 3rd Qu.:30.00
Max. :13.930 Max. :30.00
%>%
english summarise(count = n(),
range_RTlexdec = range(RTlexdec),
mean_RTlexdec = mean(RTlexdec),
sd_RTlexdec = sd(RTlexdec),
var_RTlexdec = var(RTlexdec),
min_RTlexdec = min(RTlexdec),
max_RTlexdec = max(RTlexdec),
quart1_RTlexdec = quantile(RTlexdec, 0.25),
quart1_RTlexdec = quantile(RTlexdec, 0.75),
median_RTlexdec = median(RTlexdec))
As you can see, we can add use summarise
to obtain summaries of the dataset. We asked here for the mean, sd, variance, minimum and maximum values, etc.. In the dataset english
, we have many numeric variables, and if we want to obtain summaries for all of numeric variables, we can use summarise_all
.
If you want to add another level of summaries, e.g., for length, you can either add them as another level (with a new variable name) or use summarise_all
to do that for you. We need to select only numeric variables to do that. This is the function to only select numeric variables where(is.numeric)
. If you do not use it, you will get an error message
%>%
english select(where(is.numeric)) %>%
summarise_all(funs(mean = mean, sd = sd, var = var, min = min, max = max,
range = range, median = median, Q1 = quantile(., probs = 0.25), Q3 = quantile(., probs = 0.75)))
Warning: `funs()` was deprecated in dplyr 0.8.0.
Please use a list of either functions or lambdas:
# Simple named list:
list(mean = mean, median = median)
# Auto named with `tibble::lst()`:
tibble::lst(mean, median)
# Using lambdas
list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
As you can see, in this example, we see the chains of commands in the Tidyverse
. We can continue to add commands each time we want to investigate something in particular. Keep adding pipes and commands. The most important point is that the dataset english
did not change at all. If oyu want to create a new dataset with the results, simply use the assignment function <-
at the beginning or ->
at the end and give a name to the new dataset.
What if you want to obtain all results summarised by a specific grouping? Let’s obtain the results grouped by the levels of AgeSubject
.
%>%
english group_by(AgeSubject) %>%
summarise(count = n(),
range_RTlexdec = range(RTlexdec),
mean_RTlexdec = mean(RTlexdec),
sd_RTlexdec = sd(RTlexdec),
var_RTlexdec = var(RTlexdec),
min_RTlexdec = min(RTlexdec),
max_RTlexdec = max(RTlexdec),
quart1_RTlexdec = quantile(RTlexdec, 0.25),
quart1_RTlexdec = quantile(RTlexdec, 0.75),
median_RTlexdec = median(RTlexdec))
`summarise()` has grouped output by 'AgeSubject'. You can override using the `.groups` argument.
What if you want to obtain all results summarised by multiple groupings? Let’s obtain the results grouped by the levels of AgeSubject
, WordCategory
and Voice
and we want to save the output.
%>%
english group_by(AgeSubject, WordCategory, Voice) %>%
summarise(count = n(),
range_RTlexdec = range(RTlexdec),
mean_RTlexdec = mean(RTlexdec),
sd_RTlexdec = sd(RTlexdec),
var_RTlexdec = var(RTlexdec),
min_RTlexdec = min(RTlexdec),
max_RTlexdec = max(RTlexdec),
quart1_RTlexdec = quantile(RTlexdec, 0.25),
quart1_RTlexdec = quantile(RTlexdec, 0.75),
median_RTlexdec = median(RTlexdec)) -> dfMeans
`summarise()` has grouped output by 'AgeSubject', 'WordCategory', 'Voice'. You can override using the `.groups` argument.
dfMeans
Use any of the numeric values in the dataset and obtain summaries
This is the end of the second session. We looked at the various object types, and created a dataframe from scratch. We did some manipulations of the dataframe, by creating a new variable, renaming a column, deleting one, and changing the levels of a variable. We use the package Tidyverse
to manipulate objects. We obtained then basic summaries and basic plots.
Next week, we will continue with the package Tidyverse
to manipulate the data more and obtain additional plots.
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] languageR_1.5.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
[5] purrr_0.3.4 readr_2.0.2 tidyr_1.1.4 tibble_3.1.5
[9] ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 lubridate_1.8.0 lattice_0.20-45
[4] assertthat_0.2.1 digest_0.6.28 utf8_1.2.2
[7] R6_2.5.1 cellranger_1.1.0 backports_1.3.0
[10] reprex_2.0.1 evaluate_0.14 httr_1.4.2
[13] pillar_1.6.4 rlang_0.4.12 readxl_1.3.1
[16] rstudioapi_0.13 minqa_1.2.4 jquerylib_0.1.4
[19] nloptr_1.2.2.2 Matrix_1.3-4 rmarkdown_2.11
[22] labeling_0.4.2 splines_4.1.2 lme4_1.1-27.1
[25] munsell_0.5.0 broom_0.7.10 compiler_4.1.2
[28] modelr_0.1.8 xfun_0.27 pkgconfig_2.0.3
[31] PresenceAbsence_1.1.9 mgcv_1.8-38 htmltools_0.5.2
[34] tidyselect_1.1.1 fansi_0.5.0 crayon_1.4.2
[37] tzdb_0.2.0 dbplyr_2.1.1 withr_2.4.2
[40] MASS_7.3-54 psycho_0.6.1 grid_4.1.2
[43] nlme_3.1-153 jsonlite_1.7.2 gtable_0.3.0
[46] lifecycle_1.0.1 DBI_1.1.1 magrittr_2.0.1
[49] scales_1.1.1 cli_3.1.0 stringi_1.7.5
[52] farver_2.1.0 fs_1.5.0 xml2_1.3.2
[55] bslib_0.3.1 ellipsis_0.3.2 generics_0.1.1
[58] vctrs_0.3.8 boot_1.3-28 tools_4.1.2
[61] glue_1.4.2 hms_1.1.1 fastmap_1.1.0
[64] yaml_2.2.1 colorspace_2.0-2 rvest_1.0.2
[67] knitr_1.36 haven_2.4.3 sass_0.4.0