In this third section you’ll continue working with R-objects. We’ll also introduce some basic R-functions.
Section structure:
At the end of this section, you’ll be able to perform simple, reproducible statistical analyses in R (how scientific!).
So far, we have only discussed objects with a single numeric value. A key characteric of objects containing (only) numeric values is that they can be used for all sorts of numerical calculations (sum, multiply, divide,…). An R-object may also contain so called “character values”. Character values are text entries. A character value is entered into R by using quotation marks: ""
or ''
.
Type in the editor window and run these lines:.
character_object1 <- 'this is a character value'
character_object2 <- "this is also a character value"
character_object1
character_object2
The console window should display [1] "this is a character value"
and [1] "this is also a character value"
on two separate lines. Notice that R automatically converts the single quotation marks ''
into double quotation marks ""
.
Type in the editor window and run without quotation marks:.
this is a character value
Due to the absence of quotation marks, R returns the following error message:
Error: unexpected symbol in "this is"
Finally, try:
character_object1 + 1
Clearly, numerical computations cannot be done with character values. R returns the following error message:
Error in character_object1 + 1 : non-numeric argument to binary operator
So far, we have only discussed objects with a single numeric value or character value: a single data point. Multiple data points can be combined into a single object as a vector. In short, a vector contains a sequence of values. To create a vector use the function c()
(short for concatenate).
Suppose we measured coffee intake for five individuals (hereafter referred to as ‘subjects’): 1, 3, 2, 5 and 3 units (e.g. cups) of coffee . The first individual drank 1 cup of coffee, the second 3 cups, etc. Type these data (in the editor window) and run:
c(1, 3, 2, 5, 3)
The console window should display:
[1] 1 3 2 5 3
The coffee data can be stored as a vector in an object called coffee
. Type and run:
coffee <- c(1, 3, 2, 5, 3)
coffee
Now suppose we want to calculate caffeine intake from the units of coffee. For simplicity, assume we operate in a highly controlled environment: each unit of coffee contains exactly 112.5 milligrams (mg) of caffeine (and for those who like to really be precise: each unit of coffee was consumed in its entirety). We can create a new object containing the caffeine intake in mg by multiplying each element of the (vector) object coffee
with 112.5
. We name this new object caffeine
. Type and run:
caffeine <- coffee * 112.5
caffeine
The console window should display:
[1] 112.5 337.5 225.0 562.5 337.5
Suppose we also have recorded the gender for five subjects: the first two are male and the rest is female. We can create an object gender
with characters male
and female
as possible (character) values. Type and run:
gender <- c('male', 'male', 'female', 'female', 'female')
gender
In 3.2. we used the function c()
to create a vector. Almost everything in R is done through functions! Here we will highlight some essential functions for simple computations.
sqrt()
, calculates the square root of a (set of) numeric value(s);log()
, calculates the natural logarithm of a (set of) numeric value(s);round()
, rounds a (set of) numeric value(s), by default to the nearest integer (whole number);sum()
, calculates the sum of a set of numeric values;mean()
, calculates the mean of a set of numeric values;hist()
, draws a histogram from a set of numeric values;table()
, creates a frequency table from a set of numeric or character values.Use the caffeine
and gender
objects from the previous subsection. Type and run:
sqrt(caffeine)
log(caffeine)
round(caffeine)
sum(caffeine)
mean(caffeine)
hist(caffeine)
table(gender)
The console window should display:
[1] 10.60660 18.37117 15.00000 23.71708 18.37117
[1] 4.722953 5.821566 5.416100 6.332391 5.821566
[1] 112 338 225 562 338
[1] 1575
[1] 315
gender
female male
3 2
And a histogram should be visible in the window to the right of the console:
When performing real data analyses, an R script shows all analysis steps taken and thus provides important information for you and your co-researchers. It is good practice to annotate your R script, so that you and your co-researchers will be able to understand your analyses when looking back at them after a while. Make your work reproducible! You can annotate your R-script using: #
. All text placed on the right of this sign is ignored by R (i.e. not used for making computations).
Type and run:
# Store data in objects
coffee <- c(1, 3, 2, 5, 3)
gender <- c('male', 'male', 'female', 'female', 'female')
# Show content of 'coffee' and 'gender'
coffee
gender
# Calculate caffeine
caffeine <- coffee * 112.5
# Show content of 'caffeine'
caffeine
# Run various functions
sqrt(caffeine) # the square root
log(caffeine) # the natural logarithm
round(caffeine) # round to nearest integer
sum(caffeine) # the sum
mean(caffeine) # the mean
hist(caffeine) # draw a histogram
table(gender) # counts of males and females
Suppose we measurements of the heights for 8 individuals (in cm):
Height |
---|
184.0, 174.2, 166.6, 193.2, 173.8, 166.4, 175.4, 183.3. |
Perform a simple descriptive analysis:
median()
);sd()
);Write a script to perform these steps. Annotate the script in such a way that someone else who is not familiar with R will be able to understand the steps you have taken to perform this analysis.
The solution to this exercise can be found here.