3. First R session continued

In this third section you’ll continue working with R-objects. We’ll also introduce some basic R-functions.

Section structure:

At the end of this section, you’ll be able to perform simple, reproducible statistical analyses in R (how scientific!).


3.1. Using character values

So far, we have only discussed objects with a single numeric value. A key characteric of objects containing (only) numeric values is that they can be used for all sorts of numerical calculations (sum, multiply, divide,…). An R-object may also contain so called “character values”. Character values are text entries. A character value is entered into R by using quotation marks: "" or ''.

Type in the editor window and run these lines:.

character_object1 <- 'this is a character value'
character_object2 <- "this is also a character value"
character_object1
character_object2

The console window should display [1] "this is a character value" and [1] "this is also a character value" on two separate lines. Notice that R automatically converts the single quotation marks '' into double quotation marks "".

Type in the editor window and run without quotation marks:.

this is a character value

Due to the absence of quotation marks, R returns the following error message:

Error: unexpected symbol in "this is"

Finally, try:

character_object1 + 1

Clearly, numerical computations cannot be done with character values. R returns the following error message:

Error in character_object1 + 1 : non-numeric argument to binary operator


3.2. Creating vectors

So far, we have only discussed objects with a single numeric value or character value: a single data point. Multiple data points can be combined into a single object as a vector. In short, a vector contains a sequence of values. To create a vector use the function c() (short for concatenate).

Suppose we measured coffee intake for five individuals (hereafter referred to as ‘subjects’): 1, 3, 2, 5 and 3 units (e.g. cups) of coffee . The first individual drank 1 cup of coffee, the second 3 cups, etc. Type these data (in the editor window) and run:

c(1, 3, 2, 5, 3)

The console window should display:

[1] 1 3 2 5 3

The coffee data can be stored as a vector in an object called coffee. Type and run:

coffee <- c(1, 3, 2, 5, 3)
coffee

Now suppose we want to calculate caffeine intake from the units of coffee. For simplicity, assume we operate in a highly controlled environment: each unit of coffee contains exactly 112.5 milligrams (mg) of caffeine (and for those who like to really be precise: each unit of coffee was consumed in its entirety). We can create a new object containing the caffeine intake in mg by multiplying each element of the (vector) object coffee with 112.5. We name this new object caffeine. Type and run:

caffeine <- coffee * 112.5
caffeine

The console window should display:

[1] 112.5 337.5 225.0 562.5 337.5

Suppose we also have recorded the gender for five subjects: the first two are male and the rest is female. We can create an object gender with characters male and female as possible (character) values. Type and run:

gender <- c('male', 'male', 'female', 'female', 'female')
gender


3.3. Using basic R functions

In 3.2. we used the function c() to create a vector. Almost everything in R is done through functions! Here we will highlight some essential functions for simple computations.

  • sqrt(), calculates the square root of a (set of) numeric value(s);
  • log(), calculates the natural logarithm of a (set of) numeric value(s);
  • round(), rounds a (set of) numeric value(s), by default to the nearest integer (whole number);
  • sum(), calculates the sum of a set of numeric values;
  • mean(), calculates the mean of a set of numeric values;
  • hist(), draws a histogram from a set of numeric values;
  • table(), creates a frequency table from a set of numeric or character values.

Use the caffeine and gender objects from the previous subsection. Type and run:

sqrt(caffeine)
log(caffeine)
round(caffeine)
sum(caffeine)
mean(caffeine)
hist(caffeine)
table(gender)

The console window should display:

[1] 10.60660 18.37117 15.00000 23.71708 18.37117
[1] 4.722953 5.821566 5.416100 6.332391 5.821566
[1] 112 338 225 562 338
[1] 1575
[1] 315
gender
female   male 
     3      2 

And a histogram should be visible in the window to the right of the console:


3.4. Annotate your R script

When performing real data analyses, an R script shows all analysis steps taken and thus provides important information for you and your co-researchers. It is good practice to annotate your R script, so that you and your co-researchers will be able to understand your analyses when looking back at them after a while. Make your work reproducible! You can annotate your R-script using: #. All text placed on the right of this sign is ignored by R (i.e. not used for making computations).

Type and run:

# Store data in objects
coffee <- c(1, 3, 2, 5, 3)
gender <- c('male', 'male', 'female', 'female', 'female')

# Show content of 'coffee' and 'gender'
coffee
gender

# Calculate caffeine
caffeine <- coffee * 112.5

# Show content of 'caffeine'
caffeine

# Run various functions
sqrt(caffeine)  # the square root
log(caffeine)   # the natural logarithm
round(caffeine) # round to nearest integer
sum(caffeine)   # the sum
mean(caffeine)  # the mean
hist(caffeine)  # draw a histogram
table(gender)   # counts of males and females



Exercise 1

Suppose we measurements of the heights for 8 individuals (in cm):

Height
184.0, 174.2, 166.6, 193.2, 173.8, 166.4, 175.4, 183.3.

Perform a simple descriptive analysis:

  1. Store the data as a numeric vector;
  2. Calculate the mean and median height (note: the median can be found using the function median());
  3. Describe the variability in the data by the standard deviation (note: the standard deviation can be found using the function sd());
  4. Visualize the data using a histogram.

Write a script to perform these steps. Annotate the script in such a way that someone else who is not familiar with R will be able to understand the steps you have taken to perform this analysis.

The solution to this exercise can be found here.