In this third section you’ll continue working with R-objects. We’ll also introduce some basic R-functions.

Section structure:

- 3.1. Using character values
- 3.2. Creating vectors
- 3.3. Using basic R functions
- 3.4. Annotate your R script
- Exercise 1

At the end of this section, you’ll be able to perform simple, reproducible statistical analyses in R (how scientific!).

So far, we have only discussed objects with a single numeric value. A key characteric of objects containing (only) numeric values is that they can be used for all sorts of numerical calculations (sum, multiply, divide,…). An R-object may also contain so called “character values”. Character values are text entries. A character value is entered into R by using quotation marks: `""`

or `''`

.

**Type in the editor window and run these lines:**.

```
character_object1 <- 'this is a character value'
character_object2 <- "this is also a character value"
character_object1
character_object2
```

The console window should display `[1] "this is a character value"`

and `[1] "this is also a character value"`

on two separate lines. Notice that R automatically converts the single quotation marks `''`

into double quotation marks `""`

.

**Type in the editor window and run without quotation marks:**.

`this is a character value`

Due to the absence of quotation marks, R returns the following error message:

`Error: unexpected symbol in "this is"`

Finally, **try:**

`character_object1 + 1`

Clearly, numerical computations cannot be done with character values. R returns the following error message:

`Error in character_object1 + 1 : non-numeric argument to binary operator`

So far, we have only discussed objects with a single numeric value or character value: a single data point. Multiple data points can be combined into a single object as a **vector**. In short, a vector contains a sequence of values. To create a vector use the function `c()`

(short for **concatenate**).

Suppose we measured **coffee intake** for five individuals (hereafter referred to as ‘subjects’): 1, 3, 2, 5 and 3 units (e.g. cups) of coffee . The first individual drank 1 cup of coffee, the second 3 cups, etc. **Type these data (in the editor window) and run:**

`c(1, 3, 2, 5, 3)`

The console window should display:

`[1] 1 3 2 5 3`

The coffee data can be stored as a vector in an object called `coffee`

. **Type and run:**

```
coffee <- c(1, 3, 2, 5, 3)
coffee
```

Now suppose we want to calculate **caffeine intake** from the units of coffee. For simplicity, assume we operate in a highly controlled environment: each unit of coffee contains exactly 112.5 milligrams (mg) of caffeine (and for those who like to really be precise: each unit of coffee was consumed in its entirety). We can create a new object containing the caffeine intake in mg by multiplying each element of the (vector) object `coffee`

with `112.5`

. We name this new object `caffeine`

. **Type and run:**

```
caffeine <- coffee * 112.5
caffeine
```

The console window should display:

`[1] 112.5 337.5 225.0 562.5 337.5`

Suppose we also have recorded the gender for five subjects: the first two are male and the rest is female. We can create an object `gender`

with characters `male`

and `female`

as possible (character) values. **Type and run:**

```
gender <- c('male', 'male', 'female', 'female', 'female')
gender
```

In 3.2. we used the function `c()`

to create a vector. **Almost everything in R is done through functions**! Here we will highlight some essential functions for simple computations.

`sqrt()`

, calculates the**square root**of a (set of) numeric value(s);`log()`

, calculates the**natural logarithm**of a (set of) numeric value(s);`round()`

, rounds a (set of) numeric value(s), by default to the nearest**integer**(whole number);`sum()`

, calculates the sum of a set of numeric values;`mean()`

, calculates the mean of a set of numeric values;`hist()`

, draws a**histogram**from a set of numeric values;`table()`

, creates a frequency table from a set of numeric or character values.

Use the `caffeine`

and `gender`

objects from the previous subsection. **Type and run:**

```
sqrt(caffeine)
log(caffeine)
round(caffeine)
sum(caffeine)
mean(caffeine)
hist(caffeine)
table(gender)
```

The console window should display:

`[1] 10.60660 18.37117 15.00000 23.71708 18.37117`

`[1] 4.722953 5.821566 5.416100 6.332391 5.821566`

`[1] 112 338 225 562 338`

`[1] 1575`

`[1] 315`

```
gender
female male
3 2
```

And a histogram should be visible in the window to the right of the console:

When performing real data analyses, an R script shows all analysis steps taken and thus provides important information for you and your co-researchers. It is good practice to **annotate** your R script, so that you and your co-researchers will be able to understand your analyses when looking back at them after a while. **Make your work reproducible!** You can annotate your R-script using: `#`

. All text placed on the right of this sign is ignored by R (i.e. not used for making computations).

**Type and run:**

```
# Store data in objects
coffee <- c(1, 3, 2, 5, 3)
gender <- c('male', 'male', 'female', 'female', 'female')
# Show content of 'coffee' and 'gender'
coffee
gender
# Calculate caffeine
caffeine <- coffee * 112.5
# Show content of 'caffeine'
caffeine
# Run various functions
sqrt(caffeine) # the square root
log(caffeine) # the natural logarithm
round(caffeine) # round to nearest integer
sum(caffeine) # the sum
mean(caffeine) # the mean
hist(caffeine) # draw a histogram
table(gender) # counts of males and females
```

Suppose we measurements of the heights for 8 individuals (in cm):

Height |
---|

184.0, 174.2, 166.6, 193.2, 173.8, 166.4, 175.4, 183.3. |

Perform a simple descriptive analysis:

- Store the data as a numeric vector;
- Calculate the mean and median height (note: the median can be found using the function
`median()`

); - Describe the variability in the data by the
**standard deviation**(note: the standard deviation can be found using the function`sd()`

); - Visualize the data using a histogram.

Write a script to perform these steps. Annotate the script in such a way that someone else who is not familiar with R will be able to understand the steps you have taken to perform this analysis.

The solution to this exercise can be found **here**.