In this section you will learn how to “import” and “save” data of different file types in R.
Section structure:
Performing reproducible data analyses is hard! It almost always involves many steps (e.g. data manipulation, exploration, modeling, presentation) and keeping track of all those steps. To be able to keep track of your data analysis it is good practice to store your data and associated files in one directory (i.e. folder). This is called the ‘working directory’.1 Let’s say “C:/R-tutorial” is your working directory. In R-studio you can define the working directory in two ways.
Setting work directory method 1: browsing
Manually browse to the working directory in the lower-right R-studio panel by a left click on the button:
browse to the working directory and then:
Setting work directory method 2: using the setwd function
Use the setwd()
function:
setwd("C:/R-tutorial")
Notice that R uses /
for defining directory paths, instead of \
(the standard on windows OS).
To practice loading data in R we will use an illustrative data set on aneurysm. Details about this data set can be found on the website of Ewout Steyerberg: clinical prediction models. In brief, the data set consists of 238 observations (‘rows’) on 8 variables (‘columns’):
## SEX AGE10 MI CHF ISCHEMIA LUNG RENAL STATUS
## 1: Male 4.3 0 0 0 0 0 Alive
## 2: Male 4.5 0 0 0 0 0 Alive
## 3: Male 4.9 0 0 0 0 0 Alive
## 4: Male 5.0 0 0 0 0 0 Alive
## 5: Male 5.4 0 0 0 0 0 Alive
## ---
## 234: Male 7.3 1 1 1 1 0 Dead
## 235: Male 8.4 1 1 1 0 0 Dead
## 236: Male 7.6 1 1 1 1 0 Alive
## 237: Male 8.0 1 1 1 1 0 Alive
## 238: Male 8.4 0 1 1 1 1 Dead
A zip file containing the aneurysm data with .csv, .txt and .sav extensions can be downloaded in zip format here.
Unzip the zip-file in your working directory. Make sure all three data files are unpacked.
A very simple data format is the tab-delimited text file. Aneurysm.txt is such a file. To view this file you can open it in a simple text editor such as Notepad. To import the tab-delimited file in R you can use the read.table()
function:
df <- read.table(file = "aneurysm.txt", header = TRUE)
Notice that the argument header = TRUE
tells R that the first row of the text file contains the variable names. Check in the R-studio right-upper panel whether the imported data set (object df
) indeed has 238 rows and 8 columns.
Another commonly used data format is the comma separated values (CSV) file. To view this file you can open it in a simple text editor such as Notepad or in a speadsheet program such as MS Excel. To import the csv file in R you can use the read.csv()
function:
df2 <- read.csv(file = "aneurysm.csv")
Notice that some csv files may have other so-called ‘separation characters’ (separating the columns) than the comma. For instance, columns may be seperated by a semicolon. To import such data you can add as an additional argument: sep = ";"
to the read.csv()
function.
It is not uncommon that data sets are stored in specific file formats for data analysis programs other than R. For instance, data are often stored with .sav extension to be suited for analysis in SPSS software. Files with a .sav extension can be imported using the read.spss()
function of the foreign
library. Before you run the following code, make sure that the foreign
library is installed.
library(foreign)
df3 <- read.spss(file = "aneurysm.sav", to.data.frame = TRUE)
There are many other types of data files that can be imported in R, which we will not discuss here. However, typically, it is simply a matter of finding the right function.
On a similar note, read.table() is a very flexible function that can read in both .txt and .csv files. Sometimes you come across a .txt file with a different separator (e.g. “,” or “tab” or “;”). The following video illustrates how you can read in these type of files by using read.table and specifying a separator (the .xslx file that is used in the video can be found here):
After adjusting a data set, one may wish to store this new data set as a new file. The format of the new file may or may not be the same as the format of the original file. For instance, data frame df
, originally from a tab-delimited text file, can be saved as a CSV file using:
write.csv(x = df, file = "aneurysmNEW.csv", row.names = FALSE)
Check whether the file aneurysmNEW.csv
is stored in your working directory.
Another common format for R data files is by the .rds extension:
saveRDS(object = df, file = "aneurysmNEW.rds")
To ‘import’ a file with .rds extension you can use:
df4 <- readRDS(file = "aneurysmNEW.rds")
Sometimes you may want to import data that is stored in a folder that is not the working directory. To do this you can define the whole file path (URL):
df5 <- readRDS(file = "C:/R-tutorial/aneurysmNEW.rds")
Another convenient approach is to use the function file.choose()
that allows you to browse for the file:
df6 <- readRDS(file = file.choose())
Note that this working directory preferably contains various subdirectories such as ‘Design documents’, ‘Data’, Analysis’, ‘Results’ and ‘Manuscript’.↩