This book is Work in Progress. I appreciate your feedback to make the book better.

Intro to R

An .R file is a text document. Open (and edit) it with a text editor, a web browser or a more sophisticated software like RStudio.1

Click this link that opens an R script in your browser.

Using .R files or scripts offers efficiency, reproducibility, scalability, collaboration, documentation, and flexibility. They allow you to automate tasks, handle large datasets, collaborate with others, document your work, and customize solutions.

R is a calculator

R is a calculator. Use this R demo in the browser to explore basic features of R. Commands in the script.R tab are executed by the Run bottom. It runs the entire script and prints out results in the R Console. This setting is simplified but reflects the procedure in a more complex integrated developer environment (IDE) like RStudio. Test it.


Basic arithmetic operators are:

  • + Addition
  • - Subtraction
  • * Multiplication
  • / Division
  • ^ Exponent

R is more than a calculator

Your Turn: Adjust the code.

If you never saw R before, change the main title to what ever suits you or change the color option col from lightblue to aliceblue.

If you have some experience, order the bars using the sort() command.

Define objects

Define R objects for later use. Objects are case-sensitive (X is different from x). Objects can take any name, but its best to use something that makes sense to you, and will likely make sense to others who may read your code.

Numeric Variables

The standard assignment operator is <-, the equal sign = works as well. The code assigns the object a the value 2 and b the value 3. The sum is 5.

a <- 2
b = 3
a + b
#> [1] 5

Logical Variables

Logical values are TRUE and FALSE. Abbreviations work. You can write T instead of TRUE.

> harvard               <-        TRUE # spacing doesn't matter
> yale      <- FALSE
> princeton <- F # short for FALSE
> # Attention: FALSE=0, TRUE=1
> harvard + 1
#> [1] 2

Although spacing technically doesn't matter in R, there are some best practices to consider.

“Good coding style is like using correct punctuation. You can manage without it, but it sure makes things easier to read.”


Place spaces around all binary operators (=, +, -, <-, etc.). Do not place a space before a comma, but always place one after a comma.

Read more in Google's R Style Guide at Uni Stanford.

String Variables

Text is stored as string or character.

emily  <- 'She is a friend.'     # string / character class / plain text
libby  <- "she is a coworker"    # use ' and " interchangeably
other  <- "people"               # prefer "

Factor Variables

A factor is an ordered categorical variable. c() is a generic function which combines its arguments.

fruit <- factor(c("banana", "apple")  ) # The default ordering is alphabetic
#> [1] banana apple 
#> Levels: apple banana

dose <- factor(c("low", "medium", "high")  ) # The default ordering is alphabetic
#> [1] low    medium high  
#> Levels: high low medium

Factor levels inform about the order of the components, i.e. apple comes before banana and high comes comes before low, than comes medium. Of course, the apple-banana order does not makes any sense, and the high-low-medium order is just wrong. Software cannot know whether an ordering makes sense, that's job of the data scientist. Use the levels option inside the factor() function to tell R the ordering.

dose <- factor(c("low", "medium", "high"), levels = c("low", "medium", "high") ) 
#> [1] low    medium high  
#> Levels: low medium high

Combine objects

# Declare new objects using other variables
c <- a + b + 10

# Open z object or put everything in parentheses
(c <- a + b + 10)
#> [1] 15


Think of a vector as a single column in a spreadsheet.

vectorA <- c(1,2,b)
vectorB <- c(TRUE,TRUE,FALSE)
vectorC <- c(emily, libby, other) 

# Vector Operations
vectorA - vectorB # Vector operation AND auto-change TRUE =1, FALSE=0
#> [1] 0 1 3

Data frame

# think of it conceptually like a spreadsheet
dataDF <- data.frame(numberVec    = vectorA,
                     trueFalseVec = vectorB,
                     stringsVec   = vectorC)

# Examine an entire data frame  
#>   numberVec trueFalseVec        stringsVec
#> 1         1         TRUE  She is a friend.
#> 2         2         TRUE she is a coworker
#> 3         3        FALSE            people
# Declare a new column
dataDF$NewCol <- c(10,9,8)

# Examine with new column
#>   numberVec trueFalseVec        stringsVec NewCol
#> 1         1         TRUE  She is a friend.     10
#> 2         2         TRUE she is a coworker      9
#> 3         3        FALSE            people      8

# Examine a single column
dataDF$numberVec # by name
#> [1] 1 2 3
dataDF[,1] # by index...remember ROWS then COLUMNS
#> [1] 1 2 3

# Examine a single row
dataDF[2,] # by index position
#>   numberVec trueFalseVec        stringsVec NewCol
#> 2         2         TRUE she is a coworker      9

# Examine a single value
dataDF$numberVec[2] # column name, then position (2)
#> [1] 2
dataDF[1,2] #by index row 1, column 2
#> [1] TRUE


There are base R graphs. There are ggplot2 plots.

# Create some variables
x <- 1:10
y1 <- x*x
y2  <- 2*y1

# Create a first line
plot(x, y1, type = "b", frame = FALSE, pch = 19, 
     col = "red", xlab = "x", ylab = "y")
# Add a second line
lines(x, y2, pch = 18, col = "blue", type = "b", lty = 2)
# Add a legend to the plot
legend("topleft", legend=c("Line 1", "Line 2"),
       col=c("red", "blue"), lty = 1:2, cex=0.8)