2019-11-14

Idiomatic R

Of course someone has to write loops. It doesn’t have to be you. - Jenny Bryan

But, if you are like me, you might find loops more intuitive for some tasks.

And, in some cases, a loop is the best tool for the job.

Sequence

Hand-written loops are more verbose and less optimized than built-in functions.

x <- c()
val <- 1
while (val < 2.1) {
  x <- c(x, val)
  val <- val + 0.1
}
x
##  [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
seq(1, 2, 0.1)
##  [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Repeat

x <- c()
for (i in 1:10){
  x <- c(x, "*")
}
x
##  [1] "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
rep("*", 10)
##  [1] "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"

Print

for (i in 1:5){
  x <- c()
  for (j in 1:i){
    x <- c(x, "*")
  }
  cat(c(x, "\n"), sep = "")
}
## *
## **
## ***
## ****
## *****

Print

Fill Arrays

Classic use case for loops.

out <- array(dim = c(3, 3, 3))
for (i in 1:3){
  for (j in 1:3){
    for (k in 1:3){
      out[i, j, k] <- i * j + k
    }
  }
}
out[, , 1]
##      [,1] [,2] [,3]
## [1,]    2    3    4
## [2,]    3    5    7
## [3,]    4    7   10

Manipulate Lists

First, we need to create a list for the next loop example.

rlist <- list(A = data.frame(x = sample(10, 3), y = sample(10, 3)),
              B = data.frame(x = sample(10, 3), y = sample(10, 3)))
rlist
## $A
##    x y
## 1 10 9
## 2  6 5
## 3  3 3
## 
## $B
##   x  y
## 1 3 10
## 2 4  1
## 3 6  4

Manipulate Lists

If creating a new column in every data frame in the list, a novice R programmer might start with the following:

rlist[["A"]]$z <- rlist[["A"]]$x + rlist[["A"]]$y
rlist[["B"]]$z <- rlist[["B"]]$x + rlist[["B"]]$y

Reasonable choice for one-off script when list has few elements.

Manipulate Lists

However, if list has many elements,

rlist[["A"]]$z <- rlist[["A"]]$x + rlist[["A"]]$y
rlist[["B"]]$z <- rlist[["B"]]$x + rlist[["B"]]$y
rlist[["C"]]$z <- rlist[["C"]]$x + rlist[["A"]]$y
rlist[["D"]]$z <- rlist[["D"]]$x + rlist[["D"]]$y
rlist[["E"]]$z <- rlist[["E"]]$x + rlist[["E"]]$y
rlist[["F"]]$z <- rlist[["F"]]$x + rlist[["F"]]$y

then it is shorter and less error prone to write a loop.

for (i in names(rlist)){
  rlist[[i]]$z <- rlist[[i]]$x + rlist[[i]]$y
}

Manipulate Lists

for (i in names(rlist)){
  rlist[[i]]$z <- rlist[[i]]$x + rlist[[i]]$y
}

Same loop written with mutate (from dplyr package) to add the column.

for (i in names(rlist)){
  rlist[[i]] <- dplyr::mutate(rlist[[i]], z = x + y)
}

Manipulate Lists

for (i in names(rlist)){
  rlist[[i]]$z <- rlist[[i]]$x + rlist[[i]]$y
}

Same loop written with mutate (from dplyr package) to add the column.

for (i in names(rlist)){
  rlist[[i]] <- dplyr::mutate(rlist[[i]], z = x + y)
}

Pairing lapply and an anonymous function allows you to skip the loop.

rlist <- lapply(rlist, function(x){x$z = x$x + x$y; x})

Generate Data

Crude approach to simulate acoustic telemetry data of emigrating juvenile salmon.

sample5 <- function(x){sample(x, 5, replace = TRUE)}

out_list <- list()
for (i in 1:20){
  out_list[[i]] <- 
    data.frame(day = sort(runif(20, 0, 50)),
               river_mile = c(sample5(20:16),
                              sample5(15:11),
                              sample5(10:6),
                              sample5(5:1)))
}
out_df <- dplyr::bind_rows(out_list, .id = "cohort_id")

Generate Data

head(out_df, 10)
##    cohort_id        day river_mile
## 1          1  0.6806792         18
## 2          1  1.6473979         18
## 3          1  5.7034870         18
## 4          1  6.1451043         17
## 5          1  6.5611578         16
## 6          1  8.9997531         11
## 7          1 22.5637265         12
## 8          1 23.4448595         15
## 9          1 23.5409107         12
## 10         1 24.4495831         11

Save Plots

library(ggplot2)
for (i in unique(out_df$cohort_id)){
  out_sub <- dplyr::filter(out_df, cohort_id == i)
  ggplot(out_sub, aes(x = day, y = river_mile)) +
    geom_line() +
    geom_point() +
    labs(title = paste("Cohort", i))
  ggsave(filename = paste0("Cohort_", i, ".png"), 
         width = 6, 
         height = 4, 
         path = "figures")
}

Write Files

Splitting data frames

for (i in unique(out_df$cohort_id)){
  out_sub <- dplyr::filter(out_df, cohort_id == i)
  write.csv(x = out_sub,
            file = file.path("output", paste0("Cohort_", i, ".csv")),
            row.names = FALSE)
}

Elements of a list

for (i in names(out_list)){
  write.csv(x = out_list[[i]],
            file = file.path("output", paste0("Cohort_", i, ".csv")),
            row.names = FALSE)
}

Read Files

fn <- list.files(path = "output", pattern = "csv", full.names = TRUE)

in_list <- list()
for (i in seq_along(fn)){
  in_list[[i]] <- read.csv(fn[i])
}

Skip the loop with lapply.

in_list <- lapply(fn, read.csv)

Split-Apply-Combine

out_list <- list()
for (i in unique(PlantGrowth$group)){
  pg_sub <- dplyr::filter(PlantGrowth, group == i)
  avg_weight_sub <- mean(pg_sub$weight)
  out_list[[i]] <- data.frame(avg_weight = avg_weight_sub)
}
dplyr::bind_rows(out_list, .id = "group")
##   group avg_weight
## 1  ctrl      5.032
## 2  trt1      4.661
## 3  trt2      5.526

Split-Apply-Combine

out_list <- list()
for (i in unique(PlantGrowth$group)){
  pg_sub <- dplyr::filter(PlantGrowth, group == i)
  avg_weight_sub <- mean(pg_sub$weight)
  out_list[[i]] <- data.frame(avg_weight = avg_weight_sub)
}
dplyr::bind_rows(out_list, .id = "group")

Skip the loop with group_by and summarise from the dplyr package.

PlantGrowth %>% 
  group_by(group) %>% 
  summarise(avg_weight = mean(weight))

Take-away Points

When available, use vectorized functions.

Loops are useful when filling arrays, reading/writing data, and saving plots.

Loops are natural extension of procedural style of programming.

Growing a data structure in a loop is inefficient.
Pre-allocate instead.

If you don’t find loops intuitive, explore the apply family of functions or the purrr package.

Take-away Points

I was just reminded I’ve gone 782 days with no loops written in #rstats. Why do we need to teach these to beginners again? - Miles McBain