Of course someone has to write loops. It doesn’t have to be you. - Jenny Bryan
But, if you are like me, you might find loops more intuitive for some tasks.
And, in some cases, a loop is the best tool for the job.
2019-11-14
Of course someone has to write loops. It doesn’t have to be you. - Jenny Bryan
But, if you are like me, you might find loops more intuitive for some tasks.
And, in some cases, a loop is the best tool for the job.
Hand-written loops are more verbose and less optimized than built-in functions.
x <- c() val <- 1 while (val < 2.1) { x <- c(x, val) val <- val + 0.1 } x
## [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
seq(1, 2, 0.1)
## [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
x <- c() for (i in 1:10){ x <- c(x, "*") } x
## [1] "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
rep("*", 10)
## [1] "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
for (i in 1:5){ x <- c() for (j in 1:i){ x <- c(x, "*") } cat(c(x, "\n"), sep = "") }
## * ## ** ## *** ## **** ## *****
for (i in 1:5){ x <- c() for (j in 1:i){ x <- c(x, "*") } cat(c(x, "\n"), sep = "") }
for (i in 1:5){ cat(c(rep("*", i), "\n"), sep = "") }
## * ## ** ## *** ## **** ## *****
Classic use case for loops.
out <- array(dim = c(3, 3, 3)) for (i in 1:3){ for (j in 1:3){ for (k in 1:3){ out[i, j, k] <- i * j + k } } } out[, , 1]
## [,1] [,2] [,3] ## [1,] 2 3 4 ## [2,] 3 5 7 ## [3,] 4 7 10
First, we need to create a list for the next loop example.
rlist <- list(A = data.frame(x = sample(10, 3), y = sample(10, 3)), B = data.frame(x = sample(10, 3), y = sample(10, 3))) rlist
## $A ## x y ## 1 10 9 ## 2 6 5 ## 3 3 3 ## ## $B ## x y ## 1 3 10 ## 2 4 1 ## 3 6 4
If creating a new column in every data frame in the list, a novice R programmer might start with the following:
rlist[["A"]]$z <- rlist[["A"]]$x + rlist[["A"]]$y rlist[["B"]]$z <- rlist[["B"]]$x + rlist[["B"]]$y
Reasonable choice for one-off script when list has few elements.
However, if list has many elements,
rlist[["A"]]$z <- rlist[["A"]]$x + rlist[["A"]]$y rlist[["B"]]$z <- rlist[["B"]]$x + rlist[["B"]]$y rlist[["C"]]$z <- rlist[["C"]]$x + rlist[["A"]]$y rlist[["D"]]$z <- rlist[["D"]]$x + rlist[["D"]]$y rlist[["E"]]$z <- rlist[["E"]]$x + rlist[["E"]]$y rlist[["F"]]$z <- rlist[["F"]]$x + rlist[["F"]]$y
then it is shorter and less error prone to write a loop.
for (i in names(rlist)){ rlist[[i]]$z <- rlist[[i]]$x + rlist[[i]]$y }
for (i in names(rlist)){ rlist[[i]]$z <- rlist[[i]]$x + rlist[[i]]$y }
Same loop written with mutate
(from dplyr
package) to add the column.
for (i in names(rlist)){ rlist[[i]] <- dplyr::mutate(rlist[[i]], z = x + y) }
for (i in names(rlist)){ rlist[[i]]$z <- rlist[[i]]$x + rlist[[i]]$y }
Same loop written with mutate
(from dplyr
package) to add the column.
for (i in names(rlist)){ rlist[[i]] <- dplyr::mutate(rlist[[i]], z = x + y) }
Pairing lapply
and an anonymous function allows you to skip the loop.
rlist <- lapply(rlist, function(x){x$z = x$x + x$y; x})
Crude approach to simulate acoustic telemetry data of emigrating juvenile salmon.
sample5 <- function(x){sample(x, 5, replace = TRUE)} out_list <- list() for (i in 1:20){ out_list[[i]] <- data.frame(day = sort(runif(20, 0, 50)), river_mile = c(sample5(20:16), sample5(15:11), sample5(10:6), sample5(5:1))) } out_df <- dplyr::bind_rows(out_list, .id = "cohort_id")
head(out_df, 10)
## cohort_id day river_mile ## 1 1 0.6806792 18 ## 2 1 1.6473979 18 ## 3 1 5.7034870 18 ## 4 1 6.1451043 17 ## 5 1 6.5611578 16 ## 6 1 8.9997531 11 ## 7 1 22.5637265 12 ## 8 1 23.4448595 15 ## 9 1 23.5409107 12 ## 10 1 24.4495831 11
library(ggplot2) for (i in unique(out_df$cohort_id)){ out_sub <- dplyr::filter(out_df, cohort_id == i) ggplot(out_sub, aes(x = day, y = river_mile)) + geom_line() + geom_point() + labs(title = paste("Cohort", i)) ggsave(filename = paste0("Cohort_", i, ".png"), width = 6, height = 4, path = "figures") }
Splitting data frames
for (i in unique(out_df$cohort_id)){ out_sub <- dplyr::filter(out_df, cohort_id == i) write.csv(x = out_sub, file = file.path("output", paste0("Cohort_", i, ".csv")), row.names = FALSE) }
Elements of a list
for (i in names(out_list)){ write.csv(x = out_list[[i]], file = file.path("output", paste0("Cohort_", i, ".csv")), row.names = FALSE) }
fn <- list.files(path = "output", pattern = "csv", full.names = TRUE) in_list <- list() for (i in seq_along(fn)){ in_list[[i]] <- read.csv(fn[i]) }
Skip the loop with lapply
.
in_list <- lapply(fn, read.csv)
out_list <- list() for (i in unique(PlantGrowth$group)){ pg_sub <- dplyr::filter(PlantGrowth, group == i) avg_weight_sub <- mean(pg_sub$weight) out_list[[i]] <- data.frame(avg_weight = avg_weight_sub) } dplyr::bind_rows(out_list, .id = "group")
## group avg_weight ## 1 ctrl 5.032 ## 2 trt1 4.661 ## 3 trt2 5.526
out_list <- list() for (i in unique(PlantGrowth$group)){ pg_sub <- dplyr::filter(PlantGrowth, group == i) avg_weight_sub <- mean(pg_sub$weight) out_list[[i]] <- data.frame(avg_weight = avg_weight_sub) } dplyr::bind_rows(out_list, .id = "group")
Skip the loop with group_by
and summarise
from the dplyr
package.
PlantGrowth %>% group_by(group) %>% summarise(avg_weight = mean(weight))
When available, use vectorized functions.
Loops are useful when filling arrays, reading/writing data, and saving plots.
Loops are natural extension of procedural style of programming.
Growing a data structure in a loop is inefficient.
Pre-allocate instead.
If you don’t find loops intuitive, explore the apply
family of functions or the purrr
package.
I was just reminded I’ve gone 782 days with no loops written in #rstats. Why do we need to teach these to beginners again? - Miles McBain