Purpose

Create exploratory figures and run statistical analyses for the total pesticide concentration data in water and zooplankton collected from the Sacramento River (SHR) and Toe Drain (STTD) in the summer-fall periods in 2017, 2018 and 2019. These analyses are included in the NDFS contaminants manuscript.

Global code and functions

# Load packages
library(tidyverse)
library(rlang)
library(knitr)
library(patchwork)
library(car)
library(emmeans)
library(visreg)
library(here)
library(conflicted)

# Declare package conflict preferences 
conflicts_prefer(dplyr::filter(), dplyr::lag())
# Check if we are in the correct working directory
i_am("manuscript_contam/notebooks/explore_contam_conc_water_zoop.Rmd")
## here() starts at C:/Repositories/04_IEP_Org/ND-FASTR

Display current versions of R and packages used for this analysis:

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.3 (2023-03-15 ucrt)
##  os       Windows 10 x64 (build 19044)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_United States.utf8
##  ctype    English_United States.utf8
##  tz       America/Los_Angeles
##  date     2023-09-15
##  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version  date (UTC) lib source
##  abind          1.4-5    2016-07-21 [1] CRAN (R 4.2.0)
##  bslib          0.4.2    2022-12-16 [1] CRAN (R 4.2.2)
##  cachem         1.0.8    2023-05-01 [1] CRAN (R 4.2.3)
##  callr          3.7.3    2022-11-02 [1] CRAN (R 4.2.2)
##  car          * 3.1-2    2023-03-30 [1] CRAN (R 4.2.3)
##  carData      * 3.0-5    2022-01-06 [1] CRAN (R 4.2.1)
##  cli            3.6.1    2023-03-23 [1] CRAN (R 4.2.3)
##  coda           0.19-4   2020-09-30 [1] CRAN (R 4.2.1)
##  codetools      0.2-19   2023-02-01 [2] CRAN (R 4.2.3)
##  colorspace     2.1-0    2023-01-23 [1] CRAN (R 4.2.2)
##  conflicted   * 1.2.0    2023-02-01 [1] CRAN (R 4.2.2)
##  crayon         1.5.2    2022-09-29 [1] CRAN (R 4.2.1)
##  devtools       2.4.5    2022-10-11 [1] CRAN (R 4.2.1)
##  digest         0.6.31   2022-12-11 [1] CRAN (R 4.2.2)
##  dplyr        * 1.1.2    2023-04-20 [1] CRAN (R 4.2.3)
##  ellipsis       0.3.2    2021-04-29 [1] CRAN (R 4.2.1)
##  emmeans      * 1.8.5    2023-03-08 [1] CRAN (R 4.2.2)
##  estimability   1.4.1    2022-08-05 [1] CRAN (R 4.2.1)
##  evaluate       0.21     2023-05-05 [1] CRAN (R 4.2.3)
##  fansi          1.0.4    2023-01-22 [1] CRAN (R 4.2.2)
##  fastmap        1.1.1    2023-02-24 [1] CRAN (R 4.2.2)
##  forcats      * 1.0.0    2023-01-29 [1] CRAN (R 4.2.2)
##  fs             1.6.2    2023-04-25 [1] CRAN (R 4.2.3)
##  generics       0.1.3    2022-07-05 [1] CRAN (R 4.2.1)
##  ggplot2      * 3.4.2    2023-04-03 [1] CRAN (R 4.2.3)
##  glue           1.6.2    2022-02-24 [1] CRAN (R 4.2.1)
##  gtable         0.3.3    2023-03-21 [1] CRAN (R 4.2.3)
##  here         * 1.0.1    2020-12-13 [1] CRAN (R 4.2.1)
##  hms            1.1.3    2023-03-21 [1] CRAN (R 4.2.3)
##  htmltools      0.5.5    2023-03-23 [1] CRAN (R 4.2.3)
##  htmlwidgets    1.6.2    2023-03-17 [1] CRAN (R 4.2.3)
##  httpuv         1.6.9    2023-02-14 [1] CRAN (R 4.2.2)
##  jquerylib      0.1.4    2021-04-26 [1] CRAN (R 4.2.1)
##  jsonlite       1.8.4    2022-12-06 [1] CRAN (R 4.2.2)
##  knitr        * 1.42     2023-01-25 [1] CRAN (R 4.2.2)
##  later          1.3.0    2021-08-18 [1] CRAN (R 4.2.1)
##  lattice        0.21-8   2023-04-05 [1] CRAN (R 4.2.3)
##  lifecycle      1.0.3    2022-10-07 [1] CRAN (R 4.2.1)
##  lubridate    * 1.9.2    2023-02-10 [1] CRAN (R 4.2.2)
##  magrittr       2.0.3    2022-03-30 [1] CRAN (R 4.2.1)
##  MASS           7.3-58.2 2023-01-23 [2] CRAN (R 4.2.3)
##  Matrix         1.5-4    2023-04-04 [1] CRAN (R 4.2.3)
##  memoise        2.0.1    2021-11-26 [1] CRAN (R 4.2.1)
##  mime           0.12     2021-09-28 [1] CRAN (R 4.2.0)
##  miniUI         0.1.1.1  2018-05-18 [1] CRAN (R 4.2.1)
##  multcomp       1.4-23   2023-03-09 [1] CRAN (R 4.2.2)
##  munsell        0.5.0    2018-06-12 [1] CRAN (R 4.2.1)
##  mvtnorm        1.1-3    2021-10-08 [1] CRAN (R 4.2.0)
##  patchwork    * 1.1.2    2022-08-19 [1] CRAN (R 4.2.1)
##  pillar         1.9.0    2023-03-22 [1] CRAN (R 4.2.3)
##  pkgbuild       1.4.0    2022-11-27 [1] CRAN (R 4.2.2)
##  pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 4.2.1)
##  pkgload        1.3.2    2022-11-16 [1] CRAN (R 4.2.2)
##  prettyunits    1.1.1    2020-01-24 [1] CRAN (R 4.2.1)
##  processx       3.8.1    2023-04-18 [1] CRAN (R 4.2.3)
##  profvis        0.3.7    2020-11-02 [1] CRAN (R 4.2.1)
##  promises       1.2.0.1  2021-02-11 [1] CRAN (R 4.2.1)
##  ps             1.7.5    2023-04-18 [1] CRAN (R 4.2.3)
##  purrr        * 1.0.1    2023-01-10 [1] CRAN (R 4.2.2)
##  R6             2.5.1    2021-08-19 [1] CRAN (R 4.2.1)
##  Rcpp           1.0.10   2023-01-22 [1] CRAN (R 4.2.2)
##  readr        * 2.1.4    2023-02-10 [1] CRAN (R 4.2.2)
##  remotes        2.4.2    2021-11-30 [1] CRAN (R 4.2.1)
##  rlang        * 1.1.1    2023-04-28 [1] CRAN (R 4.2.3)
##  rmarkdown      2.21     2023-03-26 [1] CRAN (R 4.2.3)
##  rprojroot      2.0.3    2022-04-02 [1] CRAN (R 4.2.1)
##  rstudioapi     0.14     2022-08-22 [1] CRAN (R 4.2.1)
##  sandwich       3.0-2    2022-06-15 [1] CRAN (R 4.2.1)
##  sass           0.4.6    2023-05-03 [1] CRAN (R 4.2.3)
##  scales         1.2.1    2022-08-20 [1] CRAN (R 4.2.1)
##  sessioninfo    1.2.2    2021-12-06 [1] CRAN (R 4.2.1)
##  shiny          1.7.4    2022-12-15 [1] CRAN (R 4.2.2)
##  stringi        1.7.12   2023-01-11 [1] CRAN (R 4.2.2)
##  stringr      * 1.5.0    2022-12-02 [1] CRAN (R 4.2.2)
##  survival       3.5-5    2023-03-12 [1] CRAN (R 4.2.3)
##  TH.data        1.1-2    2023-04-17 [1] CRAN (R 4.2.3)
##  tibble       * 3.2.1    2023-03-20 [1] CRAN (R 4.2.3)
##  tidyr        * 1.3.0    2023-01-24 [1] CRAN (R 4.2.2)
##  tidyselect     1.2.0    2022-10-10 [1] CRAN (R 4.2.1)
##  tidyverse    * 2.0.0    2023-02-22 [1] CRAN (R 4.2.2)
##  timechange     0.2.0    2023-01-11 [1] CRAN (R 4.2.2)
##  tzdb           0.4.0    2023-05-12 [1] CRAN (R 4.2.3)
##  urlchecker     1.0.1    2021-11-30 [1] CRAN (R 4.2.1)
##  usethis        2.1.6    2022-05-25 [1] CRAN (R 4.2.1)
##  utf8           1.2.3    2023-01-31 [1] CRAN (R 4.2.2)
##  vctrs          0.6.2    2023-04-19 [1] CRAN (R 4.2.3)
##  visreg       * 2.7.0    2020-06-04 [1] CRAN (R 4.2.1)
##  withr          2.5.0    2022-03-03 [1] CRAN (R 4.2.1)
##  xfun           0.39     2023-04-20 [1] CRAN (R 4.2.3)
##  xtable         1.8-4    2019-04-21 [1] CRAN (R 4.2.1)
##  yaml           2.3.7    2023-01-23 [1] CRAN (R 4.2.2)
##  zoo            1.8-12   2023-04-13 [1] CRAN (R 4.2.3)
## 
##  [1] C:/R/win-library/4.2
##  [2] C:/Program Files/R/R-4.2.3/library
## 
## ──────────────────────────────────────────────────────────────────────────────

Create function to plot model diagnostics:

# Create model diagnostic plots to check assumptions
plot_model_diag <- function(df_data, param_var, unit_label, model, trans = c("none", "log", "sqrt"), ...) {
  trans <- match.arg(trans)
  
  df_data <- df_data %>%
    mutate(
      Residuals = resid(model),
      Fitted = predict(model)
    )
  
  param_name <- as_name(ensym(param_var))
  
  if (trans == "log") {
    unit_label <- paste0("log[", unit_label, "]")
  } else if (trans == "sqrt") {
    unit_label <- paste0("sqrt[", unit_label, "]")
  }
  
  plt_hist <- ggplot(df_data, aes(x = Residuals)) +
    geom_histogram(...) +
    labs(
      title = "Residual Histogram", 
      x = "Residuals"
    ) +
    theme_bw()
  
  plt_qq <- ggplot(df_data, aes(sample = Residuals)) + 
    labs(
      title = "Residual Probability Plot", 
      x = "Normal Quantiles", 
      y = "Residuals"
    ) + 
    stat_qq() + 
    stat_qq_line(linewidth = 1, color = 'red') + 
    theme_bw()
  
  plt_res_fit <- ggplot(df_data, aes(x = Fitted, y = Residuals)) +
    geom_point() +
    labs(
      title = "Residuals vs. Fitted Values", 
      x = (paste0("Predicted ", param_name, " (", unit_label, ")")),
      y = paste0("Residuals (", unit_label, ")")
    ) +
    theme_bw()
  
  plt_obs_fit <- ggplot(df_data, aes(x = Fitted, y = {{ param_var}})) +
    geom_point() +
    geom_abline(slope = 1, intercept = 0, color = "red") +
    labs(
      title = "Observed vs. Fitted Values", 
      x = paste0("Predicted ", param_name, " (", unit_label, ")"),
      y = paste0("Observed ", param_name, " (", unit_label, ")")
    ) +
    theme_bw()
  
  (plt_hist + plt_qq) / (plt_res_fit + plt_obs_fit)
}

Import and Prepare Data

# Define root directory for pesticide data:
fp_pest <- here("manuscript_contam/data/processed")

# Total Pesticide Concentration data:
df_conc_all <- read_rds(file.path(fp_pest, "contam_conc_water_zoop_2017-2019.rds"))

# Pesticide application data:
df_appl <- read_rds(file.path(fp_pest, "pesticide_use_daily_tot_2017-2020.rds"))

# Daily average flow data:
df_davg_flow <- read_rds(here("Water_Quality/Data_Processed/LIS_SR_DailyAvgFlow_2013-2022.rds"))

Prepare the pesticide application data to be joined to the concentration data by calculating monthly totals and standardizing to area of the watershed.

df_appl_c <- df_appl %>% 
  # Calculate monthly totals by region
  summarize(TotalAppl = sum(TotalApplication), .by = c(Region, Year, Month)) %>% 
  mutate(
    # Standardize total application to area (in square miles) of the watershed 
    TotalApplStd = if_else(
      Region == "Sacramento River", 
      TotalAppl/22526.79948,
      TotalAppl/4217.503529
    ),
    # Define station that each region is assigned to
    Station = case_match(
      Region,
      "Toe Drain" ~ "STTD", 
      "Sacramento River" ~ "SHR"
    )
  ) %>% 
  # Lag standardized total application data by one month
  arrange(Region, Year, Month) %>% 
  group_by(Region) %>% 
  mutate(TotalApplStd_lag1 = lag(TotalApplStd, n = 1)) %>% 
  ungroup() %>% 
  # Clean up dataframe
  select(Station, Year, Month, TotalApplStd, TotalApplStd_lag1)

Prepare the flow data to be joined to the concentration data using SR at Freeport flow data for SHR and LIS flow data for STTD. Use Daily Average Net flows for both.

df_davg_flow_c <- df_davg_flow %>% 
  select(-LIS_DailyAvgInstFlow) %>% 
  pivot_longer(
    cols = ends_with("NetFlow"),
    names_to = "Station",
    values_to = "DailyAvgNetFlow"
  ) %>% 
  mutate(
    Station = case_match(
      Station,
      "SR_DailyAvgNetFlow" ~ "SHR",
      "LIS_DailyAvgNetFlow" ~ "STTD"
    )
  )

Join pesticide concentration, application, and flow data together.

df_conc_all_c1 <- df_conc_all %>% 
  mutate(
    Year = year(Date),
    Month = month(Date, label = TRUE)
  ) %>% 
  left_join(df_appl_c, by = join_by(Station, Year, Month)) %>% 
  left_join(df_davg_flow_c, by = join_by(Date, Station))

# Look at all data
df_conc_all_c1 %>% 
  select(
    Station,
    Date,
    Year,
    starts_with("TotalConc"),
    DailyAvgNetFlow,
    starts_with("TotalAppl")
  ) %>% 
  kable()
Station Date Year TotalConc_Zoop TotalConc_Water DailyAvgNetFlow TotalApplStd TotalApplStd_lag1
SHR 2017-08-29 2017 227.19872 389.42531 21200.00 3.379151 18.641841
SHR 2017-09-12 2017 174.19088 396.64632 22100.00 1.440834 3.379151
SHR 2017-09-19 2017 137.99606 147.01557 21500.00 1.440834 3.379151
SHR 2017-10-03 2017 316.69620 80.32957 19400.00 2.573115 1.440834
SHR 2017-10-17 2017 427.37944 47.92240 11700.00 2.573115 1.440834
SHR 2017-11-01 2017 265.57107 97.29594 10400.00 2.758317 2.573115
SHR 2018-07-12 2018 218.84885 188.76610 15700.00 17.692933 37.851786
SHR 2018-07-25 2018 164.99273 262.27653 16500.00 17.692933 37.851786
SHR 2018-08-09 2018 413.08422 668.97693 20200.00 3.639683 17.692933
SHR 2018-08-23 2018 302.75363 564.30313 17500.00 3.639683 17.692933
SHR 2018-09-06 2018 324.04838 317.93755 18000.00 1.435832 3.639683
SHR 2018-09-20 2018 270.17804 151.46644 16500.00 1.435832 3.639683
SHR 2018-10-11 2018 651.37113 98.85206 11600.00 2.469192 1.435832
SHR 2018-10-26 2018 0.00000 75.62678 9060.00 2.469192 1.435832
SHR 2018-11-09 2018 359.01012 198.17201 7850.00 2.253170 2.469192
SHR 2019-05-21 2019 127.90884 445.48078 43200.00 15.849626 5.706071
SHR 2019-06-04 2019 103.08011 215.69136 44500.00 38.770625 15.849626
SHR 2019-06-18 2019 419.33946 505.85984 23500.00 38.770625 15.849626
SHR 2019-07-02 2019 129.36941 507.56875 19300.00 19.143787 38.770625
SHR 2019-07-16 2019 163.66643 374.25063 17300.00 19.143787 38.770625
SHR 2019-08-05 2019 322.10233 337.91232 19700.00 3.012142 19.143787
SHR 2019-08-19 2019 386.21467 554.08375 21000.00 3.012142 19.143787
SHR 2019-09-05 2019 219.43424 423.41095 22500.00 1.578017 3.012142
SHR 2019-09-16 2019 246.22101 331.93137 21400.00 1.578017 3.012142
SHR 2019-09-30 2019 418.09428 148.04025 19000.00 1.578017 3.012142
SHR 2019-10-14 2019 445.06218 70.76964 12600.00 1.125018 1.578017
SHR 2019-10-28 2019 690.50734 90.12275 11500.00 1.125018 1.578017
SHR 2019-11-13 2019 378.31172 115.52704 10100.00 2.724691 1.125018
SHR 2019-11-25 2019 405.88485 186.34375 11000.00 2.724691 1.125018
SHR 2019-12-10 2019 323.05106 558.23809 23700.00 1.373379 2.724691
SHR 2019-12-27 2019 326.21796 253.96279 19500.00 1.373379 2.724691
STTD 2017-08-30 2017 75.15536 7.64 21.114730 69.159300
STTD 2017-09-13 2017 26.92852 108.00 5.969721 21.114730
STTD 2017-09-20 2017 73.05093 -75.10 5.969721 21.114730
STTD 2017-10-04 2017 269.00064 -82.80 6.902682 5.969721
STTD 2017-10-19 2017 62.22037 -47.80 6.902682 5.969721
STTD 2017-11-02 2017 285.62279 -29.70 12.922356 6.902682
STTD 2018-07-11 2018 165.58977 -70.50 66.033924 162.231117
STTD 2018-07-26 2018 100.06627 -108.00 66.033924 162.231117
STTD 2018-08-08 2018 64.42045 -70.30 21.975841 66.033924
STTD 2018-08-22 2018 212.96740 -73.70 21.975841 66.033924
STTD 2018-08-30 2018 686.83756 305.00 21.975841 66.033924
STTD 2018-09-05 2018 1170.56494 337.00 4.200371 21.975841
STTD 2018-09-19 2018 797.15646 387.00 4.200371 21.975841
STTD 2018-10-10 2018 83.18350 -93.40 9.863104 4.200371
STTD 2018-10-25 2018 99.33193 -12.00 9.863104 4.200371
STTD 2018-11-08 2018 213.58742 -22.40 13.370382 9.863104
STTD 2019-05-20 2019 919.34468 362.10000 1270.00 85.276308 39.939688
STTD 2019-06-03 2019 903.77514 1467.63536 825.00 152.302865 85.276308
STTD 2019-06-17 2019 332.82856 899.79820 63.30 152.302865 85.276308
STTD 2019-07-01 2019 230.40000 607.28115 67.332801 152.302865
STTD 2019-07-15 2019 125.70000 532.99378 67.332801 152.302865
STTD 2019-08-07 2019 0.00000 347.24398 -43.50 28.719601 67.332801
STTD 2019-08-21 2019 89.00000 457.27127 -9.89 28.719601 67.332801
STTD 2019-09-04 2019 641.20000 2289.24197 598.00 7.454874 28.719601
STTD 2019-09-18 2019 548.00000 2093.83725 635.00 7.454874 28.719601
STTD 2019-10-02 2019 333.90000 766.80342 -19.60 4.366761 7.454874
STTD 2019-10-16 2019 181.70000 347.25295 -88.70 4.366761 7.454874
STTD 2019-10-28 2019 315.00000 233.26624 -56.50 4.366761 7.454874
STTD 2019-11-14 2019 406.10000 164.59963 -113.00 14.037269 4.366761
STTD 2019-11-26 2019 413.70000 193.53662 6.97 14.037269 4.366761
STTD 2019-12-09 2019 994.50000 1714.24731 300.00 5.704477 14.037269
STTD 2019-12-26 2019 373.20000 577.11950 220.00 5.704477 14.037269

It looks like all of the flow data was available for SHR, but STTD is missing a few flow values in July 2019. We’ll restrict the date ranges to July 1 - Oct 31 for each year since some of the data extends outside of the summer-fall period.

df_conc_all_c2 <- df_conc_all_c1 %>% 
  mutate(Month = as.numeric(Month)) %>%
  filter(Month %in% 7:10)

# Look at filtered data
df_conc_all_c2 %>% 
  select(
    Station,
    Date,
    Year,
    starts_with("TotalConc"),
    DailyAvgNetFlow,
    starts_with("TotalAppl")
  ) %>% 
  kable()
Station Date Year TotalConc_Zoop TotalConc_Water DailyAvgNetFlow TotalApplStd TotalApplStd_lag1
SHR 2017-08-29 2017 227.19872 389.42531 21200.00 3.379151 18.641841
SHR 2017-09-12 2017 174.19088 396.64632 22100.00 1.440834 3.379151
SHR 2017-09-19 2017 137.99606 147.01557 21500.00 1.440834 3.379151
SHR 2017-10-03 2017 316.69620 80.32957 19400.00 2.573115 1.440834
SHR 2017-10-17 2017 427.37944 47.92240 11700.00 2.573115 1.440834
SHR 2018-07-12 2018 218.84885 188.76610 15700.00 17.692933 37.851786
SHR 2018-07-25 2018 164.99273 262.27653 16500.00 17.692933 37.851786
SHR 2018-08-09 2018 413.08422 668.97693 20200.00 3.639683 17.692933
SHR 2018-08-23 2018 302.75363 564.30313 17500.00 3.639683 17.692933
SHR 2018-09-06 2018 324.04838 317.93755 18000.00 1.435832 3.639683
SHR 2018-09-20 2018 270.17804 151.46644 16500.00 1.435832 3.639683
SHR 2018-10-11 2018 651.37113 98.85206 11600.00 2.469192 1.435832
SHR 2018-10-26 2018 0.00000 75.62678 9060.00 2.469192 1.435832
SHR 2019-07-02 2019 129.36941 507.56875 19300.00 19.143787 38.770625
SHR 2019-07-16 2019 163.66643 374.25063 17300.00 19.143787 38.770625
SHR 2019-08-05 2019 322.10233 337.91232 19700.00 3.012142 19.143787
SHR 2019-08-19 2019 386.21467 554.08375 21000.00 3.012142 19.143787
SHR 2019-09-05 2019 219.43424 423.41095 22500.00 1.578017 3.012142
SHR 2019-09-16 2019 246.22101 331.93137 21400.00 1.578017 3.012142
SHR 2019-09-30 2019 418.09428 148.04025 19000.00 1.578017 3.012142
SHR 2019-10-14 2019 445.06218 70.76964 12600.00 1.125018 1.578017
SHR 2019-10-28 2019 690.50734 90.12275 11500.00 1.125018 1.578017
STTD 2017-08-30 2017 75.15536 7.64 21.114730 69.159300
STTD 2017-09-13 2017 26.92852 108.00 5.969721 21.114730
STTD 2017-09-20 2017 73.05093 -75.10 5.969721 21.114730
STTD 2017-10-04 2017 269.00064 -82.80 6.902682 5.969721
STTD 2017-10-19 2017 62.22037 -47.80 6.902682 5.969721
STTD 2018-07-11 2018 165.58977 -70.50 66.033924 162.231117
STTD 2018-07-26 2018 100.06627 -108.00 66.033924 162.231117
STTD 2018-08-08 2018 64.42045 -70.30 21.975841 66.033924
STTD 2018-08-22 2018 212.96740 -73.70 21.975841 66.033924
STTD 2018-08-30 2018 686.83756 305.00 21.975841 66.033924
STTD 2018-09-05 2018 1170.56494 337.00 4.200371 21.975841
STTD 2018-09-19 2018 797.15646 387.00 4.200371 21.975841
STTD 2018-10-10 2018 83.18350 -93.40 9.863104 4.200371
STTD 2018-10-25 2018 99.33193 -12.00 9.863104 4.200371
STTD 2019-07-01 2019 230.40000 607.28115 67.332801 152.302865
STTD 2019-07-15 2019 125.70000 532.99378 67.332801 152.302865
STTD 2019-08-07 2019 0.00000 347.24398 -43.50 28.719601 67.332801
STTD 2019-08-21 2019 89.00000 457.27127 -9.89 28.719601 67.332801
STTD 2019-09-04 2019 641.20000 2289.24197 598.00 7.454874 28.719601
STTD 2019-09-18 2019 548.00000 2093.83725 635.00 7.454874 28.719601
STTD 2019-10-02 2019 333.90000 766.80342 -19.60 4.366761 7.454874
STTD 2019-10-16 2019 181.70000 347.25295 -88.70 4.366761 7.454874
STTD 2019-10-28 2019 315.00000 233.26624 -56.50 4.366761 7.454874

Zooplankton

# Prepare zooplankton concentration data for analysis
df_zoop <- df_conc_all_c2 %>% 
  select(-TotalConc_Water) %>% 
  drop_na() %>% 
  # Remove two samples with total concentrations equal to zero
  filter(TotalConc_Zoop > 0) %>% 
  mutate(
    # Add log-transformed and sqrt-transformed zooplankton concentrations
    TotalConc_Zoop_log = log(TotalConc_Zoop),
    TotalConc_Zoop_sqrt = sqrt(TotalConc_Zoop),
    # Convert year to character for categorical models
    Year = as.character(Year)
  )

Exploratory Plots

Before running the analyses, lets visualize the data.

Boxplots

First, we’ll look at a boxplot comparing total pesticide concentrations in zooplankton between stations.

df_zoop %>% 
  ggplot(aes(x = Station, y = TotalConc_Zoop)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(aes(color = Year), width = 0.3) +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  ylab("Total pesticide concentration in zooplankton (ng/g)")

SHR has a higher median concentration than STTD, but STTD has some higher values than SHR particularly in 2018. Let’s break apart the boxplot by station and year.

df_zoop %>% 
  ggplot(aes(x = Year, y = TotalConc_Zoop, fill = Year)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.2) +
  facet_wrap(vars(Station)) +
  scale_fill_viridis_d(option = "plasma", end = 0.8) +
  ylab("Total pesticide concentration in zooplankton (ng/g)")

Overall, STTD in 2018 had the highest variation in concentrations. Let’s look at a boxplot grouped only by year.

df_zoop %>% 
  ggplot(aes(x = Year, y = TotalConc_Zoop)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.3) +
  ylab("Total pesticide concentration in zooplankton (ng/g)")

The median of total concentrations was slightly higher in 2019 than other years when both stations are grouped together.

Seasonal time-series

Now, let’s see how total concentrations varied seasonally using day of year for each station and year.

df_zoop %>% 
  ggplot(aes(x = yday(Date), y = TotalConc_Zoop, color = Station)) +
  geom_point() +
  facet_grid(rows = vars(Year)) +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  labs(x = "DOY", y = "Total pesticide concentration in zooplankton (ng/g)")

It almost looks like SHR and STTD have opposite seasonal trends.

Concentration vs Flow

Now, let’s take a look at how the total pesticide concentrations in zooplankton vary with daily average net flow for each station.

df_zoop %>% 
  ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Zoop, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Daily average net flow (cfs)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

df_zoop %>% 
  ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Zoop, color = Year)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Daily average net flow (cfs)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

SHR seems to have a negative relationship between concentration and flow, while concentrations at STTD seem to have a positive relationship with flow.

Concentration vs Application

Next, let’s look at the total pesticide concentrations in zooplankton as a function of standardized monthly total application data for each station.

df_zoop %>% 
  ggplot(aes(x = TotalApplStd, y = TotalConc_Zoop, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Monthly Total Application (lbs/sq mile)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

df_zoop %>% 
  ggplot(aes(x = TotalApplStd, y = TotalConc_Zoop, color = Year)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Monthly Total Application (lbs/sq mile)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

There appears to be a weak negative relationship between pesticide concentrations in zooplankton and monthly application amounts at both stations. I’m not sure that monthly application will be a good predictor. Let’s take a look at whether using the prior month’s application amount makes a difference.

df_zoop %>% 
  ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Zoop, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Monthly Total Application in prior month (lbs/sq mile)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

df_zoop %>% 
  ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Zoop, color = Year)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Monthly Total Application in prior month (lbs/sq mile)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

Hmm, not really…

Histograms

Lastly, let’s take a look at histograms of total pesticide concentrations in zooplankton to look at its distribution. We’ll look at each station separately and then all data grouped together.

df_zoop %>% 
  ggplot(aes(x = TotalConc_Zoop)) +
  geom_histogram(bins = 10) +
  facet_wrap(vars(Station), scales = "free") +
  xlab("Total pesticide concentration in zooplankton (ng/g)")

df_zoop %>% 
  ggplot(aes(x = TotalConc_Zoop)) +
  geom_histogram() +
  xlab("Total pesticide concentration in zooplankton (ng/g)")

We’ll probably need to use transformed zooplankton concentration data for the analyses.

Analyses

Models with Flow and Application

We’ll create linear models of the zooplankton concentrations as a function of daily average net flow and monthly pesticide application amounts for each station separately. We’ll start with the original concentration values, but we may need to transform them. These models will allow us to examine:

  1. Do total pesticide concentrations in zooplankton significantly respond to flow at either or both SHR and STTD?
  2. Do total pesticide concentrations in zooplankton significantly respond to pesticide application at either or both SHR and STTD?

SHR

We’ll start with SHR

Current month application
df_zoop_shr <- df_zoop %>% filter(Station == "SHR")

m_zoop_q_appl_shr <- df_zoop_shr %>% 
  lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_shr %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g", 
  m_zoop_q_appl_shr, 
  bins = 10
)

shapiro.test(m_zoop_q_appl_shr$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_shr$residuals
## W = 0.94631, p-value = 0.2895

The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.

m_zoop_q_appl_shr_log <- df_zoop_shr %>% 
  lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_shr %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_q_appl_shr_log, 
  trans = "log", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_shr_log$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_shr_log$residuals
## W = 0.9739, p-value = 0.8169

The model diagnostics look pretty good with the log-transformed values, so we’ll take a closer look at this model.

summary(m_zoop_q_appl_shr_log)
## 
## Call:
## lm(formula = TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59245 -0.12635 -0.02903  0.17388  0.49120 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      7.364e+00  3.196e-01  23.041 8.26e-15 ***
## DailyAvgNetFlow -8.287e-05  1.712e-05  -4.841 0.000131 ***
## TotalApplStd    -4.318e-02  9.271e-03  -4.657 0.000196 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2726 on 18 degrees of freedom
## Multiple R-squared:  0.7004, Adjusted R-squared:  0.6671 
## F-statistic: 21.04 on 2 and 18 DF,  p-value: 1.944e-05
plt_m_zoop_q_shr <- visreg(m_zoop_q_appl_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_appl_shr <- visreg(m_zoop_q_appl_shr_log, xvar = "TotalApplStd", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_shr <- plt_m_zoop_q_shr + plt_m_zoop_appl_shr
plt_m_zoop_shr

Total pesticide concentrations decrease significantly with flow and monthly pesticide application at the SHR station; however, the relationship with monthly pesticide application is driven by four values with the highest application.

Prior month application

Let’s see if the model is a better fit when we use the prior month’s application amount.

m_zoop_q_appl_pr_shr <- df_zoop_shr %>% 
  lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_shr %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g", 
  m_zoop_q_appl_pr_shr, 
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_shr$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_shr$residuals
## W = 0.91528, p-value = 0.06986

The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.

m_zoop_q_appl_pr_shr_log <- df_zoop_shr %>% 
  lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_shr %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_q_appl_pr_shr_log, 
  trans = "log", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_shr_log$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_shr_log$residuals
## W = 0.97348, p-value = 0.8082

The model diagnostics look pretty good with the log-transformed values, so we’ll take a closer look at this model.

summary(m_zoop_q_appl_pr_shr_log)
## 
## Call:
## lm(formula = TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.62399 -0.22201 -0.02841  0.19220  0.61989 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        7.136e+00  3.768e-01  18.935 2.47e-13 ***
## DailyAvgNetFlow   -7.119e-05  2.066e-05  -3.446  0.00288 ** 
## TotalApplStd_lag1 -1.588e-02  5.192e-03  -3.058  0.00677 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3284 on 18 degrees of freedom
## Multiple R-squared:  0.5653, Adjusted R-squared:  0.517 
## F-statistic:  11.7 on 2 and 18 DF,  p-value: 0.0005545
plt_m_zoop_q_pr_shr <- visreg(m_zoop_q_appl_pr_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_appl_pr_shr <- visreg(m_zoop_q_appl_pr_shr_log, xvar = "TotalApplStd_lag1", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_pr_shr <- plt_m_zoop_q_pr_shr + plt_m_zoop_appl_pr_shr
plt_m_zoop_pr_shr

Again, total pesticide concentrations decrease significantly with flow and monthly pesticide application at the SHR station using the prior month’s application. The relationship with monthly pesticide application isn’t as susceptible to leverage by values with the highest application.

Model selection

Let’s take a look at AIC and BIC values to see which model is better.

AIC(m_zoop_q_appl_shr_log, m_zoop_q_appl_pr_shr_log)
##                          df       AIC
## m_zoop_q_appl_shr_log     4  9.768586
## m_zoop_q_appl_pr_shr_log  4 17.586844
BIC(m_zoop_q_appl_shr_log, m_zoop_q_appl_pr_shr_log)
##                          df      BIC
## m_zoop_q_appl_shr_log     4 13.94668
## m_zoop_q_appl_pr_shr_log  4 21.76493

Both AIC and BIC prefer the model with the current month’s pesticide application.

STTD

Next, we’ll look at STTD

Current month application
df_zoop_sttd <- df_zoop %>% filter(Station == "STTD")

m_zoop_q_appl_sttd <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g", 
  m_zoop_q_appl_sttd, 
  bins = 10
)

shapiro.test(m_zoop_q_appl_sttd$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_sttd$residuals
## W = 0.90656, p-value = 0.05484

The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.

m_zoop_q_appl_sttd_log <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_q_appl_sttd_log, 
  trans = "log", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_sttd_log$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_sttd_log$residuals
## W = 0.92061, p-value = 0.1018

The natural log transformation may be too strong. We’ll re-run the model using the square root-transformation.

m_zoop_q_appl_sttd_sqrt <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop_sqrt, 
  "ng/g", 
  m_zoop_q_appl_sttd_sqrt, 
  trans = "sqrt", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_sttd_sqrt$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_sttd_sqrt$residuals
## W = 0.95259, p-value = 0.4081

The model diagnostics look best with this model, so we’ll take a closer look at this one.

summary(m_zoop_q_appl_sttd_sqrt)
## 
## Call:
## lm(formula = TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.1552  -3.4770  -0.8651   4.8538  12.0845 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     13.760969   1.920517   7.165 1.58e-06 ***
## DailyAvgNetFlow  0.025096   0.005822   4.311 0.000474 ***
## TotalApplStd    -0.021261   0.074230  -0.286 0.778020    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.754 on 17 degrees of freedom
## Multiple R-squared:  0.5549, Adjusted R-squared:  0.5025 
## F-statistic:  10.6 on 2 and 17 DF,  p-value: 0.001028
plt_m_zoop_q_sttd <- visreg(m_zoop_q_appl_sttd_sqrt, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_appl_sttd <- visreg(m_zoop_q_appl_sttd_sqrt, xvar = "TotalApplStd", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_sttd <- plt_m_zoop_q_sttd + plt_m_zoop_appl_sttd
plt_m_zoop_sttd

Total pesticide concentrations increase significantly with flow at the STTD station, but there is no relationship between concentration and monthly pesticide application.

Prior month application

Let’s see if the model is a better fit when we use the prior month’s application amount.

m_zoop_q_appl_pr_sttd <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g",
  m_zoop_q_appl_pr_sttd, 
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_sttd$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_sttd$residuals
## W = 0.90557, p-value = 0.05251

The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.

m_zoop_q_appl_pr_sttd_log <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_q_appl_pr_sttd_log, 
  trans = "log", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_sttd_log$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_sttd_log$residuals
## W = 0.92017, p-value = 0.09981

Again, the natural log transformation may be too strong. We’ll re-run the model using the square root-transformation.

m_zoop_q_appl_pr_sttd_sqrt <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop_sqrt, 
  "ng/g", 
  m_zoop_q_appl_pr_sttd_sqrt, 
  trans = "sqrt",
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_sttd_sqrt$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_sttd_sqrt$residuals
## W = 0.9525, p-value = 0.4067

The model diagnostics look best with this model, so we’ll take a closer look at this one.

summary(m_zoop_q_appl_pr_sttd_sqrt)
## 
## Call:
## lm(formula = TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.078  -3.453  -1.004   4.880  12.143 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       13.666197   1.855055   7.367  1.1e-06 ***
## DailyAvgNetFlow    0.025367   0.005649   4.490 0.000322 ***
## TotalApplStd_lag1 -0.006553   0.028126  -0.233 0.818554    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.759 on 17 degrees of freedom
## Multiple R-squared:  0.5542, Adjusted R-squared:  0.5017 
## F-statistic: 10.57 on 2 and 17 DF,  p-value: 0.001042
plt_m_zoop_q_pr_sttd <- visreg(m_zoop_q_appl_pr_sttd_sqrt, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_appl_pr_sttd <- visreg(m_zoop_q_appl_pr_sttd_sqrt, xvar = "TotalApplStd_lag1", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_pr_sttd <- plt_m_zoop_q_pr_sttd + plt_m_zoop_appl_pr_sttd
plt_m_zoop_pr_sttd

The results are the same with this model.

Model selection

Let’s take a look at AIC and BIC values to see which model is better.

AIC(m_zoop_q_appl_sttd_sqrt, m_zoop_q_appl_pr_sttd_sqrt)
##                            df      AIC
## m_zoop_q_appl_sttd_sqrt     4 131.5035
## m_zoop_q_appl_pr_sttd_sqrt  4 131.5360
BIC(m_zoop_q_appl_sttd_sqrt, m_zoop_q_appl_pr_sttd_sqrt)
##                            df      BIC
## m_zoop_q_appl_sttd_sqrt     4 135.4865
## m_zoop_q_appl_pr_sttd_sqrt  4 135.5190

The AIC and BIC values are nearly identical between models with both metrics giving a slight edge to the model with the current month’s pesticide application.

Model with Station and Year

Next, we’ll create a linear model of the zooplankton concentrations as a function of the interaction between station and year. We’ll start with the original concentration values, but we may need to transform them. This model will allow us to examine:

  1. If the interaction between station and year isn’t significant, whether total pesticide concentrations in zooplankton are significantly different among stations or years (main effects).
  2. If the interaction is significant, whether total pesticide concentrations in zooplankton are significantly different among stations for each year or among years for each station.

With interaction

m_zoop_sta_yr <- df_zoop %>% 
  lm(TotalConc_Zoop ~ Station * Year, data = .)

Plot the diagnostics

df_zoop %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g", 
  m_zoop_sta_yr
)

shapiro.test(m_zoop_sta_yr$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_sta_yr$residuals
## W = 0.90768, p-value = 0.002817

The residuals deviate from a normal distribution according to the Shapiro-Wilk normality test and visual inspection. Also, the variance among groups isn’t consistent. We’ll re-run the model using the natural log of the concentration values.

m_zoop_sta_yr_log <- df_zoop %>% 
  lm(TotalConc_Zoop_log ~ Station * Year, data = .)

Plot the diagnostics

df_zoop %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_sta_yr_log, 
  trans = "log"
)

shapiro.test(m_zoop_sta_yr_log$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_sta_yr_log$residuals
## W = 0.97934, p-value = 0.6506

This is much better than the model with original values. Let’s take a look at the Anova table using type 3 sum of squares for the model using natural log transformation.

ANOVA table
Anova(m_zoop_sta_yr_log, type = 3, contrasts = list(topic = contr.sum, sys = contr.sum))
## Anova Table (Type III tests)
## 
## Response: TotalConc_Zoop_log
##               Sum Sq Df  F value  Pr(>F)    
## (Intercept)  149.373  1 272.8261 < 2e-16 ***
## Station        3.248  1   5.9332 0.02009 *  
## Year           0.230  2   0.2101 0.81154    
## Station:Year   1.887  2   1.7230 0.19334    
## Residuals     19.163 35                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Well, the interaction term is not significant which means we can take a look at the main effects. Before we do this, we’ll remove the interaction from the model since it’s not necessary. We’ll stick with the model using natural log transformation.

Without interaction

m_zoop_sta_yr_log2 <- df_zoop %>% 
  lm(TotalConc_Zoop_log ~ Station + Year, data = .)

df_zoop %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_sta_yr_log2, 
  trans = "log"
)

shapiro.test(m_zoop_sta_yr_log2$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_sta_yr_log2$residuals
## W = 0.98495, p-value = 0.8542

The diagnostics still look good with this model. Let’s take a look at the Anova table using type 2 sum of squares because of the unbalanced design.

ANOVA table
Anova(m_zoop_sta_yr_log2, type = 2)
## Anova Table (Type II tests)
## 
## Response: TotalConc_Zoop_log
##            Sum Sq Df F value  Pr(>F)  
## Station    1.8028  1  3.1690 0.08326 .
## Year       3.8276  2  3.3641 0.04546 *
## Residuals 21.0492 37                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The main effect of station is not significant (p = 0.083) meaning that total pesticide concentrations in zooplankton don’t differ between SHR and STTD. However, the main effect of Year is significant, so we can take a look at the pairwise contrasts to see how concentrations differ among the years. We’ll also take a look at the pairwise contrasts for station since it is close to significant.

Contrasts
# Contrasts for Year main effect
em_zoop_yr <- emmeans(m_zoop_sta_yr_log2, specs = "Year")
pairs(em_zoop_yr)
##  contrast            estimate    SE df t.ratio p.value
##  Year2017 - Year2018  -0.6700 0.304 37  -2.201  0.0842
##  Year2017 - Year2019  -0.7474 0.309 37  -2.420  0.0525
##  Year2018 - Year2019  -0.0774 0.274 37  -0.283  0.9570
## 
## Results are averaged over the levels of: Station 
## P value adjustment: tukey method for comparing a family of 3 estimates
visreg(m_zoop_sta_yr_log2, xvar = "Year")

# Contrasts for Station main effect
em_zoop_sta <- emmeans(m_zoop_sta_yr_log2, specs = "Station")
pairs(em_zoop_sta)
##  contrast   estimate    SE df t.ratio p.value
##  SHR - STTD    0.424 0.238 37   1.780  0.0833
## 
## Results are averaged over the levels of: Year
visreg(m_zoop_sta_yr_log2, xvar = "Station")

Pairwise tests indicate that total pesticide concentration in zooplankton in 2019 is marginally significantly higher than 2017, but the p-value is 0.053. Also, the p-value isn’t significant, but SHR is slightly higher than STTD.

Water Analysis

# Prepare water concentration data for analysis
df_water <- df_conc_all_c2 %>% 
  select(-TotalConc_Zoop) %>% 
  # only include 2019 since that was the only year water samples were collected
    # at STTD
  filter(Year == 2019) %>% 
  drop_na() %>% 
  # Add log-transformed and sqrt-transformed water concentrations
  mutate(
    TotalConc_Water_log = log(TotalConc_Water),
    TotalConc_Water_sqrt = sqrt(TotalConc_Water)
  )

Exploratory Plots

Before running the analyses, lets visualize the data.

Boxplots

First, let’s look at a boxplot comparing total pesticide concentrations in water between stations.

df_water %>% 
  ggplot(aes(x = Station, y = TotalConc_Water)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.3) +
  ylab("Total pesticide concentration in water (ng/L)")

STTD seems to have higher concentrations with much more variation than SHR.

Seasonal time-series

Now, let’s see how total concentrations varied seasonally in 2019 for each station.

df_water %>% 
  ggplot(aes(x = Date, y = TotalConc_Water, color = Station)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  labs(x = "Date", y = "Total pesticide concentration in water (ng/L)")

Concentration vs Flow

Now, let’s take a look at how the total pesticide concentrations in water vary with daily average net flow for each station.

df_water %>% 
  ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Water, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Daily average net flow (cfs)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

df_water %>% 
  ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Water)) +
  geom_point() +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Daily average net flow (cfs)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

Both SHR and STTD seem to have positive relationships between concentration and flow, but STTD looks weak and strongly affected by values with the highest flows.

Concentration vs Application

Next, let’s look at the total pesticide concentrations in water as a function of standardized monthly total application data for each station.

df_water %>% 
  ggplot(aes(x = TotalApplStd, y = TotalConc_Water, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Monthly Total Application (lbs/sq mile)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

df_water %>% 
  ggplot(aes(x = TotalApplStd, y = TotalConc_Water)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Monthly Total Application (lbs/sq mile)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

There appears to be a weak negative relationship between pesticide concentrations in water and monthly application amounts at STTD and a slightly stronger positive relationship at SHR. I’m not sure that monthly application will be a good predictor. Let’s take a look at whether using the prior month’s application amount makes a difference.

df_water %>% 
  ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Water, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Monthly Total Application in prior month (lbs/sq mile)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

df_water %>% 
  ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Water)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Monthly Total Application in prior month (lbs/sq mile)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

Hmm, not really. The relationship may be slightly better at SHR using the prior month’s application amounts.

Histograms

Lastly, let’s take a look at histograms of total pesticide concentrations in water to look at its distribution. We’ll look at each station separately and then all data grouped together.

df_water %>% 
  ggplot(aes(x = TotalConc_Water)) +
  geom_histogram(bins = 10) +
  facet_wrap(vars(Station), scales = "free") +
  xlab("Total pesticide concentration in water (ng/L)")

df_water %>% 
  ggplot(aes(x = TotalConc_Water)) +
  geom_histogram() +
  xlab("Total pesticide concentration in water (ng/L)")

We’ll probably need to use transformed water concentration data for the analyses.

Analyses

Models with Flow and Application

We’ll create linear models of the water concentrations as a function of daily average net flow and monthly pesticide application amounts for each station separately. We’ll start with the original concentration values, but we may need to transform them. These models will allow us to examine:

  1. Do total pesticide concentrations in water significantly respond to flow at either or both SHR and STTD?
  2. Do total pesticide concentrations in water significantly respond to pesticide application at either or both SHR and STTD?

SHR

We’ll start with SHR

df_water_shr <- df_water %>% filter(Station == "SHR")

m_water_q_appl_shr <- df_water_shr %>% 
  lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_water_shr %>% plot_model_diag(
  TotalConc_Water, 
  "ng/L",
  m_water_q_appl_shr, 
  bins = 8
)

shapiro.test(m_water_q_appl_shr$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_shr$residuals
## W = 0.93721, p-value = 0.5529

The diagnostics look pretty good with this model, but let’s see if they look better with log transformed values.

m_water_q_appl_shr_log <- df_water_shr %>% 
  lm(TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_water_shr %>% plot_model_diag(
  TotalConc_Water_log, 
  "ng/L", 
  m_water_q_appl_shr_log, 
  trans = "log", 
  bins = 8
)

shapiro.test(m_water_q_appl_shr_log$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_shr_log$residuals
## W = 0.96314, p-value = 0.8305

The diagnostics look better with this model, so we’ll take a closer look at it.

summary(m_water_q_appl_shr_log)
## 
## Call:
## lm(formula = TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50111 -0.08524 -0.02129  0.15153  0.43396 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)     2.334e+00  5.277e-01   4.423  0.00446 **
## DailyAvgNetFlow 1.632e-04  2.827e-05   5.772  0.00118 **
## TotalApplStd    4.076e-02  1.416e-02   2.878  0.02815 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3062 on 6 degrees of freedom
## Multiple R-squared:  0.879,  Adjusted R-squared:  0.8387 
## F-statistic:  21.8 on 2 and 6 DF,  p-value: 0.00177
plt_m_water_q_shr <- visreg(m_water_q_appl_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_shr <- visreg(m_water_q_appl_shr_log, xvar = "TotalApplStd", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_shr <- plt_m_water_q_shr + plt_m_water_appl_shr
plt_m_water_shr

Total pesticide concentrations increase significantly with both flow and monthly pesticide application at the SHR station; however, the relationship with monthly pesticide application is driven by two values with the highest application.

Prior month application

Let’s see if the model is a better fit when we use the prior month’s application amount.

m_water_q_appl_pr_shr <- df_water_shr %>%
  lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_water_shr %>% plot_model_diag(
  TotalConc_Water, 
  "ng/L", 
  m_water_q_appl_pr_shr, 
  bins = 8
)

shapiro.test(m_water_q_appl_pr_shr$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_pr_shr$residuals
## W = 0.98793, p-value = 0.9926

The model diagnostics look really good with the original values, so we’ll take a closer look at this model.

summary(m_water_q_appl_pr_shr)
## 
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -126.095  -49.937    1.512   41.715  126.284 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)       -3.270e+02  1.442e+02  -2.267  0.06395 . 
## DailyAvgNetFlow    3.073e-02  7.887e-03   3.897  0.00802 **
## TotalApplStd_lag1  5.716e+00  1.937e+00   2.950  0.02561 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 84.09 on 6 degrees of freedom
## Multiple R-squared:  0.8292, Adjusted R-squared:  0.7723 
## F-statistic: 14.57 on 2 and 6 DF,  p-value: 0.004982
plt_m_water_q_pr_shr <- visreg(m_water_q_appl_pr_shr, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_pr_shr <- visreg(m_water_q_appl_pr_shr, xvar = "TotalApplStd_lag1", gg = TRUE) +
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_pr_shr <- plt_m_water_q_pr_shr + plt_m_water_appl_pr_shr
plt_m_water_pr_shr

Again, total pesticide concentrations increase significantly with flow and monthly pesticide application at the SHR station using the prior month’s application. The relationship with monthly pesticide application isn’t as susceptible to leverage by values with the highest application.

Model selection

Let’s take a look at AIC and BIC values to see which model is better.

AIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr)
##                        df        AIC
## m_water_q_appl_shr_log  4   8.586709
## m_water_q_appl_pr_shr   4 109.665709
BIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr)
##                        df        BIC
## m_water_q_appl_shr_log  4   9.375607
## m_water_q_appl_pr_shr   4 110.454607

Both AIC and BIC strongly prefer the model with the current month’s pesticide application. However, both models aren’t using the same transformation for the response variable. I wonder if this changes if we use log-transformed values for the model with prior month’s application.

m_water_q_appl_pr_shr_log <- df_water_shr %>% 
  lm(TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_water_shr %>% plot_model_diag(
  TotalConc_Water_log, 
  "ng/L",
  m_water_q_appl_pr_shr_log, 
  trans = "log", 
  bins = 8
)

shapiro.test(m_water_q_appl_pr_shr_log$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_pr_shr_log$residuals
## W = 0.93963, p-value = 0.5781

The model diagnostics aren’t as good with the log-transformed values. We’ll take a closer look at this model to see how it compares.

summary(m_water_q_appl_pr_shr_log)
## 
## Call:
## lm(formula = TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40074 -0.05306 -0.03911  0.12387  0.26372 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.472e+00  4.053e-01   6.100 0.000885 ***
## DailyAvgNetFlow   1.504e-04  2.216e-05   6.786 0.000501 ***
## TotalApplStd_lag1 2.308e-02  5.443e-03   4.241 0.005435 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2362 on 6 degrees of freedom
## Multiple R-squared:  0.928,  Adjusted R-squared:  0.904 
## F-statistic: 38.65 on 2 and 6 DF,  p-value: 0.0003736
plt_m_water_q_pr_shr2 <- visreg(m_water_q_appl_pr_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_pr_shr2 <- visreg(m_water_q_appl_pr_shr_log, xvar = "TotalApplStd_lag1", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_pr_shr2 <- plt_m_water_q_pr_shr2 + plt_m_water_appl_pr_shr2

The results are generally the same. Let’s take a look at AIC and BIC values using both models with log-transformed values.

AIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr_log)
##                           df      AIC
## m_water_q_appl_shr_log     4 8.586709
## m_water_q_appl_pr_shr_log  4 3.920041
BIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr_log)
##                           df      BIC
## m_water_q_appl_shr_log     4 9.375607
## m_water_q_appl_pr_shr_log  4 4.708940

Both AIC and BIC now prefer the model with the prior month’s pesticide application, but the difference isn’t as large. Since the diagnostics looked better for the model with the prior month’s pesticide application using the original values, let’s use the prior month model with original values for the model selection procedure. In this case, the best model is the one with the current month’s pesticide application.

STTD

Next, we’ll look at STTD

df_water_sttd <- df_water %>% filter(Station == "STTD")

m_water_q_appl_sttd <- df_water_sttd %>% 
  lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_water_sttd %>% plot_model_diag(
  TotalConc_Water, 
  "ng/L", 
  m_water_q_appl_sttd, 
  bins = 8
)

shapiro.test(m_water_q_appl_sttd$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_sttd$residuals
## W = 0.95545, p-value = 0.7789

The diagnostics look pretty good with the original values, so let’s take a closer look at this model.

summary(m_water_q_appl_sttd)
## 
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##       1       2       3       4       5       6       7 
##  -10.56   11.26  143.49 -149.01  226.58  -11.64 -210.12 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     613.1212   121.4666   5.048 0.007243 ** 
## DailyAvgNetFlow   2.6242     0.2426  10.816 0.000415 ***
## TotalApplStd     -4.9151     6.9025  -0.712 0.515747    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 186.2 on 4 degrees of freedom
## Multiple R-squared:   0.97,  Adjusted R-squared:  0.9549 
## F-statistic: 64.59 on 2 and 4 DF,  p-value: 0.000902
plt_m_water_q_sttd <- visreg(m_water_q_appl_sttd, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_sttd <- visreg(m_water_q_appl_sttd, xvar = "TotalApplStd", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_sttd <- plt_m_water_q_sttd + plt_m_water_appl_sttd
plt_m_water_sttd

Total pesticide concentrations increase significantly with flow at the STTD station, but there are two values with flows of about 600 cfs that have a lot leverage on the model. The relationship between total pesticide concentration and monthly pesticide application wasn’t significant at STTD.

Prior month application

Let’s see if the model is a better fit when we use the prior month’s application amount.

m_water_q_appl_pr_sttd <- df_water_sttd %>% 
  lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_water_sttd %>% plot_model_diag(
  TotalConc_Water, 
  "ng/L", 
  m_water_q_appl_pr_sttd, 
  bins = 8
)

shapiro.test(m_water_q_appl_pr_sttd$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_pr_sttd$residuals
## W = 0.95422, p-value = 0.7679

The model diagnostics look really good with the original values, so we’ll take a closer look at this model.

summary(m_water_q_appl_pr_sttd)
## 
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##        1        2        3        4        5        6        7 
##   -9.275   11.173  144.338 -149.681  224.672  -10.709 -210.517 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       609.5491   116.9936   5.210 0.006471 ** 
## DailyAvgNetFlow     2.6653     0.2349  11.348 0.000344 ***
## TotalApplStd_lag1  -2.0360     2.8321  -0.719 0.511962    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 186 on 4 degrees of freedom
## Multiple R-squared:   0.97,  Adjusted R-squared:  0.955 
## F-statistic: 64.74 on 2 and 4 DF,  p-value: 0.0008981
plt_m_water_q_pr_sttd <- visreg(m_water_q_appl_pr_sttd, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_pr_sttd <- visreg(m_water_q_appl_pr_sttd, xvar = "TotalApplStd_lag1", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_pr_sttd <- plt_m_water_q_pr_sttd + plt_m_water_appl_pr_sttd
plt_m_water_pr_sttd

The results are the same with this model.

Model selection

Let’s take a look at AIC and BIC values to see which model is better.

AIC(m_water_q_appl_sttd, m_water_q_appl_pr_sttd)
##                        df      AIC
## m_water_q_appl_sttd     4 97.12213
## m_water_q_appl_pr_sttd  4 97.10699
BIC(m_water_q_appl_sttd, m_water_q_appl_pr_sttd)
##                        df      BIC
## m_water_q_appl_sttd     4 96.90577
## m_water_q_appl_pr_sttd  4 96.89063

The AIC and BIC values are nearly identical between models with both metrics giving a slight edge to the model with the prior month’s pesticide application. However, monthly pesticide application isn’t significant in either model.

Station comparison

Next, let’s compare total pesticide concentrations in water between the two stations, SHR and STTD. Since samples were only collected at both stations in 2019, we can run a simple t-test to compare them. The histogram of all water concentrations above showed that they are not normally-distributed. Let’s take a look at the natural log transformed values to see if we can use these for the t-test.

df_water %>% 
  ggplot(aes(x = TotalConc_Water_log)) +
  geom_histogram(bins = 8) +
  xlab("Total pesticide concentration in water log(ng/L)")

df_water %>% 
  ggplot(aes(x = Station, y = TotalConc_Water_log)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.3) +
  ylab("Total pesticide concentration in water log(ng/L)")

df_water %>% 
  ggplot(aes(sample = TotalConc_Water_log)) + 
  labs(
    x = "Normal Quantiles", 
    y = "Total pesticide concentration in water log(ng/L)"
  ) + 
  stat_qq() + 
  stat_qq_line(linewidth = 1, color = 'red') + 
  theme_bw()

shapiro.test(df_water$TotalConc_Water_log)
## 
##  Shapiro-Wilk normality test
## 
## data:  df_water$TotalConc_Water_log
## W = 0.93391, p-value = 0.2809

The natural log transformed values seem to be more normally distributed and have relatively equal variances between SHR and STTD. Let’s run Welch’s two-sample t-test to compare the means of total pesticide concentration in water between SHR and STTD.

df_water %>% t.test(TotalConc_Water_log ~ Station, data = .)
## 
##  Welch Two Sample t-test
## 
## data:  TotalConc_Water_log by Station
## t = -2.1709, df = 11.742, p-value = 0.05119
## alternative hypothesis: true difference in means between group SHR and group STTD is not equal to 0
## 95 percent confidence interval:
##  -1.858996839  0.005652187
## sample estimates:
##  mean in group SHR mean in group STTD 
##           5.545113           6.471786

The mean total pesticide concentration in water was marginally significantly higher (p = 0.052) at STTD than SHR during 2019.

Overall Conclusions

Zooplankton

Total pesticide concentrations in Zooplankton:

  • Total pesticide concentrations decreased significantly with flow (p = 0.00013) and monthly pesticide application (p = 0.00020) at the SHR station; however, the relationship with monthly pesticide application was driven by four values with the highest application.
## 
## Call:
## lm(formula = TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59245 -0.12635 -0.02903  0.17388  0.49120 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      7.364e+00  3.196e-01  23.041 8.26e-15 ***
## DailyAvgNetFlow -8.287e-05  1.712e-05  -4.841 0.000131 ***
## TotalApplStd    -4.318e-02  9.271e-03  -4.657 0.000196 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2726 on 18 degrees of freedom
## Multiple R-squared:  0.7004, Adjusted R-squared:  0.6671 
## F-statistic: 21.04 on 2 and 18 DF,  p-value: 1.944e-05

  • Total pesticide concentrations increased significantly with flow at the STTD station (p = 0.00047), but there was no relationship between concentration and monthly pesticide application (p = 0.78).
## 
## Call:
## lm(formula = TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.1552  -3.4770  -0.8651   4.8538  12.0845 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     13.760969   1.920517   7.165 1.58e-06 ***
## DailyAvgNetFlow  0.025096   0.005822   4.311 0.000474 ***
## TotalApplStd    -0.021261   0.074230  -0.286 0.778020    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.754 on 17 degrees of freedom
## Multiple R-squared:  0.5549, Adjusted R-squared:  0.5025 
## F-statistic:  10.6 on 2 and 17 DF,  p-value: 0.001028

  • Model selection using AIC and BIC preferred the models using the current month’s pesticide application amounts for both SHR and STTD.
  • Total pesticide concentrations didn’t differ significantly between SHR and STTD when controlling for the year when the samples were collected; however, SHR had a higher median concentration than STTD, and the p-value for their pairwise difference was 0.083.
## Anova Table (Type II tests)
## 
## Response: TotalConc_Zoop_log
##            Sum Sq Df F value  Pr(>F)  
## Station    1.8028  1  3.1690 0.08326 .
## Year       3.8276  2  3.3641 0.04546 *
## Residuals 21.0492 37                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##  contrast   estimate    SE df t.ratio p.value
##  SHR - STTD    0.424 0.238 37   1.780  0.0833
## 
## Results are averaged over the levels of: Year

  • Zooplankton concentrations were marginally significantly higher (p = 0.053) in 2019 than 2017 across both stations.
##  contrast            estimate    SE df t.ratio p.value
##  Year2017 - Year2018  -0.6700 0.304 37  -2.201  0.0842
##  Year2017 - Year2019  -0.7474 0.309 37  -2.420  0.0525
##  Year2018 - Year2019  -0.0774 0.274 37  -0.283  0.9570
## 
## Results are averaged over the levels of: Station 
## P value adjustment: tukey method for comparing a family of 3 estimates

Water

Total pesticide concentrations in Water:

  • Total pesticide concentrations increased significantly with both flow (p = 0.0012) and monthly pesticide application (p = 0.028) at the SHR station; however, the relationship with monthly pesticide application is driven by two values with the highest application.
## 
## Call:
## lm(formula = TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50111 -0.08524 -0.02129  0.15153  0.43396 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)     2.334e+00  5.277e-01   4.423  0.00446 **
## DailyAvgNetFlow 1.632e-04  2.827e-05   5.772  0.00118 **
## TotalApplStd    4.076e-02  1.416e-02   2.878  0.02815 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3062 on 6 degrees of freedom
## Multiple R-squared:  0.879,  Adjusted R-squared:  0.8387 
## F-statistic:  21.8 on 2 and 6 DF,  p-value: 0.00177

  • Total pesticide concentrations also increased significantly with flow (p = 0.00042) at the STTD station, but there were two values with flows of about 600 cfs that had a lot leverage on the model. The relationship between total pesticide concentration and monthly pesticide application wasn’t significant at STTD (p = 0.52).
## 
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##       1       2       3       4       5       6       7 
##  -10.56   11.26  143.49 -149.01  226.58  -11.64 -210.12 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     613.1212   121.4666   5.048 0.007243 ** 
## DailyAvgNetFlow   2.6242     0.2426  10.816 0.000415 ***
## TotalApplStd     -4.9151     6.9025  -0.712 0.515747    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 186.2 on 4 degrees of freedom
## Multiple R-squared:   0.97,  Adjusted R-squared:  0.9549 
## F-statistic: 64.59 on 2 and 4 DF,  p-value: 0.000902

  • Both models had a fairly low sample size to work with at 9 and 7 for SHR and STTD, respectively.
  • The mean total pesticide concentration in water was marginally significantly higher (p = 0.052) at STTD than SHR during 2019.
## 
##  Welch Two Sample t-test
## 
## data:  TotalConc_Water_log by Station
## t = -2.1709, df = 11.742, p-value = 0.05119
## alternative hypothesis: true difference in means between group SHR and group STTD is not equal to 0
## 95 percent confidence interval:
##  -1.858996839  0.005652187
## sample estimates:
##  mean in group SHR mean in group STTD 
##           5.545113           6.471786