Purpose

Create exploratory figures and run statistical analyses for the total pesticide concentration data in water and zooplankton collected from the Sacramento River (SHR) and Toe Drain (STTD) in the summer-fall periods in 2017, 2018 and 2019. These analyses are included in the NDFS contaminants manuscript.

Global code and functions

# Load packages
library(tidyverse)
library(rlang)
library(knitr)
library(patchwork)
library(car)
library(emmeans)
library(visreg)
library(here)
library(conflicted)

# Declare package conflict preferences 
conflicts_prefer(dplyr::filter(), dplyr::lag())

# Check if we are in the correct working directory
i_am("manuscript_contam/notebooks/explore_contam_conc_water_zoop.Rmd")

## here() starts at C:/Repositories/04_IEP_Org/ND-FASTR

Display current versions of R and packages used for this analysis:

devtools::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.3 (2023-03-15 ucrt)
##  os       Windows 10 x64 (build 19044)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_United States.utf8
##  ctype    English_United States.utf8
##  tz       America/Los_Angeles
##  date     2023-09-15
##  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version  date (UTC) lib source
##  abind          1.4-5    2016-07-21 [1] CRAN (R 4.2.0)
##  bslib          0.4.2    2022-12-16 [1] CRAN (R 4.2.2)
##  cachem         1.0.8    2023-05-01 [1] CRAN (R 4.2.3)
##  callr          3.7.3    2022-11-02 [1] CRAN (R 4.2.2)
##  car          * 3.1-2    2023-03-30 [1] CRAN (R 4.2.3)
##  carData      * 3.0-5    2022-01-06 [1] CRAN (R 4.2.1)
##  cli            3.6.1    2023-03-23 [1] CRAN (R 4.2.3)
##  coda           0.19-4   2020-09-30 [1] CRAN (R 4.2.1)
##  codetools      0.2-19   2023-02-01 [2] CRAN (R 4.2.3)
##  colorspace     2.1-0    2023-01-23 [1] CRAN (R 4.2.2)
##  conflicted   * 1.2.0    2023-02-01 [1] CRAN (R 4.2.2)
##  crayon         1.5.2    2022-09-29 [1] CRAN (R 4.2.1)
##  devtools       2.4.5    2022-10-11 [1] CRAN (R 4.2.1)
##  digest         0.6.31   2022-12-11 [1] CRAN (R 4.2.2)
##  dplyr        * 1.1.2    2023-04-20 [1] CRAN (R 4.2.3)
##  ellipsis       0.3.2    2021-04-29 [1] CRAN (R 4.2.1)
##  emmeans      * 1.8.5    2023-03-08 [1] CRAN (R 4.2.2)
##  estimability   1.4.1    2022-08-05 [1] CRAN (R 4.2.1)
##  evaluate       0.21     2023-05-05 [1] CRAN (R 4.2.3)
##  fansi          1.0.4    2023-01-22 [1] CRAN (R 4.2.2)
##  fastmap        1.1.1    2023-02-24 [1] CRAN (R 4.2.2)
##  forcats      * 1.0.0    2023-01-29 [1] CRAN (R 4.2.2)
##  fs             1.6.2    2023-04-25 [1] CRAN (R 4.2.3)
##  generics       0.1.3    2022-07-05 [1] CRAN (R 4.2.1)
##  ggplot2      * 3.4.2    2023-04-03 [1] CRAN (R 4.2.3)
##  glue           1.6.2    2022-02-24 [1] CRAN (R 4.2.1)
##  gtable         0.3.3    2023-03-21 [1] CRAN (R 4.2.3)
##  here         * 1.0.1    2020-12-13 [1] CRAN (R 4.2.1)
##  hms            1.1.3    2023-03-21 [1] CRAN (R 4.2.3)
##  htmltools      0.5.5    2023-03-23 [1] CRAN (R 4.2.3)
##  htmlwidgets    1.6.2    2023-03-17 [1] CRAN (R 4.2.3)
##  httpuv         1.6.9    2023-02-14 [1] CRAN (R 4.2.2)
##  jquerylib      0.1.4    2021-04-26 [1] CRAN (R 4.2.1)
##  jsonlite       1.8.4    2022-12-06 [1] CRAN (R 4.2.2)
##  knitr        * 1.42     2023-01-25 [1] CRAN (R 4.2.2)
##  later          1.3.0    2021-08-18 [1] CRAN (R 4.2.1)
##  lattice        0.21-8   2023-04-05 [1] CRAN (R 4.2.3)
##  lifecycle      1.0.3    2022-10-07 [1] CRAN (R 4.2.1)
##  lubridate    * 1.9.2    2023-02-10 [1] CRAN (R 4.2.2)
##  magrittr       2.0.3    2022-03-30 [1] CRAN (R 4.2.1)
##  MASS           7.3-58.2 2023-01-23 [2] CRAN (R 4.2.3)
##  Matrix         1.5-4    2023-04-04 [1] CRAN (R 4.2.3)
##  memoise        2.0.1    2021-11-26 [1] CRAN (R 4.2.1)
##  mime           0.12     2021-09-28 [1] CRAN (R 4.2.0)
##  miniUI         0.1.1.1  2018-05-18 [1] CRAN (R 4.2.1)
##  multcomp       1.4-23   2023-03-09 [1] CRAN (R 4.2.2)
##  munsell        0.5.0    2018-06-12 [1] CRAN (R 4.2.1)
##  mvtnorm        1.1-3    2021-10-08 [1] CRAN (R 4.2.0)
##  patchwork    * 1.1.2    2022-08-19 [1] CRAN (R 4.2.1)
##  pillar         1.9.0    2023-03-22 [1] CRAN (R 4.2.3)
##  pkgbuild       1.4.0    2022-11-27 [1] CRAN (R 4.2.2)
##  pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 4.2.1)
##  pkgload        1.3.2    2022-11-16 [1] CRAN (R 4.2.2)
##  prettyunits    1.1.1    2020-01-24 [1] CRAN (R 4.2.1)
##  processx       3.8.1    2023-04-18 [1] CRAN (R 4.2.3)
##  profvis        0.3.7    2020-11-02 [1] CRAN (R 4.2.1)
##  promises       1.2.0.1  2021-02-11 [1] CRAN (R 4.2.1)
##  ps             1.7.5    2023-04-18 [1] CRAN (R 4.2.3)
##  purrr        * 1.0.1    2023-01-10 [1] CRAN (R 4.2.2)
##  R6             2.5.1    2021-08-19 [1] CRAN (R 4.2.1)
##  Rcpp           1.0.10   2023-01-22 [1] CRAN (R 4.2.2)
##  readr        * 2.1.4    2023-02-10 [1] CRAN (R 4.2.2)
##  remotes        2.4.2    2021-11-30 [1] CRAN (R 4.2.1)
##  rlang        * 1.1.1    2023-04-28 [1] CRAN (R 4.2.3)
##  rmarkdown      2.21     2023-03-26 [1] CRAN (R 4.2.3)
##  rprojroot      2.0.3    2022-04-02 [1] CRAN (R 4.2.1)
##  rstudioapi     0.14     2022-08-22 [1] CRAN (R 4.2.1)
##  sandwich       3.0-2    2022-06-15 [1] CRAN (R 4.2.1)
##  sass           0.4.6    2023-05-03 [1] CRAN (R 4.2.3)
##  scales         1.2.1    2022-08-20 [1] CRAN (R 4.2.1)
##  sessioninfo    1.2.2    2021-12-06 [1] CRAN (R 4.2.1)
##  shiny          1.7.4    2022-12-15 [1] CRAN (R 4.2.2)
##  stringi        1.7.12   2023-01-11 [1] CRAN (R 4.2.2)
##  stringr      * 1.5.0    2022-12-02 [1] CRAN (R 4.2.2)
##  survival       3.5-5    2023-03-12 [1] CRAN (R 4.2.3)
##  TH.data        1.1-2    2023-04-17 [1] CRAN (R 4.2.3)
##  tibble       * 3.2.1    2023-03-20 [1] CRAN (R 4.2.3)
##  tidyr        * 1.3.0    2023-01-24 [1] CRAN (R 4.2.2)
##  tidyselect     1.2.0    2022-10-10 [1] CRAN (R 4.2.1)
##  tidyverse    * 2.0.0    2023-02-22 [1] CRAN (R 4.2.2)
##  timechange     0.2.0    2023-01-11 [1] CRAN (R 4.2.2)
##  tzdb           0.4.0    2023-05-12 [1] CRAN (R 4.2.3)
##  urlchecker     1.0.1    2021-11-30 [1] CRAN (R 4.2.1)
##  usethis        2.1.6    2022-05-25 [1] CRAN (R 4.2.1)
##  utf8           1.2.3    2023-01-31 [1] CRAN (R 4.2.2)
##  vctrs          0.6.2    2023-04-19 [1] CRAN (R 4.2.3)
##  visreg       * 2.7.0    2020-06-04 [1] CRAN (R 4.2.1)
##  withr          2.5.0    2022-03-03 [1] CRAN (R 4.2.1)
##  xfun           0.39     2023-04-20 [1] CRAN (R 4.2.3)
##  xtable         1.8-4    2019-04-21 [1] CRAN (R 4.2.1)
##  yaml           2.3.7    2023-01-23 [1] CRAN (R 4.2.2)
##  zoo            1.8-12   2023-04-13 [1] CRAN (R 4.2.3)
## 
##  [1] C:/R/win-library/4.2
##  [2] C:/Program Files/R/R-4.2.3/library
## 
## ──────────────────────────────────────────────────────────────────────────────

Create function to plot model diagnostics:

# Create model diagnostic plots to check assumptions
plot_model_diag <- function(df_data, param_var, unit_label, model, trans = c("none", "log", "sqrt"), ...) {
  trans <- match.arg(trans)
  
  df_data <- df_data %>%
    mutate(
      Residuals = resid(model),
      Fitted = predict(model)
    )
  
  param_name <- as_name(ensym(param_var))
  
  if (trans == "log") {
    unit_label <- paste0("log[", unit_label, "]")
  } else if (trans == "sqrt") {
    unit_label <- paste0("sqrt[", unit_label, "]")
  }
  
  plt_hist <- ggplot(df_data, aes(x = Residuals)) +
    geom_histogram(...) +
    labs(
      title = "Residual Histogram", 
      x = "Residuals"
    ) +
    theme_bw()
  
  plt_qq <- ggplot(df_data, aes(sample = Residuals)) + 
    labs(
      title = "Residual Probability Plot", 
      x = "Normal Quantiles", 
      y = "Residuals"
    ) + 
    stat_qq() + 
    stat_qq_line(linewidth = 1, color = 'red') + 
    theme_bw()
  
  plt_res_fit <- ggplot(df_data, aes(x = Fitted, y = Residuals)) +
    geom_point() +
    labs(
      title = "Residuals vs. Fitted Values", 
      x = (paste0("Predicted ", param_name, " (", unit_label, ")")),
      y = paste0("Residuals (", unit_label, ")")
    ) +
    theme_bw()
  
  plt_obs_fit <- ggplot(df_data, aes(x = Fitted, y = {{ param_var}})) +
    geom_point() +
    geom_abline(slope = 1, intercept = 0, color = "red") +
    labs(
      title = "Observed vs. Fitted Values", 
      x = paste0("Predicted ", param_name, " (", unit_label, ")"),
      y = paste0("Observed ", param_name, " (", unit_label, ")")
    ) +
    theme_bw()
  
  (plt_hist + plt_qq) / (plt_res_fit + plt_obs_fit)
}

Import and Prepare Data

# Define root directory for pesticide data:
fp_pest <- here("manuscript_contam/data/processed")

# Total Pesticide Concentration data:
df_conc_all <- read_rds(file.path(fp_pest, "contam_conc_water_zoop_2017-2019.rds"))

# Pesticide application data:
df_appl <- read_rds(file.path(fp_pest, "pesticide_use_daily_tot_2017-2020.rds"))

# Daily average flow data:
df_davg_flow <- read_rds(here("Water_Quality/Data_Processed/LIS_SR_DailyAvgFlow_2013-2022.rds"))

Prepare the pesticide application data to be joined to the concentration data by calculating monthly totals and standardizing to area of the watershed.

df_appl_c <- df_appl %>% 
  # Calculate monthly totals by region
  summarize(TotalAppl = sum(TotalApplication), .by = c(Region, Year, Month)) %>% 
  mutate(
    # Standardize total application to area (in square miles) of the watershed 
    TotalApplStd = if_else(
      Region == "Sacramento River", 
      TotalAppl/22526.79948,
      TotalAppl/4217.503529
    ),
    # Define station that each region is assigned to
    Station = case_match(
      Region,
      "Toe Drain" ~ "STTD", 
      "Sacramento River" ~ "SHR"
    )
  ) %>% 
  # Lag standardized total application data by one month
  arrange(Region, Year, Month) %>% 
  group_by(Region) %>% 
  mutate(TotalApplStd_lag1 = lag(TotalApplStd, n = 1)) %>% 
  ungroup() %>% 
  # Clean up dataframe
  select(Station, Year, Month, TotalApplStd, TotalApplStd_lag1)

Prepare the flow data to be joined to the concentration data using SR at Freeport flow data for SHR and LIS flow data for STTD. Use Daily Average Net flows for both.

df_davg_flow_c <- df_davg_flow %>% 
  select(-LIS_DailyAvgInstFlow) %>% 
  pivot_longer(
    cols = ends_with("NetFlow"),
    names_to = "Station",
    values_to = "DailyAvgNetFlow"
  ) %>% 
  mutate(
    Station = case_match(
      Station,
      "SR_DailyAvgNetFlow" ~ "SHR",
      "LIS_DailyAvgNetFlow" ~ "STTD"
    )
  )

Join pesticide concentration, application, and flow data together.

df_conc_all_c1 <- df_conc_all %>% 
  mutate(
    Year = year(Date),
    Month = month(Date, label = TRUE)
  ) %>% 
  left_join(df_appl_c, by = join_by(Station, Year, Month)) %>% 
  left_join(df_davg_flow_c, by = join_by(Date, Station))

# Look at all data
df_conc_all_c1 %>% 
  select(
    Station,
    Date,
    Year,
    starts_with("TotalConc"),
    DailyAvgNetFlow,
    starts_with("TotalAppl")
  ) %>% 
  kable()

Station	Date	Year	TotalConc_Zoop	TotalConc_Water	DailyAvgNetFlow	TotalApplStd	TotalApplStd_lag1
SHR	2017-08-29	2017	227.19872	389.42531	21200.00	3.379151	18.641841
SHR	2017-09-12	2017	174.19088	396.64632	22100.00	1.440834	3.379151
SHR	2017-09-19	2017	137.99606	147.01557	21500.00	1.440834	3.379151
SHR	2017-10-03	2017	316.69620	80.32957	19400.00	2.573115	1.440834
SHR	2017-10-17	2017	427.37944	47.92240	11700.00	2.573115	1.440834
SHR	2017-11-01	2017	265.57107	97.29594	10400.00	2.758317	2.573115
SHR	2018-07-12	2018	218.84885	188.76610	15700.00	17.692933	37.851786
SHR	2018-07-25	2018	164.99273	262.27653	16500.00	17.692933	37.851786
SHR	2018-08-09	2018	413.08422	668.97693	20200.00	3.639683	17.692933
SHR	2018-08-23	2018	302.75363	564.30313	17500.00	3.639683	17.692933
SHR	2018-09-06	2018	324.04838	317.93755	18000.00	1.435832	3.639683
SHR	2018-09-20	2018	270.17804	151.46644	16500.00	1.435832	3.639683
SHR	2018-10-11	2018	651.37113	98.85206	11600.00	2.469192	1.435832
SHR	2018-10-26	2018	0.00000	75.62678	9060.00	2.469192	1.435832
SHR	2018-11-09	2018	359.01012	198.17201	7850.00	2.253170	2.469192
SHR	2019-05-21	2019	127.90884	445.48078	43200.00	15.849626	5.706071
SHR	2019-06-04	2019	103.08011	215.69136	44500.00	38.770625	15.849626
SHR	2019-06-18	2019	419.33946	505.85984	23500.00	38.770625	15.849626
SHR	2019-07-02	2019	129.36941	507.56875	19300.00	19.143787	38.770625
SHR	2019-07-16	2019	163.66643	374.25063	17300.00	19.143787	38.770625
SHR	2019-08-05	2019	322.10233	337.91232	19700.00	3.012142	19.143787
SHR	2019-08-19	2019	386.21467	554.08375	21000.00	3.012142	19.143787
SHR	2019-09-05	2019	219.43424	423.41095	22500.00	1.578017	3.012142
SHR	2019-09-16	2019	246.22101	331.93137	21400.00	1.578017	3.012142
SHR	2019-09-30	2019	418.09428	148.04025	19000.00	1.578017	3.012142
SHR	2019-10-14	2019	445.06218	70.76964	12600.00	1.125018	1.578017
SHR	2019-10-28	2019	690.50734	90.12275	11500.00	1.125018	1.578017
SHR	2019-11-13	2019	378.31172	115.52704	10100.00	2.724691	1.125018
SHR	2019-11-25	2019	405.88485	186.34375	11000.00	2.724691	1.125018
SHR	2019-12-10	2019	323.05106	558.23809	23700.00	1.373379	2.724691
SHR	2019-12-27	2019	326.21796	253.96279	19500.00	1.373379	2.724691
STTD	2017-08-30	2017	75.15536		7.64	21.114730	69.159300
STTD	2017-09-13	2017	26.92852		108.00	5.969721	21.114730
STTD	2017-09-20	2017	73.05093		-75.10	5.969721	21.114730
STTD	2017-10-04	2017	269.00064		-82.80	6.902682	5.969721
STTD	2017-10-19	2017	62.22037		-47.80	6.902682	5.969721
STTD	2017-11-02	2017	285.62279		-29.70	12.922356	6.902682
STTD	2018-07-11	2018	165.58977		-70.50	66.033924	162.231117
STTD	2018-07-26	2018	100.06627		-108.00	66.033924	162.231117
STTD	2018-08-08	2018	64.42045		-70.30	21.975841	66.033924
STTD	2018-08-22	2018	212.96740		-73.70	21.975841	66.033924
STTD	2018-08-30	2018	686.83756		305.00	21.975841	66.033924
STTD	2018-09-05	2018	1170.56494		337.00	4.200371	21.975841
STTD	2018-09-19	2018	797.15646		387.00	4.200371	21.975841
STTD	2018-10-10	2018	83.18350		-93.40	9.863104	4.200371
STTD	2018-10-25	2018	99.33193		-12.00	9.863104	4.200371
STTD	2018-11-08	2018	213.58742		-22.40	13.370382	9.863104
STTD	2019-05-20	2019	919.34468	362.10000	1270.00	85.276308	39.939688
STTD	2019-06-03	2019	903.77514	1467.63536	825.00	152.302865	85.276308
STTD	2019-06-17	2019	332.82856	899.79820	63.30	152.302865	85.276308
STTD	2019-07-01	2019	230.40000	607.28115		67.332801	152.302865
STTD	2019-07-15	2019	125.70000	532.99378		67.332801	152.302865
STTD	2019-08-07	2019	0.00000	347.24398	-43.50	28.719601	67.332801
STTD	2019-08-21	2019	89.00000	457.27127	-9.89	28.719601	67.332801
STTD	2019-09-04	2019	641.20000	2289.24197	598.00	7.454874	28.719601
STTD	2019-09-18	2019	548.00000	2093.83725	635.00	7.454874	28.719601
STTD	2019-10-02	2019	333.90000	766.80342	-19.60	4.366761	7.454874
STTD	2019-10-16	2019	181.70000	347.25295	-88.70	4.366761	7.454874
STTD	2019-10-28	2019	315.00000	233.26624	-56.50	4.366761	7.454874
STTD	2019-11-14	2019	406.10000	164.59963	-113.00	14.037269	4.366761
STTD	2019-11-26	2019	413.70000	193.53662	6.97	14.037269	4.366761
STTD	2019-12-09	2019	994.50000	1714.24731	300.00	5.704477	14.037269
STTD	2019-12-26	2019	373.20000	577.11950	220.00	5.704477	14.037269

It looks like all of the flow data was available for SHR, but STTD is missing a few flow values in July 2019. We’ll restrict the date ranges to July 1 - Oct 31 for each year since some of the data extends outside of the summer-fall period.

df_conc_all_c2 <- df_conc_all_c1 %>% 
  mutate(Month = as.numeric(Month)) %>%
  filter(Month %in% 7:10)

# Look at filtered data
df_conc_all_c2 %>% 
  select(
    Station,
    Date,
    Year,
    starts_with("TotalConc"),
    DailyAvgNetFlow,
    starts_with("TotalAppl")
  ) %>% 
  kable()

Station	Date	Year	TotalConc_Zoop	TotalConc_Water	DailyAvgNetFlow	TotalApplStd	TotalApplStd_lag1
SHR	2017-08-29	2017	227.19872	389.42531	21200.00	3.379151	18.641841
SHR	2017-09-12	2017	174.19088	396.64632	22100.00	1.440834	3.379151
SHR	2017-09-19	2017	137.99606	147.01557	21500.00	1.440834	3.379151
SHR	2017-10-03	2017	316.69620	80.32957	19400.00	2.573115	1.440834
SHR	2017-10-17	2017	427.37944	47.92240	11700.00	2.573115	1.440834
SHR	2018-07-12	2018	218.84885	188.76610	15700.00	17.692933	37.851786
SHR	2018-07-25	2018	164.99273	262.27653	16500.00	17.692933	37.851786
SHR	2018-08-09	2018	413.08422	668.97693	20200.00	3.639683	17.692933
SHR	2018-08-23	2018	302.75363	564.30313	17500.00	3.639683	17.692933
SHR	2018-09-06	2018	324.04838	317.93755	18000.00	1.435832	3.639683
SHR	2018-09-20	2018	270.17804	151.46644	16500.00	1.435832	3.639683
SHR	2018-10-11	2018	651.37113	98.85206	11600.00	2.469192	1.435832
SHR	2018-10-26	2018	0.00000	75.62678	9060.00	2.469192	1.435832
SHR	2019-07-02	2019	129.36941	507.56875	19300.00	19.143787	38.770625
SHR	2019-07-16	2019	163.66643	374.25063	17300.00	19.143787	38.770625
SHR	2019-08-05	2019	322.10233	337.91232	19700.00	3.012142	19.143787
SHR	2019-08-19	2019	386.21467	554.08375	21000.00	3.012142	19.143787
SHR	2019-09-05	2019	219.43424	423.41095	22500.00	1.578017	3.012142
SHR	2019-09-16	2019	246.22101	331.93137	21400.00	1.578017	3.012142
SHR	2019-09-30	2019	418.09428	148.04025	19000.00	1.578017	3.012142
SHR	2019-10-14	2019	445.06218	70.76964	12600.00	1.125018	1.578017
SHR	2019-10-28	2019	690.50734	90.12275	11500.00	1.125018	1.578017
STTD	2017-08-30	2017	75.15536		7.64	21.114730	69.159300
STTD	2017-09-13	2017	26.92852		108.00	5.969721	21.114730
STTD	2017-09-20	2017	73.05093		-75.10	5.969721	21.114730
STTD	2017-10-04	2017	269.00064		-82.80	6.902682	5.969721
STTD	2017-10-19	2017	62.22037		-47.80	6.902682	5.969721
STTD	2018-07-11	2018	165.58977		-70.50	66.033924	162.231117
STTD	2018-07-26	2018	100.06627		-108.00	66.033924	162.231117
STTD	2018-08-08	2018	64.42045		-70.30	21.975841	66.033924
STTD	2018-08-22	2018	212.96740		-73.70	21.975841	66.033924
STTD	2018-08-30	2018	686.83756		305.00	21.975841	66.033924
STTD	2018-09-05	2018	1170.56494		337.00	4.200371	21.975841
STTD	2018-09-19	2018	797.15646		387.00	4.200371	21.975841
STTD	2018-10-10	2018	83.18350		-93.40	9.863104	4.200371
STTD	2018-10-25	2018	99.33193		-12.00	9.863104	4.200371
STTD	2019-07-01	2019	230.40000	607.28115		67.332801	152.302865
STTD	2019-07-15	2019	125.70000	532.99378		67.332801	152.302865
STTD	2019-08-07	2019	0.00000	347.24398	-43.50	28.719601	67.332801
STTD	2019-08-21	2019	89.00000	457.27127	-9.89	28.719601	67.332801
STTD	2019-09-04	2019	641.20000	2289.24197	598.00	7.454874	28.719601
STTD	2019-09-18	2019	548.00000	2093.83725	635.00	7.454874	28.719601
STTD	2019-10-02	2019	333.90000	766.80342	-19.60	4.366761	7.454874
STTD	2019-10-16	2019	181.70000	347.25295	-88.70	4.366761	7.454874
STTD	2019-10-28	2019	315.00000	233.26624	-56.50	4.366761	7.454874

Zooplankton

# Prepare zooplankton concentration data for analysis
df_zoop <- df_conc_all_c2 %>% 
  select(-TotalConc_Water) %>% 
  drop_na() %>% 
  # Remove two samples with total concentrations equal to zero
  filter(TotalConc_Zoop > 0) %>% 
  mutate(
    # Add log-transformed and sqrt-transformed zooplankton concentrations
    TotalConc_Zoop_log = log(TotalConc_Zoop),
    TotalConc_Zoop_sqrt = sqrt(TotalConc_Zoop),
    # Convert year to character for categorical models
    Year = as.character(Year)
  )

Exploratory Plots

Before running the analyses, lets visualize the data.

Boxplots

First, we’ll look at a boxplot comparing total pesticide concentrations in zooplankton between stations.

df_zoop %>% 
  ggplot(aes(x = Station, y = TotalConc_Zoop)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(aes(color = Year), width = 0.3) +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  ylab("Total pesticide concentration in zooplankton (ng/g)")

SHR has a higher median concentration than STTD, but STTD has some higher values than SHR particularly in 2018. Let’s break apart the boxplot by station and year.

df_zoop %>% 
  ggplot(aes(x = Year, y = TotalConc_Zoop, fill = Year)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.2) +
  facet_wrap(vars(Station)) +
  scale_fill_viridis_d(option = "plasma", end = 0.8) +
  ylab("Total pesticide concentration in zooplankton (ng/g)")

Overall, STTD in 2018 had the highest variation in concentrations. Let’s look at a boxplot grouped only by year.

df_zoop %>% 
  ggplot(aes(x = Year, y = TotalConc_Zoop)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.3) +
  ylab("Total pesticide concentration in zooplankton (ng/g)")

The median of total concentrations was slightly higher in 2019 than other years when both stations are grouped together.

Seasonal time-series

Now, let’s see how total concentrations varied seasonally using day of year for each station and year.

df_zoop %>% 
  ggplot(aes(x = yday(Date), y = TotalConc_Zoop, color = Station)) +
  geom_point() +
  facet_grid(rows = vars(Year)) +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  labs(x = "DOY", y = "Total pesticide concentration in zooplankton (ng/g)")

It almost looks like SHR and STTD have opposite seasonal trends.

Concentration vs Flow

Now, let’s take a look at how the total pesticide concentrations in zooplankton vary with daily average net flow for each station.

df_zoop %>% 
  ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Zoop, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Daily average net flow (cfs)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

df_zoop %>% 
  ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Zoop, color = Year)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Daily average net flow (cfs)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

SHR seems to have a negative relationship between concentration and flow, while concentrations at STTD seem to have a positive relationship with flow.

Concentration vs Application

Next, let’s look at the total pesticide concentrations in zooplankton as a function of standardized monthly total application data for each station.

df_zoop %>% 
  ggplot(aes(x = TotalApplStd, y = TotalConc_Zoop, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Monthly Total Application (lbs/sq mile)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

df_zoop %>% 
  ggplot(aes(x = TotalApplStd, y = TotalConc_Zoop, color = Year)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Monthly Total Application (lbs/sq mile)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

There appears to be a weak negative relationship between pesticide concentrations in zooplankton and monthly application amounts at both stations. I’m not sure that monthly application will be a good predictor. Let’s take a look at whether using the prior month’s application amount makes a difference.

df_zoop %>% 
  ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Zoop, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Monthly Total Application in prior month (lbs/sq mile)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

df_zoop %>% 
  ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Zoop, color = Year)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Monthly Total Application in prior month (lbs/sq mile)",
    y = "Total pesticide concentration in zooplankton (ng/g)"
  ) +
  theme_bw()

Hmm, not really…

Histograms

Lastly, let’s take a look at histograms of total pesticide concentrations in zooplankton to look at its distribution. We’ll look at each station separately and then all data grouped together.

df_zoop %>% 
  ggplot(aes(x = TotalConc_Zoop)) +
  geom_histogram(bins = 10) +
  facet_wrap(vars(Station), scales = "free") +
  xlab("Total pesticide concentration in zooplankton (ng/g)")

df_zoop %>% 
  ggplot(aes(x = TotalConc_Zoop)) +
  geom_histogram() +
  xlab("Total pesticide concentration in zooplankton (ng/g)")

We’ll probably need to use transformed zooplankton concentration data for the analyses.

Analyses

Models with Flow and Application

We’ll create linear models of the zooplankton concentrations as a function of daily average net flow and monthly pesticide application amounts for each station separately. We’ll start with the original concentration values, but we may need to transform them. These models will allow us to examine:

Do total pesticide concentrations in zooplankton significantly respond to flow at either or both SHR and STTD?
Do total pesticide concentrations in zooplankton significantly respond to pesticide application at either or both SHR and STTD?

SHR

We’ll start with SHR

Current month application

df_zoop_shr <- df_zoop %>% filter(Station == "SHR")

m_zoop_q_appl_shr <- df_zoop_shr %>% 
  lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_shr %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g", 
  m_zoop_q_appl_shr, 
  bins = 10
)

shapiro.test(m_zoop_q_appl_shr$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_shr$residuals
## W = 0.94631, p-value = 0.2895

The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.

m_zoop_q_appl_shr_log <- df_zoop_shr %>% 
  lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_shr %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_q_appl_shr_log, 
  trans = "log", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_shr_log$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_shr_log$residuals
## W = 0.9739, p-value = 0.8169

The model diagnostics look pretty good with the log-transformed values, so we’ll take a closer look at this model.

summary(m_zoop_q_appl_shr_log)

## 
## Call:
## lm(formula = TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59245 -0.12635 -0.02903  0.17388  0.49120 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      7.364e+00  3.196e-01  23.041 8.26e-15 ***
## DailyAvgNetFlow -8.287e-05  1.712e-05  -4.841 0.000131 ***
## TotalApplStd    -4.318e-02  9.271e-03  -4.657 0.000196 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2726 on 18 degrees of freedom
## Multiple R-squared:  0.7004, Adjusted R-squared:  0.6671 
## F-statistic: 21.04 on 2 and 18 DF,  p-value: 1.944e-05

plt_m_zoop_q_shr <- visreg(m_zoop_q_appl_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_appl_shr <- visreg(m_zoop_q_appl_shr_log, xvar = "TotalApplStd", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_shr <- plt_m_zoop_q_shr + plt_m_zoop_appl_shr
plt_m_zoop_shr

Total pesticide concentrations decrease significantly with flow and monthly pesticide application at the SHR station; however, the relationship with monthly pesticide application is driven by four values with the highest application.

Prior month application

Let’s see if the model is a better fit when we use the prior month’s application amount.

m_zoop_q_appl_pr_shr <- df_zoop_shr %>% 
  lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_shr %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g", 
  m_zoop_q_appl_pr_shr, 
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_shr$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_shr$residuals
## W = 0.91528, p-value = 0.06986

The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.

m_zoop_q_appl_pr_shr_log <- df_zoop_shr %>% 
  lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_shr %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_q_appl_pr_shr_log, 
  trans = "log", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_shr_log$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_shr_log$residuals
## W = 0.97348, p-value = 0.8082

The model diagnostics look pretty good with the log-transformed values, so we’ll take a closer look at this model.

summary(m_zoop_q_appl_pr_shr_log)

## 
## Call:
## lm(formula = TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.62399 -0.22201 -0.02841  0.19220  0.61989 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        7.136e+00  3.768e-01  18.935 2.47e-13 ***
## DailyAvgNetFlow   -7.119e-05  2.066e-05  -3.446  0.00288 ** 
## TotalApplStd_lag1 -1.588e-02  5.192e-03  -3.058  0.00677 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3284 on 18 degrees of freedom
## Multiple R-squared:  0.5653, Adjusted R-squared:  0.517 
## F-statistic:  11.7 on 2 and 18 DF,  p-value: 0.0005545

plt_m_zoop_q_pr_shr <- visreg(m_zoop_q_appl_pr_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_appl_pr_shr <- visreg(m_zoop_q_appl_pr_shr_log, xvar = "TotalApplStd_lag1", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_pr_shr <- plt_m_zoop_q_pr_shr + plt_m_zoop_appl_pr_shr
plt_m_zoop_pr_shr

Again, total pesticide concentrations decrease significantly with flow and monthly pesticide application at the SHR station using the prior month’s application. The relationship with monthly pesticide application isn’t as susceptible to leverage by values with the highest application.

Model selection

Let’s take a look at AIC and BIC values to see which model is better.

AIC(m_zoop_q_appl_shr_log, m_zoop_q_appl_pr_shr_log)

##                          df       AIC
## m_zoop_q_appl_shr_log     4  9.768586
## m_zoop_q_appl_pr_shr_log  4 17.586844

BIC(m_zoop_q_appl_shr_log, m_zoop_q_appl_pr_shr_log)

##                          df      BIC
## m_zoop_q_appl_shr_log     4 13.94668
## m_zoop_q_appl_pr_shr_log  4 21.76493

Both AIC and BIC prefer the model with the current month’s pesticide application.

STTD

Next, we’ll look at STTD

Current month application

df_zoop_sttd <- df_zoop %>% filter(Station == "STTD")

m_zoop_q_appl_sttd <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g", 
  m_zoop_q_appl_sttd, 
  bins = 10
)

shapiro.test(m_zoop_q_appl_sttd$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_sttd$residuals
## W = 0.90656, p-value = 0.05484

The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.

m_zoop_q_appl_sttd_log <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_q_appl_sttd_log, 
  trans = "log", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_sttd_log$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_sttd_log$residuals
## W = 0.92061, p-value = 0.1018

The natural log transformation may be too strong. We’ll re-run the model using the square root-transformation.

m_zoop_q_appl_sttd_sqrt <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop_sqrt, 
  "ng/g", 
  m_zoop_q_appl_sttd_sqrt, 
  trans = "sqrt", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_sttd_sqrt$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_sttd_sqrt$residuals
## W = 0.95259, p-value = 0.4081

The model diagnostics look best with this model, so we’ll take a closer look at this one.

summary(m_zoop_q_appl_sttd_sqrt)

## 
## Call:
## lm(formula = TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.1552  -3.4770  -0.8651   4.8538  12.0845 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     13.760969   1.920517   7.165 1.58e-06 ***
## DailyAvgNetFlow  0.025096   0.005822   4.311 0.000474 ***
## TotalApplStd    -0.021261   0.074230  -0.286 0.778020    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.754 on 17 degrees of freedom
## Multiple R-squared:  0.5549, Adjusted R-squared:  0.5025 
## F-statistic:  10.6 on 2 and 17 DF,  p-value: 0.001028

plt_m_zoop_q_sttd <- visreg(m_zoop_q_appl_sttd_sqrt, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_appl_sttd <- visreg(m_zoop_q_appl_sttd_sqrt, xvar = "TotalApplStd", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_sttd <- plt_m_zoop_q_sttd + plt_m_zoop_appl_sttd
plt_m_zoop_sttd

Total pesticide concentrations increase significantly with flow at the STTD station, but there is no relationship between concentration and monthly pesticide application.

Prior month application

Let’s see if the model is a better fit when we use the prior month’s application amount.

m_zoop_q_appl_pr_sttd <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g",
  m_zoop_q_appl_pr_sttd, 
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_sttd$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_sttd$residuals
## W = 0.90557, p-value = 0.05251

The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.

m_zoop_q_appl_pr_sttd_log <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_q_appl_pr_sttd_log, 
  trans = "log", 
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_sttd_log$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_sttd_log$residuals
## W = 0.92017, p-value = 0.09981

Again, the natural log transformation may be too strong. We’ll re-run the model using the square root-transformation.

m_zoop_q_appl_pr_sttd_sqrt <- df_zoop_sttd %>% 
  lm(TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_zoop_sttd %>% plot_model_diag(
  TotalConc_Zoop_sqrt, 
  "ng/g", 
  m_zoop_q_appl_pr_sttd_sqrt, 
  trans = "sqrt",
  bins = 10
)

shapiro.test(m_zoop_q_appl_pr_sttd_sqrt$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_q_appl_pr_sttd_sqrt$residuals
## W = 0.9525, p-value = 0.4067

The model diagnostics look best with this model, so we’ll take a closer look at this one.

summary(m_zoop_q_appl_pr_sttd_sqrt)

## 
## Call:
## lm(formula = TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.078  -3.453  -1.004   4.880  12.143 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       13.666197   1.855055   7.367  1.1e-06 ***
## DailyAvgNetFlow    0.025367   0.005649   4.490 0.000322 ***
## TotalApplStd_lag1 -0.006553   0.028126  -0.233 0.818554    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.759 on 17 degrees of freedom
## Multiple R-squared:  0.5542, Adjusted R-squared:  0.5017 
## F-statistic: 10.57 on 2 and 17 DF,  p-value: 0.001042

plt_m_zoop_q_pr_sttd <- visreg(m_zoop_q_appl_pr_sttd_sqrt, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_appl_pr_sttd <- visreg(m_zoop_q_appl_pr_sttd_sqrt, xvar = "TotalApplStd_lag1", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_zoop_pr_sttd <- plt_m_zoop_q_pr_sttd + plt_m_zoop_appl_pr_sttd
plt_m_zoop_pr_sttd

The results are the same with this model.

Model selection

Let’s take a look at AIC and BIC values to see which model is better.

AIC(m_zoop_q_appl_sttd_sqrt, m_zoop_q_appl_pr_sttd_sqrt)

##                            df      AIC
## m_zoop_q_appl_sttd_sqrt     4 131.5035
## m_zoop_q_appl_pr_sttd_sqrt  4 131.5360

BIC(m_zoop_q_appl_sttd_sqrt, m_zoop_q_appl_pr_sttd_sqrt)

##                            df      BIC
## m_zoop_q_appl_sttd_sqrt     4 135.4865
## m_zoop_q_appl_pr_sttd_sqrt  4 135.5190

The AIC and BIC values are nearly identical between models with both metrics giving a slight edge to the model with the current month’s pesticide application.

Model with Station and Year

Next, we’ll create a linear model of the zooplankton concentrations as a function of the interaction between station and year. We’ll start with the original concentration values, but we may need to transform them. This model will allow us to examine:

If the interaction between station and year isn’t significant, whether total pesticide concentrations in zooplankton are significantly different among stations or years (main effects).
If the interaction is significant, whether total pesticide concentrations in zooplankton are significantly different among stations for each year or among years for each station.

With interaction

m_zoop_sta_yr <- df_zoop %>% 
  lm(TotalConc_Zoop ~ Station * Year, data = .)

Plot the diagnostics

df_zoop %>% plot_model_diag(
  TotalConc_Zoop, 
  "ng/g", 
  m_zoop_sta_yr
)

shapiro.test(m_zoop_sta_yr$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_sta_yr$residuals
## W = 0.90768, p-value = 0.002817

The residuals deviate from a normal distribution according to the Shapiro-Wilk normality test and visual inspection. Also, the variance among groups isn’t consistent. We’ll re-run the model using the natural log of the concentration values.

m_zoop_sta_yr_log <- df_zoop %>% 
  lm(TotalConc_Zoop_log ~ Station * Year, data = .)

Plot the diagnostics

df_zoop %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_sta_yr_log, 
  trans = "log"
)

shapiro.test(m_zoop_sta_yr_log$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_sta_yr_log$residuals
## W = 0.97934, p-value = 0.6506

This is much better than the model with original values. Let’s take a look at the Anova table using type 3 sum of squares for the model using natural log transformation.

ANOVA table

Anova(m_zoop_sta_yr_log, type = 3, contrasts = list(topic = contr.sum, sys = contr.sum))

## Anova Table (Type III tests)
## 
## Response: TotalConc_Zoop_log
##               Sum Sq Df  F value  Pr(>F)    
## (Intercept)  149.373  1 272.8261 < 2e-16 ***
## Station        3.248  1   5.9332 0.02009 *  
## Year           0.230  2   0.2101 0.81154    
## Station:Year   1.887  2   1.7230 0.19334    
## Residuals     19.163 35                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Well, the interaction term is not significant which means we can take a look at the main effects. Before we do this, we’ll remove the interaction from the model since it’s not necessary. We’ll stick with the model using natural log transformation.

Without interaction

m_zoop_sta_yr_log2 <- df_zoop %>% 
  lm(TotalConc_Zoop_log ~ Station + Year, data = .)

df_zoop %>% plot_model_diag(
  TotalConc_Zoop_log, 
  "ng/g", 
  m_zoop_sta_yr_log2, 
  trans = "log"
)

shapiro.test(m_zoop_sta_yr_log2$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_zoop_sta_yr_log2$residuals
## W = 0.98495, p-value = 0.8542

The diagnostics still look good with this model. Let’s take a look at the Anova table using type 2 sum of squares because of the unbalanced design.

ANOVA table

Anova(m_zoop_sta_yr_log2, type = 2)

## Anova Table (Type II tests)
## 
## Response: TotalConc_Zoop_log
##            Sum Sq Df F value  Pr(>F)  
## Station    1.8028  1  3.1690 0.08326 .
## Year       3.8276  2  3.3641 0.04546 *
## Residuals 21.0492 37                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The main effect of station is not significant (p = 0.083) meaning that total pesticide concentrations in zooplankton don’t differ between SHR and STTD. However, the main effect of Year is significant, so we can take a look at the pairwise contrasts to see how concentrations differ among the years. We’ll also take a look at the pairwise contrasts for station since it is close to significant.

Contrasts

# Contrasts for Year main effect
em_zoop_yr <- emmeans(m_zoop_sta_yr_log2, specs = "Year")
pairs(em_zoop_yr)

##  contrast            estimate    SE df t.ratio p.value
##  Year2017 - Year2018  -0.6700 0.304 37  -2.201  0.0842
##  Year2017 - Year2019  -0.7474 0.309 37  -2.420  0.0525
##  Year2018 - Year2019  -0.0774 0.274 37  -0.283  0.9570
## 
## Results are averaged over the levels of: Station 
## P value adjustment: tukey method for comparing a family of 3 estimates

visreg(m_zoop_sta_yr_log2, xvar = "Year")

# Contrasts for Station main effect
em_zoop_sta <- emmeans(m_zoop_sta_yr_log2, specs = "Station")
pairs(em_zoop_sta)

##  contrast   estimate    SE df t.ratio p.value
##  SHR - STTD    0.424 0.238 37   1.780  0.0833
## 
## Results are averaged over the levels of: Year

visreg(m_zoop_sta_yr_log2, xvar = "Station")

Pairwise tests indicate that total pesticide concentration in zooplankton in 2019 is marginally significantly higher than 2017, but the p-value is 0.053. Also, the p-value isn’t significant, but SHR is slightly higher than STTD.

Water Analysis

# Prepare water concentration data for analysis
df_water <- df_conc_all_c2 %>% 
  select(-TotalConc_Zoop) %>% 
  # only include 2019 since that was the only year water samples were collected
    # at STTD
  filter(Year == 2019) %>% 
  drop_na() %>% 
  # Add log-transformed and sqrt-transformed water concentrations
  mutate(
    TotalConc_Water_log = log(TotalConc_Water),
    TotalConc_Water_sqrt = sqrt(TotalConc_Water)
  )

Exploratory Plots

Before running the analyses, lets visualize the data.

Boxplots

First, let’s look at a boxplot comparing total pesticide concentrations in water between stations.

df_water %>% 
  ggplot(aes(x = Station, y = TotalConc_Water)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.3) +
  ylab("Total pesticide concentration in water (ng/L)")

STTD seems to have higher concentrations with much more variation than SHR.

Seasonal time-series

Now, let’s see how total concentrations varied seasonally in 2019 for each station.

df_water %>% 
  ggplot(aes(x = Date, y = TotalConc_Water, color = Station)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  labs(x = "Date", y = "Total pesticide concentration in water (ng/L)")

Concentration vs Flow

Now, let’s take a look at how the total pesticide concentrations in water vary with daily average net flow for each station.

df_water %>% 
  ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Water, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Daily average net flow (cfs)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

df_water %>% 
  ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Water)) +
  geom_point() +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Daily average net flow (cfs)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

Both SHR and STTD seem to have positive relationships between concentration and flow, but STTD looks weak and strongly affected by values with the highest flows.

Concentration vs Application

Next, let’s look at the total pesticide concentrations in water as a function of standardized monthly total application data for each station.

df_water %>% 
  ggplot(aes(x = TotalApplStd, y = TotalConc_Water, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Monthly Total Application (lbs/sq mile)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

df_water %>% 
  ggplot(aes(x = TotalApplStd, y = TotalConc_Water)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Monthly Total Application (lbs/sq mile)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

There appears to be a weak negative relationship between pesticide concentrations in water and monthly application amounts at STTD and a slightly stronger positive relationship at SHR. I’m not sure that monthly application will be a good predictor. Let’s take a look at whether using the prior month’s application amount makes a difference.

df_water %>% 
  ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Water, color = Station)) +
  geom_point() +
  geom_smooth(method = "lm", formula = y ~ x) +
  labs(
    x = "Monthly Total Application in prior month (lbs/sq mile)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

df_water %>% 
  ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Water)) +
  geom_point() +
  scale_color_viridis_d(option = "plasma", end = 0.8) +
  facet_wrap(vars(Station), scales = "free") +
  labs(
    x = "Monthly Total Application in prior month (lbs/sq mile)",
    y = "Total pesticide concentration in water (ng/L)"
  ) +
  theme_bw()

Hmm, not really. The relationship may be slightly better at SHR using the prior month’s application amounts.

Histograms

Lastly, let’s take a look at histograms of total pesticide concentrations in water to look at its distribution. We’ll look at each station separately and then all data grouped together.

df_water %>% 
  ggplot(aes(x = TotalConc_Water)) +
  geom_histogram(bins = 10) +
  facet_wrap(vars(Station), scales = "free") +
  xlab("Total pesticide concentration in water (ng/L)")

df_water %>% 
  ggplot(aes(x = TotalConc_Water)) +
  geom_histogram() +
  xlab("Total pesticide concentration in water (ng/L)")

We’ll probably need to use transformed water concentration data for the analyses.

Analyses

Models with Flow and Application

We’ll create linear models of the water concentrations as a function of daily average net flow and monthly pesticide application amounts for each station separately. We’ll start with the original concentration values, but we may need to transform them. These models will allow us to examine:

Do total pesticide concentrations in water significantly respond to flow at either or both SHR and STTD?
Do total pesticide concentrations in water significantly respond to pesticide application at either or both SHR and STTD?

SHR

We’ll start with SHR

df_water_shr <- df_water %>% filter(Station == "SHR")

m_water_q_appl_shr <- df_water_shr %>% 
  lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_water_shr %>% plot_model_diag(
  TotalConc_Water, 
  "ng/L",
  m_water_q_appl_shr, 
  bins = 8
)

shapiro.test(m_water_q_appl_shr$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_shr$residuals
## W = 0.93721, p-value = 0.5529

The diagnostics look pretty good with this model, but let’s see if they look better with log transformed values.

m_water_q_appl_shr_log <- df_water_shr %>% 
  lm(TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_water_shr %>% plot_model_diag(
  TotalConc_Water_log, 
  "ng/L", 
  m_water_q_appl_shr_log, 
  trans = "log", 
  bins = 8
)

shapiro.test(m_water_q_appl_shr_log$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_shr_log$residuals
## W = 0.96314, p-value = 0.8305

The diagnostics look better with this model, so we’ll take a closer look at it.

summary(m_water_q_appl_shr_log)

## 
## Call:
## lm(formula = TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50111 -0.08524 -0.02129  0.15153  0.43396 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)     2.334e+00  5.277e-01   4.423  0.00446 **
## DailyAvgNetFlow 1.632e-04  2.827e-05   5.772  0.00118 **
## TotalApplStd    4.076e-02  1.416e-02   2.878  0.02815 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3062 on 6 degrees of freedom
## Multiple R-squared:  0.879,  Adjusted R-squared:  0.8387 
## F-statistic:  21.8 on 2 and 6 DF,  p-value: 0.00177

plt_m_water_q_shr <- visreg(m_water_q_appl_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_shr <- visreg(m_water_q_appl_shr_log, xvar = "TotalApplStd", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_shr <- plt_m_water_q_shr + plt_m_water_appl_shr
plt_m_water_shr

Total pesticide concentrations increase significantly with both flow and monthly pesticide application at the SHR station; however, the relationship with monthly pesticide application is driven by two values with the highest application.

Prior month application

Let’s see if the model is a better fit when we use the prior month’s application amount.

m_water_q_appl_pr_shr <- df_water_shr %>%
  lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_water_shr %>% plot_model_diag(
  TotalConc_Water, 
  "ng/L", 
  m_water_q_appl_pr_shr, 
  bins = 8
)

shapiro.test(m_water_q_appl_pr_shr$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_pr_shr$residuals
## W = 0.98793, p-value = 0.9926

The model diagnostics look really good with the original values, so we’ll take a closer look at this model.

summary(m_water_q_appl_pr_shr)

## 
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -126.095  -49.937    1.512   41.715  126.284 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)       -3.270e+02  1.442e+02  -2.267  0.06395 . 
## DailyAvgNetFlow    3.073e-02  7.887e-03   3.897  0.00802 **
## TotalApplStd_lag1  5.716e+00  1.937e+00   2.950  0.02561 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 84.09 on 6 degrees of freedom
## Multiple R-squared:  0.8292, Adjusted R-squared:  0.7723 
## F-statistic: 14.57 on 2 and 6 DF,  p-value: 0.004982

plt_m_water_q_pr_shr <- visreg(m_water_q_appl_pr_shr, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_pr_shr <- visreg(m_water_q_appl_pr_shr, xvar = "TotalApplStd_lag1", gg = TRUE) +
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_pr_shr <- plt_m_water_q_pr_shr + plt_m_water_appl_pr_shr
plt_m_water_pr_shr

Again, total pesticide concentrations increase significantly with flow and monthly pesticide application at the SHR station using the prior month’s application. The relationship with monthly pesticide application isn’t as susceptible to leverage by values with the highest application.

Model selection

Let’s take a look at AIC and BIC values to see which model is better.

AIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr)

##                        df        AIC
## m_water_q_appl_shr_log  4   8.586709
## m_water_q_appl_pr_shr   4 109.665709

BIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr)

##                        df        BIC
## m_water_q_appl_shr_log  4   9.375607
## m_water_q_appl_pr_shr   4 110.454607

Both AIC and BIC strongly prefer the model with the current month’s pesticide application. However, both models aren’t using the same transformation for the response variable. I wonder if this changes if we use log-transformed values for the model with prior month’s application.

m_water_q_appl_pr_shr_log <- df_water_shr %>% 
  lm(TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_water_shr %>% plot_model_diag(
  TotalConc_Water_log, 
  "ng/L",
  m_water_q_appl_pr_shr_log, 
  trans = "log", 
  bins = 8
)

shapiro.test(m_water_q_appl_pr_shr_log$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_pr_shr_log$residuals
## W = 0.93963, p-value = 0.5781

The model diagnostics aren’t as good with the log-transformed values. We’ll take a closer look at this model to see how it compares.

summary(m_water_q_appl_pr_shr_log)

## 
## Call:
## lm(formula = TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40074 -0.05306 -0.03911  0.12387  0.26372 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       2.472e+00  4.053e-01   6.100 0.000885 ***
## DailyAvgNetFlow   1.504e-04  2.216e-05   6.786 0.000501 ***
## TotalApplStd_lag1 2.308e-02  5.443e-03   4.241 0.005435 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2362 on 6 degrees of freedom
## Multiple R-squared:  0.928,  Adjusted R-squared:  0.904 
## F-statistic: 38.65 on 2 and 6 DF,  p-value: 0.0003736

plt_m_water_q_pr_shr2 <- visreg(m_water_q_appl_pr_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_pr_shr2 <- visreg(m_water_q_appl_pr_shr_log, xvar = "TotalApplStd_lag1", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_pr_shr2 <- plt_m_water_q_pr_shr2 + plt_m_water_appl_pr_shr2

The results are generally the same. Let’s take a look at AIC and BIC values using both models with log-transformed values.

AIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr_log)

##                           df      AIC
## m_water_q_appl_shr_log     4 8.586709
## m_water_q_appl_pr_shr_log  4 3.920041

BIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr_log)

##                           df      BIC
## m_water_q_appl_shr_log     4 9.375607
## m_water_q_appl_pr_shr_log  4 4.708940

Both AIC and BIC now prefer the model with the prior month’s pesticide application, but the difference isn’t as large. Since the diagnostics looked better for the model with the prior month’s pesticide application using the original values, let’s use the prior month model with original values for the model selection procedure. In this case, the best model is the one with the current month’s pesticide application.

STTD

Next, we’ll look at STTD

df_water_sttd <- df_water %>% filter(Station == "STTD")

m_water_q_appl_sttd <- df_water_sttd %>% 
  lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, data = .)

Plot the diagnostics

df_water_sttd %>% plot_model_diag(
  TotalConc_Water, 
  "ng/L", 
  m_water_q_appl_sttd, 
  bins = 8
)

shapiro.test(m_water_q_appl_sttd$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_sttd$residuals
## W = 0.95545, p-value = 0.7789

The diagnostics look pretty good with the original values, so let’s take a closer look at this model.

summary(m_water_q_appl_sttd)

## 
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##       1       2       3       4       5       6       7 
##  -10.56   11.26  143.49 -149.01  226.58  -11.64 -210.12 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     613.1212   121.4666   5.048 0.007243 ** 
## DailyAvgNetFlow   2.6242     0.2426  10.816 0.000415 ***
## TotalApplStd     -4.9151     6.9025  -0.712 0.515747    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 186.2 on 4 degrees of freedom
## Multiple R-squared:   0.97,  Adjusted R-squared:  0.9549 
## F-statistic: 64.59 on 2 and 4 DF,  p-value: 0.000902

plt_m_water_q_sttd <- visreg(m_water_q_appl_sttd, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_sttd <- visreg(m_water_q_appl_sttd, xvar = "TotalApplStd", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_sttd <- plt_m_water_q_sttd + plt_m_water_appl_sttd
plt_m_water_sttd

Total pesticide concentrations increase significantly with flow at the STTD station, but there are two values with flows of about 600 cfs that have a lot leverage on the model. The relationship between total pesticide concentration and monthly pesticide application wasn’t significant at STTD.

Prior month application

Let’s see if the model is a better fit when we use the prior month’s application amount.

m_water_q_appl_pr_sttd <- df_water_sttd %>% 
  lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)

Plot the diagnostics

df_water_sttd %>% plot_model_diag(
  TotalConc_Water, 
  "ng/L", 
  m_water_q_appl_pr_sttd, 
  bins = 8
)

shapiro.test(m_water_q_appl_pr_sttd$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m_water_q_appl_pr_sttd$residuals
## W = 0.95422, p-value = 0.7679

The model diagnostics look really good with the original values, so we’ll take a closer look at this model.

summary(m_water_q_appl_pr_sttd)

## 
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, 
##     data = .)
## 
## Residuals:
##        1        2        3        4        5        6        7 
##   -9.275   11.173  144.338 -149.681  224.672  -10.709 -210.517 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       609.5491   116.9936   5.210 0.006471 ** 
## DailyAvgNetFlow     2.6653     0.2349  11.348 0.000344 ***
## TotalApplStd_lag1  -2.0360     2.8321  -0.719 0.511962    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 186 on 4 degrees of freedom
## Multiple R-squared:   0.97,  Adjusted R-squared:  0.955 
## F-statistic: 64.74 on 2 and 4 DF,  p-value: 0.0008981

plt_m_water_q_pr_sttd <- visreg(m_water_q_appl_pr_sttd, xvar = "DailyAvgNetFlow", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_appl_pr_sttd <- visreg(m_water_q_appl_pr_sttd, xvar = "TotalApplStd_lag1", gg = TRUE) + 
  theme_bw() + 
  geom_point(size = 1)

plt_m_water_pr_sttd <- plt_m_water_q_pr_sttd + plt_m_water_appl_pr_sttd
plt_m_water_pr_sttd

The results are the same with this model.

Model selection

Let’s take a look at AIC and BIC values to see which model is better.

AIC(m_water_q_appl_sttd, m_water_q_appl_pr_sttd)

##                        df      AIC
## m_water_q_appl_sttd     4 97.12213
## m_water_q_appl_pr_sttd  4 97.10699

BIC(m_water_q_appl_sttd, m_water_q_appl_pr_sttd)

##                        df      BIC
## m_water_q_appl_sttd     4 96.90577
## m_water_q_appl_pr_sttd  4 96.89063

The AIC and BIC values are nearly identical between models with both metrics giving a slight edge to the model with the prior month’s pesticide application. However, monthly pesticide application isn’t significant in either model.

Station comparison

Next, let’s compare total pesticide concentrations in water between the two stations, SHR and STTD. Since samples were only collected at both stations in 2019, we can run a simple t-test to compare them. The histogram of all water concentrations above showed that they are not normally-distributed. Let’s take a look at the natural log transformed values to see if we can use these for the t-test.

df_water %>% 
  ggplot(aes(x = TotalConc_Water_log)) +
  geom_histogram(bins = 8) +
  xlab("Total pesticide concentration in water log(ng/L)")

df_water %>% 
  ggplot(aes(x = Station, y = TotalConc_Water_log)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.3) +
  ylab("Total pesticide concentration in water log(ng/L)")

df_water %>% 
  ggplot(aes(sample = TotalConc_Water_log)) + 
  labs(
    x = "Normal Quantiles", 
    y = "Total pesticide concentration in water log(ng/L)"
  ) + 
  stat_qq() + 
  stat_qq_line(linewidth = 1, color = 'red') + 
  theme_bw()

shapiro.test(df_water$TotalConc_Water_log)

## 
##  Shapiro-Wilk normality test
## 
## data:  df_water$TotalConc_Water_log
## W = 0.93391, p-value = 0.2809

The natural log transformed values seem to be more normally distributed and have relatively equal variances between SHR and STTD. Let’s run Welch’s two-sample t-test to compare the means of total pesticide concentration in water between SHR and STTD.

df_water %>% t.test(TotalConc_Water_log ~ Station, data = .)

## 
##  Welch Two Sample t-test
## 
## data:  TotalConc_Water_log by Station
## t = -2.1709, df = 11.742, p-value = 0.05119
## alternative hypothesis: true difference in means between group SHR and group STTD is not equal to 0
## 95 percent confidence interval:
##  -1.858996839  0.005652187
## sample estimates:
##  mean in group SHR mean in group STTD 
##           5.545113           6.471786

The mean total pesticide concentration in water was marginally significantly higher (p = 0.052) at STTD than SHR during 2019.

Overall Conclusions

Zooplankton

Total pesticide concentrations in Zooplankton:

Total pesticide concentrations decreased significantly with flow (p = 0.00013) and monthly pesticide application (p = 0.00020) at the SHR station; however, the relationship with monthly pesticide application was driven by four values with the highest application.

## 
## Call:
## lm(formula = TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59245 -0.12635 -0.02903  0.17388  0.49120 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      7.364e+00  3.196e-01  23.041 8.26e-15 ***
## DailyAvgNetFlow -8.287e-05  1.712e-05  -4.841 0.000131 ***
## TotalApplStd    -4.318e-02  9.271e-03  -4.657 0.000196 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2726 on 18 degrees of freedom
## Multiple R-squared:  0.7004, Adjusted R-squared:  0.6671 
## F-statistic: 21.04 on 2 and 18 DF,  p-value: 1.944e-05

Total pesticide concentrations increased significantly with flow at the STTD station (p = 0.00047), but there was no relationship between concentration and monthly pesticide application (p = 0.78).

## 
## Call:
## lm(formula = TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.1552  -3.4770  -0.8651   4.8538  12.0845 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     13.760969   1.920517   7.165 1.58e-06 ***
## DailyAvgNetFlow  0.025096   0.005822   4.311 0.000474 ***
## TotalApplStd    -0.021261   0.074230  -0.286 0.778020    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.754 on 17 degrees of freedom
## Multiple R-squared:  0.5549, Adjusted R-squared:  0.5025 
## F-statistic:  10.6 on 2 and 17 DF,  p-value: 0.001028

Model selection using AIC and BIC preferred the models using the current month’s pesticide application amounts for both SHR and STTD.
Total pesticide concentrations didn’t differ significantly between SHR and STTD when controlling for the year when the samples were collected; however, SHR had a higher median concentration than STTD, and the p-value for their pairwise difference was 0.083.

## Anova Table (Type II tests)
## 
## Response: TotalConc_Zoop_log
##            Sum Sq Df F value  Pr(>F)  
## Station    1.8028  1  3.1690 0.08326 .
## Year       3.8276  2  3.3641 0.04546 *
## Residuals 21.0492 37                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##  contrast   estimate    SE df t.ratio p.value
##  SHR - STTD    0.424 0.238 37   1.780  0.0833
## 
## Results are averaged over the levels of: Year

Zooplankton concentrations were marginally significantly higher (p = 0.053) in 2019 than 2017 across both stations.

##  contrast            estimate    SE df t.ratio p.value
##  Year2017 - Year2018  -0.6700 0.304 37  -2.201  0.0842
##  Year2017 - Year2019  -0.7474 0.309 37  -2.420  0.0525
##  Year2018 - Year2019  -0.0774 0.274 37  -0.283  0.9570
## 
## Results are averaged over the levels of: Station 
## P value adjustment: tukey method for comparing a family of 3 estimates

Water

Total pesticide concentrations in Water:

Total pesticide concentrations increased significantly with both flow (p = 0.0012) and monthly pesticide application (p = 0.028) at the SHR station; however, the relationship with monthly pesticide application is driven by two values with the highest application.

## 
## Call:
## lm(formula = TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50111 -0.08524 -0.02129  0.15153  0.43396 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)     2.334e+00  5.277e-01   4.423  0.00446 **
## DailyAvgNetFlow 1.632e-04  2.827e-05   5.772  0.00118 **
## TotalApplStd    4.076e-02  1.416e-02   2.878  0.02815 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3062 on 6 degrees of freedom
## Multiple R-squared:  0.879,  Adjusted R-squared:  0.8387 
## F-statistic:  21.8 on 2 and 6 DF,  p-value: 0.00177

Total pesticide concentrations also increased significantly with flow (p = 0.00042) at the STTD station, but there were two values with flows of about 600 cfs that had a lot leverage on the model. The relationship between total pesticide concentration and monthly pesticide application wasn’t significant at STTD (p = 0.52).

## 
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, 
##     data = .)
## 
## Residuals:
##       1       2       3       4       5       6       7 
##  -10.56   11.26  143.49 -149.01  226.58  -11.64 -210.12 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     613.1212   121.4666   5.048 0.007243 ** 
## DailyAvgNetFlow   2.6242     0.2426  10.816 0.000415 ***
## TotalApplStd     -4.9151     6.9025  -0.712 0.515747    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 186.2 on 4 degrees of freedom
## Multiple R-squared:   0.97,  Adjusted R-squared:  0.9549 
## F-statistic: 64.59 on 2 and 4 DF,  p-value: 0.000902

Both models had a fairly low sample size to work with at 9 and 7 for SHR and STTD, respectively.
The mean total pesticide concentration in water was marginally significantly higher (p = 0.052) at STTD than SHR during 2019.

## 
##  Welch Two Sample t-test
## 
## data:  TotalConc_Water_log by Station
## t = -2.1709, df = 11.742, p-value = 0.05119
## alternative hypothesis: true difference in means between group SHR and group STTD is not equal to 0
## 95 percent confidence interval:
##  -1.858996839  0.005652187
## sample estimates:
##  mean in group SHR mean in group STTD 
##           5.545113           6.471786

NDFS Contaminants Manuscript

Analyses of Total Concentrations in Water and Zooplankton

Dave Bosworth

September 15, 2023

Purpose

Global code and functions

Import and Prepare Data

Zooplankton

Exploratory Plots

Boxplots

Seasonal time-series

Concentration vs Flow

Concentration vs Application

Histograms

Analyses

Models with Flow and Application

SHR

Current month application

Prior month application

Model selection

STTD

Current month application

Prior month application

Model selection

Model with Station and Year

With interaction

ANOVA table

Without interaction

ANOVA table

Contrasts

Water Analysis

Exploratory Plots

Boxplots

Seasonal time-series

Concentration vs Flow

Concentration vs Application

Histograms

Analyses

Models with Flow and Application

SHR

Prior month application

Model selection

STTD

Prior month application

Model selection

Station comparison

Overall Conclusions

Zooplankton

Water