Create exploratory figures and run statistical analyses for the total pesticide concentration data in water and zooplankton collected from the Sacramento River (SHR) and Toe Drain (STTD) in the summer-fall periods in 2017, 2018 and 2019. These analyses are included in the NDFS contaminants manuscript.
# Load packages
library(tidyverse)
library(rlang)
library(knitr)
library(patchwork)
library(car)
library(emmeans)
library(visreg)
library(here)
library(conflicted)
# Declare package conflict preferences
conflicts_prefer(dplyr::filter(), dplyr::lag())
# Check if we are in the correct working directory
i_am("manuscript_contam/notebooks/explore_contam_conc_water_zoop.Rmd")
## here() starts at C:/Repositories/04_IEP_Org/ND-FASTR
Display current versions of R and packages used for this analysis:
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.2.3 (2023-03-15 ucrt)
## os Windows 10 x64 (build 19044)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate English_United States.utf8
## ctype English_United States.utf8
## tz America/Los_Angeles
## date 2023-09-15
## pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## abind 1.4-5 2016-07-21 [1] CRAN (R 4.2.0)
## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.2.2)
## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.2.3)
## callr 3.7.3 2022-11-02 [1] CRAN (R 4.2.2)
## car * 3.1-2 2023-03-30 [1] CRAN (R 4.2.3)
## carData * 3.0-5 2022-01-06 [1] CRAN (R 4.2.1)
## cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.3)
## coda 0.19-4 2020-09-30 [1] CRAN (R 4.2.1)
## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.2.3)
## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.2)
## conflicted * 1.2.0 2023-02-01 [1] CRAN (R 4.2.2)
## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.1)
## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.2.1)
## digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
## dplyr * 1.1.2 2023-04-20 [1] CRAN (R 4.2.3)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.1)
## emmeans * 1.8.5 2023-03-08 [1] CRAN (R 4.2.2)
## estimability 1.4.1 2022-08-05 [1] CRAN (R 4.2.1)
## evaluate 0.21 2023-05-05 [1] CRAN (R 4.2.3)
## fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.2)
## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.2)
## forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.2.2)
## fs 1.6.2 2023-04-25 [1] CRAN (R 4.2.3)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
## ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.2.3)
## glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
## gtable 0.3.3 2023-03-21 [1] CRAN (R 4.2.3)
## here * 1.0.1 2020-12-13 [1] CRAN (R 4.2.1)
## hms 1.1.3 2023-03-21 [1] CRAN (R 4.2.3)
## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.2.3)
## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.2.3)
## httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.2.2)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.1)
## jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.2)
## knitr * 1.42 2023-01-25 [1] CRAN (R 4.2.2)
## later 1.3.0 2021-08-18 [1] CRAN (R 4.2.1)
## lattice 0.21-8 2023-04-05 [1] CRAN (R 4.2.3)
## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.1)
## lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.2.2)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1)
## MASS 7.3-58.2 2023-01-23 [2] CRAN (R 4.2.3)
## Matrix 1.5-4 2023-04-04 [1] CRAN (R 4.2.3)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.1)
## mime 0.12 2021-09-28 [1] CRAN (R 4.2.0)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.1)
## multcomp 1.4-23 2023-03-09 [1] CRAN (R 4.2.2)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.1)
## mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 4.2.0)
## patchwork * 1.1.2 2022-08-19 [1] CRAN (R 4.2.1)
## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.3)
## pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.2.2)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1)
## pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.2.2)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.1)
## processx 3.8.1 2023-04-18 [1] CRAN (R 4.2.3)
## profvis 0.3.7 2020-11-02 [1] CRAN (R 4.2.1)
## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.1)
## ps 1.7.5 2023-04-18 [1] CRAN (R 4.2.3)
## purrr * 1.0.1 2023-01-10 [1] CRAN (R 4.2.2)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1)
## Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.2.2)
## readr * 2.1.4 2023-02-10 [1] CRAN (R 4.2.2)
## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.1)
## rlang * 1.1.1 2023-04-28 [1] CRAN (R 4.2.3)
## rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.2.3)
## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.1)
## rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)
## sandwich 3.0-2 2022-06-15 [1] CRAN (R 4.2.1)
## sass 0.4.6 2023-05-03 [1] CRAN (R 4.2.3)
## scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.1)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
## shiny 1.7.4 2022-12-15 [1] CRAN (R 4.2.2)
## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.2)
## stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
## survival 3.5-5 2023-03-12 [1] CRAN (R 4.2.3)
## TH.data 1.1-2 2023-04-17 [1] CRAN (R 4.2.3)
## tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.2.3)
## tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.2.2)
## tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1)
## tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.2.2)
## timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.2)
## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.2.3)
## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.1)
## usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.1)
## utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.2)
## vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.2.3)
## visreg * 2.7.0 2020-06-04 [1] CRAN (R 4.2.1)
## withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1)
## xfun 0.39 2023-04-20 [1] CRAN (R 4.2.3)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.1)
## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.2)
## zoo 1.8-12 2023-04-13 [1] CRAN (R 4.2.3)
##
## [1] C:/R/win-library/4.2
## [2] C:/Program Files/R/R-4.2.3/library
##
## ──────────────────────────────────────────────────────────────────────────────
Create function to plot model diagnostics:
# Create model diagnostic plots to check assumptions
plot_model_diag <- function(df_data, param_var, unit_label, model, trans = c("none", "log", "sqrt"), ...) {
trans <- match.arg(trans)
df_data <- df_data %>%
mutate(
Residuals = resid(model),
Fitted = predict(model)
)
param_name <- as_name(ensym(param_var))
if (trans == "log") {
unit_label <- paste0("log[", unit_label, "]")
} else if (trans == "sqrt") {
unit_label <- paste0("sqrt[", unit_label, "]")
}
plt_hist <- ggplot(df_data, aes(x = Residuals)) +
geom_histogram(...) +
labs(
title = "Residual Histogram",
x = "Residuals"
) +
theme_bw()
plt_qq <- ggplot(df_data, aes(sample = Residuals)) +
labs(
title = "Residual Probability Plot",
x = "Normal Quantiles",
y = "Residuals"
) +
stat_qq() +
stat_qq_line(linewidth = 1, color = 'red') +
theme_bw()
plt_res_fit <- ggplot(df_data, aes(x = Fitted, y = Residuals)) +
geom_point() +
labs(
title = "Residuals vs. Fitted Values",
x = (paste0("Predicted ", param_name, " (", unit_label, ")")),
y = paste0("Residuals (", unit_label, ")")
) +
theme_bw()
plt_obs_fit <- ggplot(df_data, aes(x = Fitted, y = {{ param_var}})) +
geom_point() +
geom_abline(slope = 1, intercept = 0, color = "red") +
labs(
title = "Observed vs. Fitted Values",
x = paste0("Predicted ", param_name, " (", unit_label, ")"),
y = paste0("Observed ", param_name, " (", unit_label, ")")
) +
theme_bw()
(plt_hist + plt_qq) / (plt_res_fit + plt_obs_fit)
}
# Define root directory for pesticide data:
fp_pest <- here("manuscript_contam/data/processed")
# Total Pesticide Concentration data:
df_conc_all <- read_rds(file.path(fp_pest, "contam_conc_water_zoop_2017-2019.rds"))
# Pesticide application data:
df_appl <- read_rds(file.path(fp_pest, "pesticide_use_daily_tot_2017-2020.rds"))
# Daily average flow data:
df_davg_flow <- read_rds(here("Water_Quality/Data_Processed/LIS_SR_DailyAvgFlow_2013-2022.rds"))
Prepare the pesticide application data to be joined to the concentration data by calculating monthly totals and standardizing to area of the watershed.
df_appl_c <- df_appl %>%
# Calculate monthly totals by region
summarize(TotalAppl = sum(TotalApplication), .by = c(Region, Year, Month)) %>%
mutate(
# Standardize total application to area (in square miles) of the watershed
TotalApplStd = if_else(
Region == "Sacramento River",
TotalAppl/22526.79948,
TotalAppl/4217.503529
),
# Define station that each region is assigned to
Station = case_match(
Region,
"Toe Drain" ~ "STTD",
"Sacramento River" ~ "SHR"
)
) %>%
# Lag standardized total application data by one month
arrange(Region, Year, Month) %>%
group_by(Region) %>%
mutate(TotalApplStd_lag1 = lag(TotalApplStd, n = 1)) %>%
ungroup() %>%
# Clean up dataframe
select(Station, Year, Month, TotalApplStd, TotalApplStd_lag1)
Prepare the flow data to be joined to the concentration data using SR at Freeport flow data for SHR and LIS flow data for STTD. Use Daily Average Net flows for both.
df_davg_flow_c <- df_davg_flow %>%
select(-LIS_DailyAvgInstFlow) %>%
pivot_longer(
cols = ends_with("NetFlow"),
names_to = "Station",
values_to = "DailyAvgNetFlow"
) %>%
mutate(
Station = case_match(
Station,
"SR_DailyAvgNetFlow" ~ "SHR",
"LIS_DailyAvgNetFlow" ~ "STTD"
)
)
Join pesticide concentration, application, and flow data together.
df_conc_all_c1 <- df_conc_all %>%
mutate(
Year = year(Date),
Month = month(Date, label = TRUE)
) %>%
left_join(df_appl_c, by = join_by(Station, Year, Month)) %>%
left_join(df_davg_flow_c, by = join_by(Date, Station))
# Look at all data
df_conc_all_c1 %>%
select(
Station,
Date,
Year,
starts_with("TotalConc"),
DailyAvgNetFlow,
starts_with("TotalAppl")
) %>%
kable()
Station | Date | Year | TotalConc_Zoop | TotalConc_Water | DailyAvgNetFlow | TotalApplStd | TotalApplStd_lag1 |
---|---|---|---|---|---|---|---|
SHR | 2017-08-29 | 2017 | 227.19872 | 389.42531 | 21200.00 | 3.379151 | 18.641841 |
SHR | 2017-09-12 | 2017 | 174.19088 | 396.64632 | 22100.00 | 1.440834 | 3.379151 |
SHR | 2017-09-19 | 2017 | 137.99606 | 147.01557 | 21500.00 | 1.440834 | 3.379151 |
SHR | 2017-10-03 | 2017 | 316.69620 | 80.32957 | 19400.00 | 2.573115 | 1.440834 |
SHR | 2017-10-17 | 2017 | 427.37944 | 47.92240 | 11700.00 | 2.573115 | 1.440834 |
SHR | 2017-11-01 | 2017 | 265.57107 | 97.29594 | 10400.00 | 2.758317 | 2.573115 |
SHR | 2018-07-12 | 2018 | 218.84885 | 188.76610 | 15700.00 | 17.692933 | 37.851786 |
SHR | 2018-07-25 | 2018 | 164.99273 | 262.27653 | 16500.00 | 17.692933 | 37.851786 |
SHR | 2018-08-09 | 2018 | 413.08422 | 668.97693 | 20200.00 | 3.639683 | 17.692933 |
SHR | 2018-08-23 | 2018 | 302.75363 | 564.30313 | 17500.00 | 3.639683 | 17.692933 |
SHR | 2018-09-06 | 2018 | 324.04838 | 317.93755 | 18000.00 | 1.435832 | 3.639683 |
SHR | 2018-09-20 | 2018 | 270.17804 | 151.46644 | 16500.00 | 1.435832 | 3.639683 |
SHR | 2018-10-11 | 2018 | 651.37113 | 98.85206 | 11600.00 | 2.469192 | 1.435832 |
SHR | 2018-10-26 | 2018 | 0.00000 | 75.62678 | 9060.00 | 2.469192 | 1.435832 |
SHR | 2018-11-09 | 2018 | 359.01012 | 198.17201 | 7850.00 | 2.253170 | 2.469192 |
SHR | 2019-05-21 | 2019 | 127.90884 | 445.48078 | 43200.00 | 15.849626 | 5.706071 |
SHR | 2019-06-04 | 2019 | 103.08011 | 215.69136 | 44500.00 | 38.770625 | 15.849626 |
SHR | 2019-06-18 | 2019 | 419.33946 | 505.85984 | 23500.00 | 38.770625 | 15.849626 |
SHR | 2019-07-02 | 2019 | 129.36941 | 507.56875 | 19300.00 | 19.143787 | 38.770625 |
SHR | 2019-07-16 | 2019 | 163.66643 | 374.25063 | 17300.00 | 19.143787 | 38.770625 |
SHR | 2019-08-05 | 2019 | 322.10233 | 337.91232 | 19700.00 | 3.012142 | 19.143787 |
SHR | 2019-08-19 | 2019 | 386.21467 | 554.08375 | 21000.00 | 3.012142 | 19.143787 |
SHR | 2019-09-05 | 2019 | 219.43424 | 423.41095 | 22500.00 | 1.578017 | 3.012142 |
SHR | 2019-09-16 | 2019 | 246.22101 | 331.93137 | 21400.00 | 1.578017 | 3.012142 |
SHR | 2019-09-30 | 2019 | 418.09428 | 148.04025 | 19000.00 | 1.578017 | 3.012142 |
SHR | 2019-10-14 | 2019 | 445.06218 | 70.76964 | 12600.00 | 1.125018 | 1.578017 |
SHR | 2019-10-28 | 2019 | 690.50734 | 90.12275 | 11500.00 | 1.125018 | 1.578017 |
SHR | 2019-11-13 | 2019 | 378.31172 | 115.52704 | 10100.00 | 2.724691 | 1.125018 |
SHR | 2019-11-25 | 2019 | 405.88485 | 186.34375 | 11000.00 | 2.724691 | 1.125018 |
SHR | 2019-12-10 | 2019 | 323.05106 | 558.23809 | 23700.00 | 1.373379 | 2.724691 |
SHR | 2019-12-27 | 2019 | 326.21796 | 253.96279 | 19500.00 | 1.373379 | 2.724691 |
STTD | 2017-08-30 | 2017 | 75.15536 | 7.64 | 21.114730 | 69.159300 | |
STTD | 2017-09-13 | 2017 | 26.92852 | 108.00 | 5.969721 | 21.114730 | |
STTD | 2017-09-20 | 2017 | 73.05093 | -75.10 | 5.969721 | 21.114730 | |
STTD | 2017-10-04 | 2017 | 269.00064 | -82.80 | 6.902682 | 5.969721 | |
STTD | 2017-10-19 | 2017 | 62.22037 | -47.80 | 6.902682 | 5.969721 | |
STTD | 2017-11-02 | 2017 | 285.62279 | -29.70 | 12.922356 | 6.902682 | |
STTD | 2018-07-11 | 2018 | 165.58977 | -70.50 | 66.033924 | 162.231117 | |
STTD | 2018-07-26 | 2018 | 100.06627 | -108.00 | 66.033924 | 162.231117 | |
STTD | 2018-08-08 | 2018 | 64.42045 | -70.30 | 21.975841 | 66.033924 | |
STTD | 2018-08-22 | 2018 | 212.96740 | -73.70 | 21.975841 | 66.033924 | |
STTD | 2018-08-30 | 2018 | 686.83756 | 305.00 | 21.975841 | 66.033924 | |
STTD | 2018-09-05 | 2018 | 1170.56494 | 337.00 | 4.200371 | 21.975841 | |
STTD | 2018-09-19 | 2018 | 797.15646 | 387.00 | 4.200371 | 21.975841 | |
STTD | 2018-10-10 | 2018 | 83.18350 | -93.40 | 9.863104 | 4.200371 | |
STTD | 2018-10-25 | 2018 | 99.33193 | -12.00 | 9.863104 | 4.200371 | |
STTD | 2018-11-08 | 2018 | 213.58742 | -22.40 | 13.370382 | 9.863104 | |
STTD | 2019-05-20 | 2019 | 919.34468 | 362.10000 | 1270.00 | 85.276308 | 39.939688 |
STTD | 2019-06-03 | 2019 | 903.77514 | 1467.63536 | 825.00 | 152.302865 | 85.276308 |
STTD | 2019-06-17 | 2019 | 332.82856 | 899.79820 | 63.30 | 152.302865 | 85.276308 |
STTD | 2019-07-01 | 2019 | 230.40000 | 607.28115 | 67.332801 | 152.302865 | |
STTD | 2019-07-15 | 2019 | 125.70000 | 532.99378 | 67.332801 | 152.302865 | |
STTD | 2019-08-07 | 2019 | 0.00000 | 347.24398 | -43.50 | 28.719601 | 67.332801 |
STTD | 2019-08-21 | 2019 | 89.00000 | 457.27127 | -9.89 | 28.719601 | 67.332801 |
STTD | 2019-09-04 | 2019 | 641.20000 | 2289.24197 | 598.00 | 7.454874 | 28.719601 |
STTD | 2019-09-18 | 2019 | 548.00000 | 2093.83725 | 635.00 | 7.454874 | 28.719601 |
STTD | 2019-10-02 | 2019 | 333.90000 | 766.80342 | -19.60 | 4.366761 | 7.454874 |
STTD | 2019-10-16 | 2019 | 181.70000 | 347.25295 | -88.70 | 4.366761 | 7.454874 |
STTD | 2019-10-28 | 2019 | 315.00000 | 233.26624 | -56.50 | 4.366761 | 7.454874 |
STTD | 2019-11-14 | 2019 | 406.10000 | 164.59963 | -113.00 | 14.037269 | 4.366761 |
STTD | 2019-11-26 | 2019 | 413.70000 | 193.53662 | 6.97 | 14.037269 | 4.366761 |
STTD | 2019-12-09 | 2019 | 994.50000 | 1714.24731 | 300.00 | 5.704477 | 14.037269 |
STTD | 2019-12-26 | 2019 | 373.20000 | 577.11950 | 220.00 | 5.704477 | 14.037269 |
It looks like all of the flow data was available for SHR, but STTD is missing a few flow values in July 2019. We’ll restrict the date ranges to July 1 - Oct 31 for each year since some of the data extends outside of the summer-fall period.
df_conc_all_c2 <- df_conc_all_c1 %>%
mutate(Month = as.numeric(Month)) %>%
filter(Month %in% 7:10)
# Look at filtered data
df_conc_all_c2 %>%
select(
Station,
Date,
Year,
starts_with("TotalConc"),
DailyAvgNetFlow,
starts_with("TotalAppl")
) %>%
kable()
Station | Date | Year | TotalConc_Zoop | TotalConc_Water | DailyAvgNetFlow | TotalApplStd | TotalApplStd_lag1 |
---|---|---|---|---|---|---|---|
SHR | 2017-08-29 | 2017 | 227.19872 | 389.42531 | 21200.00 | 3.379151 | 18.641841 |
SHR | 2017-09-12 | 2017 | 174.19088 | 396.64632 | 22100.00 | 1.440834 | 3.379151 |
SHR | 2017-09-19 | 2017 | 137.99606 | 147.01557 | 21500.00 | 1.440834 | 3.379151 |
SHR | 2017-10-03 | 2017 | 316.69620 | 80.32957 | 19400.00 | 2.573115 | 1.440834 |
SHR | 2017-10-17 | 2017 | 427.37944 | 47.92240 | 11700.00 | 2.573115 | 1.440834 |
SHR | 2018-07-12 | 2018 | 218.84885 | 188.76610 | 15700.00 | 17.692933 | 37.851786 |
SHR | 2018-07-25 | 2018 | 164.99273 | 262.27653 | 16500.00 | 17.692933 | 37.851786 |
SHR | 2018-08-09 | 2018 | 413.08422 | 668.97693 | 20200.00 | 3.639683 | 17.692933 |
SHR | 2018-08-23 | 2018 | 302.75363 | 564.30313 | 17500.00 | 3.639683 | 17.692933 |
SHR | 2018-09-06 | 2018 | 324.04838 | 317.93755 | 18000.00 | 1.435832 | 3.639683 |
SHR | 2018-09-20 | 2018 | 270.17804 | 151.46644 | 16500.00 | 1.435832 | 3.639683 |
SHR | 2018-10-11 | 2018 | 651.37113 | 98.85206 | 11600.00 | 2.469192 | 1.435832 |
SHR | 2018-10-26 | 2018 | 0.00000 | 75.62678 | 9060.00 | 2.469192 | 1.435832 |
SHR | 2019-07-02 | 2019 | 129.36941 | 507.56875 | 19300.00 | 19.143787 | 38.770625 |
SHR | 2019-07-16 | 2019 | 163.66643 | 374.25063 | 17300.00 | 19.143787 | 38.770625 |
SHR | 2019-08-05 | 2019 | 322.10233 | 337.91232 | 19700.00 | 3.012142 | 19.143787 |
SHR | 2019-08-19 | 2019 | 386.21467 | 554.08375 | 21000.00 | 3.012142 | 19.143787 |
SHR | 2019-09-05 | 2019 | 219.43424 | 423.41095 | 22500.00 | 1.578017 | 3.012142 |
SHR | 2019-09-16 | 2019 | 246.22101 | 331.93137 | 21400.00 | 1.578017 | 3.012142 |
SHR | 2019-09-30 | 2019 | 418.09428 | 148.04025 | 19000.00 | 1.578017 | 3.012142 |
SHR | 2019-10-14 | 2019 | 445.06218 | 70.76964 | 12600.00 | 1.125018 | 1.578017 |
SHR | 2019-10-28 | 2019 | 690.50734 | 90.12275 | 11500.00 | 1.125018 | 1.578017 |
STTD | 2017-08-30 | 2017 | 75.15536 | 7.64 | 21.114730 | 69.159300 | |
STTD | 2017-09-13 | 2017 | 26.92852 | 108.00 | 5.969721 | 21.114730 | |
STTD | 2017-09-20 | 2017 | 73.05093 | -75.10 | 5.969721 | 21.114730 | |
STTD | 2017-10-04 | 2017 | 269.00064 | -82.80 | 6.902682 | 5.969721 | |
STTD | 2017-10-19 | 2017 | 62.22037 | -47.80 | 6.902682 | 5.969721 | |
STTD | 2018-07-11 | 2018 | 165.58977 | -70.50 | 66.033924 | 162.231117 | |
STTD | 2018-07-26 | 2018 | 100.06627 | -108.00 | 66.033924 | 162.231117 | |
STTD | 2018-08-08 | 2018 | 64.42045 | -70.30 | 21.975841 | 66.033924 | |
STTD | 2018-08-22 | 2018 | 212.96740 | -73.70 | 21.975841 | 66.033924 | |
STTD | 2018-08-30 | 2018 | 686.83756 | 305.00 | 21.975841 | 66.033924 | |
STTD | 2018-09-05 | 2018 | 1170.56494 | 337.00 | 4.200371 | 21.975841 | |
STTD | 2018-09-19 | 2018 | 797.15646 | 387.00 | 4.200371 | 21.975841 | |
STTD | 2018-10-10 | 2018 | 83.18350 | -93.40 | 9.863104 | 4.200371 | |
STTD | 2018-10-25 | 2018 | 99.33193 | -12.00 | 9.863104 | 4.200371 | |
STTD | 2019-07-01 | 2019 | 230.40000 | 607.28115 | 67.332801 | 152.302865 | |
STTD | 2019-07-15 | 2019 | 125.70000 | 532.99378 | 67.332801 | 152.302865 | |
STTD | 2019-08-07 | 2019 | 0.00000 | 347.24398 | -43.50 | 28.719601 | 67.332801 |
STTD | 2019-08-21 | 2019 | 89.00000 | 457.27127 | -9.89 | 28.719601 | 67.332801 |
STTD | 2019-09-04 | 2019 | 641.20000 | 2289.24197 | 598.00 | 7.454874 | 28.719601 |
STTD | 2019-09-18 | 2019 | 548.00000 | 2093.83725 | 635.00 | 7.454874 | 28.719601 |
STTD | 2019-10-02 | 2019 | 333.90000 | 766.80342 | -19.60 | 4.366761 | 7.454874 |
STTD | 2019-10-16 | 2019 | 181.70000 | 347.25295 | -88.70 | 4.366761 | 7.454874 |
STTD | 2019-10-28 | 2019 | 315.00000 | 233.26624 | -56.50 | 4.366761 | 7.454874 |
# Prepare zooplankton concentration data for analysis
df_zoop <- df_conc_all_c2 %>%
select(-TotalConc_Water) %>%
drop_na() %>%
# Remove two samples with total concentrations equal to zero
filter(TotalConc_Zoop > 0) %>%
mutate(
# Add log-transformed and sqrt-transformed zooplankton concentrations
TotalConc_Zoop_log = log(TotalConc_Zoop),
TotalConc_Zoop_sqrt = sqrt(TotalConc_Zoop),
# Convert year to character for categorical models
Year = as.character(Year)
)
Before running the analyses, lets visualize the data.
First, we’ll look at a boxplot comparing total pesticide concentrations in zooplankton between stations.
df_zoop %>%
ggplot(aes(x = Station, y = TotalConc_Zoop)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(aes(color = Year), width = 0.3) +
scale_color_viridis_d(option = "plasma", end = 0.8) +
ylab("Total pesticide concentration in zooplankton (ng/g)")
SHR has a higher median concentration than STTD, but STTD has some higher values than SHR particularly in 2018. Let’s break apart the boxplot by station and year.
df_zoop %>%
ggplot(aes(x = Year, y = TotalConc_Zoop, fill = Year)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.2) +
facet_wrap(vars(Station)) +
scale_fill_viridis_d(option = "plasma", end = 0.8) +
ylab("Total pesticide concentration in zooplankton (ng/g)")
Overall, STTD in 2018 had the highest variation in concentrations. Let’s look at a boxplot grouped only by year.
df_zoop %>%
ggplot(aes(x = Year, y = TotalConc_Zoop)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.3) +
ylab("Total pesticide concentration in zooplankton (ng/g)")
The median of total concentrations was slightly higher in 2019 than other years when both stations are grouped together.
Now, let’s see how total concentrations varied seasonally using day of year for each station and year.
df_zoop %>%
ggplot(aes(x = yday(Date), y = TotalConc_Zoop, color = Station)) +
geom_point() +
facet_grid(rows = vars(Year)) +
scale_color_viridis_d(option = "plasma", end = 0.8) +
labs(x = "DOY", y = "Total pesticide concentration in zooplankton (ng/g)")
It almost looks like SHR and STTD have opposite seasonal trends.
Now, let’s take a look at how the total pesticide concentrations in zooplankton vary with daily average net flow for each station.
df_zoop %>%
ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Zoop, color = Station)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x) +
labs(
x = "Daily average net flow (cfs)",
y = "Total pesticide concentration in zooplankton (ng/g)"
) +
theme_bw()
df_zoop %>%
ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Zoop, color = Year)) +
geom_point() +
scale_color_viridis_d(option = "plasma", end = 0.8) +
facet_wrap(vars(Station), scales = "free") +
labs(
x = "Daily average net flow (cfs)",
y = "Total pesticide concentration in zooplankton (ng/g)"
) +
theme_bw()
SHR seems to have a negative relationship between concentration and flow, while concentrations at STTD seem to have a positive relationship with flow.
Next, let’s look at the total pesticide concentrations in zooplankton as a function of standardized monthly total application data for each station.
df_zoop %>%
ggplot(aes(x = TotalApplStd, y = TotalConc_Zoop, color = Station)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x) +
labs(
x = "Monthly Total Application (lbs/sq mile)",
y = "Total pesticide concentration in zooplankton (ng/g)"
) +
theme_bw()
df_zoop %>%
ggplot(aes(x = TotalApplStd, y = TotalConc_Zoop, color = Year)) +
geom_point() +
scale_color_viridis_d(option = "plasma", end = 0.8) +
facet_wrap(vars(Station), scales = "free") +
labs(
x = "Monthly Total Application (lbs/sq mile)",
y = "Total pesticide concentration in zooplankton (ng/g)"
) +
theme_bw()
There appears to be a weak negative relationship between pesticide concentrations in zooplankton and monthly application amounts at both stations. I’m not sure that monthly application will be a good predictor. Let’s take a look at whether using the prior month’s application amount makes a difference.
df_zoop %>%
ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Zoop, color = Station)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x) +
labs(
x = "Monthly Total Application in prior month (lbs/sq mile)",
y = "Total pesticide concentration in zooplankton (ng/g)"
) +
theme_bw()
df_zoop %>%
ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Zoop, color = Year)) +
geom_point() +
scale_color_viridis_d(option = "plasma", end = 0.8) +
facet_wrap(vars(Station), scales = "free") +
labs(
x = "Monthly Total Application in prior month (lbs/sq mile)",
y = "Total pesticide concentration in zooplankton (ng/g)"
) +
theme_bw()
Hmm, not really…
Lastly, let’s take a look at histograms of total pesticide concentrations in zooplankton to look at its distribution. We’ll look at each station separately and then all data grouped together.
df_zoop %>%
ggplot(aes(x = TotalConc_Zoop)) +
geom_histogram(bins = 10) +
facet_wrap(vars(Station), scales = "free") +
xlab("Total pesticide concentration in zooplankton (ng/g)")
df_zoop %>%
ggplot(aes(x = TotalConc_Zoop)) +
geom_histogram() +
xlab("Total pesticide concentration in zooplankton (ng/g)")
We’ll probably need to use transformed zooplankton concentration data for the analyses.
We’ll create linear models of the zooplankton concentrations as a function of daily average net flow and monthly pesticide application amounts for each station separately. We’ll start with the original concentration values, but we may need to transform them. These models will allow us to examine:
We’ll start with SHR
df_zoop_shr <- df_zoop %>% filter(Station == "SHR")
m_zoop_q_appl_shr <- df_zoop_shr %>%
lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd, data = .)
Plot the diagnostics
df_zoop_shr %>% plot_model_diag(
TotalConc_Zoop,
"ng/g",
m_zoop_q_appl_shr,
bins = 10
)
shapiro.test(m_zoop_q_appl_shr$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_shr$residuals
## W = 0.94631, p-value = 0.2895
The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.
m_zoop_q_appl_shr_log <- df_zoop_shr %>%
lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, data = .)
Plot the diagnostics
df_zoop_shr %>% plot_model_diag(
TotalConc_Zoop_log,
"ng/g",
m_zoop_q_appl_shr_log,
trans = "log",
bins = 10
)
shapiro.test(m_zoop_q_appl_shr_log$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_shr_log$residuals
## W = 0.9739, p-value = 0.8169
The model diagnostics look pretty good with the log-transformed values, so we’ll take a closer look at this model.
summary(m_zoop_q_appl_shr_log)
##
## Call:
## lm(formula = TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59245 -0.12635 -0.02903 0.17388 0.49120
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.364e+00 3.196e-01 23.041 8.26e-15 ***
## DailyAvgNetFlow -8.287e-05 1.712e-05 -4.841 0.000131 ***
## TotalApplStd -4.318e-02 9.271e-03 -4.657 0.000196 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2726 on 18 degrees of freedom
## Multiple R-squared: 0.7004, Adjusted R-squared: 0.6671
## F-statistic: 21.04 on 2 and 18 DF, p-value: 1.944e-05
plt_m_zoop_q_shr <- visreg(m_zoop_q_appl_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_zoop_appl_shr <- visreg(m_zoop_q_appl_shr_log, xvar = "TotalApplStd", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_zoop_shr <- plt_m_zoop_q_shr + plt_m_zoop_appl_shr
plt_m_zoop_shr
Total pesticide concentrations decrease significantly with flow and monthly pesticide application at the SHR station; however, the relationship with monthly pesticide application is driven by four values with the highest application.
Let’s see if the model is a better fit when we use the prior month’s application amount.
m_zoop_q_appl_pr_shr <- df_zoop_shr %>%
lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)
Plot the diagnostics
df_zoop_shr %>% plot_model_diag(
TotalConc_Zoop,
"ng/g",
m_zoop_q_appl_pr_shr,
bins = 10
)
shapiro.test(m_zoop_q_appl_pr_shr$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_pr_shr$residuals
## W = 0.91528, p-value = 0.06986
The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.
m_zoop_q_appl_pr_shr_log <- df_zoop_shr %>%
lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)
Plot the diagnostics
df_zoop_shr %>% plot_model_diag(
TotalConc_Zoop_log,
"ng/g",
m_zoop_q_appl_pr_shr_log,
trans = "log",
bins = 10
)
shapiro.test(m_zoop_q_appl_pr_shr_log$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_pr_shr_log$residuals
## W = 0.97348, p-value = 0.8082
The model diagnostics look pretty good with the log-transformed values, so we’ll take a closer look at this model.
summary(m_zoop_q_appl_pr_shr_log)
##
## Call:
## lm(formula = TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd_lag1,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.62399 -0.22201 -0.02841 0.19220 0.61989
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.136e+00 3.768e-01 18.935 2.47e-13 ***
## DailyAvgNetFlow -7.119e-05 2.066e-05 -3.446 0.00288 **
## TotalApplStd_lag1 -1.588e-02 5.192e-03 -3.058 0.00677 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3284 on 18 degrees of freedom
## Multiple R-squared: 0.5653, Adjusted R-squared: 0.517
## F-statistic: 11.7 on 2 and 18 DF, p-value: 0.0005545
plt_m_zoop_q_pr_shr <- visreg(m_zoop_q_appl_pr_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_zoop_appl_pr_shr <- visreg(m_zoop_q_appl_pr_shr_log, xvar = "TotalApplStd_lag1", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_zoop_pr_shr <- plt_m_zoop_q_pr_shr + plt_m_zoop_appl_pr_shr
plt_m_zoop_pr_shr
Again, total pesticide concentrations decrease significantly with flow and monthly pesticide application at the SHR station using the prior month’s application. The relationship with monthly pesticide application isn’t as susceptible to leverage by values with the highest application.
Let’s take a look at AIC and BIC values to see which model is better.
AIC(m_zoop_q_appl_shr_log, m_zoop_q_appl_pr_shr_log)
## df AIC
## m_zoop_q_appl_shr_log 4 9.768586
## m_zoop_q_appl_pr_shr_log 4 17.586844
BIC(m_zoop_q_appl_shr_log, m_zoop_q_appl_pr_shr_log)
## df BIC
## m_zoop_q_appl_shr_log 4 13.94668
## m_zoop_q_appl_pr_shr_log 4 21.76493
Both AIC and BIC prefer the model with the current month’s pesticide application.
Next, we’ll look at STTD
df_zoop_sttd <- df_zoop %>% filter(Station == "STTD")
m_zoop_q_appl_sttd <- df_zoop_sttd %>%
lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd, data = .)
Plot the diagnostics
df_zoop_sttd %>% plot_model_diag(
TotalConc_Zoop,
"ng/g",
m_zoop_q_appl_sttd,
bins = 10
)
shapiro.test(m_zoop_q_appl_sttd$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_sttd$residuals
## W = 0.90656, p-value = 0.05484
The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.
m_zoop_q_appl_sttd_log <- df_zoop_sttd %>%
lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd, data = .)
Plot the diagnostics
df_zoop_sttd %>% plot_model_diag(
TotalConc_Zoop_log,
"ng/g",
m_zoop_q_appl_sttd_log,
trans = "log",
bins = 10
)
shapiro.test(m_zoop_q_appl_sttd_log$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_sttd_log$residuals
## W = 0.92061, p-value = 0.1018
The natural log transformation may be too strong. We’ll re-run the model using the square root-transformation.
m_zoop_q_appl_sttd_sqrt <- df_zoop_sttd %>%
lm(TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd, data = .)
Plot the diagnostics
df_zoop_sttd %>% plot_model_diag(
TotalConc_Zoop_sqrt,
"ng/g",
m_zoop_q_appl_sttd_sqrt,
trans = "sqrt",
bins = 10
)
shapiro.test(m_zoop_q_appl_sttd_sqrt$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_sttd_sqrt$residuals
## W = 0.95259, p-value = 0.4081
The model diagnostics look best with this model, so we’ll take a closer look at this one.
summary(m_zoop_q_appl_sttd_sqrt)
##
## Call:
## lm(formula = TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.1552 -3.4770 -0.8651 4.8538 12.0845
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.760969 1.920517 7.165 1.58e-06 ***
## DailyAvgNetFlow 0.025096 0.005822 4.311 0.000474 ***
## TotalApplStd -0.021261 0.074230 -0.286 0.778020
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.754 on 17 degrees of freedom
## Multiple R-squared: 0.5549, Adjusted R-squared: 0.5025
## F-statistic: 10.6 on 2 and 17 DF, p-value: 0.001028
plt_m_zoop_q_sttd <- visreg(m_zoop_q_appl_sttd_sqrt, xvar = "DailyAvgNetFlow", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_zoop_appl_sttd <- visreg(m_zoop_q_appl_sttd_sqrt, xvar = "TotalApplStd", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_zoop_sttd <- plt_m_zoop_q_sttd + plt_m_zoop_appl_sttd
plt_m_zoop_sttd
Total pesticide concentrations increase significantly with flow at the STTD station, but there is no relationship between concentration and monthly pesticide application.
Let’s see if the model is a better fit when we use the prior month’s application amount.
m_zoop_q_appl_pr_sttd <- df_zoop_sttd %>%
lm(TotalConc_Zoop ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)
Plot the diagnostics
df_zoop_sttd %>% plot_model_diag(
TotalConc_Zoop,
"ng/g",
m_zoop_q_appl_pr_sttd,
bins = 10
)
shapiro.test(m_zoop_q_appl_pr_sttd$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_pr_sttd$residuals
## W = 0.90557, p-value = 0.05251
The residuals deviate from a normal distribution according to visual inspection. We’ll re-run the model using the natural log transformation.
m_zoop_q_appl_pr_sttd_log <- df_zoop_sttd %>%
lm(TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)
Plot the diagnostics
df_zoop_sttd %>% plot_model_diag(
TotalConc_Zoop_log,
"ng/g",
m_zoop_q_appl_pr_sttd_log,
trans = "log",
bins = 10
)
shapiro.test(m_zoop_q_appl_pr_sttd_log$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_pr_sttd_log$residuals
## W = 0.92017, p-value = 0.09981
Again, the natural log transformation may be too strong. We’ll re-run the model using the square root-transformation.
m_zoop_q_appl_pr_sttd_sqrt <- df_zoop_sttd %>%
lm(TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)
Plot the diagnostics
df_zoop_sttd %>% plot_model_diag(
TotalConc_Zoop_sqrt,
"ng/g",
m_zoop_q_appl_pr_sttd_sqrt,
trans = "sqrt",
bins = 10
)
shapiro.test(m_zoop_q_appl_pr_sttd_sqrt$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_q_appl_pr_sttd_sqrt$residuals
## W = 0.9525, p-value = 0.4067
The model diagnostics look best with this model, so we’ll take a closer look at this one.
summary(m_zoop_q_appl_pr_sttd_sqrt)
##
## Call:
## lm(formula = TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd_lag1,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.078 -3.453 -1.004 4.880 12.143
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.666197 1.855055 7.367 1.1e-06 ***
## DailyAvgNetFlow 0.025367 0.005649 4.490 0.000322 ***
## TotalApplStd_lag1 -0.006553 0.028126 -0.233 0.818554
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.759 on 17 degrees of freedom
## Multiple R-squared: 0.5542, Adjusted R-squared: 0.5017
## F-statistic: 10.57 on 2 and 17 DF, p-value: 0.001042
plt_m_zoop_q_pr_sttd <- visreg(m_zoop_q_appl_pr_sttd_sqrt, xvar = "DailyAvgNetFlow", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_zoop_appl_pr_sttd <- visreg(m_zoop_q_appl_pr_sttd_sqrt, xvar = "TotalApplStd_lag1", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_zoop_pr_sttd <- plt_m_zoop_q_pr_sttd + plt_m_zoop_appl_pr_sttd
plt_m_zoop_pr_sttd
The results are the same with this model.
Let’s take a look at AIC and BIC values to see which model is better.
AIC(m_zoop_q_appl_sttd_sqrt, m_zoop_q_appl_pr_sttd_sqrt)
## df AIC
## m_zoop_q_appl_sttd_sqrt 4 131.5035
## m_zoop_q_appl_pr_sttd_sqrt 4 131.5360
BIC(m_zoop_q_appl_sttd_sqrt, m_zoop_q_appl_pr_sttd_sqrt)
## df BIC
## m_zoop_q_appl_sttd_sqrt 4 135.4865
## m_zoop_q_appl_pr_sttd_sqrt 4 135.5190
The AIC and BIC values are nearly identical between models with both metrics giving a slight edge to the model with the current month’s pesticide application.
Next, we’ll create a linear model of the zooplankton concentrations as a function of the interaction between station and year. We’ll start with the original concentration values, but we may need to transform them. This model will allow us to examine:
m_zoop_sta_yr <- df_zoop %>%
lm(TotalConc_Zoop ~ Station * Year, data = .)
Plot the diagnostics
df_zoop %>% plot_model_diag(
TotalConc_Zoop,
"ng/g",
m_zoop_sta_yr
)
shapiro.test(m_zoop_sta_yr$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_sta_yr$residuals
## W = 0.90768, p-value = 0.002817
The residuals deviate from a normal distribution according to the Shapiro-Wilk normality test and visual inspection. Also, the variance among groups isn’t consistent. We’ll re-run the model using the natural log of the concentration values.
m_zoop_sta_yr_log <- df_zoop %>%
lm(TotalConc_Zoop_log ~ Station * Year, data = .)
Plot the diagnostics
df_zoop %>% plot_model_diag(
TotalConc_Zoop_log,
"ng/g",
m_zoop_sta_yr_log,
trans = "log"
)
shapiro.test(m_zoop_sta_yr_log$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_sta_yr_log$residuals
## W = 0.97934, p-value = 0.6506
This is much better than the model with original values. Let’s take a look at the Anova table using type 3 sum of squares for the model using natural log transformation.
Anova(m_zoop_sta_yr_log, type = 3, contrasts = list(topic = contr.sum, sys = contr.sum))
## Anova Table (Type III tests)
##
## Response: TotalConc_Zoop_log
## Sum Sq Df F value Pr(>F)
## (Intercept) 149.373 1 272.8261 < 2e-16 ***
## Station 3.248 1 5.9332 0.02009 *
## Year 0.230 2 0.2101 0.81154
## Station:Year 1.887 2 1.7230 0.19334
## Residuals 19.163 35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Well, the interaction term is not significant which means we can take a look at the main effects. Before we do this, we’ll remove the interaction from the model since it’s not necessary. We’ll stick with the model using natural log transformation.
m_zoop_sta_yr_log2 <- df_zoop %>%
lm(TotalConc_Zoop_log ~ Station + Year, data = .)
df_zoop %>% plot_model_diag(
TotalConc_Zoop_log,
"ng/g",
m_zoop_sta_yr_log2,
trans = "log"
)
shapiro.test(m_zoop_sta_yr_log2$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_zoop_sta_yr_log2$residuals
## W = 0.98495, p-value = 0.8542
The diagnostics still look good with this model. Let’s take a look at the Anova table using type 2 sum of squares because of the unbalanced design.
Anova(m_zoop_sta_yr_log2, type = 2)
## Anova Table (Type II tests)
##
## Response: TotalConc_Zoop_log
## Sum Sq Df F value Pr(>F)
## Station 1.8028 1 3.1690 0.08326 .
## Year 3.8276 2 3.3641 0.04546 *
## Residuals 21.0492 37
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The main effect of station is not significant (p = 0.083) meaning that total pesticide concentrations in zooplankton don’t differ between SHR and STTD. However, the main effect of Year is significant, so we can take a look at the pairwise contrasts to see how concentrations differ among the years. We’ll also take a look at the pairwise contrasts for station since it is close to significant.
# Contrasts for Year main effect
em_zoop_yr <- emmeans(m_zoop_sta_yr_log2, specs = "Year")
pairs(em_zoop_yr)
## contrast estimate SE df t.ratio p.value
## Year2017 - Year2018 -0.6700 0.304 37 -2.201 0.0842
## Year2017 - Year2019 -0.7474 0.309 37 -2.420 0.0525
## Year2018 - Year2019 -0.0774 0.274 37 -0.283 0.9570
##
## Results are averaged over the levels of: Station
## P value adjustment: tukey method for comparing a family of 3 estimates
visreg(m_zoop_sta_yr_log2, xvar = "Year")
# Contrasts for Station main effect
em_zoop_sta <- emmeans(m_zoop_sta_yr_log2, specs = "Station")
pairs(em_zoop_sta)
## contrast estimate SE df t.ratio p.value
## SHR - STTD 0.424 0.238 37 1.780 0.0833
##
## Results are averaged over the levels of: Year
visreg(m_zoop_sta_yr_log2, xvar = "Station")
Pairwise tests indicate that total pesticide concentration in zooplankton in 2019 is marginally significantly higher than 2017, but the p-value is 0.053. Also, the p-value isn’t significant, but SHR is slightly higher than STTD.
# Prepare water concentration data for analysis
df_water <- df_conc_all_c2 %>%
select(-TotalConc_Zoop) %>%
# only include 2019 since that was the only year water samples were collected
# at STTD
filter(Year == 2019) %>%
drop_na() %>%
# Add log-transformed and sqrt-transformed water concentrations
mutate(
TotalConc_Water_log = log(TotalConc_Water),
TotalConc_Water_sqrt = sqrt(TotalConc_Water)
)
Before running the analyses, lets visualize the data.
First, let’s look at a boxplot comparing total pesticide concentrations in water between stations.
df_water %>%
ggplot(aes(x = Station, y = TotalConc_Water)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.3) +
ylab("Total pesticide concentration in water (ng/L)")
STTD seems to have higher concentrations with much more variation than SHR.
Now, let’s see how total concentrations varied seasonally in 2019 for each station.
df_water %>%
ggplot(aes(x = Date, y = TotalConc_Water, color = Station)) +
geom_point() +
scale_color_viridis_d(option = "plasma", end = 0.8) +
labs(x = "Date", y = "Total pesticide concentration in water (ng/L)")
Now, let’s take a look at how the total pesticide concentrations in water vary with daily average net flow for each station.
df_water %>%
ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Water, color = Station)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x) +
labs(
x = "Daily average net flow (cfs)",
y = "Total pesticide concentration in water (ng/L)"
) +
theme_bw()
df_water %>%
ggplot(aes(x = DailyAvgNetFlow, y = TotalConc_Water)) +
geom_point() +
facet_wrap(vars(Station), scales = "free") +
labs(
x = "Daily average net flow (cfs)",
y = "Total pesticide concentration in water (ng/L)"
) +
theme_bw()
Both SHR and STTD seem to have positive relationships between concentration and flow, but STTD looks weak and strongly affected by values with the highest flows.
Next, let’s look at the total pesticide concentrations in water as a function of standardized monthly total application data for each station.
df_water %>%
ggplot(aes(x = TotalApplStd, y = TotalConc_Water, color = Station)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x) +
labs(
x = "Monthly Total Application (lbs/sq mile)",
y = "Total pesticide concentration in water (ng/L)"
) +
theme_bw()
df_water %>%
ggplot(aes(x = TotalApplStd, y = TotalConc_Water)) +
geom_point() +
scale_color_viridis_d(option = "plasma", end = 0.8) +
facet_wrap(vars(Station), scales = "free") +
labs(
x = "Monthly Total Application (lbs/sq mile)",
y = "Total pesticide concentration in water (ng/L)"
) +
theme_bw()
There appears to be a weak negative relationship between pesticide concentrations in water and monthly application amounts at STTD and a slightly stronger positive relationship at SHR. I’m not sure that monthly application will be a good predictor. Let’s take a look at whether using the prior month’s application amount makes a difference.
df_water %>%
ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Water, color = Station)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x) +
labs(
x = "Monthly Total Application in prior month (lbs/sq mile)",
y = "Total pesticide concentration in water (ng/L)"
) +
theme_bw()
df_water %>%
ggplot(aes(x = TotalApplStd_lag1, y = TotalConc_Water)) +
geom_point() +
scale_color_viridis_d(option = "plasma", end = 0.8) +
facet_wrap(vars(Station), scales = "free") +
labs(
x = "Monthly Total Application in prior month (lbs/sq mile)",
y = "Total pesticide concentration in water (ng/L)"
) +
theme_bw()
Hmm, not really. The relationship may be slightly better at SHR using the prior month’s application amounts.
Lastly, let’s take a look at histograms of total pesticide concentrations in water to look at its distribution. We’ll look at each station separately and then all data grouped together.
df_water %>%
ggplot(aes(x = TotalConc_Water)) +
geom_histogram(bins = 10) +
facet_wrap(vars(Station), scales = "free") +
xlab("Total pesticide concentration in water (ng/L)")
df_water %>%
ggplot(aes(x = TotalConc_Water)) +
geom_histogram() +
xlab("Total pesticide concentration in water (ng/L)")
We’ll probably need to use transformed water concentration data for the analyses.
We’ll create linear models of the water concentrations as a function of daily average net flow and monthly pesticide application amounts for each station separately. We’ll start with the original concentration values, but we may need to transform them. These models will allow us to examine:
We’ll start with SHR
df_water_shr <- df_water %>% filter(Station == "SHR")
m_water_q_appl_shr <- df_water_shr %>%
lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, data = .)
Plot the diagnostics
df_water_shr %>% plot_model_diag(
TotalConc_Water,
"ng/L",
m_water_q_appl_shr,
bins = 8
)
shapiro.test(m_water_q_appl_shr$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_water_q_appl_shr$residuals
## W = 0.93721, p-value = 0.5529
The diagnostics look pretty good with this model, but let’s see if they look better with log transformed values.
m_water_q_appl_shr_log <- df_water_shr %>%
lm(TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd, data = .)
Plot the diagnostics
df_water_shr %>% plot_model_diag(
TotalConc_Water_log,
"ng/L",
m_water_q_appl_shr_log,
trans = "log",
bins = 8
)
shapiro.test(m_water_q_appl_shr_log$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_water_q_appl_shr_log$residuals
## W = 0.96314, p-value = 0.8305
The diagnostics look better with this model, so we’ll take a closer look at it.
summary(m_water_q_appl_shr_log)
##
## Call:
## lm(formula = TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.50111 -0.08524 -0.02129 0.15153 0.43396
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.334e+00 5.277e-01 4.423 0.00446 **
## DailyAvgNetFlow 1.632e-04 2.827e-05 5.772 0.00118 **
## TotalApplStd 4.076e-02 1.416e-02 2.878 0.02815 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3062 on 6 degrees of freedom
## Multiple R-squared: 0.879, Adjusted R-squared: 0.8387
## F-statistic: 21.8 on 2 and 6 DF, p-value: 0.00177
plt_m_water_q_shr <- visreg(m_water_q_appl_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_appl_shr <- visreg(m_water_q_appl_shr_log, xvar = "TotalApplStd", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_shr <- plt_m_water_q_shr + plt_m_water_appl_shr
plt_m_water_shr
Total pesticide concentrations increase significantly with both flow and monthly pesticide application at the SHR station; however, the relationship with monthly pesticide application is driven by two values with the highest application.
Let’s see if the model is a better fit when we use the prior month’s application amount.
m_water_q_appl_pr_shr <- df_water_shr %>%
lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)
Plot the diagnostics
df_water_shr %>% plot_model_diag(
TotalConc_Water,
"ng/L",
m_water_q_appl_pr_shr,
bins = 8
)
shapiro.test(m_water_q_appl_pr_shr$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_water_q_appl_pr_shr$residuals
## W = 0.98793, p-value = 0.9926
The model diagnostics look really good with the original values, so we’ll take a closer look at this model.
summary(m_water_q_appl_pr_shr)
##
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -126.095 -49.937 1.512 41.715 126.284
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.270e+02 1.442e+02 -2.267 0.06395 .
## DailyAvgNetFlow 3.073e-02 7.887e-03 3.897 0.00802 **
## TotalApplStd_lag1 5.716e+00 1.937e+00 2.950 0.02561 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 84.09 on 6 degrees of freedom
## Multiple R-squared: 0.8292, Adjusted R-squared: 0.7723
## F-statistic: 14.57 on 2 and 6 DF, p-value: 0.004982
plt_m_water_q_pr_shr <- visreg(m_water_q_appl_pr_shr, xvar = "DailyAvgNetFlow", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_appl_pr_shr <- visreg(m_water_q_appl_pr_shr, xvar = "TotalApplStd_lag1", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_pr_shr <- plt_m_water_q_pr_shr + plt_m_water_appl_pr_shr
plt_m_water_pr_shr
Again, total pesticide concentrations increase significantly with flow and monthly pesticide application at the SHR station using the prior month’s application. The relationship with monthly pesticide application isn’t as susceptible to leverage by values with the highest application.
Let’s take a look at AIC and BIC values to see which model is better.
AIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr)
## df AIC
## m_water_q_appl_shr_log 4 8.586709
## m_water_q_appl_pr_shr 4 109.665709
BIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr)
## df BIC
## m_water_q_appl_shr_log 4 9.375607
## m_water_q_appl_pr_shr 4 110.454607
Both AIC and BIC strongly prefer the model with the current month’s pesticide application. However, both models aren’t using the same transformation for the response variable. I wonder if this changes if we use log-transformed values for the model with prior month’s application.
m_water_q_appl_pr_shr_log <- df_water_shr %>%
lm(TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)
Plot the diagnostics
df_water_shr %>% plot_model_diag(
TotalConc_Water_log,
"ng/L",
m_water_q_appl_pr_shr_log,
trans = "log",
bins = 8
)
shapiro.test(m_water_q_appl_pr_shr_log$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_water_q_appl_pr_shr_log$residuals
## W = 0.93963, p-value = 0.5781
The model diagnostics aren’t as good with the log-transformed values. We’ll take a closer look at this model to see how it compares.
summary(m_water_q_appl_pr_shr_log)
##
## Call:
## lm(formula = TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd_lag1,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40074 -0.05306 -0.03911 0.12387 0.26372
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.472e+00 4.053e-01 6.100 0.000885 ***
## DailyAvgNetFlow 1.504e-04 2.216e-05 6.786 0.000501 ***
## TotalApplStd_lag1 2.308e-02 5.443e-03 4.241 0.005435 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2362 on 6 degrees of freedom
## Multiple R-squared: 0.928, Adjusted R-squared: 0.904
## F-statistic: 38.65 on 2 and 6 DF, p-value: 0.0003736
plt_m_water_q_pr_shr2 <- visreg(m_water_q_appl_pr_shr_log, xvar = "DailyAvgNetFlow", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_appl_pr_shr2 <- visreg(m_water_q_appl_pr_shr_log, xvar = "TotalApplStd_lag1", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_pr_shr2 <- plt_m_water_q_pr_shr2 + plt_m_water_appl_pr_shr2
The results are generally the same. Let’s take a look at AIC and BIC values using both models with log-transformed values.
AIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr_log)
## df AIC
## m_water_q_appl_shr_log 4 8.586709
## m_water_q_appl_pr_shr_log 4 3.920041
BIC(m_water_q_appl_shr_log, m_water_q_appl_pr_shr_log)
## df BIC
## m_water_q_appl_shr_log 4 9.375607
## m_water_q_appl_pr_shr_log 4 4.708940
Both AIC and BIC now prefer the model with the prior month’s pesticide application, but the difference isn’t as large. Since the diagnostics looked better for the model with the prior month’s pesticide application using the original values, let’s use the prior month model with original values for the model selection procedure. In this case, the best model is the one with the current month’s pesticide application.
Next, we’ll look at STTD
df_water_sttd <- df_water %>% filter(Station == "STTD")
m_water_q_appl_sttd <- df_water_sttd %>%
lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd, data = .)
Plot the diagnostics
df_water_sttd %>% plot_model_diag(
TotalConc_Water,
"ng/L",
m_water_q_appl_sttd,
bins = 8
)
shapiro.test(m_water_q_appl_sttd$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_water_q_appl_sttd$residuals
## W = 0.95545, p-value = 0.7789
The diagnostics look pretty good with the original values, so let’s take a closer look at this model.
summary(m_water_q_appl_sttd)
##
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd,
## data = .)
##
## Residuals:
## 1 2 3 4 5 6 7
## -10.56 11.26 143.49 -149.01 226.58 -11.64 -210.12
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 613.1212 121.4666 5.048 0.007243 **
## DailyAvgNetFlow 2.6242 0.2426 10.816 0.000415 ***
## TotalApplStd -4.9151 6.9025 -0.712 0.515747
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 186.2 on 4 degrees of freedom
## Multiple R-squared: 0.97, Adjusted R-squared: 0.9549
## F-statistic: 64.59 on 2 and 4 DF, p-value: 0.000902
plt_m_water_q_sttd <- visreg(m_water_q_appl_sttd, xvar = "DailyAvgNetFlow", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_appl_sttd <- visreg(m_water_q_appl_sttd, xvar = "TotalApplStd", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_sttd <- plt_m_water_q_sttd + plt_m_water_appl_sttd
plt_m_water_sttd
Total pesticide concentrations increase significantly with flow at the STTD station, but there are two values with flows of about 600 cfs that have a lot leverage on the model. The relationship between total pesticide concentration and monthly pesticide application wasn’t significant at STTD.
Let’s see if the model is a better fit when we use the prior month’s application amount.
m_water_q_appl_pr_sttd <- df_water_sttd %>%
lm(TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1, data = .)
Plot the diagnostics
df_water_sttd %>% plot_model_diag(
TotalConc_Water,
"ng/L",
m_water_q_appl_pr_sttd,
bins = 8
)
shapiro.test(m_water_q_appl_pr_sttd$residuals)
##
## Shapiro-Wilk normality test
##
## data: m_water_q_appl_pr_sttd$residuals
## W = 0.95422, p-value = 0.7679
The model diagnostics look really good with the original values, so we’ll take a closer look at this model.
summary(m_water_q_appl_pr_sttd)
##
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd_lag1,
## data = .)
##
## Residuals:
## 1 2 3 4 5 6 7
## -9.275 11.173 144.338 -149.681 224.672 -10.709 -210.517
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 609.5491 116.9936 5.210 0.006471 **
## DailyAvgNetFlow 2.6653 0.2349 11.348 0.000344 ***
## TotalApplStd_lag1 -2.0360 2.8321 -0.719 0.511962
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 186 on 4 degrees of freedom
## Multiple R-squared: 0.97, Adjusted R-squared: 0.955
## F-statistic: 64.74 on 2 and 4 DF, p-value: 0.0008981
plt_m_water_q_pr_sttd <- visreg(m_water_q_appl_pr_sttd, xvar = "DailyAvgNetFlow", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_appl_pr_sttd <- visreg(m_water_q_appl_pr_sttd, xvar = "TotalApplStd_lag1", gg = TRUE) +
theme_bw() +
geom_point(size = 1)
plt_m_water_pr_sttd <- plt_m_water_q_pr_sttd + plt_m_water_appl_pr_sttd
plt_m_water_pr_sttd
The results are the same with this model.
Let’s take a look at AIC and BIC values to see which model is better.
AIC(m_water_q_appl_sttd, m_water_q_appl_pr_sttd)
## df AIC
## m_water_q_appl_sttd 4 97.12213
## m_water_q_appl_pr_sttd 4 97.10699
BIC(m_water_q_appl_sttd, m_water_q_appl_pr_sttd)
## df BIC
## m_water_q_appl_sttd 4 96.90577
## m_water_q_appl_pr_sttd 4 96.89063
The AIC and BIC values are nearly identical between models with both metrics giving a slight edge to the model with the prior month’s pesticide application. However, monthly pesticide application isn’t significant in either model.
Next, let’s compare total pesticide concentrations in water between the two stations, SHR and STTD. Since samples were only collected at both stations in 2019, we can run a simple t-test to compare them. The histogram of all water concentrations above showed that they are not normally-distributed. Let’s take a look at the natural log transformed values to see if we can use these for the t-test.
df_water %>%
ggplot(aes(x = TotalConc_Water_log)) +
geom_histogram(bins = 8) +
xlab("Total pesticide concentration in water log(ng/L)")
df_water %>%
ggplot(aes(x = Station, y = TotalConc_Water_log)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.3) +
ylab("Total pesticide concentration in water log(ng/L)")
df_water %>%
ggplot(aes(sample = TotalConc_Water_log)) +
labs(
x = "Normal Quantiles",
y = "Total pesticide concentration in water log(ng/L)"
) +
stat_qq() +
stat_qq_line(linewidth = 1, color = 'red') +
theme_bw()
shapiro.test(df_water$TotalConc_Water_log)
##
## Shapiro-Wilk normality test
##
## data: df_water$TotalConc_Water_log
## W = 0.93391, p-value = 0.2809
The natural log transformed values seem to be more normally distributed and have relatively equal variances between SHR and STTD. Let’s run Welch’s two-sample t-test to compare the means of total pesticide concentration in water between SHR and STTD.
df_water %>% t.test(TotalConc_Water_log ~ Station, data = .)
##
## Welch Two Sample t-test
##
## data: TotalConc_Water_log by Station
## t = -2.1709, df = 11.742, p-value = 0.05119
## alternative hypothesis: true difference in means between group SHR and group STTD is not equal to 0
## 95 percent confidence interval:
## -1.858996839 0.005652187
## sample estimates:
## mean in group SHR mean in group STTD
## 5.545113 6.471786
The mean total pesticide concentration in water was marginally significantly higher (p = 0.052) at STTD than SHR during 2019.
Total pesticide concentrations in Zooplankton:
##
## Call:
## lm(formula = TotalConc_Zoop_log ~ DailyAvgNetFlow + TotalApplStd,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59245 -0.12635 -0.02903 0.17388 0.49120
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.364e+00 3.196e-01 23.041 8.26e-15 ***
## DailyAvgNetFlow -8.287e-05 1.712e-05 -4.841 0.000131 ***
## TotalApplStd -4.318e-02 9.271e-03 -4.657 0.000196 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2726 on 18 degrees of freedom
## Multiple R-squared: 0.7004, Adjusted R-squared: 0.6671
## F-statistic: 21.04 on 2 and 18 DF, p-value: 1.944e-05
##
## Call:
## lm(formula = TotalConc_Zoop_sqrt ~ DailyAvgNetFlow + TotalApplStd,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.1552 -3.4770 -0.8651 4.8538 12.0845
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.760969 1.920517 7.165 1.58e-06 ***
## DailyAvgNetFlow 0.025096 0.005822 4.311 0.000474 ***
## TotalApplStd -0.021261 0.074230 -0.286 0.778020
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.754 on 17 degrees of freedom
## Multiple R-squared: 0.5549, Adjusted R-squared: 0.5025
## F-statistic: 10.6 on 2 and 17 DF, p-value: 0.001028
## Anova Table (Type II tests)
##
## Response: TotalConc_Zoop_log
## Sum Sq Df F value Pr(>F)
## Station 1.8028 1 3.1690 0.08326 .
## Year 3.8276 2 3.3641 0.04546 *
## Residuals 21.0492 37
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## contrast estimate SE df t.ratio p.value
## SHR - STTD 0.424 0.238 37 1.780 0.0833
##
## Results are averaged over the levels of: Year
## contrast estimate SE df t.ratio p.value
## Year2017 - Year2018 -0.6700 0.304 37 -2.201 0.0842
## Year2017 - Year2019 -0.7474 0.309 37 -2.420 0.0525
## Year2018 - Year2019 -0.0774 0.274 37 -0.283 0.9570
##
## Results are averaged over the levels of: Station
## P value adjustment: tukey method for comparing a family of 3 estimates
Total pesticide concentrations in Water:
##
## Call:
## lm(formula = TotalConc_Water_log ~ DailyAvgNetFlow + TotalApplStd,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.50111 -0.08524 -0.02129 0.15153 0.43396
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.334e+00 5.277e-01 4.423 0.00446 **
## DailyAvgNetFlow 1.632e-04 2.827e-05 5.772 0.00118 **
## TotalApplStd 4.076e-02 1.416e-02 2.878 0.02815 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3062 on 6 degrees of freedom
## Multiple R-squared: 0.879, Adjusted R-squared: 0.8387
## F-statistic: 21.8 on 2 and 6 DF, p-value: 0.00177
##
## Call:
## lm(formula = TotalConc_Water ~ DailyAvgNetFlow + TotalApplStd,
## data = .)
##
## Residuals:
## 1 2 3 4 5 6 7
## -10.56 11.26 143.49 -149.01 226.58 -11.64 -210.12
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 613.1212 121.4666 5.048 0.007243 **
## DailyAvgNetFlow 2.6242 0.2426 10.816 0.000415 ***
## TotalApplStd -4.9151 6.9025 -0.712 0.515747
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 186.2 on 4 degrees of freedom
## Multiple R-squared: 0.97, Adjusted R-squared: 0.9549
## F-statistic: 64.59 on 2 and 4 DF, p-value: 0.000902
##
## Welch Two Sample t-test
##
## data: TotalConc_Water_log by Station
## t = -2.1709, df = 11.742, p-value = 0.05119
## alternative hypothesis: true difference in means between group SHR and group STTD is not equal to 0
## 95 percent confidence interval:
## -1.858996839 0.005652187
## sample estimates:
## mean in group SHR mean in group STTD
## 5.545113 6.471786