Purpose

Explore and analyze the continuous chlorophyll data to be included in the NDFS synthesis manuscript. We will attempt to fit multiple models to predict weekly average chlorophyll fluorescence values. All models will only include representative stations for 4 habitat types - upstream (RD22), lower Yolo Bypass (STTD), Cache Slough complex (LIB), and downstream (RVB). At a minimum, the models will contain the two categorical variables - Year and Station - as predictor variables. In some of the models, we will add weekly average flow as a continuous predictor which replaces the categorical predictor - flow action period - in the original analysis. Additionally, we’ll add a GAM smooth for Week number term to account for seasonality in some of the models. After fitting multiple models, we’ll use a model selection process to determine the best one.

Global code and functions

# Load packages
library(tidyverse)
library(scales)
library(knitr)
library(mgcv)
library(car)
library(gratia)
library(ggeffects)
library(emmeans)
library(multcomp)
library(here)
library(conflicted)

# Source functions
source(here("manuscript_synthesis/src/global_functions.R"))

# Declare package conflict preferences
conflicts_prefer(dplyr::filter(), dplyr::lag(), dplyr::select())

Display current versions of R and packages used for this analysis:

devtools::session_info()
## Warning in system2("quarto", "-V", stdout = TRUE, env = paste0("TMPDIR=", :
## running command '"quarto"
## TMPDIR=C:/Users/dboswort/AppData/Local/Temp/Rtmpkr473d/file67c07d185211 -V' had
## status 1
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.5.2 (2025-10-31 ucrt)
##  os       Windows 11 x64 (build 26100)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_United States.utf8
##  ctype    English_United States.utf8
##  tz       America/Los_Angeles
##  date     2026-03-03
##  pandoc   3.6.3 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##  quarto   NA @ C:\\PROGRA~1\\Quarto\\bin\\quarto.exe
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version  date (UTC) lib source
##  abind          1.4-8    2024-09-12 [1] CRAN (R 4.5.0)
##  bslib          0.10.0   2026-01-26 [1] CRAN (R 4.5.2)
##  cachem         1.1.0    2024-05-16 [1] CRAN (R 4.5.0)
##  car          * 3.1-3    2024-09-27 [1] CRAN (R 4.5.0)
##  carData      * 3.0-5    2022-01-06 [1] CRAN (R 4.5.0)
##  cli            3.6.5    2025-04-23 [1] CRAN (R 4.5.0)
##  coda           0.19-4.1 2024-01-31 [1] CRAN (R 4.5.0)
##  codetools      0.2-20   2024-03-31 [2] CRAN (R 4.5.2)
##  conflicted   * 1.2.0    2023-02-01 [1] CRAN (R 4.5.0)
##  devtools       2.4.5    2022-10-11 [1] CRAN (R 4.5.0)
##  digest         0.6.37   2024-08-19 [1] CRAN (R 4.5.0)
##  dplyr        * 1.1.4    2023-11-17 [1] CRAN (R 4.5.0)
##  ellipsis       0.3.2    2021-04-29 [1] CRAN (R 4.5.0)
##  emmeans      * 1.11.1   2025-05-04 [1] CRAN (R 4.5.0)
##  estimability   1.5.1    2024-05-12 [1] CRAN (R 4.5.0)
##  evaluate       1.0.5    2025-08-27 [1] CRAN (R 4.5.1)
##  farver         2.1.2    2024-05-13 [1] CRAN (R 4.5.0)
##  fastmap        1.2.0    2024-05-15 [1] CRAN (R 4.5.0)
##  forcats      * 1.0.0    2023-01-29 [1] CRAN (R 4.5.0)
##  Formula        1.2-5    2023-02-24 [1] CRAN (R 4.5.0)
##  fs           * 1.6.6    2025-04-12 [1] CRAN (R 4.5.0)
##  generics       0.1.4    2025-05-09 [1] CRAN (R 4.5.0)
##  ggeffects    * 2.2.1    2025-03-11 [1] CRAN (R 4.5.0)
##  ggokabeito     0.1.0    2021-10-18 [1] CRAN (R 4.5.0)
##  ggplot2      * 4.0.2    2026-02-03 [1] CRAN (R 4.5.2)
##  glue           1.8.0    2024-09-30 [1] CRAN (R 4.5.0)
##  gratia       * 0.10.0   2024-12-19 [1] CRAN (R 4.5.0)
##  gtable         0.3.6    2024-10-25 [1] CRAN (R 4.5.0)
##  here         * 1.0.1    2020-12-13 [1] CRAN (R 4.5.0)
##  hms            1.1.3    2023-03-21 [1] CRAN (R 4.5.0)
##  htmltools      0.5.8.1  2024-04-04 [1] CRAN (R 4.5.0)
##  htmlwidgets    1.6.4    2023-12-06 [1] CRAN (R 4.5.0)
##  httpuv         1.6.16   2025-04-16 [1] CRAN (R 4.5.0)
##  insight        1.4.4    2025-12-06 [1] CRAN (R 4.5.2)
##  jquerylib      0.1.4    2021-04-26 [1] CRAN (R 4.5.0)
##  jsonlite       2.0.0    2025-03-27 [1] CRAN (R 4.5.0)
##  knitr        * 1.50     2025-03-16 [1] CRAN (R 4.5.0)
##  later          1.4.2    2025-04-08 [1] CRAN (R 4.5.0)
##  lattice        0.22-7   2025-04-02 [1] CRAN (R 4.5.0)
##  lifecycle      1.0.4    2023-11-07 [1] CRAN (R 4.5.0)
##  lubridate    * 1.9.4    2024-12-08 [1] CRAN (R 4.5.0)
##  magrittr       2.0.3    2022-03-30 [1] CRAN (R 4.5.0)
##  MASS         * 7.3-65   2025-02-28 [1] CRAN (R 4.5.0)
##  Matrix         1.7-3    2025-03-11 [1] CRAN (R 4.5.0)
##  memoise        2.0.1    2021-11-26 [1] CRAN (R 4.5.0)
##  mgcv         * 1.9-3    2025-04-04 [1] CRAN (R 4.5.0)
##  mime           0.13     2025-03-17 [1] CRAN (R 4.5.0)
##  miniUI         0.1.2    2025-04-17 [1] CRAN (R 4.5.0)
##  multcomp     * 1.4-28   2025-01-29 [1] CRAN (R 4.5.0)
##  mvnfast        0.2.8    2023-02-23 [1] CRAN (R 4.5.0)
##  mvtnorm      * 1.3-3    2025-01-10 [1] CRAN (R 4.5.0)
##  nlme         * 3.1-168  2025-03-31 [1] CRAN (R 4.5.0)
##  otel           0.2.0    2025-08-29 [1] CRAN (R 4.5.2)
##  patchwork    * 1.3.2    2025-08-25 [1] CRAN (R 4.5.2)
##  pillar         1.11.1   2025-09-17 [1] CRAN (R 4.5.1)
##  pkgbuild       1.4.8    2025-05-26 [1] CRAN (R 4.5.0)
##  pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 4.5.0)
##  pkgload        1.4.0    2024-06-28 [1] CRAN (R 4.5.0)
##  profvis        0.4.0    2024-09-20 [1] CRAN (R 4.5.2)
##  promises       1.5.0    2025-11-01 [1] CRAN (R 4.5.2)
##  purrr        * 1.1.0    2025-07-10 [1] CRAN (R 4.5.1)
##  R6             2.6.1    2025-02-15 [1] CRAN (R 4.5.0)
##  RColorBrewer   1.1-3    2022-04-03 [1] CRAN (R 4.5.0)
##  Rcpp           1.0.14   2025-01-12 [1] CRAN (R 4.5.0)
##  readr        * 2.1.5    2024-01-10 [1] CRAN (R 4.5.0)
##  remotes        2.5.0    2024-03-17 [1] CRAN (R 4.5.0)
##  rlang        * 1.1.6    2025-04-11 [1] CRAN (R 4.5.0)
##  rmarkdown      2.29     2024-11-04 [1] CRAN (R 4.5.0)
##  rprojroot      2.1.1    2025-08-26 [1] CRAN (R 4.5.1)
##  rstudioapi     0.17.1   2024-10-22 [1] CRAN (R 4.5.0)
##  S7             0.2.0    2024-11-07 [1] CRAN (R 4.5.1)
##  sandwich       3.1-1    2024-09-15 [1] CRAN (R 4.5.0)
##  sass           0.4.10   2025-04-11 [1] CRAN (R 4.5.0)
##  scales       * 1.4.0    2025-04-24 [1] CRAN (R 4.5.0)
##  sessioninfo    1.2.3    2025-02-05 [1] CRAN (R 4.5.0)
##  shiny          1.12.1   2025-12-09 [1] CRAN (R 4.5.2)
##  stringi        1.8.7    2025-03-27 [1] CRAN (R 4.5.0)
##  stringr      * 1.5.2    2025-09-08 [1] CRAN (R 4.5.1)
##  survival     * 3.8-3    2024-12-17 [1] CRAN (R 4.5.0)
##  TH.data      * 1.1-3    2025-01-17 [1] CRAN (R 4.5.0)
##  tibble       * 3.3.0    2025-06-08 [1] CRAN (R 4.5.0)
##  tidyr        * 1.3.1    2024-01-24 [1] CRAN (R 4.5.0)
##  tidyselect     1.2.1    2024-03-11 [1] CRAN (R 4.5.0)
##  tidyverse    * 2.0.0    2023-02-22 [1] CRAN (R 4.5.2)
##  timechange     0.3.0    2024-01-18 [1] CRAN (R 4.5.0)
##  tzdb           0.5.0    2025-03-15 [1] CRAN (R 4.5.0)
##  urlchecker     1.0.1    2021-11-30 [1] CRAN (R 4.5.0)
##  usethis        3.2.1    2025-09-06 [1] CRAN (R 4.5.1)
##  vctrs          0.6.5    2023-12-01 [1] CRAN (R 4.5.0)
##  withr          3.0.2    2024-10-28 [1] CRAN (R 4.5.0)
##  xfun           0.53     2025-08-19 [1] CRAN (R 4.5.1)
##  xtable         1.8-4    2019-04-21 [1] CRAN (R 4.5.0)
##  yaml           2.3.10   2024-07-26 [1] CRAN (R 4.5.0)
##  zoo            1.8-14   2025-04-10 [1] CRAN (R 4.5.0)
## 
##  [1] C:/R/win-library/4.5
##  [2] C:/Program Files/R/R-4.5.2/library
##  * ── Packages attached to the search path.
## 
## ──────────────────────────────────────────────────────────────────────────────

Import Data

# Define file path for processed data
fp_data <- here("manuscript_synthesis/data/processed")

# Import weekly average water quality data
df_wq <- readRDS(file.path(fp_data, "wq_week_avg_2013-2019.rds"))

# Import weekly average flow data
df_flow <- readRDS(file.path(fp_data, "flow_week_avg_2013-2019.rds"))

Prepare Data

# Create a vector for the factor order of StationCode
sta_order <- c("RD22", "STTD", "LIB", "RVB")

# We will use LIS flow data for all stations except for RD22
df_flow_c <- bind_rows(
  expand_grid(
    StationCode = c("STTD", "LIB", "RVB"),
    df_flow |> filter(StationCode == "LIS") |> select(-StationCode)
  ),
  df_flow |> filter(StationCode == "RD22")
)


# Prepare chlorophyll and flow data for exploration and analysis
df_chla_c1 <- df_wq %>%
  select(StationCode, Year, Week, Chla) %>%
  drop_na(Chla) %>%
  # Filter to only include representative stations for 4 habitat types - RD22, STTD, LIB, RVB
  filter(StationCode %in% sta_order) %>%
  # Join flow data to chlorophyll data
  left_join(df_flow_c, by = join_by(StationCode, Year, Week)) %>%
  # Remove all NA flow values
  drop_na(Flow) %>%
  mutate(
    # Scale and log transform chlorophyll values
    Chla_log = log(Chla * 1000),
    # Apply factor order to StationCode
    StationCode = factor(StationCode, levels = sta_order),
    # Add a column for Year as a factor for the model
    Year_fct = factor(Year)
  ) %>%
  arrange(StationCode, Year, Week)

Explore sample counts by Station

df_chla_c1 %>%
  summarize(
    min_week = min(Week),
    max_week = max(Week),
    num_samples = n(),
    .by = c(StationCode, Year)
  ) %>%
  arrange(StationCode, Year) %>%
  kable()
StationCode Year min_week max_week num_samples
RD22 2014 39 45 7
RD22 2015 30 45 16
RD22 2016 25 38 14
RD22 2017 28 44 17
RD22 2018 28 45 18
RD22 2019 28 45 18
STTD 2013 33 44 12
STTD 2014 30 45 14
STTD 2015 30 46 17
STTD 2016 25 38 14
STTD 2017 28 39 10
STTD 2018 29 42 14
STTD 2019 30 45 16
LIB 2013 29 46 18
LIB 2014 30 45 16
LIB 2015 27 46 20
LIB 2016 22 38 17
LIB 2017 28 44 17
LIB 2018 33 45 13
LIB 2019 28 45 14
RVB 2013 27 46 20
RVB 2014 30 45 16
RVB 2015 27 46 20
RVB 2016 22 38 16
RVB 2017 28 44 17
RVB 2018 28 45 18
RVB 2019 28 45 18

Looking at the sample counts and date ranges, we’ll only include years 2015-2019 for the analysis.

df_chla_c2 <- df_chla_c1 %>%
  filter(Year %in% 2015:2019) %>%
  mutate(Year_fct = fct_drop(Year_fct))

We’ll create another dataframe that has up to 2 lag variables for chlorophyll to be used in the models to help with serial autocorrelation.

df_chla_c2_lag <- df_chla_c2 %>%
  # Fill in missing weeks for each StationCode-Year combination
  group_by(StationCode, Year) %>%
  # Create lag variables of scaled log transformed chlorophyll values
  mutate(
    lag1 = lag(Chla_log),
    lag2 = lag(Chla_log, 2)
  ) %>%
  ungroup()

Plots

Let’s explore the data with some plots. First, lets plot the data in scatter plots of chlorophyll and flow faceted by Station and grouping all years together.

df_chla_c2 %>%
  ggplot(aes(x = Flow, y = Chla)) +
  geom_point() +
  geom_smooth(formula = "y ~ x") +
  facet_wrap(vars(StationCode), scales = "free") +
  theme_bw()

At first glance, I’m not sure how well flow is going to be able to predict chlorophyll concentrations. At the furthest upstream station - RD22 - chlorophyll appears to be highest at the lowest flows, but the variation is at its maximum at the lowest flows. There may be some dilution effect going on here at the higher flows. At STTD, there does seem to be a modest increase in chlorophyll concentrations at the mid-range flows. There appears to be no effect of flow on chlorophyll at RVB, but the range of chlorophyll concentrations is narrow at this station (between 0 and 5).

Let’s break these scatterplots apart by year to see how these patterns vary annually.

df_chla_c2 %>%
  ggplot(aes(x = Flow, y = Chla)) +
  geom_point() +
  geom_smooth(formula = "y ~ x") +
  facet_wrap(
    vars(StationCode, Year),
    ncol = 5,
    scales = "free",
    labeller = labeller(.multi_line = FALSE)
  ) +
  theme_bw()

The patterns appear to vary annually at each station, which may justify using a 3-way interaction.

Model 1: GAM Model with Flow and 3-way interactions

First, we will attempt to fit a generalized additive model (GAM) to the data set to help account for seasonality in the data. We’ll try running a GAM using a three-way interaction between Year, Weekly Average Flow, and Station, and a cyclic penalized cubic regression spline smooth term for week number to account for seasonality (restricting the k-value to 5 to reduce overfitting). Initially, we’ll run the GAM without accounting for serial autocorrelation.

Initial Model

m_gam_flow3 <- gam(
  Chla_log ~ Year_fct * Flow * StationCode + s(Week,  bs = "cc", k = 5),
  data = df_chla_c2,
  method = "REML",
  knots = list(week = c(0, 52))
)

Lets look at the model summary and diagnostics:

summary(m_gam_flow3)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ Year_fct * Flow * StationCode + s(Week, bs = "cc", 
##     k = 5)
## 
## Parametric coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        9.773e+00  1.028e-01  95.030  < 2e-16 ***
## Year_fct2016                      -8.399e-01  1.491e-01  -5.634 4.27e-08 ***
## Year_fct2017                      -9.270e-01  2.299e-01  -4.033 7.10e-05 ***
## Year_fct2018                      -4.121e-01  1.375e-01  -2.997 0.002971 ** 
## Year_fct2019                      -1.387e-01  1.351e-01  -1.027 0.305445    
## Flow                              -1.894e-03  4.583e-04  -4.134 4.71e-05 ***
## StationCodeSTTD                   -1.141e+00  1.296e-01  -8.803  < 2e-16 ***
## StationCodeLIB                    -2.093e+00  1.252e-01 -16.721  < 2e-16 ***
## StationCodeRVB                    -2.139e+00  1.252e-01 -17.086  < 2e-16 ***
## Year_fct2016:Flow                  2.168e-03  7.331e-04   2.957 0.003372 ** 
## Year_fct2017:Flow                  2.903e-02  1.181e-02   2.458 0.014564 *  
## Year_fct2018:Flow                  5.729e-04  6.313e-04   0.908 0.364902    
## Year_fct2019:Flow                 -6.888e-05  5.554e-04  -0.124 0.901398    
## Year_fct2016:StationCodeSTTD       1.211e+00  1.906e-01   6.353 8.48e-10 ***
## Year_fct2017:StationCodeSTTD       1.141e+00  3.131e-01   3.644 0.000319 ***
## Year_fct2018:StationCodeSTTD      -2.362e-01  1.818e-01  -1.299 0.194995    
## Year_fct2019:StationCodeSTTD      -9.179e-01  1.788e-01  -5.133 5.34e-07 ***
## Year_fct2016:StationCodeLIB        1.725e+00  1.836e-01   9.396  < 2e-16 ***
## Year_fct2017:StationCodeLIB        1.032e+00  2.614e-01   3.949 9.94e-05 ***
## Year_fct2018:StationCodeLIB       -1.684e+00  1.831e-01  -9.197  < 2e-16 ***
## Year_fct2019:StationCodeLIB       -5.488e-01  1.782e-01  -3.079 0.002279 ** 
## Year_fct2016:StationCodeRVB        9.322e-01  1.851e-01   5.037 8.48e-07 ***
## Year_fct2017:StationCodeRVB        7.290e-01  2.614e-01   2.788 0.005658 ** 
## Year_fct2018:StationCodeRVB        1.103e-01  1.730e-01   0.638 0.524245    
## Year_fct2019:StationCodeRVB       -6.620e-01  1.728e-01  -3.831 0.000157 ***
## Flow:StationCodeSTTD               5.195e-03  6.951e-04   7.475 9.86e-13 ***
## Flow:StationCodeLIB                3.302e-03  6.787e-04   4.866 1.90e-06 ***
## Flow:StationCodeRVB                2.240e-03  6.787e-04   3.301 0.001087 ** 
## Year_fct2016:Flow:StationCodeSTTD -4.430e-03  1.062e-03  -4.173 4.01e-05 ***
## Year_fct2017:Flow:StationCodeSTTD -2.130e-02  1.319e-02  -1.615 0.107370    
## Year_fct2018:Flow:StationCodeSTTD -9.922e-04  9.345e-04  -1.062 0.289266    
## Year_fct2019:Flow:StationCodeSTTD -1.603e-03  8.273e-04  -1.938 0.053681 .  
## Year_fct2016:Flow:StationCodeLIB  -3.443e-03  1.046e-03  -3.293 0.001116 ** 
## Year_fct2017:Flow:StationCodeLIB  -2.917e-02  1.198e-02  -2.435 0.015531 *  
## Year_fct2018:Flow:StationCodeLIB   1.080e-03  9.380e-04   1.151 0.250505    
## Year_fct2019:Flow:StationCodeLIB  -1.223e-03  8.241e-04  -1.485 0.138753    
## Year_fct2016:Flow:StationCodeRVB  -3.292e-03  1.048e-03  -3.142 0.001855 ** 
## Year_fct2017:Flow:StationCodeRVB  -2.807e-02  1.198e-02  -2.343 0.019847 *  
## Year_fct2018:Flow:StationCodeRVB  -1.138e-03  9.159e-04  -1.242 0.215225    
## Year_fct2019:Flow:StationCodeRVB   3.417e-04  8.116e-04   0.421 0.674054    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value    
## s(Week) 2.729      3 24.69  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.908   Deviance explained =   92%
## -REML =  233.8  Scale est. = 0.10226   n = 324
appraise(m_gam_flow3)

shapiro.test(residuals(m_gam_flow3))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_flow3)
## W = 0.98296, p-value = 0.000692
k.check(m_gam_flow3)
##         k'      edf   k-index p-value
## s(Week)  3 2.729041 0.9270647  0.0625
draw(m_gam_flow3, select = 1, residuals = TRUE, rug = FALSE)

plot(m_gam_flow3, pages = 1, all.terms = TRUE)

acf(residuals(m_gam_flow3))

Box.test(residuals(m_gam_flow3), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_flow3)
## X-squared = 74.642, df = 20, p-value = 3.125e-08

Besides the Shapiro-Wilk normality test showing that the residuals aren’t normal, the diagnostic plots look pretty good. However, the residuals are autocorrelated.

Model with lag terms

Now, we’ll try to deal with the residual autocorrelation. We’ll run a series of models adding 1 and 2 lag terms and compare how well they correct for autocorrelation.

Lag 1

m_gam_flow3_lag1 <- gam(
  Chla_log ~ Year_fct * Flow * StationCode + s(Week, bs = "cc", k = 5) + lag1,
  data = df_chla_c2_lag,
  method = "REML",
  knots = list(week = c(0, 52))
)

acf(residuals(m_gam_flow3_lag1))

Box.test(residuals(m_gam_flow3_lag1), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_flow3_lag1)
## X-squared = 32.024, df = 20, p-value = 0.04305

Lag 2

m_gam_flow3_lag2 <- gam(
  Chla_log ~ Year_fct * Flow * StationCode + s(Week, bs = "cc", k = 5) + lag1 + lag2,
  data = df_chla_c2_lag,
  method = "REML",
  knots = list(week = c(0, 52))
)

acf(residuals(m_gam_flow3_lag2))

Box.test(residuals(m_gam_flow3_lag2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_flow3_lag2)
## X-squared = 21.5, df = 20, p-value = 0.3682

The model with 1 lag term already seems to address the serial autocorrelation. Let’s use AIC to see how they compare.

Compare Models

AIC(m_gam_flow3, m_gam_flow3_lag1, m_gam_flow3_lag2)
##                        df      AIC
## m_gam_flow3      43.93936 222.7229
## m_gam_flow3_lag1 44.80311 143.9646
## m_gam_flow3_lag2 45.70977 136.0963

It looks like the lag2 model has the best fit according to the AIC values. Let’s take a closer look at that one.

Lag 2 model summary

summary(m_gam_flow3_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ Year_fct * Flow * StationCode + s(Week, bs = "cc", 
##     k = 5) + lag1 + lag2
## 
## Parametric coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        6.457e+00  5.381e-01  11.999  < 2e-16 ***
## Year_fct2016                      -6.040e-01  1.522e-01  -3.968 9.60e-05 ***
## Year_fct2017                      -7.041e-01  2.304e-01  -3.056 0.002493 ** 
## Year_fct2018                      -3.195e-01  1.358e-01  -2.353 0.019429 *  
## Year_fct2019                      -7.442e-02  1.324e-01  -0.562 0.574628    
## Flow                              -1.378e-03  4.361e-04  -3.159 0.001788 ** 
## StationCodeSTTD                   -7.493e-01  1.379e-01  -5.435 1.35e-07 ***
## StationCodeLIB                    -1.427e+00  1.606e-01  -8.885  < 2e-16 ***
## StationCodeRVB                    -1.448e+00  1.640e-01  -8.831 2.24e-16 ***
## lag1                               4.422e-01  6.280e-02   7.042 2.00e-11 ***
## lag2                              -1.029e-01  5.679e-02  -1.811 0.071338 .  
## Year_fct2016:Flow                  1.388e-03  6.872e-04   2.020 0.044463 *  
## Year_fct2017:Flow                  2.475e-02  1.293e-02   1.914 0.056806 .  
## Year_fct2018:Flow                  6.146e-04  5.818e-04   1.056 0.291879    
## Year_fct2019:Flow                  9.183e-05  5.150e-04   0.178 0.858623    
## Year_fct2016:StationCodeSTTD       8.625e-01  1.949e-01   4.425 1.47e-05 ***
## Year_fct2017:StationCodeSTTD       8.917e-01  3.137e-01   2.842 0.004863 ** 
## Year_fct2018:StationCodeSTTD      -6.823e-02  1.796e-01  -0.380 0.704387    
## Year_fct2019:StationCodeSTTD      -6.332e-01  1.825e-01  -3.469 0.000619 ***
## Year_fct2016:StationCodeLIB        1.197e+00  2.007e-01   5.962 8.85e-09 ***
## Year_fct2017:StationCodeLIB        7.846e-01  2.598e-01   3.020 0.002802 ** 
## Year_fct2018:StationCodeLIB       -1.223e+00  2.077e-01  -5.888 1.31e-08 ***
## Year_fct2019:StationCodeLIB       -3.807e-01  1.783e-01  -2.135 0.033762 *  
## Year_fct2016:StationCodeRVB        6.358e-01  1.861e-01   3.417 0.000744 ***
## Year_fct2017:StationCodeRVB        5.530e-01  2.569e-01   2.153 0.032349 *  
## Year_fct2018:StationCodeRVB        1.523e-01  1.681e-01   0.906 0.366035    
## Year_fct2019:StationCodeRVB       -4.260e-01  1.730e-01  -2.463 0.014498 *  
## Flow:StationCodeSTTD               3.504e-03  6.822e-04   5.137 5.79e-07 ***
## Flow:StationCodeLIB                2.591e-03  6.383e-04   4.059 6.67e-05 ***
## Flow:StationCodeRVB                1.672e-03  6.328e-04   2.643 0.008770 ** 
## Year_fct2016:Flow:StationCodeSTTD -2.986e-03  9.944e-04  -3.003 0.002953 ** 
## Year_fct2017:Flow:StationCodeSTTD -1.857e-02  1.409e-02  -1.318 0.188898    
## Year_fct2018:Flow:StationCodeSTTD -9.356e-04  8.701e-04  -1.075 0.283361    
## Year_fct2019:Flow:StationCodeSTTD -1.078e-03  7.689e-04  -1.402 0.162296    
## Year_fct2016:Flow:StationCodeLIB  -2.435e-03  9.667e-04  -2.518 0.012442 *  
## Year_fct2017:Flow:StationCodeLIB  -2.506e-02  1.303e-02  -1.923 0.055647 .  
## Year_fct2018:Flow:StationCodeLIB   9.051e-04  8.777e-04   1.031 0.303478    
## Year_fct2019:Flow:StationCodeLIB  -1.210e-03  7.606e-04  -1.590 0.113119    
## Year_fct2016:Flow:StationCodeRVB  -2.092e-03  9.709e-04  -2.155 0.032182 *  
## Year_fct2017:Flow:StationCodeRVB  -2.403e-02  1.303e-02  -1.844 0.066479 .  
## Year_fct2018:Flow:StationCodeRVB  -1.105e-03  8.418e-04  -1.313 0.190473    
## Year_fct2019:Flow:StationCodeRVB  -5.663e-05  7.480e-04  -0.076 0.939710    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value    
## s(Week) 2.492      3 11.75  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.928   Deviance explained = 93.9%
## -REML = 194.53  Scale est. = 0.081254  n = 284
appraise(m_gam_flow3_lag2)

shapiro.test(residuals(m_gam_flow3_lag2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_flow3_lag2)
## W = 0.97077, p-value = 1.533e-05
k.check(m_gam_flow3_lag2)
##         k'      edf   k-index p-value
## s(Week)  3 2.492179 0.9184442   0.065
draw(m_gam_flow3_lag2, select = 1, residuals = TRUE, rug = FALSE)

plot(m_gam_flow3_lag2, pages = 1, all.terms = TRUE)

anova(m_gam_flow3_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ Year_fct * Flow * StationCode + s(Week, bs = "cc", 
##     k = 5) + lag1 + lag2
## 
## Parametric Terms:
##                           df      F  p-value
## Year_fct                   4  6.153 9.94e-05
## Flow                       1  9.978  0.00179
## StationCode                3 30.183  < 2e-16
## lag1                       1 49.587 2.00e-11
## lag2                       1  3.281  0.07134
## Year_fct:Flow              4  2.202  0.06948
## Year_fct:StationCode      12  8.790 7.17e-14
## Flow:StationCode           3 10.049 2.92e-06
## Year_fct:Flow:StationCode 12  2.584  0.00304
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value
## s(Week) 2.492  3.000 11.75  <2e-16

The model diagnostics look pretty good. Note that the 2-way interaction between Year and Flow isn’t significant. We’ll use m_gam_flow3_lag2 in the model selection process.

rm(m_gam_flow3, m_gam_flow3_lag1)

Model 2: GAM Model with Flow and 2-way interactions

Now we’ll try running a GAM using all two-way interactions between Year, Flow, and Station.

Initial Model

m_gam_flow2 <- gam(
  Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc", k = 5),
  data = df_chla_c2,
  method = "REML",
  knots = list(week = c(0, 52))
)

Lets look at the model summary and diagnostics:

summary(m_gam_flow2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc", 
##     k = 5)
## 
## Parametric coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   9.6785347  0.0976557  99.109  < 2e-16 ***
## Year_fct2016                 -0.5160117  0.1363411  -3.785 0.000187 ***
## Year_fct2017                 -0.3917242  0.1302573  -3.007 0.002864 ** 
## Year_fct2018                 -0.3677308  0.1268303  -2.899 0.004021 ** 
## Year_fct2019                 -0.0645713  0.1259576  -0.513 0.608587    
## Flow                         -0.0011879  0.0003373  -3.522 0.000497 ***
## StationCodeSTTD              -1.0148358  0.1264555  -8.025 2.46e-14 ***
## StationCodeLIB               -1.9983265  0.1229774 -16.250  < 2e-16 ***
## StationCodeRVB               -2.0473550  0.1229570 -16.651  < 2e-16 ***
## Year_fct2016:Flow            -0.0006402  0.0004252  -1.505 0.133281    
## Year_fct2017:Flow             0.0012378  0.0017088   0.724 0.469434    
## Year_fct2018:Flow             0.0003011  0.0003586   0.839 0.401889    
## Year_fct2019:Flow            -0.0006212  0.0003201  -1.941 0.053260 .  
## Year_fct2016:StationCodeSTTD  0.8092426  0.1800566   4.494 1.01e-05 ***
## Year_fct2017:StationCodeSTTD  0.3774412  0.1999526   1.888 0.060060 .  
## Year_fct2018:StationCodeSTTD -0.3032110  0.1740797  -1.742 0.082593 .  
## Year_fct2019:StationCodeSTTD -1.0386379  0.1709584  -6.075 3.83e-09 ***
## Year_fct2016:StationCodeLIB   1.3710722  0.1745109   7.857 7.53e-14 ***
## Year_fct2017:StationCodeLIB   0.5334927  0.1846843   2.889 0.004157 ** 
## Year_fct2018:StationCodeLIB  -1.6107835  0.1741979  -9.247  < 2e-16 ***
## Year_fct2019:StationCodeLIB  -0.6746411  0.1723851  -3.914 0.000113 ***
## Year_fct2016:StationCodeRVB   0.5745856  0.1757651   3.269 0.001208 ** 
## Year_fct2017:StationCodeRVB   0.2096231  0.1846195   1.135 0.257121    
## Year_fct2018:StationCodeRVB   0.0386979  0.1668655   0.232 0.816769    
## Year_fct2019:StationCodeRVB  -0.6665100  0.1664704  -4.004 7.90e-05 ***
## Flow:StationCodeSTTD          0.0036351  0.0003199  11.364  < 2e-16 ***
## Flow:StationCodeLIB           0.0025172  0.0003265   7.709 1.98e-13 ***
## Flow:StationCodeRVB           0.0016994  0.0003164   5.371 1.60e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value    
## s(Week) 2.713      3 27.04  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.894   Deviance explained = 90.4%
## -REML = 190.64  Scale est. = 0.11725   n = 324
appraise(m_gam_flow2)

shapiro.test(residuals(m_gam_flow2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_flow2)
## W = 0.9887, p-value = 0.01281
k.check(m_gam_flow2)
##         k'      edf   k-index p-value
## s(Week)  3 2.713176 0.9526447   0.165
draw(m_gam_flow2, select = 1, residuals = TRUE, rug = FALSE)

plot(m_gam_flow2, pages = 1, all.terms = TRUE)

acf(residuals(m_gam_flow2))

Box.test(residuals(m_gam_flow2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_flow2)
## X-squared = 129.09, df = 20, p-value < 2.2e-16

Besides the Shapiro-Wilk normality test showing that the residuals aren’t normal, the diagnostic plots look really good. However, the residuals are autocorrelated.

Model with lag terms

Now, we’ll try to deal with the residual autocorrelation. We’ll run a series of models adding 1 and 2 lag terms and compare how well they correct for autocorrelation.

Lag 1

m_gam_flow2_lag1 <- gam(
  Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc", k = 5) + lag1,
  data = df_chla_c2_lag,
  method = "REML",
  knots = list(week = c(0, 52))
)

acf(residuals(m_gam_flow2_lag1))

Box.test(residuals(m_gam_flow2_lag1), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_flow2_lag1)
## X-squared = 29.001, df = 20, p-value = 0.08773

Lag 2

m_gam_flow2_lag2 <- gam(
  Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc", k = 5) + lag1 + lag2,
  data = df_chla_c2_lag,
  method = "REML",
  knots = list(week = c(0, 52))
)

acf(residuals(m_gam_flow2_lag2))

Box.test(residuals(m_gam_flow2_lag2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_flow2_lag2)
## X-squared = 21.766, df = 20, p-value = 0.3533

The model with 1 lag term already seems to address the serial autocorrelation, but the lag2 model is even better. Let’s use AIC to see how they compare.

Compare Models

AIC(m_gam_flow2, m_gam_flow2_lag1, m_gam_flow2_lag2)
##                        df      AIC
## m_gam_flow2      31.89629 256.5089
## m_gam_flow2_lag1 32.76250 150.1410
## m_gam_flow2_lag2 33.67624 146.9387

Again, it looks like the lag2 model has the best fit according to the AIC values. Let’s take a closer look at that one.

Lag 2 model summary

summary(m_gam_flow2_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc", 
##     k = 5) + lag1 + lag2
## 
## Parametric coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   5.3662016  0.5043957  10.639  < 2e-16 ***
## Year_fct2016                 -0.3297933  0.1311038  -2.516 0.012510 *  
## Year_fct2017                 -0.2414980  0.1222049  -1.976 0.049228 *  
## Year_fct2018                 -0.2301478  0.1196736  -1.923 0.055593 .  
## Year_fct2019                  0.0109322  0.1177270   0.093 0.926088    
## Flow                         -0.0007624  0.0003112  -2.450 0.014958 *  
## StationCodeSTTD              -0.5388663  0.1257449  -4.285 2.60e-05 ***
## StationCodeLIB               -1.1379329  0.1497928  -7.597 5.98e-13 ***
## StationCodeRVB               -1.1542962  0.1527487  -7.557 7.68e-13 ***
## lag1                          0.5463245  0.0617008   8.854  < 2e-16 ***
## lag2                         -0.1017626  0.0571666  -1.780 0.076266 .  
## Year_fct2016:Flow            -0.0003588  0.0004033  -0.890 0.374502    
## Year_fct2017:Flow             0.0006436  0.0015057   0.427 0.669422    
## Year_fct2018:Flow             0.0003122  0.0003217   0.970 0.332888    
## Year_fct2019:Flow            -0.0003463  0.0002888  -1.199 0.231698    
## Year_fct2016:StationCodeSTTD  0.4844492  0.1705394   2.841 0.004870 ** 
## Year_fct2017:StationCodeSTTD  0.2241941  0.1866201   1.201 0.230749    
## Year_fct2018:StationCodeSTTD -0.1270279  0.1612103  -0.788 0.431460    
## Year_fct2019:StationCodeSTTD -0.6331731  0.1651007  -3.835 0.000159 ***
## Year_fct2016:StationCodeLIB   0.7928280  0.1755833   4.515 9.72e-06 ***
## Year_fct2017:StationCodeLIB   0.3290977  0.1698532   1.938 0.053798 .  
## Year_fct2018:StationCodeLIB  -0.9598305  0.1821609  -5.269 2.95e-07 ***
## Year_fct2019:StationCodeLIB  -0.4498340  0.1628288  -2.763 0.006157 ** 
## Year_fct2016:StationCodeRVB   0.3314845  0.1650767   2.008 0.045706 *  
## Year_fct2017:StationCodeRVB   0.1085885  0.1685921   0.644 0.520104    
## Year_fct2018:StationCodeRVB   0.0609466  0.1532752   0.398 0.691241    
## Year_fct2019:StationCodeRVB  -0.3872854  0.1563890  -2.476 0.013929 *  
## Flow:StationCodeSTTD          0.0020916  0.0003174   6.589 2.57e-10 ***
## Flow:StationCodeLIB           0.0016824  0.0003057   5.503 9.17e-08 ***
## Flow:StationCodeRVB           0.0009565  0.0002892   3.308 0.001079 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F  p-value    
## s(Week) 2.433      3 10.13 9.24e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.922   Deviance explained = 93.1%
## -REML = 139.63  Scale est. = 0.087477  n = 284
appraise(m_gam_flow2_lag2)

shapiro.test(residuals(m_gam_flow2_lag2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_flow2_lag2)
## W = 0.96771, p-value = 5.379e-06
k.check(m_gam_flow2_lag2)
##         k'      edf  k-index p-value
## s(Week)  3 2.433331 0.953256    0.25
draw(m_gam_flow2_lag2, select = 1, residuals = TRUE, rug = FALSE)

plot(m_gam_flow2_lag2, pages = 1, all.terms = TRUE)

anova(m_gam_flow2_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc", 
##     k = 5) + lag1 + lag2
## 
## Parametric Terms:
##                      df      F  p-value
## Year_fct              4  3.083   0.0167
## Flow                  1  6.004   0.0150
## StationCode           3 22.412 6.81e-13
## lag1                  1 78.401  < 2e-16
## lag2                  1  3.169   0.0763
## Year_fct:Flow         4  1.761   0.1372
## Year_fct:StationCode 12  6.748 1.63e-10
## Flow:StationCode      3 16.615 7.02e-10
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F  p-value
## s(Week) 2.433  3.000 10.13 9.24e-07

The model diagnostics look pretty good. Note that the 2-way interaction between Year and Flow isn’t significant. We’ll use m_gam_flow2_lag2 in the model selection process.

rm(m_gam_flow2, m_gam_flow2_lag1)

Model 3: GAM Model with 2-way interaction between Station and Year but without Flow

Next we’ll try running a GAM using a two-way interaction between Year and Station but not including flow as a predictor.

Initial Model

m_gam_cat2 <- gam(
  Chla_log ~ Year_fct * StationCode + s(Week, bs = "cc", k = 5),
  data = df_chla_c2,
  method = "REML",
  knots = list(week = c(0, 52))
)

Lets look at the model summary and diagnostics:

summary(m_gam_cat2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ Year_fct * StationCode + s(Week, bs = "cc", k = 5)
## 
## Parametric coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   9.50143    0.10544  90.116  < 2e-16 ***
## Year_fct2016                 -0.53966    0.15556  -3.469 0.000598 ***
## Year_fct2017                 -0.22139    0.14665  -1.510 0.132184    
## Year_fct2018                 -0.29804    0.14461  -2.061 0.040157 *  
## Year_fct2019                 -0.14458    0.14461  -1.000 0.318217    
## StationCodeSTTD              -0.76504    0.14652  -5.222 3.31e-07 ***
## StationCodeLIB               -1.79713    0.14133 -12.716  < 2e-16 ***
## StationCodeRVB               -1.85708    0.14133 -13.140  < 2e-16 ***
## Year_fct2016:StationCodeSTTD  0.85168    0.21617   3.940 0.000101 ***
## Year_fct2017:StationCodeSTTD  0.03124    0.22324   0.140 0.888793    
## Year_fct2018:StationCodeSTTD -0.32807    0.20996  -1.563 0.119213    
## Year_fct2019:StationCodeSTTD -0.84839    0.20587  -4.121 4.88e-05 ***
## Year_fct2016:StationCodeLIB   1.41976    0.20751   6.842 4.36e-11 ***
## Year_fct2017:StationCodeLIB   0.26021    0.20194   1.289 0.198535    
## Year_fct2018:StationCodeLIB  -1.59497    0.20924  -7.623 3.26e-13 ***
## Year_fct2019:StationCodeLIB  -0.55049    0.20608  -2.671 0.007968 ** 
## Year_fct2016:StationCodeRVB   0.60877    0.20905   2.912 0.003859 ** 
## Year_fct2017:StationCodeRVB  -0.03003    0.20194  -0.149 0.881876    
## Year_fct2018:StationCodeRVB  -0.02211    0.19905  -0.111 0.911619    
## Year_fct2019:StationCodeRVB  -0.61209    0.19905  -3.075 0.002298 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value    
## s(Week) 2.515      3 14.88  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.84   Deviance explained = 85.1%
## -REML = 199.37  Scale est. = 0.17683   n = 324
appraise(m_gam_cat2)

shapiro.test(residuals(m_gam_cat2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_cat2)
## W = 0.98654, p-value = 0.004115
k.check(m_gam_cat2)
##         k'      edf   k-index p-value
## s(Week)  3 2.515323 0.9266828  0.0925
draw(m_gam_cat2, select = 1, residuals = TRUE, rug = FALSE)

plot(m_gam_cat2, pages = 1, all.terms = TRUE)

acf(residuals(m_gam_cat2))

Box.test(residuals(m_gam_cat2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_cat2)
## X-squared = 304.27, df = 20, p-value < 2.2e-16

Besides the Shapiro-Wilk normality test showing that the residuals aren’t normal, the diagnostic plots look really good. However, the residuals are autocorrelated.

Model with lag terms

Now, we’ll try to deal with the residual autocorrelation. We’ll run a series of models adding 1 and 2 lag terms and compare how well they correct for autocorrelation.

Lag 1

m_gam_cat2_lag1 <- gam(
  Chla_log ~ Year_fct * StationCode + s(Week, bs = "cc", k = 5) + lag1,
  data = df_chla_c2_lag,
  method = "REML",
  knots = list(week = c(0, 52))
)

acf(residuals(m_gam_cat2_lag1))

Box.test(residuals(m_gam_cat2_lag1), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_cat2_lag1)
## X-squared = 27.997, df = 20, p-value = 0.1095

Lag 2

m_gam_cat2_lag2 <- gam(
  Chla_log ~ Year_fct * StationCode + s(Week, bs = "cc", k = 5) + lag1 + lag2,
  data = df_chla_c2_lag,
  method = "REML",
  knots = list(week = c(0, 52))
)

acf(residuals(m_gam_cat2_lag2))

Box.test(residuals(m_gam_cat2_lag2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_cat2_lag2)
## X-squared = 14.052, df = 20, p-value = 0.8279

The model with 1 lag term already seems to address the serial autocorrelation, but the lag2 model is even better. Let’s use AIC to see how they compare.

Compare Models

AIC(m_gam_cat2, m_gam_cat2_lag1, m_gam_cat2_lag2)
##                       df      AIC
## m_gam_cat2      23.80557 382.3995
## m_gam_cat2_lag1 24.57483 200.3556
## m_gam_cat2_lag2 25.58907 189.8429

Again, it looks like the lag2 model has the best fit according to the AIC values. Let’s take a closer look at that one.

Lag 2 model summary

summary(m_gam_cat2_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ Year_fct * StationCode + s(Week, bs = "cc", k = 5) + 
##     lag1 + lag2
## 
## Parametric coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   3.810369   0.496266   7.678 3.28e-13 ***
## Year_fct2016                 -0.259782   0.130906  -1.984 0.048253 *  
## Year_fct2017                 -0.093160   0.120489  -0.773 0.440116    
## Year_fct2018                 -0.114755   0.119466  -0.961 0.337660    
## Year_fct2019                 -0.015790   0.118796  -0.133 0.894360    
## StationCodeSTTD              -0.243671   0.125324  -1.944 0.052936 .  
## StationCodeLIB               -0.714169   0.146451  -4.877 1.88e-06 ***
## StationCodeRVB               -0.736391   0.149591  -4.923 1.52e-06 ***
## lag1                          0.753790   0.060094  12.543  < 2e-16 ***
## lag2                         -0.158328   0.060905  -2.600 0.009867 ** 
## Year_fct2016:StationCodeSTTD  0.371220   0.182013   2.040 0.042409 *  
## Year_fct2017:StationCodeSTTD -0.030086   0.186161  -0.162 0.871738    
## Year_fct2018:StationCodeSTTD -0.111226   0.173052  -0.643 0.520964    
## Year_fct2019:StationCodeSTTD -0.401320   0.173509  -2.313 0.021506 *  
## Year_fct2016:StationCodeLIB   0.595813   0.184824   3.224 0.001427 ** 
## Year_fct2017:StationCodeLIB   0.099092   0.165276   0.600 0.549327    
## Year_fct2018:StationCodeLIB  -0.690350   0.189863  -3.636 0.000334 ***
## Year_fct2019:StationCodeLIB  -0.284119   0.171038  -1.661 0.097890 .  
## Year_fct2016:StationCodeRVB   0.252084   0.174056   1.448 0.148740    
## Year_fct2017:StationCodeRVB  -0.038284   0.164998  -0.232 0.816701    
## Year_fct2018:StationCodeRVB   0.001762   0.162307   0.011 0.991349    
## Year_fct2019:StationCodeRVB  -0.279779   0.164629  -1.699 0.090432 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F  p-value    
## s(Week) 2.164      3 4.538 0.000802 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.907   Deviance explained = 91.5%
## -REML = 108.14  Scale est. = 0.10428   n = 284
appraise(m_gam_cat2_lag2)

shapiro.test(residuals(m_gam_cat2_lag2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_cat2_lag2)
## W = 0.95429, p-value = 9.186e-08
k.check(m_gam_cat2_lag2)
##         k'      edf   k-index p-value
## s(Week)  3 2.163534 0.9370215   0.155
draw(m_gam_cat2_lag2, select = 1, residuals = TRUE, rug = FALSE)

plot(m_gam_cat2_lag2, pages = 1, all.terms = TRUE)

anova(m_gam_cat2_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ Year_fct * StationCode + s(Week, bs = "cc", k = 5) + 
##     lag1 + lag2
## 
## Parametric Terms:
##                      df       F  p-value
## Year_fct              4   1.260 0.286020
## StationCode           3  10.211 2.22e-06
## lag1                  1 157.339  < 2e-16
## lag2                  1   6.758 0.009867
## Year_fct:StationCode 12   3.168 0.000307
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F  p-value
## s(Week) 2.164  3.000 4.538 0.000802

The model diagnostics look pretty good but not quite as good as with the initial model. We’ll use m_gam_cat2_lag2 in the model selection process.

rm(m_gam_cat2, m_gam_cat2_lag1)

Model 4: Linear Model with Flow and 3-way interactions

Let’s try the weekly average model as a linear model with a three-way interaction between Year, Weekly Average Flow, and Station but without the smooth term for week number. Initially, we’ll run the model without accounting for serial autocorrelation.

Initial Model

m_lm_flow3 <- lm(Chla_log ~ Year_fct * Flow * StationCode, data = df_chla_c2)

Lets look at the model summary and diagnostics:

summary(m_lm_flow3)
## 
## Call:
## lm(formula = Chla_log ~ Year_fct * Flow * StationCode, data = df_chla_c2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.97374 -0.19702 -0.02017  0.16626  1.33105 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        9.7615689  0.1149026  84.955  < 2e-16 ***
## Year_fct2016                      -0.7803006  0.1662325  -4.694 4.17e-06 ***
## Year_fct2017                      -1.2750955  0.2533518  -5.033 8.59e-07 ***
## Year_fct2018                      -0.3716521  0.1542539  -2.409 0.016617 *  
## Year_fct2019                      -0.1064280  0.1515132  -0.702 0.482984    
## Flow                              -0.0021678  0.0005067  -4.278 2.58e-05 ***
## StationCodeSTTD                   -1.1482782  0.1453232  -7.902 6.10e-14 ***
## StationCodeLIB                    -2.0659490  0.1403787 -14.717  < 2e-16 ***
## StationCodeRVB                    -2.1117161  0.1403787 -15.043  < 2e-16 ***
## Year_fct2016:Flow                  0.0028066  0.0008103   3.464 0.000615 ***
## Year_fct2017:Flow                  0.0498038  0.0128457   3.877 0.000131 ***
## Year_fct2018:Flow                  0.0004445  0.0007085   0.627 0.530893    
## Year_fct2019:Flow                 -0.0000748  0.0006226  -0.120 0.904452    
## Year_fct2016:StationCodeSTTD       1.2499596  0.2136261   5.851 1.34e-08 ***
## Year_fct2017:StationCodeSTTD       1.7155377  0.3407645   5.034 8.53e-07 ***
## Year_fct2018:StationCodeSTTD      -0.2869259  0.2037980  -1.408 0.160256    
## Year_fct2019:StationCodeSTTD      -0.9572118  0.2007104  -4.769 2.96e-06 ***
## Year_fct2016:StationCodeLIB        1.7614390  0.2054580   8.573 6.59e-16 ***
## Year_fct2017:StationCodeLIB        1.3626781  0.2886090   4.722 3.68e-06 ***
## Year_fct2018:StationCodeLIB       -1.8236205  0.2044106  -8.921  < 2e-16 ***
## Year_fct2019:StationCodeLIB       -0.6043253  0.1995777  -3.028 0.002688 ** 
## Year_fct2016:StationCodeRVB        0.9752324  0.2069080   4.713 3.82e-06 ***
## Year_fct2017:StationCodeRVB        1.0593695  0.2886090   3.671 0.000289 ***
## Year_fct2018:StationCodeRVB        0.0493574  0.1940940   0.254 0.799451    
## Year_fct2019:StationCodeRVB       -0.7029752  0.1938728  -3.626 0.000341 ***
## Flow:StationCodeSTTD               0.0049297  0.0007794   6.325 9.84e-10 ***
## Flow:StationCodeLIB                0.0028763  0.0007599   3.785 0.000187 ***
## Flow:StationCodeRVB                0.0018144  0.0007599   2.388 0.017609 *  
## Year_fct2016:Flow:StationCodeSTTD -0.0043111  0.0011917  -3.618 0.000352 ***
## Year_fct2017:Flow:StationCodeSTTD -0.0352894  0.0145777  -2.421 0.016114 *  
## Year_fct2018:Flow:StationCodeSTTD -0.0007185  0.0010472  -0.686 0.493211    
## Year_fct2019:Flow:StationCodeSTTD -0.0012877  0.0009279  -1.388 0.166310    
## Year_fct2016:Flow:StationCodeLIB  -0.0032468  0.0011734  -2.767 0.006027 ** 
## Year_fct2017:Flow:StationCodeLIB  -0.0483336  0.0131090  -3.687 0.000272 ***
## Year_fct2018:Flow:StationCodeLIB   0.0017105  0.0010495   1.630 0.104231    
## Year_fct2019:Flow:StationCodeLIB  -0.0008137  0.0009237  -0.881 0.379107    
## Year_fct2016:Flow:StationCodeRVB  -0.0031157  0.0011756  -2.650 0.008493 ** 
## Year_fct2017:Flow:StationCodeRVB  -0.0472314  0.0131090  -3.603 0.000371 ***
## Year_fct2018:Flow:StationCodeRVB  -0.0007736  0.0010268  -0.753 0.451830    
## Year_fct2019:Flow:StationCodeRVB   0.0007609  0.0009095   0.837 0.403500    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3591 on 284 degrees of freedom
## Multiple R-squared:  0.8977, Adjusted R-squared:  0.8837 
## F-statistic: 63.92 on 39 and 284 DF,  p-value: < 2.2e-16
df_chla_c2 %>% plot_lm_diag(Chla_log, m_lm_flow3)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

shapiro.test(residuals(m_lm_flow3))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_lm_flow3)
## W = 0.98191, p-value = 0.0004191
acf(residuals(m_lm_flow3))

Box.test(residuals(m_lm_flow3), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_lm_flow3)
## X-squared = 151.84, df = 20, p-value < 2.2e-16

The residuals deviate from a normal distribution according to visual inspection and the Shapiro-Wilk normality test. Also, model definitely has residual autocorrelation as indicated by the ACF plot and the Box-Ljung test.

Model with lag terms

Now, we’ll try to deal with the residual autocorrelation and the non-normal residuals. We’ll run a series of linear models adding 1 and 2 lag terms and compare how well they correct for autocorrelation.

Lag 1

m_lm_flow3_lag1 <- df_chla_c2_lag %>%
  drop_na(Chla_log, lag1) %>%
  lm(Chla_log ~ Year_fct * Flow * StationCode + lag1, data = .)

acf(residuals(m_lm_flow3_lag1))

Box.test(residuals(m_lm_flow3_lag1), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_lm_flow3_lag1)
## X-squared = 25.245, df = 20, p-value = 0.1922

Lag 2

m_lm_flow3_lag2 <- df_chla_c2_lag %>%
  drop_na(Chla_log, lag1, lag2) %>%
  lm(Chla_log ~ Year_fct * Flow * StationCode + lag1 + lag2, data = .)

acf(residuals(m_lm_flow3_lag2))

Box.test(residuals(m_lm_flow3_lag2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_lm_flow3_lag2)
## X-squared = 13.923, df = 20, p-value = 0.8344

The model with 1 lag term already has better ACF and Box-Ljung test results than the initial model. Let’s use AIC to see how they compare.

Compare Models

AIC(m_lm_flow3, m_lm_flow3_lag1, m_lm_flow3_lag2)
##                 df      AIC
## m_lm_flow3      41 295.0642
## m_lm_flow3_lag1 42 179.7469
## m_lm_flow3_lag2 43 172.2352

Again, it looks like the lag2 model has the best fit according to the AIC values. Let’s take a closer look at that one.

Lag 2 model summary

summary(m_lm_flow3_lag2)
## 
## Call:
## lm(formula = Chla_log ~ Year_fct * Flow * StationCode + lag1 + 
##     lag2, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.03686 -0.14697 -0.00613  0.11959  1.00420 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        5.4975717  0.5381025  10.217  < 2e-16 ***
## Year_fct2016                      -0.5272351  0.1585453  -3.325 0.001020 ** 
## Year_fct2017                      -0.7714772  0.2455699  -3.142 0.001890 ** 
## Year_fct2018                      -0.2640250  0.1445395  -1.827 0.068982 .  
## Year_fct2019                      -0.0528143  0.1412730  -0.374 0.708846    
## Flow                              -0.0014995  0.0004555  -3.292 0.001143 ** 
## StationCodeSTTD                   -0.6512369  0.1458385  -4.465 1.23e-05 ***
## StationCodeLIB                    -1.2126820  0.1650472  -7.347 3.09e-12 ***
## StationCodeRVB                    -1.2247878  0.1681936  -7.282 4.60e-12 ***
## lag1                               0.5301472  0.0652367   8.127 2.28e-14 ***
## lag2                              -0.0907567  0.0601615  -1.509 0.132718    
## Year_fct2016:Flow                  0.0017664  0.0007138   2.475 0.014027 *  
## Year_fct2017:Flow                  0.0299815  0.0135893   2.206 0.028306 *  
## Year_fct2018:Flow                  0.0005670  0.0006222   0.911 0.363076    
## Year_fct2019:Flow                  0.0001914  0.0005492   0.348 0.727804    
## Year_fct2016:StationCodeSTTD       0.7826555  0.2073799   3.774 0.000202 ***
## Year_fct2017:StationCodeSTTD       1.0521849  0.3299013   3.189 0.001614 ** 
## Year_fct2018:StationCodeSTTD      -0.0991403  0.1918570  -0.517 0.605809    
## Year_fct2019:StationCodeSTTD      -0.5693466  0.1949134  -2.921 0.003819 ** 
## Year_fct2016:StationCodeLIB        1.0737971  0.2106794   5.097 6.96e-07 ***
## Year_fct2017:StationCodeLIB        0.8020328  0.2761937   2.904 0.004026 ** 
## Year_fct2018:StationCodeLIB       -1.1176227  0.2212095  -5.052 8.60e-07 ***
## Year_fct2019:StationCodeLIB       -0.3518422  0.1901208  -1.851 0.065442 .  
## Year_fct2016:StationCodeRVB        0.6035354  0.1967708   3.067 0.002406 ** 
## Year_fct2017:StationCodeRVB        0.5994637  0.2727511   2.198 0.028908 *  
## Year_fct2018:StationCodeRVB        0.1036231  0.1796164   0.577 0.564534    
## Year_fct2019:StationCodeRVB       -0.3803675  0.1848640  -2.058 0.040705 *  
## Flow:StationCodeSTTD               0.0029446  0.0007232   4.072 6.32e-05 ***
## Flow:StationCodeLIB                0.0021381  0.0006779   3.154 0.001814 ** 
## Flow:StationCodeRVB                0.0012279  0.0006719   1.827 0.068869 .  
## Year_fct2016:Flow:StationCodeSTTD -0.0025056  0.0010605  -2.363 0.018942 *  
## Year_fct2017:Flow:StationCodeSTTD -0.0193301  0.0149430  -1.294 0.197039    
## Year_fct2018:Flow:StationCodeSTTD -0.0006183  0.0009282  -0.666 0.505995    
## Year_fct2019:Flow:StationCodeSTTD -0.0007664  0.0008211  -0.933 0.351515    
## Year_fct2016:Flow:StationCodeLIB  -0.0020124  0.0010295  -1.955 0.051756 .  
## Year_fct2017:Flow:StationCodeLIB  -0.0296984  0.0137441  -2.161 0.031692 *  
## Year_fct2018:Flow:StationCodeLIB   0.0010813  0.0009386   1.152 0.250423    
## Year_fct2019:Flow:StationCodeLIB  -0.0009727  0.0008129  -1.197 0.232628    
## Year_fct2016:Flow:StationCodeRVB  -0.0016700  0.0010336  -1.616 0.107443    
## Year_fct2017:Flow:StationCodeRVB  -0.0286477  0.0137460  -2.084 0.038203 *  
## Year_fct2018:Flow:StationCodeRVB  -0.0007750  0.0008989  -0.862 0.389463    
## Year_fct2019:Flow:StationCodeRVB   0.0001112  0.0008000   0.139 0.889543    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3051 on 242 degrees of freedom
## Multiple R-squared:  0.929,  Adjusted R-squared:  0.917 
## F-statistic: 77.26 on 41 and 242 DF,  p-value: < 2.2e-16
df_chla_c2_lag %>%
  drop_na(Chla_log, lag1, lag2) %>%
  plot_lm_diag(Chla_log, m_lm_flow3_lag2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

shapiro.test(residuals(m_lm_flow3_lag2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_lm_flow3_lag2)
## W = 0.9644, p-value = 1.828e-06
Anova(m_lm_flow3_lag2, type = 3, contrasts = list(topic = contr.sum, sys = contr.sum))
## Anova Table (Type III tests)
## 
## Response: Chla_log
##                            Sum Sq  Df  F value    Pr(>F)    
## (Intercept)                9.7165   1 104.3787 < 2.2e-16 ***
## Year_fct                   1.8414   4   4.9454 0.0007538 ***
## Flow                       1.0089   1  10.8376 0.0011431 ** 
## StationCode                5.8077   3  20.7963 5.065e-12 ***
## lag1                       6.1477   1  66.0404 2.279e-14 ***
## lag2                       0.2118   1   2.2757 0.1327181    
## Year_fct:Flow              1.0592   4   2.8446 0.0247760 *  
## Year_fct:StationCode       7.2387  12   6.4801 5.404e-10 ***
## Flow:StationCode           1.7700   3   6.3378 0.0003741 ***
## Year_fct:Flow:StationCode  2.3512  12   2.1048 0.0172994 *  
## Residuals                 22.5276 242                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model diagnostics look okay, but not as good as with the GAM models. We’ll use m_lm_flow3_lag2 in the model selection process.

rm(m_lm_flow3, m_lm_flow3_lag1)

Model 5: Linear Model with Flow and 2-way interactions

Let’s try a linear model using all two-way interactions between Year, Weekly Average Flow, and Station. Initially, we’ll run the model without accounting for serial autocorrelation.

Initial Model

m_lm_flow2 <- lm(Chla_log ~ (Year_fct + Flow + StationCode)^2, data = df_chla_c2)

Lets look at the model summary and diagnostics:

summary(m_lm_flow2)
## 
## Call:
## lm(formula = Chla_log ~ (Year_fct + Flow + StationCode)^2, data = df_chla_c2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.99534 -0.22307 -0.03153  0.20115  1.36296 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   9.6943716  0.1094244  88.594  < 2e-16 ***
## Year_fct2016                 -0.4689162  0.1527744  -3.069 0.002344 ** 
## Year_fct2017                 -0.4662132  0.1463633  -3.185 0.001600 ** 
## Year_fct2018                 -0.3651423  0.1429793  -2.554 0.011156 *  
## Year_fct2019                 -0.0701638  0.1420134  -0.494 0.621627    
## Flow                         -0.0016930  0.0003628  -4.666 4.65e-06 ***
## StationCodeSTTD              -1.0540539  0.1423626  -7.404 1.38e-12 ***
## StationCodeLIB               -1.9989312  0.1385827 -14.424  < 2e-16 ***
## StationCodeRVB               -2.0471430  0.1385528 -14.775  < 2e-16 ***
## Year_fct2016:Flow             0.0001952  0.0004545   0.430 0.667779    
## Year_fct2017:Flow             0.0036316  0.0018887   1.923 0.055456 .  
## Year_fct2018:Flow             0.0004829  0.0004020   1.201 0.230592    
## Year_fct2019:Flow            -0.0003297  0.0003562  -0.926 0.355401    
## Year_fct2016:StationCodeSTTD  0.8736753  0.2027976   4.308 2.24e-05 ***
## Year_fct2017:StationCodeSTTD  0.5774444  0.2236000   2.582 0.010289 *  
## Year_fct2018:StationCodeSTTD -0.3143766  0.1963384  -1.601 0.110400    
## Year_fct2019:StationCodeSTTD -1.0367475  0.1929058  -5.374 1.56e-07 ***
## Year_fct2016:StationCodeLIB   1.4274612  0.1961874   7.276 3.10e-12 ***
## Year_fct2017:StationCodeLIB   0.6143627  0.2080527   2.953 0.003400 ** 
## Year_fct2018:StationCodeLIB  -1.7055558  0.1958498  -8.708  < 2e-16 ***
## Year_fct2019:StationCodeLIB  -0.6916575  0.1940127  -3.565 0.000424 ***
## Year_fct2016:StationCodeRVB   0.6391321  0.1974165   3.237 0.001343 ** 
## Year_fct2017:StationCodeRVB   0.2879781  0.2080017   1.384 0.167249    
## Year_fct2018:StationCodeRVB   0.0149704  0.1882767   0.080 0.936678    
## Year_fct2019:StationCodeRVB  -0.6628422  0.1878057  -3.529 0.000483 ***
## Flow:StationCodeSTTD          0.0035768  0.0003609   9.910  < 2e-16 ***
## Flow:StationCodeLIB           0.0024150  0.0003682   6.560 2.40e-10 ***
## Flow:StationCodeRVB           0.0015361  0.0003566   4.308 2.24e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3865 on 296 degrees of freedom
## Multiple R-squared:  0.8765, Adjusted R-squared:  0.8652 
## F-statistic: 77.81 on 27 and 296 DF,  p-value: < 2.2e-16
df_chla_c2 %>% plot_lm_diag(Chla_log, m_lm_flow2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

shapiro.test(residuals(m_lm_flow2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_lm_flow2)
## W = 0.9802, p-value = 0.0001911
acf(residuals(m_lm_flow2))

Box.test(residuals(m_lm_flow2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_lm_flow2)
## X-squared = 239.47, df = 20, p-value < 2.2e-16

The residuals deviate from a normal distribution according to visual inspection and the Shapiro-Wilk normality test. Also, model definitely has residual autocorrelation as indicated by the ACF plot and the Box-Ljung test.

Model with lag terms

Now, we’ll try to deal with the residual autocorrelation and the non-normal residuals. We’ll run a series of linear models adding 1 and 2 lag terms and compare how well they correct for autocorrelation.

Lag 1

m_lm_flow2_lag1 <- df_chla_c2_lag %>%
  drop_na(Chla_log, lag1) %>%
  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1, data = .)

acf(residuals(m_lm_flow2_lag1))

Box.test(residuals(m_lm_flow2_lag1), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_lm_flow2_lag1)
## X-squared = 29.04, df = 20, p-value = 0.08698

Lag 2

m_lm_flow2_lag2 <- df_chla_c2_lag %>%
  drop_na(Chla_log, lag1, lag2) %>%
  lm(Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + lag2, data = .)

acf(residuals(m_lm_flow2_lag2))

Box.test(residuals(m_lm_flow2_lag2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_lm_flow2_lag2)
## X-squared = 16.346, df = 20, p-value = 0.6949

The model with 2 lag terms seems to be okay in terms of serial autocorrelation. Let’s use AIC to see how they compare.

Compare Models

AIC(m_lm_flow2, m_lm_flow2_lag1, m_lm_flow2_lag2)
##                 df      AIC
## m_lm_flow2      29 332.1787
## m_lm_flow2_lag1 30 181.2203
## m_lm_flow2_lag2 31 176.4292

Again, it looks like the lag2 model has the best fit according to the AIC values. Let’s take a closer look at that one.

Lag 2 model summary

summary(m_lm_flow2_lag2)
## 
## Call:
## lm(formula = Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + 
##     lag2, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.20397 -0.15989 -0.01927  0.11946  0.96043 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   4.4998924  0.4956888   9.078  < 2e-16 ***
## Year_fct2016                 -0.2943646  0.1351127  -2.179 0.030277 *  
## Year_fct2017                 -0.2628023  0.1284627  -2.046 0.041811 *  
## Year_fct2018                 -0.2050746  0.1259745  -1.628 0.104785    
## Year_fct2019                  0.0077718  0.1241524   0.063 0.950135    
## Flow                         -0.0010099  0.0003095  -3.263 0.001252 ** 
## StationCodeSTTD              -0.4756410  0.1317070  -3.611 0.000367 ***
## StationCodeLIB               -0.9610523  0.1524165  -6.305 1.27e-09 ***
## StationCodeRVB               -0.9695374  0.1550963  -6.251 1.71e-09 ***
## lag1                          0.6313481  0.0631115  10.004  < 2e-16 ***
## lag2                         -0.0943825  0.0599941  -1.573 0.116918    
## Year_fct2016:Flow             0.0002841  0.0003772   0.753 0.451999    
## Year_fct2017:Flow             0.0014882  0.0015620   0.953 0.341608    
## Year_fct2018:Flow             0.0004459  0.0003377   1.320 0.187957    
## Year_fct2019:Flow            -0.0001049  0.0002986  -0.351 0.725789    
## Year_fct2016:StationCodeSTTD  0.4590973  0.1797473   2.554 0.011230 *  
## Year_fct2017:StationCodeSTTD  0.2611445  0.1965932   1.328 0.185255    
## Year_fct2018:StationCodeSTTD -0.1222489  0.1703977  -0.717 0.473766    
## Year_fct2019:StationCodeSTTD -0.5428917  0.1736903  -3.126 0.001980 ** 
## Year_fct2016:StationCodeLIB   0.7212841  0.1828336   3.945 0.000103 ***
## Year_fct2017:StationCodeLIB   0.3150986  0.1796582   1.754 0.080657 .  
## Year_fct2018:StationCodeLIB  -0.8442328  0.1911986  -4.415 1.49e-05 ***
## Year_fct2019:StationCodeLIB  -0.3984735  0.1712576  -2.327 0.020766 *  
## Year_fct2016:StationCodeRVB   0.3436105  0.1728820   1.988 0.047936 *  
## Year_fct2017:StationCodeRVB   0.1211150  0.1783372   0.679 0.497671    
## Year_fct2018:StationCodeRVB   0.0466405  0.1619971   0.288 0.773651    
## Year_fct2019:StationCodeRVB  -0.3254659  0.1649252  -1.973 0.049532 *  
## Flow:StationCodeSTTD          0.0018283  0.0003316   5.514 8.62e-08 ***
## Flow:StationCodeLIB           0.0014442  0.0003202   4.511 9.87e-06 ***
## Flow:StationCodeRVB           0.0007276  0.0003025   2.405 0.016874 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.313 on 254 degrees of freedom
## Multiple R-squared:  0.9216, Adjusted R-squared:  0.9127 
## F-statistic:   103 on 29 and 254 DF,  p-value: < 2.2e-16
df_chla_c2_lag %>%
  drop_na(Chla_log, lag1, lag2) %>%
  plot_lm_diag(Chla_log, m_lm_flow2_lag2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

shapiro.test(residuals(m_lm_flow2_lag2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_lm_flow2_lag2)
## W = 0.96256, p-value = 1.027e-06
Anova(m_lm_flow2_lag2, type = 3, contrasts = list(topic = contr.sum, sys = contr.sum))
## Anova Table (Type III tests)
## 
## Response: Chla_log
##                       Sum Sq  Df  F value    Pr(>F)    
## (Intercept)           8.0720   1  82.4112 < 2.2e-16 ***
## Year_fct              0.9908   4   2.5290  0.041132 *  
## Flow                  1.0431   1  10.6493  0.001252 ** 
## StationCode           4.5471   3  15.4744 2.824e-09 ***
## lag1                  9.8020   1 100.0738 < 2.2e-16 ***
## lag2                  0.2424   1   2.4749  0.116918    
## Year_fct:Flow         0.5183   4   1.3228  0.261880    
## Year_fct:StationCode  5.7798  12   4.9174 2.643e-07 ***
## Flow:StationCode      3.4705   3  11.8107 2.875e-07 ***
## Residuals            24.8788 254                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model diagnostics look somewhat worse than those for the 3-way interaction model. Note that the 2-way interaction between Year and Flow isn’t significant. We’ll use m_lm_flow2_lag2 in the model selection process.

rm(m_lm_flow2, m_lm_flow2_lag1)

Model 6: Linear Model with 2-way interaction between Station and Year but without Flow

We’ll try running a linear model using a two-way interaction between Year and Station but not including flow as a predictor. Initially, we’ll run the model without accounting for serial autocorrelation.

Initial Model

m_lm_cat2 <- lm(Chla_log ~ Year_fct * StationCode, data = df_chla_c2)

Lets look at the model summary and diagnostics:

summary(m_lm_cat2)
## 
## Call:
## lm(formula = Chla_log ~ Year_fct * StationCode, data = df_chla_c2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.35952 -0.25178 -0.05221  0.27546  1.31786 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   9.45473    0.11258  83.981  < 2e-16 ***
## Year_fct2016                 -0.40045    0.16480  -2.430 0.015684 *  
## Year_fct2017                 -0.19510    0.15686  -1.244 0.214519    
## Year_fct2018                 -0.26862    0.15473  -1.736 0.083569 .  
## Year_fct2019                 -0.11515    0.15473  -0.744 0.457316    
## StationCodeSTTD              -0.75644    0.15686  -4.822 2.25e-06 ***
## StationCodeLIB               -1.74964    0.15104 -11.584  < 2e-16 ***
## StationCodeRVB               -1.80959    0.15104 -11.980  < 2e-16 ***
## Year_fct2016:StationCodeSTTD  0.84307    0.23146   3.642 0.000317 ***
## Year_fct2017:StationCodeSTTD  0.10665    0.23835   0.447 0.654872    
## Year_fct2018:StationCodeSTTD -0.35014    0.22440  -1.560 0.119719    
## Year_fct2019:StationCodeSTTD -0.88642    0.22033  -4.023 7.25e-05 ***
## Year_fct2016:StationCodeLIB   1.38154    0.22188   6.227 1.58e-09 ***
## Year_fct2017:StationCodeLIB   0.21272    0.21604   0.985 0.325578    
## Year_fct2018:StationCodeLIB  -1.72136    0.22289  -7.723 1.66e-13 ***
## Year_fct2019:StationCodeLIB  -0.62118    0.22038  -2.819 0.005138 ** 
## Year_fct2016:StationCodeRVB   0.57425    0.22355   2.569 0.010683 *  
## Year_fct2017:StationCodeRVB  -0.07752    0.21604  -0.359 0.719984    
## Year_fct2018:StationCodeRVB  -0.06960    0.21295  -0.327 0.744018    
## Year_fct2019:StationCodeRVB  -0.65957    0.21295  -3.097 0.002135 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4503 on 304 degrees of freedom
## Multiple R-squared:  0.8278, Adjusted R-squared:  0.817 
## F-statistic: 76.92 on 19 and 304 DF,  p-value: < 2.2e-16
df_chla_c2 %>% plot_lm_diag(Chla_log, m_lm_cat2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

shapiro.test(residuals(m_lm_cat2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_lm_cat2)
## W = 0.98776, p-value = 0.007767
acf(residuals(m_lm_cat2))

Box.test(residuals(m_lm_cat2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_lm_cat2)
## X-squared = 374.61, df = 20, p-value < 2.2e-16

Besides the Shapiro-Wilk normality test showing that the residuals aren’t normal, the diagnostic plots look pretty good. However, the residuals are autocorrelated.

Model with lag terms

Now, we’ll try to deal with the residual autocorrelation and the non-normal residuals. We’ll run a series of linear models adding 1 and 2 lag terms and compare how well they correct for autocorrelation.

Lag 1

m_lm_cat2_lag1 <- df_chla_c2_lag %>%
  drop_na(Chla_log, lag1) %>%
  lm(Chla_log ~ Year_fct * StationCode + lag1, data = .)

acf(residuals(m_lm_cat2_lag1))

Box.test(residuals(m_lm_cat2_lag1), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_lm_cat2_lag1)
## X-squared = 33.232, df = 20, p-value = 0.03182

Lag 2

m_lm_cat2_lag2 <- df_chla_c2_lag %>%
  drop_na(Chla_log, lag1, lag2) %>%
  lm(Chla_log ~ Year_fct * StationCode + lag1 + lag2, data = .)

acf(residuals(m_lm_cat2_lag2))

Box.test(residuals(m_lm_cat2_lag2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_lm_cat2_lag2)
## X-squared = 16.358, df = 20, p-value = 0.6942

The model with 2 lag terms seems to be okay in terms of serial autocorrelation. Let’s use AIC to see how they compare.

Compare Models

AIC(m_lm_cat2, m_lm_cat2_lag1, m_lm_cat2_lag2)
##                df      AIC
## m_lm_cat2      21 423.8688
## m_lm_cat2_lag1 22 212.6171
## m_lm_cat2_lag2 23 201.4048

Again, it looks like the lag2 model has the best fit according to the AIC values. Let’s take a closer look at that one.

Lag 2 model summary

summary(m_lm_cat2_lag2)
## 
## Call:
## lm(formula = Chla_log ~ Year_fct * StationCode + lag1 + lag2, 
##     data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.32499 -0.17269 -0.01809  0.14998  0.85380 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   3.372162   0.481718   7.000 2.14e-11 ***
## Year_fct2016                 -0.181269   0.131159  -1.382 0.168132    
## Year_fct2017                 -0.077050   0.123280  -0.625 0.532516    
## Year_fct2018                 -0.087196   0.122108  -0.714 0.475808    
## Year_fct2019                  0.004655   0.121586   0.038 0.969491    
## StationCodeSTTD              -0.204739   0.127949  -1.600 0.110768    
## StationCodeLIB               -0.610005   0.145883  -4.181 3.95e-05 ***
## StationCodeRVB               -0.628436   0.148825  -4.223 3.33e-05 ***
## lag1                          0.797550   0.060317  13.223  < 2e-16 ***
## lag2                         -0.158015   0.061978  -2.550 0.011357 *  
## Year_fct2016:StationCodeSTTD  0.330530   0.186247   1.775 0.077111 .  
## Year_fct2017:StationCodeSTTD -0.018475   0.190484  -0.097 0.922808    
## Year_fct2018:StationCodeSTTD -0.126243   0.177047  -0.713 0.476452    
## Year_fct2019:StationCodeSTTD -0.386387   0.177700  -2.174 0.030572 *  
## Year_fct2016:StationCodeLIB   0.523514   0.186858   2.802 0.005463 ** 
## Year_fct2017:StationCodeLIB   0.063625   0.169164   0.376 0.707135    
## Year_fct2018:StationCodeLIB  -0.674882   0.194480  -3.470 0.000608 ***
## Year_fct2019:StationCodeLIB  -0.297316   0.175374  -1.695 0.091202 .  
## Year_fct2016:StationCodeRVB   0.220236   0.177556   1.240 0.215946    
## Year_fct2017:StationCodeRVB  -0.061200   0.169078  -0.362 0.717671    
## Year_fct2018:StationCodeRVB  -0.025173   0.166265  -0.151 0.879774    
## Year_fct2019:StationCodeRVB  -0.281198   0.168751  -1.666 0.096840 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3312 on 262 degrees of freedom
## Multiple R-squared:  0.9095, Adjusted R-squared:  0.9022 
## F-statistic: 125.3 on 21 and 262 DF,  p-value: < 2.2e-16
df_chla_c2_lag %>%
  drop_na(Chla_log, lag1, lag2) %>%
  plot_lm_diag(Chla_log, m_lm_cat2_lag2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

shapiro.test(residuals(m_lm_cat2_lag2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_lm_cat2_lag2)
## W = 0.95081, p-value = 3.589e-08
Anova(m_lm_cat2_lag2, type = 3, contrasts = list(topic = contr.sum, sys = contr.sum))
## Anova Table (Type III tests)
## 
## Response: Chla_log
##                       Sum Sq  Df  F value    Pr(>F)    
## (Intercept)           5.3755   1  49.0040 2.140e-11 ***
## Year_fct              0.3048   4   0.6947  0.596226    
## StationCode           2.5266   3   7.6776 6.165e-05 ***
## lag1                 19.1789   1 174.8376 < 2.2e-16 ***
## lag2                  0.7130   1   6.5002  0.011357 *  
## Year_fct:StationCode  3.5215  12   2.6752  0.002074 ** 
## Residuals            28.7402 262                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model diagnostics don’t look that great. However, we’ll use m_lm_cat2_lag2 in the model selection process.

rm(m_lm_cat2, m_lm_cat2_lag1)

Model 7: GAM model using smooths for Flow

Finally, we’ll try running a GAM model using smooths for weekly average flow by Station and Year and a smooth term for week number to account for seasonality. Initially, we’ll run the model without accounting for serial autocorrelation.

Initial Model

m_gam_sflow <- gam(
  Chla_log ~ s(Flow, by = StationCode) + s(Flow, by = Year_fct) + Year_fct * StationCode + s(Week, bs = "cc", k = 5),
  data = df_chla_c2,
  method = "REML",
  knots = list(week = c(0, 52))
)

Lets look at the model summary and diagnostics:

summary(m_gam_sflow)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ s(Flow, by = StationCode) + s(Flow, by = Year_fct) + 
##     Year_fct * StationCode + s(Week, bs = "cc", k = 5)
## 
## Parametric coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   9.66544    0.08511 113.562  < 2e-16 ***
## Year_fct2016                 -0.48055    0.17967  -2.675 0.007913 ** 
## Year_fct2017                 -0.40595    0.13255  -3.063 0.002405 ** 
## Year_fct2018                 -0.37617    0.11335  -3.319 0.001023 ** 
## Year_fct2019                 -0.15454    0.11391  -1.357 0.175965    
## StationCodeSTTD              -0.86616    0.12060  -7.182 6.04e-12 ***
## StationCodeLIB               -1.93724    0.11667 -16.605  < 2e-16 ***
## StationCodeRVB               -2.01684    0.11368 -17.742  < 2e-16 ***
## Year_fct2016:StationCodeSTTD  0.71843    0.23175   3.100 0.002129 ** 
## Year_fct2017:StationCodeSTTD  0.39824    0.19354   2.058 0.040537 *  
## Year_fct2018:StationCodeSTTD -0.21645    0.16541  -1.309 0.191730    
## Year_fct2019:StationCodeSTTD -0.89987    0.16826  -5.348 1.83e-07 ***
## Year_fct2016:StationCodeLIB   1.30740    0.22874   5.716 2.76e-08 ***
## Year_fct2017:StationCodeLIB   0.60585    0.17751   3.413 0.000736 ***
## Year_fct2018:StationCodeLIB  -1.57969    0.16418  -9.622  < 2e-16 ***
## Year_fct2019:StationCodeLIB  -0.53646    0.16853  -3.183 0.001618 ** 
## Year_fct2016:StationCodeRVB   0.51588    0.22543   2.288 0.022847 *  
## Year_fct2017:StationCodeRVB   0.25699    0.17314   1.484 0.138846    
## Year_fct2018:StationCodeRVB   0.07229    0.15625   0.463 0.643968    
## Year_fct2019:StationCodeRVB  -0.61271    0.15635  -3.919 0.000112 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                               edf    Ref.df      F  p-value    
## s(Flow):StationCodeRD22 0.0002284 0.0004524  0.070    0.996    
## s(Flow):StationCodeSTTD 2.9039150 3.5937947 48.484  < 2e-16 ***
## s(Flow):StationCodeLIB  2.5890173 3.2052194 27.351  < 2e-16 ***
## s(Flow):StationCodeRVB  1.0000126 1.0000249 42.436  < 2e-16 ***
## s(Flow):Year_fct2015    1.0000124 1.0000242 26.236 7.97e-07 ***
## s(Flow):Year_fct2016    6.4233608 7.2168988  7.122  < 2e-16 ***
## s(Flow):Year_fct2017    1.0000032 1.0000064  0.237    0.627    
## s(Flow):Year_fct2018    1.0000114 1.0000226 17.315 4.22e-05 ***
## s(Flow):Year_fct2019    1.0098281 1.0191588 41.238  < 2e-16 ***
## s(Week)                 2.7381672 3.0000000 31.562  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Rank: 103/104
## R-sq.(adj) =  0.908   Deviance explained = 91.9%
## -REML =  142.3  Scale est. = 0.10241   n = 324
appraise(m_gam_sflow)

shapiro.test(residuals(m_gam_sflow))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_sflow)
## W = 0.98117, p-value = 0.0002968
k.check(m_gam_sflow)
##                         k'          edf   k-index p-value
## s(Flow):StationCodeRD22  9 0.0002283591 1.0470183  0.7850
## s(Flow):StationCodeSTTD  9 2.9039150301 1.0470183  0.7675
## s(Flow):StationCodeLIB   9 2.5890172673 1.0470183  0.7600
## s(Flow):StationCodeRVB   9 1.0000125887 1.0470183  0.7800
## s(Flow):Year_fct2015     9 1.0000123659 1.0470183  0.8125
## s(Flow):Year_fct2016     9 6.4233608063 1.0470183  0.8175
## s(Flow):Year_fct2017     9 1.0000032061 1.0470183  0.8075
## s(Flow):Year_fct2018     9 1.0000113887 1.0470183  0.7400
## s(Flow):Year_fct2019     9 1.0098280510 1.0470183  0.7950
## s(Week)                  3 2.7381671744 0.9647659  0.2575
concurvity(m_gam_sflow, full = FALSE)$worst
##                                 para s(Flow):StationCodeRD22
## para                    1.000000e+00            2.561680e-01
## s(Flow):StationCodeRD22 2.561680e-01            1.000000e+00
## s(Flow):StationCodeSTTD 7.487812e-02            1.101331e-26
## s(Flow):StationCodeLIB  7.679159e-02            9.271762e-27
## s(Flow):StationCodeRVB  8.759218e-02            8.234788e-27
## s(Flow):Year_fct2015    2.252320e-01            4.236437e-01
## s(Flow):Year_fct2016    1.881476e-01            6.035297e-01
## s(Flow):Year_fct2017    1.882716e-01            3.384493e-01
## s(Flow):Year_fct2018    1.937259e-01            3.067056e-01
## s(Flow):Year_fct2019    2.011684e-01            8.259018e-01
## s(Week)                 7.060792e-32            1.433059e-01
##                         s(Flow):StationCodeSTTD s(Flow):StationCodeLIB
## para                               7.487812e-02           7.679159e-02
## s(Flow):StationCodeRD22            1.074363e-26           9.371473e-27
## s(Flow):StationCodeSTTD            1.000000e+00           4.131051e-28
## s(Flow):StationCodeLIB             5.679579e-28           1.000000e+00
## s(Flow):StationCodeRVB             5.735916e-28           4.420274e-28
## s(Flow):Year_fct2015               2.465837e-01           2.573537e-01
## s(Flow):Year_fct2016               1.543428e-01           2.446523e-01
## s(Flow):Year_fct2017               6.714084e-02           1.873644e-01
## s(Flow):Year_fct2018               1.811522e-01           1.938793e-01
## s(Flow):Year_fct2019               2.906021e-01           2.818269e-01
## s(Week)                            5.160478e-02           7.485617e-02
##                         s(Flow):StationCodeRVB s(Flow):Year_fct2015
## para                              8.759218e-02         2.252320e-01
## s(Flow):StationCodeRD22           8.837323e-27         4.236437e-01
## s(Flow):StationCodeSTTD           3.737793e-28         2.465837e-01
## s(Flow):StationCodeLIB            2.333734e-28         2.573537e-01
## s(Flow):StationCodeRVB            1.000000e+00         2.449126e-01
## s(Flow):Year_fct2015              2.449126e-01         1.000000e+00
## s(Flow):Year_fct2016              1.563972e-01         2.043733e-26
## s(Flow):Year_fct2017              1.839605e-01         1.259481e-13
## s(Flow):Year_fct2018              1.810226e-01         1.362544e-26
## s(Flow):Year_fct2019              2.894419e-01         1.200993e-25
## s(Week)                           7.772559e-02         1.478751e-01
##                         s(Flow):Year_fct2016 s(Flow):Year_fct2017
## para                            1.881476e-01         1.882716e-01
## s(Flow):StationCodeRD22         6.035297e-01         3.384493e-01
## s(Flow):StationCodeSTTD         1.543428e-01         6.714084e-02
## s(Flow):StationCodeLIB          2.446523e-01         1.873644e-01
## s(Flow):StationCodeRVB          1.563972e-01         1.839605e-01
## s(Flow):Year_fct2015            2.238503e-26         4.918031e-14
## s(Flow):Year_fct2016            1.000000e+00         1.534799e-14
## s(Flow):Year_fct2017            1.595872e-14         1.000000e+00
## s(Flow):Year_fct2018            4.730280e-27         1.875936e-14
## s(Flow):Year_fct2019            3.535738e-27         1.950144e-14
## s(Week)                         1.636050e-01         1.689131e-01
##                         s(Flow):Year_fct2018 s(Flow):Year_fct2019      s(Week)
## para                            1.937259e-01         2.011684e-01 6.694329e-32
## s(Flow):StationCodeRD22         3.067056e-01         8.259018e-01 1.433059e-01
## s(Flow):StationCodeSTTD         1.811522e-01         2.906021e-01 5.160478e-02
## s(Flow):StationCodeLIB          1.938793e-01         2.818269e-01 7.485617e-02
## s(Flow):StationCodeRVB          1.810226e-01         2.894419e-01 7.772559e-02
## s(Flow):Year_fct2015            1.358798e-26         1.248024e-25 1.478751e-01
## s(Flow):Year_fct2016            6.224435e-27         2.970225e-27 1.636050e-01
## s(Flow):Year_fct2017            5.521515e-14         6.662451e-14 1.689131e-01
## s(Flow):Year_fct2018            1.000000e+00         3.023201e-27 1.057174e-01
## s(Flow):Year_fct2019            3.103856e-27         1.000000e+00 9.582630e-02
## s(Week)                         1.057174e-01         9.582630e-02 1.000000e+00
draw(m_gam_sflow, select = 10, residuals = TRUE, rug = FALSE)

plot(m_gam_sflow, pages = 1, all.terms = TRUE)

acf(residuals(m_gam_sflow))

Box.test(residuals(m_gam_sflow), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_sflow)
## X-squared = 95.118, df = 20, p-value = 9.317e-12

The diagnostic plots look really good. However, the residuals are autocorrelated.

Model with lag terms

Now, we’ll try to deal with the residual autocorrelation. We’ll run a series of models adding 1 and 2 lag terms and compare how well they correct for autocorrelation.

Lag 1

m_gam_sflow_lag1 <- gam(
  Chla_log ~ s(Flow, by = StationCode) + s(Flow, by = Year_fct) + Year_fct * StationCode + s(Week, bs = "cc", k = 5) + lag1,
  data = df_chla_c2_lag,
  method = "REML",
  knots = list(week = c(0, 52))
)

acf(residuals(m_gam_sflow_lag1))

Box.test(residuals(m_gam_sflow_lag1), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_sflow_lag1)
## X-squared = 29.405, df = 20, p-value = 0.08009

Lag 2

m_gam_sflow_lag2 <- gam(
  Chla_log ~ s(Flow, by = StationCode) + s(Flow, by = Year_fct) + Year_fct * StationCode + s(Week, bs = "cc", k = 5) + lag1 + lag2,
  data = df_chla_c2_lag,
  method = "REML",
  knots = list(week = c(0, 52))
)

acf(residuals(m_gam_sflow_lag2))

Box.test(residuals(m_gam_sflow_lag2), lag = 20, type = 'Ljung-Box')
## 
##  Box-Ljung test
## 
## data:  residuals(m_gam_sflow_lag2)
## X-squared = 27.718, df = 20, p-value = 0.1163

The model with 2 lag terms seems to be okay in terms of serial autocorrelation. Let’s use AIC to see how they compare.

Compare Models

AIC(m_gam_sflow, m_gam_sflow_lag1, m_gam_sflow_lag2)
##                        df      AIC
## m_gam_sflow      42.98590 224.8171
## m_gam_sflow_lag1 37.30077 142.4951
## m_gam_sflow_lag2 41.10788 138.2065

It looks like the lag2 model has the best fit according to the AIC values. Let’s take a closer look at that one.

Lag 2 model summary

summary(m_gam_sflow_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ s(Flow, by = StationCode) + s(Flow, by = Year_fct) + 
##     Year_fct * StationCode + s(Week, bs = "cc", k = 5) + lag1 + 
##     lag2
## 
## Parametric coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   5.46683    0.49950  10.945  < 2e-16 ***
## Year_fct2016                 -0.42630    0.13257  -3.216 0.001475 ** 
## Year_fct2017                 -0.24476    0.13746  -1.781 0.076209 .  
## Year_fct2018                 -0.23054    0.10997  -2.096 0.037066 *  
## Year_fct2019                 -0.05188    0.10981  -0.472 0.637007    
## StationCodeSTTD              -0.45622    0.11968  -3.812 0.000174 ***
## StationCodeLIB               -1.08806    0.14463  -7.523 9.99e-13 ***
## StationCodeRVB               -1.14879    0.14626  -7.854 1.24e-13 ***
## lag1                          0.51328    0.06069   8.457 2.45e-15 ***
## lag2                         -0.08217    0.05658  -1.452 0.147693    
## Year_fct2016:StationCodeSTTD  0.56750    0.17455   3.251 0.001310 ** 
## Year_fct2017:StationCodeSTTD  0.25948    0.18396   1.411 0.159648    
## Year_fct2018:StationCodeSTTD -0.07260    0.15812  -0.459 0.646556    
## Year_fct2019:StationCodeSTTD -0.52630    0.16644  -3.162 0.001762 ** 
## Year_fct2016:StationCodeLIB   0.86208    0.17920   4.811 2.62e-06 ***
## Year_fct2017:StationCodeLIB   0.36319    0.16785   2.164 0.031444 *  
## Year_fct2018:StationCodeLIB  -0.96612    0.17810  -5.425 1.39e-07 ***
## Year_fct2019:StationCodeLIB  -0.33972    0.16491  -2.060 0.040445 *  
## Year_fct2016:StationCodeRVB   0.39689    0.16777   2.366 0.018776 *  
## Year_fct2017:StationCodeRVB   0.12359    0.16387   0.754 0.451467    
## Year_fct2018:StationCodeRVB   0.08717    0.14887   0.586 0.558732    
## Year_fct2019:StationCodeRVB  -0.34776    0.15317  -2.270 0.024042 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                               edf    Ref.df      F  p-value    
## s(Flow):StationCodeRD22 0.0000145 2.791e-05  0.000 0.500000    
## s(Flow):StationCodeSTTD 2.3190987 2.881e+00 17.469  < 2e-16 ***
## s(Flow):StationCodeLIB  2.3556727 2.929e+00 12.983 5.03e-07 ***
## s(Flow):StationCodeRVB  1.0000426 1.000e+00 12.213 0.000563 ***
## s(Flow):Year_fct2015    1.0000705 1.000e+00  8.929 0.003092 ** 
## s(Flow):Year_fct2016    3.0909633 3.829e+00  4.198 0.003170 ** 
## s(Flow):Year_fct2017    1.0000033 1.000e+00  0.146 0.702257    
## s(Flow):Year_fct2018    1.0000217 1.000e+00  3.413 0.065872 .  
## s(Flow):Year_fct2019    1.3738345 1.631e+00 10.097 0.001214 ** 
## s(Week)                 2.5188386 3.000e+00 13.189  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Rank: 105/106
## R-sq.(adj) =  0.927   Deviance explained = 93.6%
## -REML = 95.296  Scale est. = 0.082211  n = 284
appraise(m_gam_sflow_lag2)

shapiro.test(residuals(m_gam_sflow_lag2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_sflow_lag2)
## W = 0.96665, p-value = 3.788e-06
k.check(m_gam_sflow_lag2)
##                         k'          edf   k-index p-value
## s(Flow):StationCodeRD22  9 1.450141e-05 1.0740506  0.8850
## s(Flow):StationCodeSTTD  9 2.319099e+00 1.0740506  0.8550
## s(Flow):StationCodeLIB   9 2.355673e+00 1.0740506  0.8575
## s(Flow):StationCodeRVB   9 1.000043e+00 1.0740506  0.8625
## s(Flow):Year_fct2015     9 1.000070e+00 1.0740506  0.9050
## s(Flow):Year_fct2016     9 3.090963e+00 1.0740506  0.8725
## s(Flow):Year_fct2017     9 1.000003e+00 1.0740506  0.8675
## s(Flow):Year_fct2018     9 1.000022e+00 1.0740506  0.8850
## s(Flow):Year_fct2019     9 1.373835e+00 1.0740506  0.8500
## s(Week)                  3 2.518839e+00 0.9636937  0.2450
concurvity(m_gam_sflow_lag2, full = FALSE)$worst
##                                 para s(Flow):StationCodeRD22
## para                    1.000000e+00            2.570357e-01
## s(Flow):StationCodeRD22 2.570357e-01            1.000000e+00
## s(Flow):StationCodeSTTD 6.467291e-02            6.800983e-27
## s(Flow):StationCodeLIB  6.873111e-02            6.780406e-27
## s(Flow):StationCodeRVB  8.179021e-02            5.058301e-27
## s(Flow):Year_fct2015    2.288070e-01            4.243169e-01
## s(Flow):Year_fct2016    1.865535e-01            6.038788e-01
## s(Flow):Year_fct2017    1.866197e-01            3.603111e-01
## s(Flow):Year_fct2018    1.930861e-01            3.083022e-01
## s(Flow):Year_fct2019    2.019358e-01            8.269739e-01
## s(Week)                 5.166483e-31            1.539601e-01
##                         s(Flow):StationCodeSTTD s(Flow):StationCodeLIB
## para                               6.467291e-02           6.873111e-02
## s(Flow):StationCodeRD22            6.832305e-27           5.903932e-27
## s(Flow):StationCodeSTTD            1.000000e+00           3.215191e-28
## s(Flow):StationCodeLIB             3.701507e-28           1.000000e+00
## s(Flow):StationCodeRVB             5.964115e-28           7.386603e-28
## s(Flow):Year_fct2015               2.480268e-01           2.881836e-01
## s(Flow):Year_fct2016               1.530147e-01           2.425054e-01
## s(Flow):Year_fct2017               6.561655e-02           1.890768e-01
## s(Flow):Year_fct2018               1.807200e-01           1.938443e-01
## s(Flow):Year_fct2019               2.900501e-01           2.806141e-01
## s(Week)                            8.315753e-02           1.163488e-01
##                         s(Flow):StationCodeRVB s(Flow):Year_fct2015
## para                              8.179021e-02         2.288070e-01
## s(Flow):StationCodeRD22           5.535822e-27         4.243169e-01
## s(Flow):StationCodeSTTD           3.328124e-28         2.480268e-01
## s(Flow):StationCodeLIB            3.558085e-28         2.881836e-01
## s(Flow):StationCodeRVB            1.000000e+00         2.467403e-01
## s(Flow):Year_fct2015              2.467403e-01         1.000000e+00
## s(Flow):Year_fct2016              1.556821e-01         2.858411e-26
## s(Flow):Year_fct2017              1.858040e-01         1.650521e-12
## s(Flow):Year_fct2018              1.920115e-01         1.077666e-25
## s(Flow):Year_fct2019              2.889471e-01         5.409089e-27
## s(Week)                           1.030716e-01         1.864721e-01
##                         s(Flow):Year_fct2016 s(Flow):Year_fct2017
## para                            1.865535e-01         1.866197e-01
## s(Flow):StationCodeRD22         6.038788e-01         3.603111e-01
## s(Flow):StationCodeSTTD         1.530147e-01         6.561655e-02
## s(Flow):StationCodeLIB          2.425054e-01         1.890768e-01
## s(Flow):StationCodeRVB          1.556821e-01         1.858040e-01
## s(Flow):Year_fct2015            4.157153e-26         1.978233e-12
## s(Flow):Year_fct2016            1.000000e+00         7.668165e-14
## s(Flow):Year_fct2017            3.566287e-15         1.000000e+00
## s(Flow):Year_fct2018            8.938628e-28         3.281103e-13
## s(Flow):Year_fct2019            1.689988e-27         4.718115e-13
## s(Week)                         2.843514e-01         1.958644e-01
##                         s(Flow):Year_fct2018 s(Flow):Year_fct2019      s(Week)
## para                            1.930861e-01         2.019358e-01 4.097543e-31
## s(Flow):StationCodeRD22         3.083022e-01         8.269739e-01 1.539601e-01
## s(Flow):StationCodeSTTD         1.807200e-01         2.900501e-01 8.315753e-02
## s(Flow):StationCodeLIB          1.938443e-01         2.806141e-01 1.163488e-01
## s(Flow):StationCodeRVB          1.920115e-01         2.889471e-01 1.030716e-01
## s(Flow):Year_fct2015            1.118029e-25         4.642002e-27 1.864721e-01
## s(Flow):Year_fct2016            7.576510e-28         1.173662e-27 2.843514e-01
## s(Flow):Year_fct2017            2.499652e-13         2.531812e-13 1.958644e-01
## s(Flow):Year_fct2018            1.000000e+00         2.888272e-27 1.117878e-01
## s(Flow):Year_fct2019            2.569263e-27         1.000000e+00 1.212121e-01
## s(Week)                         1.117878e-01         1.212121e-01 1.000000e+00
draw(m_gam_sflow_lag2, select = 1:4, residuals = TRUE, rug = FALSE)

draw(m_gam_sflow_lag2, select = 5:9, residuals = TRUE, rug = FALSE)

draw(m_gam_sflow_lag2, select = 10, residuals = TRUE, rug = FALSE)

anova(m_gam_sflow_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ s(Flow, by = StationCode) + s(Flow, by = Year_fct) + 
##     Year_fct * StationCode + s(Week, bs = "cc", k = 5) + lag1 + 
##     lag2
## 
## Parametric Terms:
##                      df      F  p-value
## Year_fct              4  3.423  0.00957
## StationCode           3 23.715 1.61e-13
## lag1                  1 71.518 2.45e-15
## lag2                  1  2.109  0.14769
## Year_fct:StationCode 12  7.112 4.15e-11
## 
## Approximate significance of smooth terms:
##                               edf    Ref.df      F  p-value
## s(Flow):StationCodeRD22 1.450e-05 2.791e-05  0.000 0.500000
## s(Flow):StationCodeSTTD 2.319e+00 2.881e+00 17.469  < 2e-16
## s(Flow):StationCodeLIB  2.356e+00 2.929e+00 12.983 5.03e-07
## s(Flow):StationCodeRVB  1.000e+00 1.000e+00 12.213 0.000563
## s(Flow):Year_fct2015    1.000e+00 1.000e+00  8.929 0.003092
## s(Flow):Year_fct2016    3.091e+00 3.829e+00  4.198 0.003170
## s(Flow):Year_fct2017    1.000e+00 1.000e+00  0.146 0.702257
## s(Flow):Year_fct2018    1.000e+00 1.000e+00  3.413 0.065872
## s(Flow):Year_fct2019    1.374e+00 1.631e+00 10.097 0.001214
## s(Week)                 2.519e+00 3.000e+00 13.189  < 2e-16

The model diagnostics look a little worse than with the initial model but they still look pretty good. We’ll use m_gam_sflow_lag2 in the model selection process.

rm(m_gam_sflow, m_gam_sflow_lag1)

Model selection

Now we’ll compare the seven candidate models with AIC to select the one with the best fit. As a summary, here are the 7 models we are comparing:

  • Model 1 - m_gam_flow3_lag2 - GAM 3-way interactions with s(Week)
    Formula: Chla_log ~ Year_fct * Flow * StationCode + s(Week, bs = “cc”, k = 5) + lag1 + lag2

  • Model 2 - m_gam_flow2_lag2 - GAM 2-way interactions with s(Week)
    Formula: Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = “cc”, k = 5) + lag1 + lag2

  • Model 3 - m_gam_cat2_lag2 - GAM 2-way interaction between Station and Year with s(Week) but without Flow
    Formula: Chla_log ~ Year_fct * StationCode + s(Week, bs = “cc”, k = 5) + lag1 + lag2

  • Model 4 - m_lm_flow3_lag2 - LM 3-way interactions without seasonal term
    Formula: Chla_log ~ Year_fct * Flow * StationCode + lag1 + lag2

  • Model 5 - m_lm_flow2_lag2 - LM 2-way interactions without seasonal term
    Formula: Chla_log ~ (Year_fct + Flow + StationCode)^2 + lag1 + lag2

  • Model 6 - m_lm_cat2_lag2 - LM 2-way interaction between Station and Year but without Flow and seasonal term
    Formula: Chla_log ~ Year_fct * StationCode + lag1 + lag2

  • Model 7 - m_gam_sflow_lag2 - GAM using smooths for Flow with s(Week)
    Formula: Chla_log ~ s(Flow, by = StationCode) + s(Flow, by = Year_fct) + Year_fct * StationCode + s(Week, bs = “cc”, k = 5) + lag1 + lag2
# AIC values
df_m_aic <-
  AIC(
    m_gam_flow3_lag2,
    m_gam_flow2_lag2,
    m_gam_cat2_lag2,
    m_lm_flow3_lag2,
    m_lm_flow2_lag2,
    m_lm_cat2_lag2,
    m_gam_sflow_lag2
  ) %>%
  as_tibble(rownames = "Model") %>%
  mutate(Model_Number = 1:7, .before = Model)

# BIC values
df_m_bic <-
  BIC(
    m_gam_flow3_lag2,
    m_gam_flow2_lag2,
    m_gam_cat2_lag2,
    m_lm_flow3_lag2,
    m_lm_flow2_lag2,
    m_lm_cat2_lag2,
    m_gam_sflow_lag2
  ) %>%
  as_tibble(rownames = "Model")

# Combine AIC and BIC and calculate differences from lowest value
df_m_aic_bic <-
  left_join(df_m_aic, df_m_bic, by = join_by(Model, df)) %>%
  mutate(across(c(AIC, BIC), ~ .x - min(.x), .names = "{.col}_delta")) %>%
  select(starts_with("Model"), df, starts_with("AIC"), starts_with("BIC"))

# Sort by AIC
df_m_aic_bic %>% arrange(AIC)
## # A tibble: 7 × 7
##   Model_Number Model               df   AIC AIC_delta   BIC BIC_delta
##          <int> <chr>            <dbl> <dbl>     <dbl> <dbl>     <dbl>
## 1            1 m_gam_flow3_lag2  45.7  136.      0     303.      33.1
## 2            7 m_gam_sflow_lag2  41.1  138.      2.11  288.      18.4
## 3            2 m_gam_flow2_lag2  33.7  147.     10.8   270.       0  
## 4            4 m_lm_flow3_lag2   43    172.     36.1   329.      59.3
## 5            5 m_lm_flow2_lag2   31    176.     40.3   290.      19.7
## 6            3 m_gam_cat2_lag2   25.6  190.     53.7   283.      13.4
## 7            6 m_lm_cat2_lag2    23    201.     65.3   285.      15.5

Export the AIC/BIC table.

df_m_aic_bic %>%
  arrange(AIC) %>%
  mutate(across(where(is.numeric), ~ paste0(formatC(.x, digits = 1, format = "f"), "##"))) %>%
  write_csv(here("manuscript_synthesis/results/tables/chl_aic_weekly_models.csv"))

According to AIC, Model 1 (GAM 3-way interactions with s(Week)) was the model with the best fit. BIC preferred Model 2 (GAM 2-way interactions with s(Week)). BIC prefers simpler models, so I’m inclined to go with Model 2, especially since we really don’t need to look at the 3-way interaction. Before we proceed with either Models 1 or 2, let’s revisit the model diagnostics for each and take a closer look at how the back-transformed fitted values from the models match the observed values.

Model 1 diagnostics

appraise(m_gam_flow3_lag2)

shapiro.test(residuals(m_gam_flow3_lag2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_flow3_lag2)
## W = 0.97077, p-value = 1.533e-05
k.check(m_gam_flow3_lag2)
##         k'      edf   k-index p-value
## s(Week)  3 2.492179 0.9184442   0.065
draw(m_gam_flow3_lag2, select = 1, residuals = TRUE, rug = FALSE)

plot(m_gam_flow3_lag2, pages = 1, all.terms = TRUE)

anova(m_gam_flow3_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ Year_fct * Flow * StationCode + s(Week, bs = "cc", 
##     k = 5) + lag1 + lag2
## 
## Parametric Terms:
##                           df      F  p-value
## Year_fct                   4  6.153 9.94e-05
## Flow                       1  9.978  0.00179
## StationCode                3 30.183  < 2e-16
## lag1                       1 49.587 2.00e-11
## lag2                       1  3.281  0.07134
## Year_fct:Flow              4  2.202  0.06948
## Year_fct:StationCode      12  8.790 7.17e-14
## Flow:StationCode           3 10.049 2.92e-06
## Year_fct:Flow:StationCode 12  2.584  0.00304
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value
## s(Week) 2.492  3.000 11.75  <2e-16

Model 1 observed vs fitted values

df_chla_c2_lag2 <- df_chla_c2_lag %>% drop_na(lag1, lag2)

df_m_gam_flow3_lag2_fit <- df_chla_c2_lag2 %>%
  fitted_values(m_gam_flow3_lag2, data = .) %>%
  mutate(fitted_bt = exp(.fitted) / 1000)

plt_m_gam_flow3_lag2_fit <- df_m_gam_flow3_lag2_fit %>%
  ggplot(aes(x = fitted_bt, y = Chla)) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0, color = "red") +
  theme_bw() +
  labs(
    x = "Back-transformed Fitted Values",
    y = "Observed Values"
  )

plt_m_gam_flow3_lag2_fit

Let’s group by station.

plt_m_gam_flow3_lag2_fit + facet_wrap(vars(StationCode), scales = "free")

Now, group by year.

plt_m_gam_flow3_lag2_fit + facet_wrap(vars(Year_fct), scales = "free")

Model 2 diagnostics

appraise(m_gam_flow2_lag2)

shapiro.test(residuals(m_gam_flow2_lag2))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(m_gam_flow2_lag2)
## W = 0.96771, p-value = 5.379e-06
k.check(m_gam_flow2_lag2)
##         k'      edf  k-index p-value
## s(Week)  3 2.433331 0.953256    0.22
draw(m_gam_flow2_lag2, select = 1, residuals = TRUE, rug = FALSE)

plot(m_gam_flow2_lag2, pages = 1, all.terms = TRUE)

anova(m_gam_flow2_lag2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Chla_log ~ (Year_fct + Flow + StationCode)^2 + s(Week, bs = "cc", 
##     k = 5) + lag1 + lag2
## 
## Parametric Terms:
##                      df      F  p-value
## Year_fct              4  3.083   0.0167
## Flow                  1  6.004   0.0150
## StationCode           3 22.412 6.81e-13
## lag1                  1 78.401  < 2e-16
## lag2                  1  3.169   0.0763
## Year_fct:Flow         4  1.761   0.1372
## Year_fct:StationCode 12  6.748 1.63e-10
## Flow:StationCode      3 16.615 7.02e-10
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F  p-value
## s(Week) 2.433  3.000 10.13 9.24e-07

Model 2 observed vs fitted values

df_m_gam_flow2_lag2_fit <- df_chla_c2_lag2 %>%
  fitted_values(m_gam_flow2_lag2, data = .) %>%
  mutate(fitted_bt = exp(.fitted) / 1000)

plt_m_gam_flow2_lag2_fit <- df_m_gam_flow2_lag2_fit %>%
  ggplot(aes(x = fitted_bt, y = Chla)) +
  geom_point() +
  geom_abline(slope = 1, intercept = 0, color = "red") +
  theme_bw() +
  labs(
    x = "Back-transformed Fitted Values",
    y = "Observed Values"
  )

plt_m_gam_flow2_lag2_fit

Let’s group by station.

plt_m_gam_flow2_lag2_fit + facet_wrap(vars(StationCode), scales = "free")

Now, group by year.

plt_m_gam_flow2_lag2_fit + facet_wrap(vars(Year_fct), scales = "free")

Everything looks pretty decent with both models. Not perfect, but pretty good given the number of data points. Note that variability does increase as the chlorophyll values increase. Before proceeding with either Model 1 or 2, let’s look more closely at their results.

Model 1 Results

Effect of Flow by Station

# Calculate min and max flows for each station to narrow down x-axis in the plot
df_chla_flow_sta_summ <- df_chla_c2_lag2 %>%
  summarize(
    Flow_min = min(Flow),
    Flow_max = max(Flow),
    .by = c(StationCode)
  ) %>%
  mutate(
    Flow_buffer = (Flow_max - Flow_min) * 0.05,
    Flow_min = Flow_min - Flow_buffer,
    Flow_max = Flow_max + Flow_buffer
  )

# Calculate effects of flow on chlorophyll for each station holding the
  # non-focal variables constant - marginal effects/adjusted predictions
df_gam_flow3_sta_eff <-
  as.data.frame(
    predict_response(
      m_gam_flow3_lag2,
      terms = c("Flow", "StationCode"),
      margin = "marginalmeans"
    ),
    terms_to_colnames = TRUE
  ) %>%
  as_tibble() %>%
  # Narrow down range of flow values for each station
  left_join(df_chla_flow_sta_summ, by = join_by(StationCode)) %>%
  filter(Flow >= Flow_min & Flow <= Flow_max) %>%
  transmute(
    StationCode,
    Flow,
    # Back calculate model fits and confidence levels
    across(c(predicted, conf.low, conf.high), ~ exp(.x) / 1000)
  )

# Create effects plot
plt_gam_flow3_sta_eff <- df_gam_flow3_sta_eff %>%
  ggplot(aes(x = Flow, y = predicted)) +
  geom_point(
    data = df_chla_c2_lag2,
    aes(y = Chla, color = Year_fct),
    alpha = 0.6
  ) +
  geom_line(linewidth = 1) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.25) +
  facet_wrap(vars(StationCode), scales = "free") +
  theme_bw() +
  labs(
    x = "Flow (cfs)",
    y = expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))
  ) +
  scale_x_continuous(breaks = breaks_extended(6)) +
  scale_color_viridis_d(name = "Year", option = "C")

plt_gam_flow3_sta_eff

There is a lot of uncertainty in the model results at the highest flows. This seems problematic.

Effect of Flow by Year

# Calculate min and max flows for each station to narrow down x-axis in the plot
df_chla_flow_yr_summ <- df_chla_c2_lag2 %>%
  summarize(
    Flow_min = min(Flow),
    Flow_max = max(Flow),
    .by = c(Year_fct)
  ) %>%
  mutate(
    Flow_buffer = (Flow_max - Flow_min) * 0.05,
    Flow_min = Flow_min - Flow_buffer,
    Flow_max = Flow_max + Flow_buffer
  )

# Calculate effects of flow on chlorophyll for each year holding the
  # non-focal variables constant - marginal effects/adjusted predictions
df_gam_flow3_yr_eff <-
  as.data.frame(
    predict_response(
      m_gam_flow3_lag2,
      terms = c("Flow", "Year_fct"),
      margin = "marginalmeans"
    ),
    terms_to_colnames = TRUE
  ) %>%
  as_tibble() %>%
  # Narrow down range of flow values for each station
  left_join(df_chla_flow_yr_summ, by = join_by(Year_fct)) %>%
  filter(Flow >= Flow_min & Flow <= Flow_max) %>%
  transmute(
    Year_fct,
    Flow,
    # Back calculate model fits and confidence levels
    across(c(predicted, conf.low, conf.high), ~ exp(.x) / 1000)
  )

# Create effects plot
plt_gam_flow3_yr_eff <- df_gam_flow3_yr_eff %>%
  ggplot(aes(x = Flow, y = predicted)) +
  geom_point(
    data = df_chla_c2_lag2,
    aes(y = Chla, color = StationCode),
    alpha = 0.6
  ) +
  geom_line(linewidth = 1) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.25) +
  facet_wrap(vars(Year_fct), scales = "free") +
  theme_bw() +
  labs(
    x = "Flow (cfs)",
    y = expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))
  ) +
  scale_x_continuous(breaks = breaks_extended(6)) +
  scale_color_viridis_d(name = "Station", option = "C") +
  theme(
    legend.margin = margin(0, 0, 0, 0),
    legend.position = "inside",
    legend.position.inside = c(0.8, 0.3)
  )

plt_gam_flow3_yr_eff

These results look reasonable.

Station by Year Contrasts

# Estimated marginal means for station by year
em_gam3_sta_yr <- emmeans(m_gam_flow3_lag2, ~ StationCode | Year_fct)

# Tukey post-hoc contrasts
pairs(em_gam3_sta_yr)
## Year_fct = 2015:
##  contrast    estimate     SE  df t.ratio p.value
##  RD22 - STTD   0.5216 0.1200 240   4.332  0.0001
##  RD22 - LIB    1.2589 0.1490 240   8.452  <.0001
##  RD22 - RVB    1.3394 0.1520 240   8.795  <.0001
##  STTD - LIB    0.7372 0.1180 240   6.231  <.0001
##  STTD - RVB    0.8177 0.1210 240   6.763  <.0001
##  LIB - RVB     0.0805 0.0988 240   0.814  0.8477
## 
## Year_fct = 2016:
##  contrast    estimate     SE  df t.ratio p.value
##  RD22 - STTD  -0.1468 0.1200 240  -1.220  0.6149
##  RD22 - LIB    0.2205 0.1170 240   1.884  0.2377
##  RD22 - RVB    0.8395 0.1370 240   6.124  <.0001
##  STTD - LIB    0.3674 0.1140 240   3.225  0.0078
##  STTD - RVB    0.9863 0.1360 240   7.258  <.0001
##  LIB - RVB     0.6190 0.1190 240   5.198  <.0001
## 
## Year_fct = 2017:
##  contrast    estimate     SE  df t.ratio p.value
##  RD22 - STTD   0.8366 0.8440 240   0.991  0.7547
##  RD22 - LIB    2.1026 0.6950 240   3.027  0.0145
##  RD22 - RVB    2.3475 0.7000 240   3.351  0.0051
##  STTD - LIB    1.2660 0.5910 240   2.142  0.1430
##  STTD - RVB    1.5109 0.5940 240   2.544  0.0559
##  LIB - RVB     0.2448 0.2890 240   0.847  0.8321
## 
## Year_fct = 2018:
##  contrast    estimate     SE  df t.ratio p.value
##  RD22 - STTD   0.6507 0.1260 240   5.145  <.0001
##  RD22 - LIB    2.4232 0.2250 240  10.783  <.0001
##  RD22 - RVB    1.2589 0.1480 240   8.503  <.0001
##  STTD - LIB    1.7725 0.1820 240   9.760  <.0001
##  STTD - RVB    0.6082 0.1190 240   5.095  <.0001
##  LIB - RVB    -1.1643 0.1440 240  -8.111  <.0001
## 
## Year_fct = 2019:
##  contrast    estimate     SE  df t.ratio p.value
##  RD22 - STTD   1.2249 0.1440 240   8.506  <.0001
##  RD22 - LIB    1.7182 0.1750 240   9.812  <.0001
##  RD22 - RVB    1.7690 0.1760 240  10.036  <.0001
##  STTD - LIB    0.4934 0.1220 240   4.060  0.0004
##  STTD - RVB    0.5441 0.1170 240   4.653  <.0001
##  LIB - RVB     0.0508 0.1110 240   0.458  0.9680
## 
## P value adjustment: tukey method for comparing a family of 4 estimates
# Create table of contrasts and convert it to a tibble for plot
df_gam3_sta_yr <- em_gam3_sta_yr %>%
  cld(sort = FALSE, Letters = letters) %>%
  as_tibble() %>%
  mutate(
    group = str_remove_all(.group, fixed(" ")),
    # back transform log-transformed results
    across(c(emmean, lower.CL, upper.CL), ~ exp(.x) / 1000)
  ) %>%
  # Add min and max values of observed data to the Tukey post-hoc results and
    # calculate vertical positioning of letters
  left_join(
    df_chla_c2_lag2 %>%
      summarize(
        max_val = max(Chla),
        min_val = min(Chla),
        .by = Year_fct
    ),
    by = join_by(Year_fct)
  ) %>%
  mutate(max_val = if_else(upper.CL > max_val, upper.CL, max_val)) %>%
  group_by(Year_fct) %>%
  mutate(max_val = max(max_val)) %>%
  ungroup() %>%
  mutate(y_pos = max_val + (max_val - min_val) / 10) %>%
  select(
    StationCode,
    Year_fct,
    emmean,
    lower.CL,
    upper.CL,
    group,
    y_pos
  )

# Create boxplot showing Tukey post-hoc results
plt_gam3_sta_yr <- df_gam3_sta_yr %>%
  ggplot(
    aes(
      x = StationCode,
      y = emmean,
      ymin = lower.CL,
      ymax = upper.CL
    )
  ) +
  geom_boxplot(
    data = df_chla_c2_lag2,
    aes(x = StationCode, y = Chla),
    inherit.aes = FALSE
  ) +
  geom_crossbar(color = "grey82", fill = "grey", alpha = 0.7, linewidth = 0.1) +
  geom_point(color = "red") +
  geom_text(aes(y = y_pos, label = group), size = 3.5) +
  facet_wrap(vars(Year_fct), scales = "free_y") +
  xlab("Station") +
  ylab(expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))) +
  theme_bw()

plt_gam3_sta_yr

Many of the contrasts are significant. The model under predicts some values, and 2017 has a lot of uncertainty.

Year by Station Contrasts

# Estimated marginal means for year by station
em_gam3_yr_sta <- emmeans(m_gam_flow3_lag2, ~ Year_fct | StationCode)

# Tukey post-hoc contrasts
pairs(em_gam3_yr_sta)
## StationCode = RD22:
##  contrast                    estimate    SE  df t.ratio p.value
##  Year_fct2015 - Year_fct2016   0.5138 0.129 240   3.993  0.0008
##  Year_fct2015 - Year_fct2017  -0.9042 0.658 240  -1.374  0.6451
##  Year_fct2015 - Year_fct2018   0.2795 0.116 240   2.405  0.1174
##  Year_fct2015 - Year_fct2019   0.0685 0.115 240   0.594  0.9758
##  Year_fct2016 - Year_fct2017  -1.4180 0.658 240  -2.154  0.2009
##  Year_fct2016 - Year_fct2018  -0.2343 0.118 240  -1.991  0.2736
##  Year_fct2016 - Year_fct2019  -0.4453 0.120 240  -3.709  0.0024
##  Year_fct2017 - Year_fct2018   1.1837 0.658 240   1.799  0.3765
##  Year_fct2017 - Year_fct2019   0.9726 0.656 240   1.483  0.5745
##  Year_fct2018 - Year_fct2019  -0.2111 0.108 240  -1.952  0.2928
## 
## StationCode = STTD:
##  contrast                    estimate    SE  df t.ratio p.value
##  Year_fct2015 - Year_fct2016  -0.1547 0.114 240  -1.359  0.6547
##  Year_fct2015 - Year_fct2017  -0.5892 0.561 240  -1.050  0.8318
##  Year_fct2015 - Year_fct2018   0.4085 0.117 240   3.494  0.0051
##  Year_fct2015 - Year_fct2019   0.7717 0.124 240   6.239  <.0001
##  Year_fct2016 - Year_fct2017  -0.4346 0.561 240  -0.775  0.9375
##  Year_fct2016 - Year_fct2018   0.5632 0.128 240   4.402  0.0002
##  Year_fct2016 - Year_fct2019   0.9263 0.137 240   6.741  <.0001
##  Year_fct2017 - Year_fct2018   0.9978 0.562 240   1.774  0.3911
##  Year_fct2017 - Year_fct2019   1.3609 0.564 240   2.412  0.1156
##  Year_fct2018 - Year_fct2019   0.3631 0.117 240   3.107  0.0179
## 
## StationCode = LIB:
##  contrast                    estimate    SE  df t.ratio p.value
##  Year_fct2015 - Year_fct2016  -0.5246 0.115 240  -4.547  0.0001
##  Year_fct2015 - Year_fct2017  -0.0604 0.218 240  -0.277  0.9987
##  Year_fct2015 - Year_fct2018   1.4438 0.158 240   9.138  <.0001
##  Year_fct2015 - Year_fct2019   0.5278 0.117 240   4.516  0.0001
##  Year_fct2016 - Year_fct2017   0.4641 0.223 240   2.077  0.2333
##  Year_fct2016 - Year_fct2018   1.9684 0.200 240   9.849  <.0001
##  Year_fct2016 - Year_fct2019   1.0523 0.146 240   7.196  <.0001
##  Year_fct2017 - Year_fct2018   1.5042 0.256 240   5.871  <.0001
##  Year_fct2017 - Year_fct2019   0.5882 0.230 240   2.556  0.0820
##  Year_fct2018 - Year_fct2019  -0.9160 0.140 240  -6.545  <.0001
## 
## StationCode = RVB:
##  contrast                    estimate    SE  df t.ratio p.value
##  Year_fct2015 - Year_fct2016   0.0139 0.106 240   0.132  0.9999
##  Year_fct2015 - Year_fct2017   0.1039 0.218 240   0.477  0.9894
##  Year_fct2015 - Year_fct2018   0.1991 0.102 240   1.957  0.2907
##  Year_fct2015 - Year_fct2019   0.4981 0.110 240   4.544  0.0001
##  Year_fct2016 - Year_fct2017   0.0900 0.221 240   0.408  0.9942
##  Year_fct2016 - Year_fct2018   0.1851 0.109 240   1.696  0.4384
##  Year_fct2016 - Year_fct2019   0.4842 0.117 240   4.145  0.0005
##  Year_fct2017 - Year_fct2018   0.0951 0.219 240   0.435  0.9925
##  Year_fct2017 - Year_fct2019   0.3942 0.222 240   1.772  0.3924
##  Year_fct2018 - Year_fct2019   0.2990 0.105 240   2.850  0.0379
## 
## P value adjustment: tukey method for comparing a family of 5 estimates
# Create table of contrasts and convert it to a tibble for plot
df_gam3_yr_sta <- em_gam3_yr_sta %>%
  cld(sort = FALSE, Letters = letters) %>%
  as_tibble() %>%
  mutate(
    group = str_remove_all(.group, fixed(" ")),
    # back transform log-transformed results
    across(c(emmean, lower.CL, upper.CL), ~ exp(.x) / 1000)
  ) %>%
  # Add min and max values of observed data to the Tukey post-hoc results and
    # calculate vertical positioning of letters
  left_join(
    df_chla_c2_lag2 %>%
      summarize(
        max_val = max(Chla),
        min_val = min(Chla),
        .by = StationCode
    ),
    by = join_by(StationCode)
  ) %>%
  mutate(max_val = if_else(upper.CL > max_val, upper.CL, max_val)) %>%
  group_by(StationCode) %>%
  mutate(max_val = max(max_val)) %>%
  ungroup() %>%
  mutate(y_pos = max_val + (max_val - min_val) / 10) %>%
  select(
    StationCode,
    Year_fct,
    emmean,
    lower.CL,
    upper.CL,
    group,
    y_pos
  )

# Create boxplot showing Tukey post-hoc results
plt_gam3_yr_sta <- df_gam3_yr_sta %>%
  ggplot(
    aes(
      x = Year_fct,
      y = emmean,
      ymin = lower.CL,
      ymax = upper.CL
    )
  ) +
  geom_boxplot(
    data = df_chla_c2_lag2,
    aes(x = Year_fct, y = Chla),
    inherit.aes = FALSE
  ) +
  geom_crossbar(color = "grey82", fill = "grey", alpha = 0.7, linewidth = 0.1) +
  geom_point(color = "red") +
  geom_text(aes(y = y_pos, label = group), size = 3.5) +
  facet_wrap(vars(StationCode), scales = "free_y") +
  xlab("Year") +
  ylab(expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))) +
  theme_bw()

plt_gam3_yr_sta

Again, the model under predicts some values, and 2017 has a lot of uncertainty.

Model 2 Results

Effect of Flow by Station

# Calculate effects of flow on chlorophyll for each station holding the
  # non-focal variables constant - marginal effects/adjusted predictions
df_gam_flow2_sta_eff <-
  as.data.frame(
    predict_response(
      m_gam_flow2_lag2,
      terms = c("Flow", "StationCode"),
      margin = "marginalmeans"
    ),
    terms_to_colnames = TRUE
  ) %>%
  as_tibble() %>%
  # Narrow down range of flow values for each station
  left_join(df_chla_flow_sta_summ, by = join_by(StationCode)) %>%
  filter(Flow >= Flow_min & Flow <= Flow_max) %>%
  transmute(
    StationCode,
    Flow,
    # Back calculate model fits and confidence levels
    across(c(predicted, conf.low, conf.high), ~ exp(.x) / 1000)
  )

# Create effects plot
plt_gam_flow2_sta_eff <- df_gam_flow2_sta_eff %>%
  ggplot(aes(x = Flow, y = predicted)) +
  geom_point(
    data = df_chla_c2_lag2,
    aes(y = Chla, color = Year_fct),
    alpha = 0.6
  ) +
  geom_line(linewidth = 1) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.25) +
  facet_wrap(vars(StationCode), scales = "free") +
  theme_bw() +
  labs(
    x = "Flow (cfs)",
    y = expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))
  ) +
  scale_x_continuous(breaks = breaks_extended(6)) +
  scale_color_viridis_d(name = "Year", option = "C")

plt_gam_flow2_sta_eff

There is much less uncertainty in the model results at the highest flows when compared to model 1. These results seem more reasonable.

Effect of Flow by Year

# Calculate effects of flow on chlorophyll for each year holding the
  # non-focal variables constant - marginal effects/adjusted predictions
df_gam_flow2_yr_eff <-
  as.data.frame(
    predict_response(
      m_gam_flow2_lag2,
      terms = c("Flow", "Year_fct"),
      margin = "marginalmeans"
    ),
    terms_to_colnames = TRUE
  ) %>%
  as_tibble() %>%
  # Narrow down range of flow values for each station
  left_join(df_chla_flow_yr_summ, by = join_by(Year_fct)) %>%
  filter(Flow >= Flow_min & Flow <= Flow_max) %>%
  transmute(
    Year_fct,
    Flow,
    # Back calculate model fits and confidence levels
    across(c(predicted, conf.low, conf.high), ~ exp(.x) / 1000)
  )

# Create effects plot
plt_gam_flow2_yr_eff <- df_gam_flow2_yr_eff %>%
  ggplot(aes(x = Flow, y = predicted)) +
  geom_point(
    data = df_chla_c2_lag2,
    aes(y = Chla, color = StationCode),
    alpha = 0.6
  ) +
  geom_line(linewidth = 1) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.25) +
  facet_wrap(vars(Year_fct), scales = "free") +
  theme_bw() +
  labs(
    x = "Flow (cfs)",
    y = expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))
  ) +
  scale_x_continuous(breaks = breaks_extended(6)) +
  scale_color_viridis_d(name = "Station", option = "C") +
  theme(
    legend.margin = margin(0, 0, 0, 0),
    legend.position = "inside",
    legend.position.inside = c(0.8, 0.3)
  )

plt_gam_flow2_yr_eff

These results look reasonable, but not interesting.

Station by Year Contrasts

# Estimated marginal means for station by year
em_gam2_sta_yr <- emmeans(m_gam_flow2_lag2, ~ StationCode | Year_fct)

# Tukey post-hoc contrasts
pairs(em_gam2_sta_yr)
## Year_fct = 2015:
##  contrast     estimate     SE  df t.ratio p.value
##  RD22 - STTD  0.402971 0.1200 252   3.351  0.0051
##  RD22 - LIB   1.028625 0.1450 252   7.116  <.0001
##  RD22 - RVB   1.092147 0.1480 252   7.382  <.0001
##  STTD - LIB   0.625654 0.1180 252   5.313  <.0001
##  STTD - RVB   0.689176 0.1200 252   5.730  <.0001
##  LIB - RVB    0.063522 0.0994 252   0.639  0.9193
## 
## Year_fct = 2016:
##  contrast     estimate     SE  df t.ratio p.value
##  RD22 - STTD -0.081479 0.1230 252  -0.664  0.9103
##  RD22 - LIB   0.235797 0.1200 252   1.967  0.2035
##  RD22 - RVB   0.760663 0.1400 252   5.435  <.0001
##  STTD - LIB   0.317275 0.1170 252   2.713  0.0357
##  STTD - RVB   0.842141 0.1360 252   6.178  <.0001
##  LIB - RVB    0.524866 0.1210 252   4.349  0.0001
## 
## Year_fct = 2017:
##  contrast     estimate     SE  df t.ratio p.value
##  RD22 - STTD  0.178777 0.1480 252   1.209  0.6218
##  RD22 - LIB   0.699527 0.1470 252   4.767  <.0001
##  RD22 - RVB   0.983559 0.1570 252   6.257  <.0001
##  STTD - LIB   0.520751 0.1430 252   3.636  0.0019
##  STTD - RVB   0.804782 0.1520 252   5.306  <.0001
##  LIB - RVB    0.284032 0.1130 252   2.513  0.0604
## 
## Year_fct = 2018:
##  contrast     estimate     SE  df t.ratio p.value
##  RD22 - STTD  0.529999 0.1260 252   4.206  0.0002
##  RD22 - LIB   1.988455 0.2110 252   9.418  <.0001
##  RD22 - RVB   1.031200 0.1430 252   7.188  <.0001
##  STTD - LIB   1.458457 0.1760 252   8.298  <.0001
##  STTD - RVB   0.501202 0.1220 252   4.120  0.0003
##  LIB - RVB   -0.957255 0.1420 252  -6.751  <.0001
## 
## Year_fct = 2019:
##  contrast     estimate     SE  df t.ratio p.value
##  RD22 - STTD  1.036144 0.1400 252   7.381  <.0001
##  RD22 - LIB   1.478459 0.1690 252   8.769  <.0001
##  RD22 - RVB   1.479432 0.1680 252   8.831  <.0001
##  STTD - LIB   0.442315 0.1230 252   3.582  0.0023
##  STTD - RVB   0.443289 0.1180 252   3.767  0.0012
##  LIB - RVB    0.000974 0.1140 252   0.009  1.0000
## 
## P value adjustment: tukey method for comparing a family of 4 estimates
# Create table of contrasts and convert it to a tibble for plot
df_gam2_sta_yr <- em_gam2_sta_yr %>%
  cld(sort = FALSE, Letters = letters) %>%
  as_tibble() %>%
  mutate(
    group = str_remove_all(.group, fixed(" ")),
    # back transform log-transformed results
    across(c(emmean, lower.CL, upper.CL), ~ exp(.x) / 1000)
  ) %>%
  # Add min and max values of observed data to the Tukey post-hoc results and
    # calculate vertical positioning of letters
  left_join(
    df_chla_c2_lag2 %>%
      summarize(
        max_val = max(Chla),
        min_val = min(Chla),
        .by = Year_fct
    ),
    by = join_by(Year_fct)
  ) %>%
  mutate(max_val = if_else(upper.CL > max_val, upper.CL, max_val)) %>%
  group_by(Year_fct) %>%
  mutate(max_val = max(max_val)) %>%
  ungroup() %>%
  mutate(y_pos = max_val + (max_val - min_val) / 10) %>%
  select(
    StationCode,
    Year_fct,
    emmean,
    lower.CL,
    upper.CL,
    group,
    y_pos
  )

# Create boxplot showing Tukey post-hoc results
plt_gam2_sta_yr <- df_gam2_sta_yr %>%
  ggplot(
    aes(
      x = StationCode,
      y = emmean,
      ymin = lower.CL,
      ymax = upper.CL
    )
  ) +
  geom_boxplot(
    data = df_chla_c2_lag2,
    aes(x = StationCode, y = Chla),
    inherit.aes = FALSE
  ) +
  geom_crossbar(color = "grey82", fill = "grey", alpha = 0.7, linewidth = 0.1) +
  geom_point(color = "red") +
  geom_text(aes(y = y_pos, label = group), size = 3.5) +
  facet_wrap(vars(Year_fct), scales = "free_y") +
  xlab("Station") +
  ylab(expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))) +
  theme_bw()

plt_gam2_sta_yr

Many of the contrasts are significant. The model under predicts RD22, but the uncertainty looks reasonable for all values.

Year by Station Contrasts

# Estimated marginal means for year by station
em_gam2_yr_sta <- emmeans(m_gam_flow2_lag2, ~ Year_fct | StationCode)

# Tukey post-hoc contrasts
pairs(em_gam2_yr_sta)
## StationCode = RD22:
##  contrast                    estimate    SE  df t.ratio p.value
##  Year_fct2015 - Year_fct2016   0.3531 0.124 252   2.855  0.0373
##  Year_fct2015 - Year_fct2017   0.1997 0.140 252   1.428  0.6100
##  Year_fct2015 - Year_fct2018   0.2099 0.113 252   1.856  0.3443
##  Year_fct2015 - Year_fct2019   0.0116 0.112 252   0.103  1.0000
##  Year_fct2016 - Year_fct2017  -0.1534 0.139 252  -1.104  0.8042
##  Year_fct2016 - Year_fct2018  -0.1432 0.116 252  -1.233  0.7324
##  Year_fct2016 - Year_fct2019  -0.3415 0.118 252  -2.902  0.0326
##  Year_fct2017 - Year_fct2018   0.0102 0.134 252   0.076  1.0000
##  Year_fct2017 - Year_fct2019  -0.1881 0.135 252  -1.396  0.6307
##  Year_fct2018 - Year_fct2019  -0.1983 0.107 252  -1.851  0.3467
## 
## StationCode = STTD:
##  contrast                    estimate    SE  df t.ratio p.value
##  Year_fct2015 - Year_fct2016  -0.1313 0.118 252  -1.117  0.7972
##  Year_fct2015 - Year_fct2017  -0.0245 0.192 252  -0.128  0.9999
##  Year_fct2015 - Year_fct2018   0.3369 0.120 252   2.810  0.0422
##  Year_fct2015 - Year_fct2019   0.6447 0.125 252   5.156  <.0001
##  Year_fct2016 - Year_fct2017   0.1068 0.193 252   0.554  0.9814
##  Year_fct2016 - Year_fct2018   0.4682 0.131 252   3.586  0.0037
##  Year_fct2016 - Year_fct2019   0.7761 0.138 252   5.616  <.0001
##  Year_fct2017 - Year_fct2018   0.3614 0.199 252   1.817  0.3660
##  Year_fct2017 - Year_fct2019   0.6693 0.206 252   3.252  0.0113
##  Year_fct2018 - Year_fct2019   0.3078 0.120 252   2.564  0.0804
## 
## StationCode = LIB:
##  contrast                    estimate    SE  df t.ratio p.value
##  Year_fct2015 - Year_fct2016  -0.4397 0.116 252  -3.805  0.0017
##  Year_fct2015 - Year_fct2017  -0.1294 0.174 252  -0.744  0.9459
##  Year_fct2015 - Year_fct2018   1.1697 0.154 252   7.603  <.0001
##  Year_fct2015 - Year_fct2019   0.4614 0.119 252   3.865  0.0013
##  Year_fct2016 - Year_fct2017   0.3103 0.181 252   1.717  0.4251
##  Year_fct2016 - Year_fct2018   1.6094 0.192 252   8.389  <.0001
##  Year_fct2016 - Year_fct2019   0.9011 0.146 252   6.170  <.0001
##  Year_fct2017 - Year_fct2018   1.2991 0.216 252   6.002  <.0001
##  Year_fct2017 - Year_fct2019   0.5908 0.190 252   3.102  0.0181
##  Year_fct2018 - Year_fct2019  -0.7083 0.139 252  -5.096  <.0001
## 
## StationCode = RVB:
##  contrast                    estimate    SE  df t.ratio p.value
##  Year_fct2015 - Year_fct2016   0.0216 0.108 252   0.201  0.9996
##  Year_fct2015 - Year_fct2017   0.0911 0.174 252   0.525  0.9848
##  Year_fct2015 - Year_fct2018   0.1489 0.104 252   1.437  0.6042
##  Year_fct2015 - Year_fct2019   0.3989 0.111 252   3.601  0.0035
##  Year_fct2016 - Year_fct2017   0.0695 0.178 252   0.390  0.9951
##  Year_fct2016 - Year_fct2018   0.1273 0.112 252   1.135  0.7876
##  Year_fct2016 - Year_fct2019   0.3772 0.119 252   3.183  0.0141
##  Year_fct2017 - Year_fct2018   0.0578 0.176 252   0.329  0.9975
##  Year_fct2017 - Year_fct2019   0.3078 0.180 252   1.711  0.4289
##  Year_fct2018 - Year_fct2019   0.2499 0.108 252   2.315  0.1434
## 
## P value adjustment: tukey method for comparing a family of 5 estimates
# Create table of contrasts and convert it to a tibble for plot
df_gam2_yr_sta <- em_gam2_yr_sta %>%
  cld(sort = FALSE, Letters = letters) %>%
  as_tibble() %>%
  mutate(
    group = str_remove_all(.group, fixed(" ")),
    # back transform log-transformed results
    across(c(emmean, lower.CL, upper.CL), ~ exp(.x) / 1000)
  ) %>%
  # Add min and max values of observed data to the Tukey post-hoc results and
    # calculate vertical positioning of letters
  left_join(
    df_chla_c2_lag2 %>%
      summarize(
        max_val = max(Chla),
        min_val = min(Chla),
        .by = StationCode
    ),
    by = join_by(StationCode)
  ) %>%
  mutate(max_val = if_else(upper.CL > max_val, upper.CL, max_val)) %>%
  group_by(StationCode) %>%
  mutate(max_val = max(max_val)) %>%
  ungroup() %>%
  mutate(y_pos = max_val + (max_val - min_val) / 10) %>%
  select(
    StationCode,
    Year_fct,
    emmean,
    lower.CL,
    upper.CL,
    group,
    y_pos
  )

# Create boxplot showing Tukey post-hoc results
plt_gam2_yr_sta <- df_gam2_yr_sta %>%
  ggplot(
    aes(
      x = Year_fct,
      y = emmean,
      ymin = lower.CL,
      ymax = upper.CL
    )
  ) +
  geom_boxplot(
    data = df_chla_c2_lag2,
    aes(x = Year_fct, y = Chla),
    inherit.aes = FALSE
  ) +
  geom_crossbar(color = "grey82", fill = "grey", alpha = 0.7, linewidth = 0.1) +
  geom_point(color = "red") +
  geom_text(aes(y = y_pos, label = group), size = 3.5) +
  facet_wrap(vars(StationCode), scales = "free_y") +
  xlab("Year") +
  ylab(expression(Chlorophyll~Fluoresence~(mu*g~L^{-1}))) +
  theme_bw()

plt_gam2_yr_sta

Again, the model under predicts RD22, but the uncertainty looks reasonable for all values.

Conclusions

After looking at the results more closely from Models 1 and 2, I think we should use Model 2 (GAM 2-way interactions with s(Week)) because it’s the simpler model, supported by BIC, and the uncertainty around model predictions is less than with Model 1.