Some material adapted from: https://ourcodingclub.github.io/tutorials/mixed-models/#six
As we discussed in the powerpoint, mixed models are linear models where some of the factors are “fixed” or “explanatory” variables that you want to study. Are variables are known as “random” or “grouping” variables that you need to account for, but you don’t really care about them.
Additionally, the data for our random effect is just a sample of all the possibilities: with unlimited time and funding we might have sampled every cage possible, every school in the country, every chocolate in the box), but we usually tend to generalise results to a whole population based on representative sampling. We don’t care about estimating how much better pupils in school A have done compared to pupils in school B, but we know that their respective teachers might be a reason why their scores would be different, and we’d like to know how much variation is attributable to this when we predict scores for pupils in school Z. For the fish, we don’t really care whether cage 1 is better than cage 2, but we need to know what the cage effect is.
Let’s try this out with data on smelt cages.
Our study questions: * Does site impact fish growth? * Does season impact fish growth?
cagedata = read_csv("SmeltCageData.csv")
## Rows: 1162 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Deployment, Period, Site, ID, Cage, CageRep
## dbl (3): FL_cm, Weight_g, K
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(cagedata)
## spc_tbl_ [1,162 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Deployment: chr [1:1162] "Winter" "Winter" "Winter" "Winter" ...
## $ Period : chr [1:1162] "Post" "Post" "Post" "Post" ...
## $ Site : chr [1:1162] "DWSC" "DWSC" "DWSC" "DWSC" ...
## $ ID : chr [1:1162] "Winter_122" "Winter_123" "Winter_124" "Winter_125" ...
## $ FL_cm : num [1:1162] 6.2 6.1 6 6 6.6 6.9 6.2 6.2 6.5 6.4 ...
## $ Weight_g : num [1:1162] 1.2 1.01 1.44 0.91 1.8 1.86 1.37 1.48 1.33 1.32 ...
## $ Cage : chr [1:1162] "B" "B" "B" "B" ...
## $ K : num [1:1162] 0.504 0.445 0.667 0.421 0.626 0.566 0.575 0.621 0.484 0.504 ...
## $ CageRep : chr [1:1162] "Winter_DWSC_B" "Winter_DWSC_B" "Winter_DWSC_B" "Winter_DWSC_B" ...
## - attr(*, "spec")=
## .. cols(
## .. Deployment = col_character(),
## .. Period = col_character(),
## .. Site = col_character(),
## .. ID = col_character(),
## .. FL_cm = col_double(),
## .. Weight_g = col_double(),
## .. Cage = col_character(),
## .. K = col_double(),
## .. CageRep = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
#Let's take a look at some of these variables
table(cagedata$Deployment, cagedata$Cage)
##
## A B B3 B4 B5 C Cage A Cage B Cage C D
## Fall 162 167 30 30 30 159 0 0 0 0
## Summer 0 0 29 30 29 0 39 39 48 0
## Winter 0 124 0 0 0 122 0 0 0 124
table(cagedata$Deployment, cagedata$Site)
##
## DWSC FCCL RV SM Yolo
## Fall 0 90 161 166 161
## Summer 0 88 126 0 0
## Winter 178 0 192 0 0
table(cagedata$Deployment, cagedata$CageRep)
##
## Fall_FCCL_B3 Fall_FCCL_B4 Fall_FCCL_B5 Fall_RV_A Fall_RV_B Fall_RV_C
## Fall 30 30 30 54 54 53
## Summer 0 0 0 0 0 0
## Winter 0 0 0 0 0 0
##
## Fall_SM_A Fall_SM_B Fall_SM_C Fall_Yolo_A Fall_Yolo_B Fall_Yolo_C
## Fall 56 58 52 52 55 54
## Summer 0 0 0 0 0 0
## Winter 0 0 0 0 0 0
##
## Summer_FCCL_B3 Summer_FCCL_B4 Summer_FCCL_B5 Summer_RV_Cage A
## Fall 0 0 0 0
## Summer 29 30 29 39
## Winter 0 0 0 0
##
## Summer_RV_Cage B Summer_RV_Cage C Winter_DWSC_B Winter_DWSC_C
## Fall 0 0 0 0
## Summer 39 48 0 0
## Winter 0 0 60 58
##
## Winter_DWSC_D Winter_RV_B Winter_RV_C Winter_RV_D
## Fall 0 0 0 0
## Summer 0 0 0 0
## Winter 60 64 64 64
table(cagedata$Site, cagedata$Cage)
##
## A B B3 B4 B5 C Cage A Cage B Cage C D
## DWSC 0 60 0 0 0 58 0 0 0 60
## FCCL 0 0 59 60 59 0 0 0 0 0
## RV 54 118 0 0 0 117 39 39 48 64
## SM 56 58 0 0 0 52 0 0 0 0
## Yolo 52 55 0 0 0 54 0 0 0 0
In this case, cage is nested within Site and Deployment. Each cage designation only makes sense within a deployment and a site. Cage A at Rio Vista in summer is not the same as cage A in Suisun in winter.
Site is crossed with Season. Rio Vista is still Rio Vista whether it is winter or summer.
This is obviously an unbalanced design, and it’s going to be pretty hard to tease anything apart, but we’re going to give it a try.
Before we can model anything, we need to see what the data look like and assess whether we need any transformations.
#histogram of forklength
ggplot(cagedata, aes(x = FL_cm)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).
#remarkably normal
#histogram of condition factor
ggplot(cagedata, aes(x = K)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).
#also looks ggood
I like seeing whether I can pick out trends in the raw data that I can model before I get started.
ggplot(cagedata, aes(x = Site, y = FL_cm)) + geom_boxplot()+
facet_wrap(~Deployment)
## Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
From this boxplot, it looks like we might see a difference in fish length between Rio Vista and FCCL in the fall, but maybe not in other seasons. Also, Fall is lower than other seasons.
But how do we show this statistically?
We know fish within a cage are not independent replicates, so we could just average the length within each cage
avesmelt = group_by(cagedata, Site, Deployment, Cage) %>%
summarize(FL = mean(FL_cm, na.rm =T))
## `summarise()` has grouped output by 'Site', 'Deployment'. You can override
## using the `.groups` argument.
lm1 = lm(FL ~ Site + Deployment, data = avesmelt)
summary(lm1)
##
## Call:
## lm(formula = FL ~ Site + Deployment, data = avesmelt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.297099 -0.055674 0.008967 0.070923 0.259165
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.956921 0.138702 42.948 < 2e-16 ***
## SiteFCCL 0.006845 0.144870 0.047 0.9629
## SiteRV -0.053123 0.118286 -0.449 0.6590
## SiteSM -0.070752 0.161969 -0.437 0.6677
## SiteYolo -0.047196 0.161969 -0.291 0.7743
## DeploymentSummer -0.855155 0.083641 -10.224 1.12e-08 ***
## DeploymentWinter 0.212869 0.110646 1.924 0.0713 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1449 on 17 degrees of freedom
## Multiple R-squared: 0.9177, Adjusted R-squared: 0.8886
## F-statistic: 31.59 on 6 and 17 DF, p-value: 2.588e-08
plot(lm1)
However, when we average all the values of fish within a cage, we lose a lot of information, which is a shame.
A better way, would be to include a random effect of cage (nested within site).
m1 = lmer(FL_cm ~ Site + Deployment + (1|Site/Cage), data = cagedata)
summary(m1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: FL_cm ~ Site + Deployment + (1 | Site/Cage)
## Data: cagedata
##
## REML criterion at convergence: 1863.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4031 -0.6274 0.0352 0.6226 3.9207
##
## Random effects:
## Groups Name Variance Std.Dev.
## Cage:Site (Intercept) 0.006519 0.08074
## Site (Intercept) 0.052714 0.22960
## Residual 0.283572 0.53251
## Number of obs: 1158, groups: Cage:Site, 19; Site, 5
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 6.023e+00 2.453e-01 1.500e-08 24.554 1.0000
## SiteFCCL -7.512e-02 3.415e-01 1.408e-08 -0.220 1.0000
## SiteRV -1.155e-01 3.353e-01 1.309e-08 -0.344 1.0000
## SiteSM -1.379e-01 3.417e-01 1.412e-08 -0.404 1.0000
## SiteYolo -1.134e-01 3.418e-01 1.413e-08 -0.332 1.0000
## DeploymentSummer -8.277e-01 5.979e-02 4.290e+01 -13.842 <2e-16 ***
## DeploymentWinter 1.465e-01 6.076e-02 1.348e+02 2.410 0.0173 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) StFCCL SiteRV SiteSM SiteYl DplymS
## SiteFCCL -0.713
## SiteRV -0.712 0.514
## SiteSM -0.718 0.512 0.511
## SiteYolo -0.718 0.512 0.511 0.515
## DplymntSmmr -0.064 -0.041 -0.033 0.046 0.046
## DplymntWntr -0.248 0.156 0.103 0.178 0.178 0.258
Comparing the outputs from the two models, the fixed effects of the mixed model looks a lot like the effects of the model with the averages within each cage, however the mixed model also gives us information about the random effects, which is helpful in understanding our data and planing for future studies.
The output from the linear model tells us the significant difference
between the intercept (DWSC and Fall) versus teh other levels of the
factor. If we want to know all the pairwise comparisons, we need to do a
post-hoc. The emmeans
package has ways of doing a bunch of
different post-hocs.
library(emmeans)
#first let's look at the regular linear model
emmeans(lm1, pairwise ~ Deployment)
## $emmeans
## Deployment emmean SE df lower.CL upper.CL
## Fall 5.92 0.0495 17 5.82 6.03
## Summer 5.07 0.0724 17 4.92 5.22
## Winter 6.14 0.0836 17 5.96 6.31
##
## Results are averaged over the levels of: Site
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## Fall - Summer 0.855 0.0836 17 10.224 <.0001
## Fall - Winter -0.213 0.1106 17 -1.924 0.1624
## Summer - Winter -1.068 0.1106 17 -9.653 <.0001
##
## Results are averaged over the levels of: Site
## P value adjustment: tukey method for comparing a family of 3 estimates
emmeans(lm1, pairwise ~ Site)
## $emmeans
## Site emmean SE df lower.CL upper.CL
## DWSC 5.74 0.1080 17 5.52 5.97
## FCCL 5.75 0.0683 17 5.61 5.89
## RV 5.69 0.0483 17 5.59 5.79
## SM 5.67 0.0996 17 5.46 5.88
## Yolo 5.70 0.0996 17 5.49 5.91
##
## Results are averaged over the levels of: Deployment
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## DWSC - FCCL -0.00684 0.1449 17 -0.047 1.0000
## DWSC - RV 0.05312 0.1183 17 0.449 0.9908
## DWSC - SM 0.07075 0.1620 17 0.437 0.9917
## DWSC - Yolo 0.04720 0.1620 17 0.291 0.9983
## FCCL - RV 0.05997 0.0836 17 0.717 0.9497
## FCCL - SM 0.07760 0.1106 17 0.701 0.9534
## FCCL - Yolo 0.05404 0.1106 17 0.488 0.9874
## RV - SM 0.01763 0.1106 17 0.159 0.9998
## RV - Yolo -0.00593 0.1106 17 -0.054 1.0000
## SM - Yolo -0.02356 0.1183 17 -0.199 0.9996
##
## Results are averaged over the levels of: Deployment
## P value adjustment: tukey method for comparing a family of 5 estimates
Now let’s look at the mixed model.
emmeans(m1, pairwise ~ Deployment)
## $emmeans
## Deployment emmean SE df lower.CL upper.CL
## Fall 5.93 0.108 3204 5.72 6.15
## Summer 5.11 0.117 834 4.88 5.34
## Winter 6.08 0.115 1449 5.86 6.31
##
## Results are averaged over the levels of: Site
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## Fall - Summer 0.828 0.0615 92.8 13.462 <.0001
## Fall - Winter -0.146 0.0621 265.7 -2.358 0.0498
## Summer - Winter -0.974 0.0760 52.9 -12.814 <.0001
##
## Results are averaged over the levels of: Site
## Degrees-of-freedom method: kenward-roger
## P value adjustment: tukey method for comparing a family of 3 estimates
emmeans(m1, pairwise ~ Site)
## $emmeans
## Site emmean SE df lower.CL upper.CL
## DWSC 5.80 0.241 2268 5.32 6.27
## FCCL 5.72 0.239 2717 5.25 6.19
## RV 5.68 0.233 13533 5.22 6.14
## SM 5.66 0.240 2761 5.19 6.13
## Yolo 5.68 0.240 2768 5.21 6.15
##
## Results are averaged over the levels of: Deployment
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## DWSC - FCCL 0.07512 0.342 2211 0.220 0.9995
## DWSC - RV 0.11549 0.336 4047 0.344 0.9970
## DWSC - SM 0.13791 0.342 2553 0.403 0.9944
## DWSC - Yolo 0.11344 0.342 2556 0.332 0.9974
## FCCL - RV 0.04037 0.334 5361 0.121 1.0000
## FCCL - SM 0.06278 0.338 2709 0.186 0.9997
## FCCL - Yolo 0.03831 0.338 2712 0.113 1.0000
## RV - SM 0.02241 0.335 4965 0.067 1.0000
## RV - Yolo -0.00206 0.335 4971 -0.006 1.0000
## SM - Yolo -0.02447 0.337 2907 -0.073 1.0000
##
## Results are averaged over the levels of: Deployment
## Degrees-of-freedom method: kenward-roger
## P value adjustment: tukey method for comparing a family of 5 estimates
They look pretty darn similar. However, we can use the variance of the random effects from the origional model output to plan our study design for next time.
When you are conducting an experiment, like fish in cages, your fixed and random effects are included in your study design. However, when you are dealing with samples from the environment, you might have to make some choices about which variables to treat as fixed and which to treat as random.
For this demo, we are going to use the Fall Midwater Trawl data from 2000-2010. There is a lot of data, and it’s going to be easier if we do this on just 10 years.
If you haven’t already downloaded the data, grab it here: https://filelib.wildlife.ca.gov/Public/TownetFallMidwaterTrawl/FMWT%20Data/FMWT%201967-2022%20Catch%20Matrix_updated.zip
FMWT = read_csv("FMWT 1967-2022 Catch Matrix_updated.csv")
## Rows: 29244 Columns: 142
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SampleTimeStart
## dbl (140): Year, SurveyNumber, StationCode, index, StationLat, StationLong,...
## date (1): SampleDate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
FMWT90 = filter(FMWT, Year>1999, Year <2011)
str(FMWT90)
## spc_tbl_ [5,613 × 142] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Year : num [1:5613] 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 ...
## $ SampleDate : Date[1:5613], format: "2000-01-07" "2000-01-07" ...
## $ SurveyNumber : num [1:5613] 7 7 7 7 7 7 7 7 7 7 ...
## $ StationCode : num [1:5613] 804 806 807 808 809 810 811 812 813 814 ...
## $ index : num [1:5613] 1 1 1 1 1 1 1 1 1 1 ...
## $ StationLat : num [1:5613] 38 38 38 38 38.1 ...
## $ StationLong : num [1:5613] -122 -122 -122 -122 -122 ...
## $ StartLat : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ StartLong : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ EndLat : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ EndLong : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ SampleTimeStart : chr [1:5613] "10:35" "10:12" "10:02" "09:47" ...
## $ DepthBottom : num [1:5613] 35 40 35 35 35 40 50 40 40 40 ...
## $ WaterTemperature : num [1:5613] 9.4 9.4 9.3 9.1 8.9 8.7 8.7 8.7 8.6 8.6 ...
## $ BottomTemperature : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ ConductivityTop : num [1:5613] 1080 890 701 549 411 355 335 289 256 192 ...
## $ ConductivityBottom : num [1:5613] 1088 927 NA NA NA ...
## $ Turbidity : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ Secchi : num [1:5613] 1.19 1.24 1.3 1.3 1.3 1.3 1.3 1.3 1.3 1.28 ...
## $ MeterStart : num [1:5613] 324790 313760 302515 0 281130 ...
## $ MeterEnd : num [1:5613] 336597 324790 313760 10072 292959 ...
## $ MeterDiff : num [1:5613] 11807 11030 11245 10072 11829 ...
## $ Volume : num [1:5613] 3395 3172 3233 2896 3401 ...
## $ TideCode : num [1:5613] 2 2 2 2 2 2 2 2 4 4 ...
## $ TowDirectionCode : num [1:5613] 1 1 1 1 1 1 1 1 2 2 ...
## $ WeatherCode : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ Microcystis : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ WaveCode : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ Aequorea spp : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ American Shad : num [1:5613] 0 0 0 0 0 0 0 0 0 1 ...
## $ Arrow Goby : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Bat Ray : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Bay Goby : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Bay Pipefish : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Big Skate : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Bigscale Logperch : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Black Crappie : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Black Perch : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Bluegill : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Broadnose Sevengill Shark: num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Brown Bullhead : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Brown Smoothhound : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ California Grunion : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ California Halibut : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ California Tonguefish : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Chameleon Goby : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Channel Catfish : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Cheekspot Goby : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Chinook Salmon : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Chrysaora fuscensens : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Comb Jelly : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ Common Carp : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Crangon Shrimp : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Delta Smelt : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Diamond Turbot : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Dwarf Perch : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ English Sole : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Flatfish : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Goby (unid) : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Golden Shiner : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Goldfish : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Green Sturgeon : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Green Sunfish : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Grey Smoothhound : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Hitch : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Jacksmelt : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Jellyfish : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Largemouth Bass : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Leopard Shark : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Lingcod : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Lizardfish : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ Longfin Smelt : num [1:5613] 1 3 4 1 2 1 0 0 0 1 ...
## $ Maeotias : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ Mississippi Grass Shrimp : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ Mississippi Silverside : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Moon Jellies : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ Mud shrimp : num [1:5613] NA NA NA NA NA NA NA NA NA NA ...
## $ Night Smelt : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Northern Anchovy : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pacific Electric Ray : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pacific Halibut : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pacific Herring : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pacific Lamprey : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pacific Pompano : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pacific Sanddab : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pacific Sardine : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pacific Staghorn Sculpin : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pacific Tomcod : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Palaemon spp. : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pile Perch : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Plainfin Midshipman : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Polyorchis : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Prickly Sculpin : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Rainwater Killifish : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Redear Sunfish : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Riffle Sculpin : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ River Lamprey : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Rock Sole : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## $ Rubberlip Seaperch : num [1:5613] 0 0 0 0 0 0 0 0 0 0 ...
## [list output truncated]
## - attr(*, "spec")=
## .. cols(
## .. Year = col_double(),
## .. SampleDate = col_date(format = ""),
## .. SurveyNumber = col_double(),
## .. StationCode = col_double(),
## .. index = col_double(),
## .. StationLat = col_double(),
## .. StationLong = col_double(),
## .. StartLat = col_double(),
## .. StartLong = col_double(),
## .. EndLat = col_double(),
## .. EndLong = col_double(),
## .. SampleTimeStart = col_character(),
## .. DepthBottom = col_double(),
## .. WaterTemperature = col_double(),
## .. BottomTemperature = col_double(),
## .. ConductivityTop = col_double(),
## .. ConductivityBottom = col_double(),
## .. Turbidity = col_double(),
## .. Secchi = col_double(),
## .. MeterStart = col_double(),
## .. MeterEnd = col_double(),
## .. MeterDiff = col_double(),
## .. Volume = col_double(),
## .. TideCode = col_double(),
## .. TowDirectionCode = col_double(),
## .. WeatherCode = col_double(),
## .. Microcystis = col_double(),
## .. WaveCode = col_double(),
## .. `Aequorea spp` = col_double(),
## .. `American Shad` = col_double(),
## .. `Arrow Goby` = col_double(),
## .. `Bat Ray` = col_double(),
## .. `Bay Goby` = col_double(),
## .. `Bay Pipefish` = col_double(),
## .. `Big Skate` = col_double(),
## .. `Bigscale Logperch` = col_double(),
## .. `Black Crappie` = col_double(),
## .. `Black Perch` = col_double(),
## .. Bluegill = col_double(),
## .. `Broadnose Sevengill Shark` = col_double(),
## .. `Brown Bullhead` = col_double(),
## .. `Brown Smoothhound` = col_double(),
## .. `California Grunion` = col_double(),
## .. `California Halibut` = col_double(),
## .. `California Tonguefish` = col_double(),
## .. `Chameleon Goby` = col_double(),
## .. `Channel Catfish` = col_double(),
## .. `Cheekspot Goby` = col_double(),
## .. `Chinook Salmon` = col_double(),
## .. `Chrysaora fuscensens` = col_double(),
## .. `Comb Jelly` = col_double(),
## .. `Common Carp` = col_double(),
## .. `Crangon Shrimp` = col_double(),
## .. `Delta Smelt` = col_double(),
## .. `Diamond Turbot` = col_double(),
## .. `Dwarf Perch` = col_double(),
## .. `English Sole` = col_double(),
## .. Flatfish = col_double(),
## .. `Goby (unid)` = col_double(),
## .. `Golden Shiner` = col_double(),
## .. Goldfish = col_double(),
## .. `Green Sturgeon` = col_double(),
## .. `Green Sunfish` = col_double(),
## .. `Grey Smoothhound` = col_double(),
## .. Hitch = col_double(),
## .. Jacksmelt = col_double(),
## .. Jellyfish = col_double(),
## .. `Largemouth Bass` = col_double(),
## .. `Leopard Shark` = col_double(),
## .. Lingcod = col_double(),
## .. Lizardfish = col_double(),
## .. `Longfin Smelt` = col_double(),
## .. Maeotias = col_double(),
## .. `Mississippi Grass Shrimp` = col_double(),
## .. `Mississippi Silverside` = col_double(),
## .. `Moon Jellies` = col_double(),
## .. `Mud shrimp` = col_double(),
## .. `Night Smelt` = col_double(),
## .. `Northern Anchovy` = col_double(),
## .. `Pacific Electric Ray` = col_double(),
## .. `Pacific Halibut` = col_double(),
## .. `Pacific Herring` = col_double(),
## .. `Pacific Lamprey` = col_double(),
## .. `Pacific Pompano` = col_double(),
## .. `Pacific Sanddab` = col_double(),
## .. `Pacific Sardine` = col_double(),
## .. `Pacific Staghorn Sculpin` = col_double(),
## .. `Pacific Tomcod` = col_double(),
## .. `Palaemon spp.` = col_double(),
## .. `Pile Perch` = col_double(),
## .. `Plainfin Midshipman` = col_double(),
## .. Polyorchis = col_double(),
## .. `Prickly Sculpin` = col_double(),
## .. `Rainwater Killifish` = col_double(),
## .. `Redear Sunfish` = col_double(),
## .. `Riffle Sculpin` = col_double(),
## .. `River Lamprey` = col_double(),
## .. `Rock Sole` = col_double(),
## .. `Rubberlip Seaperch` = col_double(),
## .. `Sacramento Blackfish` = col_double(),
## .. `Sacramento Perch` = col_double(),
## .. `Sacramento Pikeminnow` = col_double(),
## .. `Sacramento Sucker` = col_double(),
## .. `Sand Sole` = col_double(),
## .. `Scrippsia pacifica` = col_double(),
## .. `Shimofuri Goby` = col_double(),
## .. `Shiner Perch` = col_double(),
## .. `Shokihaze Goby` = col_double(),
## .. `Shrimp (unid)` = col_double(),
## .. `Siberian Prawn` = col_double(),
## .. `Silver Surfperch` = col_double(),
## .. `Smelt Family` = col_double(),
## .. `Speckled Sanddab` = col_double(),
## .. `Spiny Dogfish` = col_double(),
## .. Splittail = col_double(),
## .. `Spotfin Surfperch` = col_double(),
## .. `Spotted Bass` = col_double(),
## .. `Starry Flounder` = col_double(),
## .. Steelhead = col_double(),
## .. `Striped Bass age-0` = col_double(),
## .. `Striped Bass age-1` = col_double(),
## .. `Striped Bass age-2` = col_double(),
## .. `Striped Bass age-3+` = col_double(),
## .. `Surf Smelt` = col_double(),
## .. `Threadfin Shad` = col_double(),
## .. `Threespine Stickleback` = col_double(),
## .. Topsmelt = col_double(),
## .. `Tule Perch` = col_double(),
## .. Unid = col_double(),
## .. Wakasagi = col_double(),
## .. `Walleye Surfperch` = col_double(),
## .. Warmouth = col_double(),
## .. `Western Mosquitofish` = col_double(),
## .. `White Catfish` = col_double(),
## .. `White Crappie` = col_double(),
## .. `White Croaker` = col_double(),
## .. `White Sea Bass` = col_double(),
## .. `White Seaperch` = col_double(),
## .. `White Sturgeon` = col_double(),
## .. `Whitebait Smelt` = col_double(),
## .. `Wolf Eel` = col_double(),
## .. `Yellowfin Goby` = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
#A few minor changes to make the data more user-friendly
FMWT90 = mutate(FMWT90, StationCode = as.factor(StationCode), TideCode = as.factor(TideCode), index = as.logical(index),
TotalCatch = rowSums(across(`American Shad`:`Yellowfin Goby`), na.rm =T)) %>%
select(Year, SampleDate, SurveyNumber, StationCode, index, StationLat, StationLong, DepthBottom, WaterTemperature,
ConductivityTop, Turbidity, Secchi, TideCode, Microcystis, TotalCatch)
As always, we want to start by looking at our data.
ggplot(FMWT90, aes(x = TotalCatch)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(FMWT90, aes(x = log(TotalCatch +1)))+ geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This data is a little zero inflated, so we probably want to use some kind of fancy model to do it right, but for the sake of the example, we’ll go with it for now. We’ll definitely want to log-transform it though.
The problem comes when we want to decide which factors are likely to be fixed and which are likely to be random. That is going to be based on our question. We’ll start with
“How does secchi depth impact fish catch”.
I like to start with a quick graph.
FMWT90 = mutate(FMWT90, logcatch = log(TotalCatch+1))
ggplot(FMWT90, aes(x = Secchi, y = logcatch))+ geom_point()+ geom_smooth()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
It looks like there is definitely a trend there.
However, there are a lot of other factors that probably affect fish abundance too. We should account for them.
Some of these factors we might want to include as explanatory variables (fixed effects), whereas other we might want to include as grouping variables (random effects).
We are unlikely to be interested in the impact of station on it’s own, so let’s definitely put that in the random effects.
m1 = lmer(logcatch ~ Secchi + Year + SurveyNumber + (1|StationCode), data = FMWT90)
summary(m1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: logcatch ~ Secchi + Year + SurveyNumber + (1 | StationCode)
## Data: FMWT90
##
## REML criterion at convergence: 19145
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9586 -0.6809 -0.0611 0.6316 3.9458
##
## Random effects:
## Groups Name Variance Std.Dev.
## StationCode (Intercept) 0.5257 0.725
## Residual 1.7364 1.318
## Number of obs: 5547, groups: StationCode, 122
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.531e+02 1.246e+01 5.543e+03 12.29 <2e-16 ***
## Secchi -1.332e+00 6.705e-02 4.690e+03 -19.87 <2e-16 ***
## Year -7.425e-02 6.218e-03 5.543e+03 -11.94 <2e-16 ***
## SurveyNumber -2.572e-01 1.232e-02 5.417e+03 -20.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Secchi Year
## Secchi 0.424
## Year -1.000 -0.428
## SurveyNumbr -0.238 0.027 0.233
plot(m1)
But maybe we aren’t really interested in changes over time. Let’s put year and time of year (survey code) as random effects too.
m2 = lmer(logcatch ~ Secchi + (1|Year)+ (1|SurveyNumber)+ (1|StationCode), data = FMWT90)
summary(m2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula:
## logcatch ~ Secchi + (1 | Year) + (1 | SurveyNumber) + (1 | StationCode)
## Data: FMWT90
##
## REML criterion at convergence: 18917.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2398 -0.6737 -0.0559 0.6141 4.0259
##
## Random effects:
## Groups Name Variance Std.Dev.
## StationCode (Intercept) 0.5283 0.7269
## Year (Intercept) 0.1685 0.4105
## SurveyNumber (Intercept) 0.3656 0.6046
## Residual 1.6486 1.2840
## Number of obs: 5547, groups: StationCode, 122; Year, 11; SurveyNumber, 7
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 2.60017 0.27479 11.38352 9.462 9.82e-07 ***
## Secchi -1.23181 0.06997 4470.55970 -17.605 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Secchi -0.186
plot(m2)
Let’s compare these two models.
anova(m1, m2)
## refitting model(s) with ML (instead of REML)
## Data: FMWT90
## Models:
## m1: logcatch ~ Secchi + Year + SurveyNumber + (1 | StationCode)
## m2: logcatch ~ Secchi + (1 | Year) + (1 | SurveyNumber) + (1 | StationCode)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## m1 6 19134 19174 -9561.2 19122
## m2 6 18925 18965 -9456.4 18913 209.41 0
The model with fewer fixed effects has a greater log-likelihood and lower BIC (BIC is better than AIC for mixed models). So, if we are trying to choose between including something as a fixed or random effect, and we aren’t super interested in it’s effects, going with random effects will give you a better model.
https://en.wikipedia.org/wiki/Mixed_model
Greven, S., and T. Kneib. 2010. On the behaviour of marginal and conditional AIC in linear mixed models. Biometrika 97:773-789. 10.1093/biomet/asq042
Zuur AF, Ieno EN, Walker NJ, Saveliev AA, Smith GM. 2009. Mixed Effects Models and Extensions in Ecology with R. New York, NY: Springer.
https://ourcodingclub.github.io/tutorials/mixed-models/#six
https://cran.r-project.org/web/views/MixedModels.html
https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf