vignettes/ex6_NAA.Rmd
ex6_NAA.Rmd
In this vignette we walk through an example using the
wham
(WHAM = Woods Hole Assessment Model) package to run a
state-space age-structured stock assessment model. WHAM is a
generalization of code written for Miller et al. (2016)
and Xu et
al. (2018), and in this example we apply WHAM to the same stock,
Southern New England / Mid-Atlantic Yellowtail Flounder.
This is the 6th WHAM example, which blends aspects from Ex
1, Ex
2, and Ex
5. We assume you already have wham
installed. If not,
see the Introduction.
The simpler 1st example is available as a R
script and vignette.
As in example 1:
age_comp = "logistic-normal-miss0"
)As in example 2:
recruit_model = 3
)ecov$process_model = 'ar1'
)ecov$how = 2
)As in example 4:
As in example 5:
Example 6 highlights WHAM’s options for treating the yearly transitions in numbers-at-age (i.e. survival):
Open R and load the wham
package:
# devtools::install_github("timjmiller/wham", dependencies=TRUE)
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.2.3
#> Warning: package 'tibble' was built under R version 4.2.3
#> Warning: package 'tidyr' was built under R version 4.2.3
#> Warning: package 'purrr' was built under R version 4.2.3
#> Warning: package 'dplyr' was built under R version 4.2.3
#> Warning: package 'stringr' was built under R version 4.2.3
library(wham)
For a clean, runnable .R
script, look at
ex6_NAA.R
in the example_scripts
folder of the
wham
package install:
wham.dir <- find.package("wham")
file.path(wham.dir, "example_scripts")
You can run this entire example script with (total run time about 12 minutes):
write.dir <- "choose/where/to/save/output" # otherwise will be saved in working directory
source(file.path(wham.dir, "example_scripts", "ex6_NAA.R"))
Let’s create a directory for this analysis:
# choose a location to save output, otherwise will be saved in working directory
write.dir <- "choose/where/to/save/output"
dir.create(write.dir)
setwd(write.dir)
We need the same ASAP data file as in example
1, and the environmental covariate (Gulf Stream Index, GSI). Let’s
copy ex1_SNEMAYT.dat
and GSI.csv
to our
analysis directory:
wham.dir <- find.package("wham")
file.copy(from=file.path(wham.dir,"extdata","ex1_SNEMAYT.dat"), to=write.dir, overwrite=FALSE)
file.copy(from=file.path(wham.dir,"extdata","GSI.csv"), to=write.dir, overwrite=FALSE)
Confirm you are in the correct directory and it has the required data files:
Read the ASAP3 .dat file into R and convert to input list for wham:
asap3 <- read_asap3_dat("ex1_SNEMAYT.dat")
Load the environmental covariate (Gulf Stream Index, GSI) data into R:
As in example 5, the GSI data file does not have a standard error estimate, either for each yearly observation or one overall value. In such a case, WHAM can estimate the observation error for the environmental covariate, either as one overall value, \(\sigma_{GSI}\), or yearly values as random effects, \(\mathrm{log}\sigma_{{GSI}_y} \sim \mathcal{N}(\mathrm{log}\sigma_{GSI}, \sigma^2_{\sigma_{GSI}})\). In this example we choose the simpler option and estimate one observation error parameter, shared across years.
Now we specify several models with different options for the numbers-at-age (NAA) transitions, i.e. survival:
df.mods <- data.frame(NAA_cor = c('---','iid','ar1_y','iid','ar1_a','ar1_y','2dar1','iid','ar1_y','iid','ar1_a','ar1_y','2dar1'),
NAA_sigma = c('---',rep("rec",2),rep("rec+1",4),rep("rec",2),rep("rec+1",4)),
GSI_how = c(rep(0,7),rep(2,6)), stringsAsFactors=FALSE)
n.mods <- dim(df.mods)[1]
df.mods$Model <- paste0("m",1:n.mods)
df.mods <- df.mods %>% select(Model, everything()) # moves Model to first col
Look at the model table:
df.mods
#> Model NAA_cor NAA_sigma GSI_how
#> 1 m1 --- --- 0
#> 2 m2 iid rec 0
#> 3 m3 ar1_y rec 0
#> 4 m4 iid rec+1 0
#> 5 m5 ar1_a rec+1 0
#> 6 m6 ar1_y rec+1 0
#> 7 m7 2dar1 rec+1 0
#> 8 m8 iid rec 2
#> 9 m9 ar1_y rec 2
#> 10 m10 iid rec+1 2
#> 11 m11 ar1_a rec+1 2
#> 12 m12 ar1_y rec+1 2
#> 13 m13 2dar1 rec+1 2
To specify the options for modeling NAA transitions, include an
optional list argument, NAA_re
, to the
prepare_wham_input
function (see the
prepare_wham_input
help page). ASAP3 does not estimate
random effects, and therefore these options are not specified in the
ASAP data file. By default (NAA_re
is NULL
or
not included), WHAM fits a traditional statistical catch-at-age model
(NAA = predicted NAA for all ages, i.e. survival is deterministic). To
fit a state-space model, we must specify NAA_re
.
NAA_re
is a list with the following entries:
$sigma
: Which ages allow deviations from pred_NAA?
Common options are specified with strings.
"rec"
: Recruitment deviations are random effects,
survival of all other ages is deterministic"rec+1"
: Survival of all ages is stochastic (“full
state space model”), with 2 estimated \(\sigma_a\), one for recruitment and one
shared among other ages$cor
: Correlation structure for the NAA deviations.
Options are:
"iid"
: NAA deviations vary by year and age, but
uncorrelated."ar1_a"
: NAA deviations correlated by age (AR1)."ar1_y"
: NAA deviations correlated by year (AR1)."2dar1"
: NAA deviations correlated by year and age (2D
AR1, as for \(M\) in example 5).Alternatively, you can specify a more complex
NAA_re$sigma
by entering a vector with length equal to the
number of ages, where each entry points to the \(\sigma_a\) to use for that age. For
example, NAA_re$sigma = c(1,2,2,3,3,3)
will estimate three
\(\sigma_a\), with recruitment (age-1)
deviations having their own \(\sigma_R\), ages 2-3 sharing \(\sigma_2\), and ages 4-6 sharing \(\sigma_3\).
For example, to fit model m1
(SCAA):
NAA_re <- NULL # or simply leave out of call to prepare_wham_input
To fit model m3
, recruitment deviations are correlated
random effects:
NAA_re <- list(sigma="rec", cor="ar1_y")
And to fit model m7
, numbers at all ages are random
effects correlated by year AND age:
NAA_re <- list(sigma="rec+1", cor="2dar1")
As described in example
2, the environmental covariate options are fed to
prepare_wham_input
as a list, ecov
. This
example differs from example 2 in that:
ecov$logsigma = "est_1"
estimates the GSI observation
error (\(\sigma_{GSI}\), one overall
value for all years). The other option is "est_re"
to allow
the GSI observation error to have yearly fluctuations (random effects).
The Cold Pool Index in example 2 had yearly observation errors given, so
this was not necessary.ecov$how = 0
estimates the GSI time-series model (AR1)
for models without a GSI-Recruitment effect, in order to compare AIC
with models that do include the effect. Setting
ecov$how = 2
specifies that the GSI affects the
Beverton-Holt \(\beta\) parameter
(“limiting” / carrying capacity effect).ecov$link_model = "linear"
specifies a linear model
linking GSI to \(\beta\).For example, the ecov
list for models
m8
-m13
with the linear GSI-\(\beta\) effect:
ecov <- list(
label = "GSI",
mean = as.matrix(env.dat$GSI),
logsigma = 'est_1', # estimate obs sigma, 1 value shared across years
year = env.dat$year,
use_obs = matrix(1, ncol=1, nrow=dim(env.dat)[1]), # use all obs (=1)
lag = 1, # GSI in year t affects recruitment in year t+1
process_model = "ar1", # GSI modeled as AR1 (random walk would be "rw")
where = "recruit",
how = 2,
link_model = "linear") # linear GSI-Recruitment effect
Note that you can set ecov = NULL
to fit the model
without environmental covariate data, but here we fit the
ecov
data even for models without the GSI effect on
recruitment (m1
-m7
) so that we can compare
them via AIC (need to have the same data in the likelihood). We
accomplish this by setting ecov$how = 0
and
ecov$process_model = "ar1"
.
All models use the same options for expected recruitment (Beverton-Holt stock-recruit function) and selectivity (age-specific, with one or two ages fixed at 1).
for(m in 1:n.mods){
<- list(cor=df.mods[m,"NAA_cor"], sigma=df.mods[m,"NAA_sigma"])
NAA_list if(NAA_list$sigma == '---') NAA_list = NULL
<- list(
ecov label = "GSI",
mean = as.matrix(env.dat$GSI),
logsigma = 'est_1', # estimate obs sigma, 1 value shared across years
year = env.dat$year,
use_obs = matrix(1, ncol=1, nrow=dim(env.dat)[1]), # use all obs (=1)
lag = 1, # GSI in year t affects Rec in year t + 1
process_model = 'ar1', # "rw" or "ar1"
where = where = c("none","recruit")[as.logical(df.mods$GSI_how[m])+1], # GSI affects recruitment
how = df.mods$GSI_how[m], # 0 = no effect (but still fit Ecov to compare AIC), 2 = limiting
link_model = "linear")
<- prepare_wham_input(asap3, recruit_model = 3, # Bev Holt recruitment
input model_name = "Ex 6: Numbers-at-age",
selectivity=list(model=rep("age-specific",3), re=c("none","none","none"),
initial_pars=list(c(0.1,0.5,0.5,1,1,1),c(0.5,0.5,0.5,1,0.5,0.5),c(0.5,0.5,1,1,1,1)),
fix_pars=list(4:6,4,3:6)),
NAA_re = NAA_list,
ecov=ecov,
age_comp = "logistic-normal-miss0") # logistic normal, treat 0 obs as missing
# Fit model
<- fit_wham(input, do.retro=T, do.osa=F)
mod
# Save model
saveRDS(mod, file=paste0(df.mods$Model[m],".rds"))
# If desired, do projections
# mod_proj <- project_wham(mod)
# saveRDS(mod_proj, file=paste0(df.mods$Model[m],"_proj.rds"))
}
Collect all models into a list.
Get model convergence and stats.
opt_conv = 1-sapply(mods, function(x) x$opt$convergence)
ok_sdrep = sapply(mods, function(x) if(x$na_sdrep==FALSE & !is.na(x$na_sdrep)) 1 else 0)
df.mods$conv <- as.logical(opt_conv)
df.mods$pdHess <- as.logical(ok_sdrep)
Only calculate AIC and Mohn’s rho for converged models.
df.mods$runtime <- sapply(mods, function(x) x$runtime)
df.mods$NLL <- sapply(mods, function(x) round(x$opt$objective,3))
not_conv <- !df.mods$conv | !df.mods$pdHess
mods2 <- mods
mods2[not_conv] <- NULL
df.aic.tmp <- as.data.frame(compare_wham_models(mods2, table.opts=list(sort=FALSE, calc.rho=TRUE))$tab)
df.aic <- df.aic.tmp[FALSE,]
ct = 1
for(i in 1:n.mods){
if(not_conv[i]){
df.aic[i,] <- rep(NA,5)
} else {
df.aic[i,] <- df.aic.tmp[ct,]
ct <- ct + 1
}
}
df.aic[,1:2] <- format(round(df.aic[,1:2], 1), nsmall=1)
df.aic[,3:5] <- format(round(df.aic[,3:5], 3), nsmall=3)
df.aic[grep("NA",df.aic$dAIC),] <- "---"
df.mods <- cbind(df.mods, df.aic)
rownames(df.mods) <- NULL
Look at results table.
df.mods
Save results table.
write.csv(df.mods, file="ex6_table.csv",quote=F, row.names=F)
Plot output for models that converged.
mods[[1]]$env$data$recruit_model = 2 # m1 didn't actually fit a Bev-Holt
for(m in which(!not_conv)){
plot_wham_output(mod=mods[[m]], dir.main=file.path(getwd(),paste0("m",m)), out.type='html')
}
Two models had very similar AIC and were overwhelmingly supported
relative to the other models (bold in table below):
m11
(all NAA are random effects with correlation by age,
GSI-Recruitment effect) and m13
(all NAA are random effects
with correlation by age and year, GSI-Recruitment effect).
The SCAA and state-space models with independent NAA deviations had
the lowest runtime. Estimating NAA deviations only for age-1,
i.e. recruitment as random effects, broke the Hessian sparseness, making
models m2
, m3
, m8
, and
m9
the slowest. Adding correlation structure to the NAA
deviations increased runtime roughly by a factor of 2 (comparing models
m4
with m5
-m7
and
m10
with m11
-m13
).
All models except for m4
converged and successfully
inverted the Hessian to produce SE estimates for (fixed effect)
parameters. WHAM stores this information in mod$na_sdrep
(should be FALSE
), mod$sdrep$pdHess
(should be
TRUE
), and mod$opt$convergence
(should be
0
). See stats::nlminb()
and
TMB::sdreport()
for details.
We leave the AIC for model m1
blank because comparing
models in which parameters (here, recruitment deviations) are estimated
as fixed effects versus random effects is messy and marginal AIC is not
appropriate. Still, we include m1
to show 1) that WHAM can
fit this NAA option, and 2) the poor retrospective pattern in the status
quo assessment. Note that m2
is identical to
m1
except that recruitment deviations are random effects
instead of fixed effects. The retrospective patterns are very similar,
but m2
takes 7x longer to run.
Model | NAA_cor | NAA_sigma | GSI_how | Converged | Pos def Hessian | Runtime (min) | NLL | \(\Delta AIC\) | AIC | \(\rho_{R}\) | \(\rho_{SSB}\) | \(\rho_{\overline{F}}\) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
m1 | — | — | — | TRUE | TRUE | 0.16 | -623.530 | — | — | 8.818 | 0.951 | -0.433 |
m2 | iid | rec | — | TRUE | TRUE | 1.36 | -509.511 | 516.9 | -875.0 | 5.054 | 1.023 | -0.437 |
m3 | ar1_y | rec | — | TRUE | TRUE | 1.35 | -527.116 | 483.7 | -908.2 | 3.390 | 0.990 | -0.440 |
m4 | iid | rec+1 | — | TRUE | TRUE | 0.32 | -751.711 | 34.5 | -1357.4 | 0.374 | 0.013 | -0.020 |
m5 | ar1_a | rec+1 | — | TRUE | TRUE | 0.66 | -762.017 | 15.9 | -1376.0 | 0.245 | 0.000 | -0.017 |
m6 | ar1_y | rec+1 | — | TRUE | TRUE | 0.73 | -756.807 | 26.3 | -1365.6 | 0.412 | -0.002 | 0.006 |
m7 | 2dar1 | rec+1 | — | TRUE | TRUE | 0.83 | -764.875 | 12.1 | -1379.8 | 0.256 | -0.011 | 0.000 |
m8 | iid | rec | Limiting | TRUE | FALSE | 0.99 | -522.582 | — | — | — | — | — |
m9 | ar1_y | rec | Limiting | TRUE | FALSE | 1.08 | -531.190 | — | — | — | — | — |
m10 | iid | rec+1 | Limiting | TRUE | TRUE | 0.36 | -761.137 | 17.6 | -1374.3 | 0.277 | -0.002 | -0.003 |
m11 | ar1_a | rec+1 | Limiting | TRUE | TRUE | 0.73 | -770.949 | 0.0 | -1391.9 | 0.201 | -0.012 | -0.001 |
m12 | ar1_y | rec+1 | Limiting | TRUE | TRUE | 0.91 | -763.095 | 15.7 | -1376.2 | 0.272 | -0.011 | 0.013 |
m13 | 2dar1 | rec+1 | Limiting | TRUE | TRUE | 0.95 | -771.763 | 0.4 | -1391.5 | 0.173 | -0.020 | 0.010 |
Estimated survival deviations by age (y-axis) and year (x-axis) for all converged models:
The base model, m1
, had a severe retrospective pattern
for recruitment (very high \(\rho_R\)).
The full state-space model effectively alleviated this, whether or not a
GSI-Recruitment effect was included (\(\rho_R\) for m5-7
and
m10-13
all below 0.4). Adding a GSI-Recruitment link to the
state-space models further reduced \(\rho_R\), and also reduced \(\rho_{SSB}\) and \(\rho_{\overline{F}}\) (although \(|\rho_{SSB}|\) and \(|\rho_{\overline{F}}|\) were negligible for
m5-7
, < 0.05).
Comparing m11
and m13
, the models with
lowest AIC, m13
had slightly lower Mohn’s \(\rho_R\), \(\rho_{SSB}\), and \(\rho_{\overline{F}}\).
The plots below compare retrospective patterns from the base model
(m1
, left) to those from the full model (m13
,
right).
Models with a GSI-Recruitment link estimated higher observation error
and more smoothing of the GSI than those without. Compare the confidence
intervals for m7
(left, no GSI-Recruitment link), to those
for m13
(right, with GSI-Recruitment link).
The state-space models, with or without the GSI-Recruitment link,
estimated similar but slightly exaggerated trends in \(SSB\) and \(F\) compared to the base model. Left: \(SSB\) and \(F\) trends from the base model,
m1
. Right: \(SSB\) and
\(F\) trends from the state-space model
without GSI effect, m7
.
Adding the GSI-Recruitment link to the state-space model did not
impact the probability that the stock was overfished or experiencing
overfishing in the final year, 2016 (m7
w/o GSI, left;
m13
w/ GSI, right).