vignettes/ex04_selectivity.Rmd
ex04_selectivity.Rmd
In this vignette we walk through an example using the
wham
(WHAM = Woods Hole Assessment Model) package to run a
state-space age-structured stock assessment model. WHAM is a
generalization of code written for Miller et al. (2016)
and Xu et
al. (2018), and in this example we apply WHAM to the same stock,
Southern New England / Mid-Atlantic Yellowtail Flounder.
Here we assume you already have wham
installed. If not,
see the Introduction.
This is the 4th wham
example, which builds off model
m4
from example
1:
full state-space model (numbers-at-age are random effects for all
ages, NAA_re = list(sigma='rec+1',cor='iid')
)
logistic normal age compositions, treating observations of zero
as missing (age_comp = "logistic-normal-miss0"
)
random-about-mean recruitment
(recruit_model = 2
)
no environmental covariate (ecov = NULL
)
2 indices
fit to 1973-2016 data
In example 4, we demonstrate the time-varying selectivity options in WHAM for both logistic and age-specific selectivity:
none
: time-constant
iid
: parameter- and year-specific (random effect)
deviations from mean selectivity parameters
ar1
: as above, but estimate correlation across
logistic parameters
ar1_y
: as above, but estimate correlation across
years
2dar1
: as above, but estimate correlation across
both years and parameters
Note that each of these options can be applied to any selectivity block (and therefore fleet/catch or index/survey).
Open R and load the wham
package:
For a clean, runnable .R
script, look at
ex4_selectivity.R
in the example_scripts
folder of the wham
package. You can run this entire example
script with:
wham.dir <- find.package("wham")
source(file.path(wham.dir, "example_scripts", "ex4_selectivity.R"))
Let’s create a directory for this analysis:
# choose a location to save output, otherwise will be saved in working directory
write.dir <- "choose/where/to/save/output" # need to change e.g., tempdir(check=TRUE)
dir.create(write.dir)
setwd(write.dir)
We need the same data files as in example
1. Read in ex1_SNEMAYT.dat
:
asap3 <- read_asap3_dat(file.path(wham.dir,"extdata","ex1_SNEMAYT.dat"))
We are going to run 8 models that differ only in the selectivity options:
# m1-m5 logistic, m6-m9 age-specific
sel_model <- c(rep("logistic",3), rep("age-specific",5))
# time-varying options for each of 3 blocks (b1 = fleet, b2-3 = indices)
sel_re <- list(c("none","none","none"), # m1-m4 logistic
c("iid","none","none"),
# c("ar1","none","none"), #can't get a converged model for logistic with two re and two fe for
c("2dar1","none","none"),
c("none","none","none"), # m4-m8 age-specific
c("iid","none","none"),
c("ar1","none","none"), #will map all age-specific (mean) parameters to one estimated value
c("ar1_y","none","none"),
c("2dar1","none","none"))
n.mods <- length(sel_re)
# summary data frame
df.mods <- data.frame(Model=paste0("m",1:n.mods),
Selectivity=sel_model, # Selectivity model (same for all blocks)
Block_1_re=sapply(sel_re, function(x) x[[1]])) # Block 1 random effects
rownames(df.mods) <- NULL
df.mods
#> Model Selectivity Block_1_re
#> 1 m1 logistic none
#> 2 m2 logistic iid
#> 3 m3 logistic 2dar1
#> 4 m4 age-specific none
#> 5 m5 age-specific iid
#> 6 m6 age-specific ar1
#> 7 m7 age-specific ar1_y
#> 8 m8 age-specific 2dar1
The ASAP data file specifies selectivity options (model, initial
parameter values, which parameters to fix/estimate). WHAM uses these by
default in order to facilitate running ASAP models. To see the currently
specified selectivity options in asap3
:
asap3$dat$sel_block_assign # 1 fleet, all years assigned to block 1
#> NULL
# by default each index gets its own selectivity block (here, blocks 2 and 3)
asap3$dat$sel_block_option # fleet selectivity (1 block), 2 = logistic
#> NULL
asap3$dat$index_sel_option # index selectivity (2 blocks), 2 = logistic
#> NULL
asap3$dat$sel_ini # fleet sel initial values (col1), estimation phase (-1 = fix)
#> NULL
asap3$dat$index_sel_ini # index sel initial values (col1), estimation phase (-1 = fix)
#> NULL
When we specify the WHAM model with
prepare_wham_input()
, we can overwrite the selectivity
options from the ASAP data file with the optional list argument
selectivity
. The selectivity model is chosen via
selectivity$model
:
Model | selectivity$model |
No. Parameters |
---|---|---|
Age-specific | "age-specific" |
n_ages |
Logistic (increasing) | "logistic" |
2 |
Double logistic (dome) | "double-logistic" |
4 |
Logistic (decreasing) | "decreasing-logistic" |
2 |
Regardless of the selectivity model used, we incorporate time-varying selectivity by estimating a mean for each selectivity parameter, \(\mu^{s}_a\), and (random effect) deviations from the mean, \(\delta_{a,y}\). We then estimate the selectivity parameters, \(s_{a,y}\), on the logit-scale with (possibly) lower and upper limits: \[s_{a,y} = \mathrm{lower} + \frac{\mathrm{upper} - \mathrm{lower}}{1 + e^{-(\mu^{s}_a + \delta_{a,y})}}\]
The deviations, \(\boldsymbol{\delta}\), follow a 2-dimensional AR(1) process defined by the parameters \(\sigma^2_s\), \(\rho_a\), and \(\rho_y\): \[\boldsymbol{\delta} \sim \mathrm{MVN}(0,\Sigma)\] \[\Sigma = \sigma^2_s(\mathrm{R}_a \otimes \mathrm{R}_y)\] \[R_{a,a^*} = \rho_a^{\vert a - a^* \vert}\] \[R_{y,y^*} = \rho_y^{\vert y - y^* \vert}\]
Mean selectivity parameters can be initialized at different values
from the ASAP file with selectivity$initial_pars
.
Parameters can be fixed at their initial values by specifying
selectivity$fix_pars
. Finally, we specify any time-varying
(random effects) on selectivity parameters
(selectivity$re
):
selectivity$re |
Deviations from mean | Estimated parameters |
---|---|---|
"none" |
time-constant (no deviation) | |
"iid" |
independent, identically-distributed | \(\sigma^2\) |
"ar1" |
autoregressive-1 (correlated across ages/parameters) | \(\sigma^2\), \(\rho_a\) |
"ar1_y" |
autoregressive-1 (correlated across years) | \(\sigma^2\), \(\rho_y\) |
"2dar1" |
2D AR1 (correlated across both years and ages/parameters) | \(\sigma^2\), \(\rho_a\), \(\rho_y\) |
We will loop over and fit each of the models, but first lets define some initial parameter values and which will be fixed. This is needed specifically for the age-specific selectivity models.
#initial pars for logistic or age-specfic selectivity
initial_pars <- c(rep(list(list(c(2,0.3),c(2,0.3),c(2,0.3))),3), rep(list(list(c(0.1,0.5,0.5,1,1,1),c(0.5,0.5,0.5,1,1,0.5),c(0.5,0.5,1,1,1,1))),5))
#which pars to fix for age-specific selectivity
fix_pars <- c(list(NULL,NULL,NULL), rep(list(list(4:6,4:5,3:6)),5))
Now we can run the above models in a loop:
#make a list of fits to fill in
mods <- list()
for(m in 1:n.mods){
# overwrite initial parameter values in ASAP data file (ex1_SNEMAYT.dat)
input <- prepare_wham_input(asap3, model_name=paste(paste0("Model ",m), sel_model[m], paste(sel_re[[m]], collapse="-"), sep=": "), recruit_model=2,
selectivity=list(model=rep(sel_model[m],3), re=sel_re[[m]], initial_pars=initial_pars[[m]], fix_pars = fix_pars[[m]]),
NAA_re = list(sigma='rec+1',cor='iid'),
age_comp = "logistic-normal-miss0") # logistic normal, treat 0 obs as missing
# fit model
mods[[m]] <- fit_wham(input, do.check=T, do.osa=F, do.retro=F)
}
#save the model fits
for(m in 1:length(mods)) saveRDS(mod[[m]], file=paste0("m",m,".rds"))
Check which models converged.
vign4_conv <- lapply(mods, function(x) capture.output(check_convergence(x)))
for(m in 1:n.mods) cat(paste0("Model ",m,":"), vign4_conv[[m]], "", sep='\n')
#> Model 1:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 6.24e-11
#> Max gradient parameter: log_NAA_sigma
#> TMB:sdreport() was performed successfully for this model
#>
#> Model 2:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 8.81e-13
#> Max gradient parameter: logit_q
#> TMB:sdreport() was performed successfully for this model
#>
#> Model 3:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 2.84e-13
#> Max gradient parameter: logit_q
#> TMB:sdreport() was performed successfully for this model
#>
#> Model 4:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 1.71e-10
#> Max gradient parameter: log_NAA_sigma
#> TMB:sdreport() was performed successfully for this model
#>
#> Model 5:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 5.27e-08
#> Max gradient parameter: catch_paa_pars
#> TMB:sdreport() was performed successfully for this model
#>
#> Model 6:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 5.61e-13
#> Max gradient parameter: logit_q
#> TMB:sdreport() was performed successfully for this model
#>
#> Model 7:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 1.08e-09
#> Max gradient parameter: index_paa_pars
#> TMB:sdreport() was performed successfully for this model
#>
#> Model 8:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 3.58e-09
#> Max gradient parameter: F_pars
#> TMB:sdreport() was performed successfully for this model
Plot output for models that converged (in a subfolder for each model):
# Is Hessian positive definite?
ok_sdrep = sapply(mods, function(x) if(x$na_sdrep==FALSE & !is.na(x$na_sdrep)) 1 else 0)
pdHess <- as.logical(ok_sdrep)
# did stats::nlminb converge?
conv <- sapply(mods, function(x) x$opt$convergence == 0) # 0 means opt converged
conv_mods <- (1:n.mods)[pdHess]
for(m in conv_mods){
#html and png files by default
plot_wham_output(mod=mods[[m]], dir.main=file.path(write.dir,paste0("m",m)))
}
Compare the models using AIC:
df.aic <- as.data.frame(compare_wham_models(mods, table.opts=list(sort=FALSE, calc.rho=FALSE))$tab)
df.aic[!pdHess,] = NA
minAIC <- min(df.aic$AIC, na.rm=T)
df.aic$dAIC <- round(df.aic$AIC - minAIC,1)
df.mods <- cbind(data.frame(Model=paste0("m",1:n.mods), Selectivity=sel_model,
"Block1_re"=sapply(sel_re, function(x) x[[1]]),
"opt_converged"= ifelse(conv, "Yes", "No"),
"pd_hessian"= ifelse(pdHess, "Yes", "No"),
"NLL"=sapply(mods, function(x) round(x$opt$objective,3)),
"Runtime"=sapply(mods, function(x) x$runtime)), df.aic)
rownames(df.mods) <- NULL
df.mods
#> Model Selectivity Block1_re opt_converged pd_hessian NLL Runtime dAIC
#> 1 m1 logistic none Yes Yes -788.748 0.73 54.5
#> 2 m2 logistic iid Yes Yes -811.603 0.87 10.8
#> 3 m3 logistic 2dar1 Yes Yes -818.980 0.89 0.0
#> 4 m4 age-specific none Yes Yes -793.497 0.67 51.0
#> 5 m5 age-specific iid Yes Yes -793.497 0.84 53.0
#> 6 m6 age-specific ar1 Yes Yes -785.042 0.85 67.9
#> 7 m7 age-specific ar1_y Yes Yes -818.610 0.68 4.8
#> 8 m8 age-specific 2dar1 Yes Yes -820.487 0.84 3.0
#> AIC
#> 1 -1449.5
#> 2 -1493.2
#> 3 -1504.0
#> 4 -1453.0
#> 5 -1451.0
#> 6 -1436.1
#> 7 -1499.2
#> 8 -1501.0
Prepare to plot selectivity-at-age for block 1 (fleet).
library(tidyverse)
library(viridis)
selAA <- lapply(mods, function(x) x$report()$selAA[[1]])
sel_mod <- factor(c("Age-specific","Logistic")[sapply(mods, function(x) x$env$data$selblock_models[1])], levels=c("Logistic","Age-specific"))
sel_cor <- factor(c("None","IID","AR1","AR1_y","2D AR1")[sapply(mods, function(x) x$env$data$selblock_models_re[1])], levels=c("None","IID","AR1","AR1_y","2D AR1"))
df.selAA <- data.frame(matrix(NA, nrow=0, ncol=11))
colnames(df.selAA) <- c(paste0("Age_",1:6),"Year","Model","conv","sel_mod","sel_cor")
for(m in 1:n.mods){
df <- as.data.frame(selAA[[m]])
df$Year <- input$years
colnames(df) <- c(paste0("Age_",1:6),"Year")
df$Model <- m
df$conv <- ifelse(df.mods$pd_hessian[m]=="Yes",1,0)
df$sel_mod = sel_mod[m]
df$sel_cor = sel_cor[m]
df.selAA <- rbind(df.selAA, df)
}
df <- df.selAA %>% pivot_longer(-c(Year,Model,conv,sel_mod,sel_cor),
names_to = "Age",
names_prefix = "Age_",
names_transform = list(Age = as.integer),
values_to = "Selectivity")
df$Age <- as.integer(df$Age)
df$sel_mod <- factor(df$sel_mod, levels=c("Logistic","Age-specific"))
df$sel_cor <- factor(df$sel_cor, levels=c("None","IID","AR1","AR1_y","2D AR1"))
df$conv = factor(df$conv)
Now plot selectivity-at-age for block 1 (fleet) in all models.
print(ggplot(df, aes(x=Year, y=Age)) +
geom_tile(aes(fill=Selectivity)) +
geom_label(aes(x=Year, y=Age, label=lab), size=5, alpha=1, #fontface = "bold",
data=data.frame(Year=1975.5, Age=5.8, lab=paste0("m",1:length(mods)), sel_mod=sel_mod, sel_cor=sel_cor)) +
facet_grid(rows=vars(sel_cor), cols=vars(sel_mod)) +
scale_fill_viridis() +
theme_bw() +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)))
When fitting age-specific selectivity, oftentimes some of the (mean, \(\mu^s_a\)) selectivity parameters need to be fixed for the model to converge. The specifications used here follow this procedure:
NaN
estimates of their standard error:
mod$parList$logit_selpars # mean sel pars
mod$input$map$sel_repars # if time-varying selectivity turned on
mod$rep$selAA # list of annual selectivity-at-age by block
mod$sdrep # look for sel pars with NaN standard errors