Ex 4: Selectivity with time- and age-varying random effects • wham

In this vignette we walk through an example using the wham (WHAM = Woods Hole Assessment Model) package to run a state-space age-structured stock assessment model. WHAM is a generalization of code written for Miller et al. (2016) and Xu et al. (2018), and in this example we apply WHAM to the same stock, Southern New England / Mid-Atlantic Yellowtail Flounder.

Here we assume you already have wham installed. If not, see the Introduction. This is the 4th wham example, which builds off model m4 from example 1:

full state-space model (numbers-at-age are random effects for all ages, NAA_re = list(sigma='rec+1',cor='iid'))
logistic normal age compositions, treating observations of zero as missing (age_comp = "logistic-normal-miss0")
random-about-mean recruitment (recruit_model = 2)
no environmental covariate (ecov = NULL)
2 indices
fit to 1973-2016 data

In example 4, we demonstrate the time-varying selectivity options in WHAM for both logistic and age-specific selectivity:

none: time-constant
iid: parameter- and year-specific (random effect) deviations from mean selectivity parameters
ar1: as above, but estimate correlation across logistic parameters
ar1_y: as above, but estimate correlation across years
2dar1: as above, but estimate correlation across both years and parameters

Note that each of these options can be applied to any selectivity block (and therefore fleet/catch or index/survey).

1. Load data

Open R and load the wham package:

library(wham)

For a clean, runnable .R script, look at ex4_selectivity.R in the example_scripts folder of the wham package. You can run this entire example script with:

wham.dir <- find.package("wham")
source(file.path(wham.dir, "example_scripts", "ex4_selectivity.R"))

Let’s create a directory for this analysis:

# choose a location to save output, otherwise will be saved in working directory
write.dir <- "choose/where/to/save/output" # need to change e.g., tempdir(check=TRUE)
dir.create(write.dir)
setwd(write.dir)

We need the same data files as in example 1. Read in ex1_SNEMAYT.dat:

asap3 <- read_asap3_dat(file.path(wham.dir,"extdata","ex1_SNEMAYT.dat"))

2. Specify selectivity model options

We are going to run 8 models that differ only in the selectivity options:

# m1-m5 logistic, m6-m9 age-specific
sel_model <- c(rep("logistic",3), rep("age-specific",5))

# time-varying options for each of 3 blocks (b1 = fleet, b2-3 = indices)
sel_re <- list(c("none","none","none"), # m1-m4 logistic
                c("iid","none","none"),
#               c("ar1","none","none"), #can't get a converged model for logistic with two re and two fe for
                c("2dar1","none","none"),
                c("none","none","none"), # m4-m8 age-specific
                c("iid","none","none"),
                c("ar1","none","none"), #will map all age-specific (mean) parameters to one estimated value
                c("ar1_y","none","none"),
                c("2dar1","none","none"))
n.mods <- length(sel_re)

# summary data frame
df.mods <- data.frame(Model=paste0("m",1:n.mods), 
    Selectivity=sel_model, # Selectivity model (same for all blocks)
    Block_1_re=sapply(sel_re, function(x) x[[1]])) # Block 1 random effects
rownames(df.mods) <- NULL

df.mods
#>   Model  Selectivity Block_1_re
#> 1    m1     logistic       none
#> 2    m2     logistic        iid
#> 3    m3     logistic      2dar1
#> 4    m4 age-specific       none
#> 5    m5 age-specific        iid
#> 6    m6 age-specific        ar1
#> 7    m7 age-specific      ar1_y
#> 8    m8 age-specific      2dar1

3. Setup and run models

The ASAP data file specifies selectivity options (model, initial parameter values, which parameters to fix/estimate). WHAM uses these by default in order to facilitate running ASAP models. To see the currently specified selectivity options in asap3:

asap3$dat$sel_block_assign # 1 fleet, all years assigned to block 1
#> NULL
# by default each index gets its own selectivity block (here, blocks 2 and 3)

asap3$dat$sel_block_option # fleet selectivity (1 block), 2 = logistic
#> NULL
asap3$dat$index_sel_option # index selectivity (2 blocks), 2 = logistic
#> NULL

asap3$dat$sel_ini # fleet sel initial values (col1), estimation phase (-1 = fix)
#> NULL
asap3$dat$index_sel_ini # index sel initial values (col1), estimation phase (-1 = fix)
#> NULL

When we specify the WHAM model with prepare_wham_input(), we can overwrite the selectivity options from the ASAP data file with the optional list argument selectivity. The selectivity model is chosen via selectivity$model:

Model	`selectivity$model`	No. Parameters
Age-specific	`"age-specific"`	`n_ages`
Logistic (increasing)	`"logistic"`	2
Double logistic (dome)	`"double-logistic"`	4
Logistic (decreasing)	`"decreasing-logistic"`	2

Regardless of the selectivity model used, we incorporate time-varying selectivity by estimating a mean for each selectivity parameter, $\mu^{s}_a$, and (random effect) deviations from the mean, $\delta_{a,y}$. We then estimate the selectivity parameters, $s_{a,y}$, on the logit-scale with (possibly) lower and upper limits: \[s_{a,y} = \mathrm{lower} + \frac{\mathrm{upper} - \mathrm{lower}}{1 + e^{-(\mu^{s}_a + \delta_{a,y})}}\]

The deviations, $\boldsymbol{\delta}$, follow a 2-dimensional AR(1) process defined by the parameters $\sigma^2_s$, $\rho_a$, and $\rho_y$: \[\boldsymbol{\delta} \sim \mathrm{MVN}(0,\Sigma)\] \[\Sigma = \sigma^2_s(\mathrm{R}_a \otimes \mathrm{R}_y)\] \[R_{a,a^*} = \rho_a^{\vert a - a^* \vert}\] \[R_{y,y^*} = \rho_y^{\vert y - y^* \vert}\]

Mean selectivity parameters can be initialized at different values from the ASAP file with selectivity$initial_pars. Parameters can be fixed at their initial values by specifying selectivity$fix_pars. Finally, we specify any time-varying (random effects) on selectivity parameters (selectivity$re):

`selectivity$re`	Deviations from mean	Estimated parameters
`"none"`	time-constant (no deviation)
`"iid"`	independent, identically-distributed	$\sigma^2$
`"ar1"`	autoregressive-1 (correlated across ages/parameters)	$\sigma^2$, $\rho_a$
`"ar1_y"`	autoregressive-1 (correlated across years)	$\sigma^2$, $\rho_y$
`"2dar1"`	2D AR1 (correlated across both years and ages/parameters)	$\sigma^2$, $\rho_a$, $\rho_y$

We will loop over and fit each of the models, but first lets define some initial parameter values and which will be fixed. This is needed specifically for the age-specific selectivity models.

#initial pars for logistic or age-specfic selectivity
initial_pars <- c(rep(list(list(c(2,0.3),c(2,0.3),c(2,0.3))),3), rep(list(list(c(0.1,0.5,0.5,1,1,1),c(0.5,0.5,0.5,1,1,0.5),c(0.5,0.5,1,1,1,1))),5))
#which pars to fix for age-specific selectivity
fix_pars <- c(list(NULL,NULL,NULL), rep(list(list(4:6,4:5,3:6)),5))

Now we can run the above models in a loop:

#make a list of fits to fill in
mods <- list()

for(m in 1:n.mods){
    # overwrite initial parameter values in ASAP data file (ex1_SNEMAYT.dat)
    input <- prepare_wham_input(asap3, model_name=paste(paste0("Model ",m), sel_model[m], paste(sel_re[[m]], collapse="-"), sep=": "), recruit_model=2,
                selectivity=list(model=rep(sel_model[m],3), re=sel_re[[m]], initial_pars=initial_pars[[m]], fix_pars = fix_pars[[m]]),
                NAA_re = list(sigma='rec+1',cor='iid'),
                age_comp = "logistic-normal-miss0") # logistic normal, treat 0 obs as missing

    # fit model
    mods[[m]] <- fit_wham(input, do.check=T, do.osa=F, do.retro=F) 
}

#save the model fits
for(m in 1:length(mods)) saveRDS(mod[[m]], file=paste0("m",m,".rds"))

4. Model convergence and comparison

Check which models converged.

vign4_conv <- lapply(mods, function(x) capture.output(check_convergence(x)))
for(m in 1:n.mods) cat(paste0("Model ",m,":"), vign4_conv[[m]], "", sep='\n')

#> Model 1:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 6.24e-11 
#> Max gradient parameter: log_NAA_sigma 
#> TMB:sdreport() was performed successfully for this model
#> 
#> Model 2:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 8.81e-13 
#> Max gradient parameter: logit_q 
#> TMB:sdreport() was performed successfully for this model
#> 
#> Model 3:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 2.84e-13 
#> Max gradient parameter: logit_q 
#> TMB:sdreport() was performed successfully for this model
#> 
#> Model 4:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 1.71e-10 
#> Max gradient parameter: log_NAA_sigma 
#> TMB:sdreport() was performed successfully for this model
#> 
#> Model 5:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 5.27e-08 
#> Max gradient parameter: catch_paa_pars 
#> TMB:sdreport() was performed successfully for this model
#> 
#> Model 6:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 5.61e-13 
#> Max gradient parameter: logit_q 
#> TMB:sdreport() was performed successfully for this model
#> 
#> Model 7:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 1.08e-09 
#> Max gradient parameter: index_paa_pars 
#> TMB:sdreport() was performed successfully for this model
#> 
#> Model 8:
#> stats:nlminb thinks the model has converged: mod$opt$convergence == 0
#> Maximum gradient component: 3.58e-09 
#> Max gradient parameter: F_pars 
#> TMB:sdreport() was performed successfully for this model

Plot output for models that converged (in a subfolder for each model):

# Is Hessian positive definite?
ok_sdrep = sapply(mods, function(x) if(x$na_sdrep==FALSE & !is.na(x$na_sdrep)) 1 else 0)
pdHess <- as.logical(ok_sdrep)
# did stats::nlminb converge?
conv <- sapply(mods, function(x) x$opt$convergence == 0) # 0 means opt converged
conv_mods <- (1:n.mods)[pdHess] 
for(m in conv_mods){
    #html and png files by default
    plot_wham_output(mod=mods[[m]], dir.main=file.path(write.dir,paste0("m",m)))
}

Compare the models using AIC:

df.aic <- as.data.frame(compare_wham_models(mods, table.opts=list(sort=FALSE, calc.rho=FALSE))$tab)
df.aic[!pdHess,] = NA
minAIC <- min(df.aic$AIC, na.rm=T)
df.aic$dAIC <- round(df.aic$AIC - minAIC,1)
df.mods <- cbind(data.frame(Model=paste0("m",1:n.mods), Selectivity=sel_model,
                            "Block1_re"=sapply(sel_re, function(x) x[[1]]),
                            "opt_converged"= ifelse(conv, "Yes", "No"),
                            "pd_hessian"= ifelse(pdHess, "Yes", "No"),
                            "NLL"=sapply(mods, function(x) round(x$opt$objective,3)),
                            "Runtime"=sapply(mods, function(x) x$runtime)), df.aic)
rownames(df.mods) <- NULL
df.mods

#>   Model  Selectivity Block1_re opt_converged pd_hessian      NLL Runtime dAIC
#> 1    m1     logistic      none           Yes        Yes -788.748    0.73 54.5
#> 2    m2     logistic       iid           Yes        Yes -811.603    0.87 10.8
#> 3    m3     logistic     2dar1           Yes        Yes -818.980    0.89  0.0
#> 4    m4 age-specific      none           Yes        Yes -793.497    0.67 51.0
#> 5    m5 age-specific       iid           Yes        Yes -793.497    0.84 53.0
#> 6    m6 age-specific       ar1           Yes        Yes -785.042    0.85 67.9
#> 7    m7 age-specific     ar1_y           Yes        Yes -818.610    0.68  4.8
#> 8    m8 age-specific     2dar1           Yes        Yes -820.487    0.84  3.0
#>       AIC
#> 1 -1449.5
#> 2 -1493.2
#> 3 -1504.0
#> 4 -1453.0
#> 5 -1451.0
#> 6 -1436.1
#> 7 -1499.2
#> 8 -1501.0

Prepare to plot selectivity-at-age for block 1 (fleet).

library(tidyverse)
library(viridis)

selAA <- lapply(mods, function(x) x$report()$selAA[[1]])
sel_mod <- factor(c("Age-specific","Logistic")[sapply(mods, function(x) x$env$data$selblock_models[1])], levels=c("Logistic","Age-specific"))
sel_cor <- factor(c("None","IID","AR1","AR1_y","2D AR1")[sapply(mods, function(x) x$env$data$selblock_models_re[1])], levels=c("None","IID","AR1","AR1_y","2D AR1"))
df.selAA <- data.frame(matrix(NA, nrow=0, ncol=11))
colnames(df.selAA) <- c(paste0("Age_",1:6),"Year","Model","conv","sel_mod","sel_cor")
for(m in 1:n.mods){
    df <- as.data.frame(selAA[[m]])
    df$Year <- input$years
    colnames(df) <- c(paste0("Age_",1:6),"Year")
    df$Model <- m
    df$conv <- ifelse(df.mods$pd_hessian[m]=="Yes",1,0)
    df$sel_mod = sel_mod[m]
    df$sel_cor = sel_cor[m]
    df.selAA <- rbind(df.selAA, df)
}
df <- df.selAA |> pivot_longer(-c(Year,Model,conv,sel_mod,sel_cor),
                names_to = "Age",
                names_prefix = "Age_",
                names_transform = list(Age = as.integer),
                values_to = "Selectivity")
df$Age <- as.integer(df$Age)
df$sel_mod <- factor(df$sel_mod, levels=c("Logistic","Age-specific"))
df$sel_cor <- factor(df$sel_cor, levels=c("None","IID","AR1","AR1_y","2D AR1"))
df$conv = factor(df$conv)

Now plot selectivity-at-age for block 1 (fleet) in all models.

print(ggplot(df, aes(x=Year, y=Age)) +
    geom_tile(aes(fill=Selectivity)) +
    geom_label(aes(x=Year, y=Age, label=lab), size=5, alpha=1, #fontface = "bold",
      data=data.frame(Year=1975.5, Age=5.8, lab=paste0("m",1:length(mods)), sel_mod=sel_mod, sel_cor=sel_cor)) +    
    facet_grid(rows=vars(sel_cor), cols=vars(sel_mod)) +
    scale_fill_viridis() +
    theme_bw() +
    scale_x_continuous(expand=c(0,0)) +
    scale_y_continuous(expand=c(0,0)))

A note on convergence

When fitting age-specific selectivity, oftentimes some of the (mean, $\mu^s_a$) selectivity parameters need to be fixed for the model to converge. The specifications used here follow this procedure:

Fit the model without fixing any selectivity parameters.
If the model fails to converge or the hessian is not invertible (i.e. not positive definite), look for mean selectivity parameters that are very close to 0 or 1 (> 5 or < -5 on the logit scale) and/or have NaN estimates of their standard error:

mod$parList$logit_selpars # mean sel pars
mod$input$map$sel_repars # if time-varying selectivity turned on
mod$rep$selAA # list of annual selectivity-at-age by block
mod$sdrep # look for sel pars with NaN standard errors

Re-run the model fixing the worst selectivity-at-age parameter for each block at 0 or 1 as appropriate. In the above age-specific models, we initially just fixed age 4 in block 2. The logit scale selectivity for age 5 for that block was around 20 for at least one model, indicating that they too should be fixed at 1. Sometimes just initializing the worst parameter is enough, without fixing it.
The goal is to find a set of selectivity parameter initial/fixed values that allow all nested models to converge. Fixing parameters should not affect the NLL much, and any model that is a superset of another should not have a greater NLL (indicates not converged to global minimum). The following commands may be helpful:

mod.list <- file.path(getwd(),paste0("m",1:n.mods,".rds"))
mods <- lapply(mod.list, readRDS)
sapply(mods, function(x) check_convergence(x))
sapply(mods, function(x) x$opt$obj) # get NLL 
lapply(mods, function(x) x$parList$logit_selpars)
lapply(mods, function(x) x$input$map$sel_repars)