In this vignette we walk through an example using the wham (WHAM = Woods Hole Assessment Model) package to run a state-space age-structured stock assessment model. WHAM is a generalization of code written for Miller et al. (2016) and Xu et al. (2018), and in this example we apply WHAM to the same stock, Southern New England / Mid-Atlantic Yellowtail Flounder.

Here we assume you already have wham installed. If not, see the Introduction. This is the 4th wham example, which builds off model m4 from example 1:

  • full state-space model (numbers-at-age are random effects for all ages, NAA_re = list(sigma='rec+1',cor='iid'))

  • logistic normal age compositions, treating observations of zero as missing (input$data$age_comp_model_fleets = 7 and input$data$age_comp_model_indices = 7)

  • random-about-mean recruitment (recruit_model = 2)

  • no environmental covariate (ecov = NULL)

  • 2 indices

  • fit to 1973-2016 data

In example 4, we demonstrate the time-varying selectivity options in WHAM for both logistic and age-specific selectivity:

  • none: time-constant

  • iid: parameter- and year-specific (random effect) deviations from mean selectivity parameters

  • ar1: as above, but estimate correlation across logistic parameters

  • ar1_y: as above, but estimate correlation across years

  • 2dar1: as above, but estimate correlation across both years and parameters

Note that each of these options can be applied to any selectivity block (and therefore fleet/catch or index/survey).

1. Load data

Open R and load the wham package:

library(wham)

For a clean, runnable .R script, look at ex4_selectivity.R in the example_scripts folder of the wham package install:

wham.dir <- find.package("wham")
file.path(wham.dir, "example_scripts")

You can run this entire example script with:

write.dir <- "choose/where/to/save/output" # otherwise will be saved in working directory
source(file.path(wham.dir, "example_scripts", "ex4_selectivity.R"))

Let’s create a directory for this analysis:

# choose a location to save output, otherwise will be saved in working directory
write.dir <- "choose/where/to/save/output" # need to change
dir.create(write.dir)
setwd(write.dir)

We need the same data files as in example 1. Let’s copy ex1_SNEMAYT.dat to our analysis directory:

wham.dir <- find.package("wham")
file.copy(from=file.path(wham.dir,"extdata","ex1_SNEMAYT.dat"), to=write.dir, overwrite=FALSE)

Confirm you are in the correct directory and it has the required data files:

list.files()
#>  [1] "CPI.csv"                "ex1_SNEMAYT.dat"        "ex1_test_results.rds"  
#>  [4] "ex2_SNEMAYT.dat"        "ex2_test_results.rds"   "ex4_test_results.rds"  
#>  [7] "ex5_summerflounder.dat" "ex5_test_results.rds"   "ex6_test_results.rds"  
#> [10] "GSI.csv"

Read the ASAP3 .dat file into R and convert to input list for wham:

asap3 <- read_asap3_dat("ex1_SNEMAYT.dat")

2. Specify selectivity model options

We are going to run 9 models that differ only in the selectivity options:

# m1-m5 logistic, m6-m9 age-specific
sel_model <- c(rep("logistic",5), rep("age-specific",4))

# time-varying options for each of 3 blocks (b1 = fleet, b2-3 = indices)
sel_re <- list(c("none","none","none"), # m1-m5 logistic
                c("iid","none","none"),
                c("ar1","none","none"),
                c("ar1_y","none","none"),
                c("2dar1","none","none"),
                c("none","none","none"), # m6-m9 age-specific
                c("iid","none","none"),
                c("ar1_y","none","none"),
                c("2dar1","none","none"))
n.mods <- length(sel_re)

# summary data frame
df.mods <- data.frame(Model=paste0("m",1:n.mods),
    Selectivity=sel_model, # Selectivity model (same for all blocks)
    Block1_re=sapply(sel_re, function(x) x[[1]])) # Block 1 random effects
rownames(df.mods) <- NULL

df.mods
#>   Model  Selectivity Block1_re
#> 1    m1     logistic      none
#> 2    m2     logistic       iid
#> 3    m3     logistic       ar1
#> 4    m4     logistic     ar1_y
#> 5    m5     logistic     2dar1
#> 6    m6 age-specific      none
#> 7    m7 age-specific       iid
#> 8    m8 age-specific     ar1_y
#> 9    m9 age-specific     2dar1

3. Setup and run models

The ASAP data file specifies selectivity options (model, initial parameter values, which parameters to fix/estimate). WHAM uses these by default in order to facilitate running ASAP models. To see the currently specified selectivity options in asap3:

asap3$dat$sel_block_assign # 1 fleet, all years assigned to block 1
#> [[1]]
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [39] 1 1 1 1 1 1
# by default each index gets its own selectivity block (here, blocks 2 and 3)

asap3$dat$sel_block_option # fleet selectivity (1 block), 2 = logistic
#> [1] 2
asap3$dat$index_sel_option # index selectivity (2 blocks), 2 = logistic
#> [1] 2 2

asap3$dat$sel_ini # fleet sel initial values (col1), estimation phase (-1 = fix)
#> [[1]]
#>       [,1] [,2] [,3] [,4]
#>  [1,] 0.10    1    0    1
#>  [2,] 1.00   -1    0    1
#>  [3,] 0.42    1    0    1
#>  [4,] 0.58    1    0    1
#>  [5,] 0.74    1    0    1
#>  [6,] 0.90    1    0    1
#>  [7,] 1.00    2    0    1
#>  [8,] 0.90    3    0    1
#>  [9,] 0.50    1    0    1
#> [10,] 0.60    1    0    1
#> [11,] 4.00    1    0    1
#> [12,] 1.10    1    0    1
asap3$dat$index_sel_ini # index sel initial values (col1), estimation phase (-1 = fix)
#> [[1]]
#>       [,1] [,2] [,3] [,4]
#>  [1,] 0.10    1    0    1
#>  [2,] 0.26    1    0    1
#>  [3,] 1.00   -1    0    1
#>  [4,] 0.58    1    0    1
#>  [5,] 0.74    1    0    1
#>  [6,] 0.90    1    0    1
#>  [7,] 1.50    2    0    1
#>  [8,] 0.90    3    0    1
#>  [9,] 0.75    1    0    1
#> [10,] 0.60    1    0    1
#> [11,] 4.00    1    0    1
#> [12,] 1.10    1    0    1
#> 
#> [[2]]
#>       [,1] [,2] [,3] [,4]
#>  [1,] 0.10    1    0    1
#>  [2,] 1.00   -1    0    1
#>  [3,] 0.42    1    0    1
#>  [4,] 0.58    1    0    1
#>  [5,] 0.74    1    0    1
#>  [6,] 0.90    1    0    1
#>  [7,] 1.00    2    0    1
#>  [8,] 0.90    3    0    1
#>  [9,] 0.50    1    0    1
#> [10,] 0.60    1    0    1
#> [11,] 4.00    1    0    1
#> [12,] 1.10    1    0    1

When we specify the WHAM model with prepare_wham_input(), we can overwrite the selectivity options from the ASAP data file with the optional list argument selectivity. The selectivity model is chosen via selectivity$model:

Model selectivity$model No. Parameters
Age-specific "age-specific" n_ages
Logistic (increasing) "logistic" 2
Double logistic (dome) "double-logistic" 4
Logistic (decreasing) "decreasing-logistic" 2

Regardless of the selectivity model used, we incorporate time-varying selectivity by estimating a mean for each selectivity parameter, \(\mu^{s}_a\), and (random effect) deviations from the mean, \(\delta_{a,y}\). We then estimate the selectivity parameters, \(s_{a,y}\), on the logit-scale with (possibly) lower and upper limits: \[s_{a,y} = \mathrm{lower} + \frac{\mathrm{upper} - \mathrm{lower}}{1 + e^{-(\mu^{s}_a + \delta_{a,y})}}\]

The deviations, \(\boldsymbol{\delta}\), follow a 2-dimensional AR(1) process defined by the parameters \(\sigma^2_s\), \(\rho_a\), and \(\rho_y\): \[\boldsymbol{\delta} \sim \mathrm{MVN}(0,\Sigma)\] \[\Sigma = \sigma^2_s(\mathrm{R}_a \otimes \mathrm{R}_y)\] \[R_{a,a^*} = \rho_a^{\vert a - a^* \vert}\] \[R_{y,y^*} = \rho_y^{\vert y - y^* \vert}\]

Mean selectivity parameters can be initialized at different values from the ASAP file with selectivity$initial_pars. Parameters can be fixed at their initial values by specifying selectivity$fix_pars. Finally, we specify any time-varying (random effects) on selectivity parameters (selectivity$re):

selectivity$re Deviations from mean Estimated parameters
"none" time-constant (no deviation)
"iid" independent, identically-distributed \(\sigma^2\)
"ar1" autoregressive-1 (correlated across ages/parameters) \(\sigma^2\), \(\rho_a\)
"ar1_y" autoregressive-1 (correlated across years) \(\sigma^2\), \(\rho_y\)
"2dar1" 2D AR1 (correlated across both years and ages/parameters) \(\sigma^2\), \(\rho_a\), \(\rho_y\)

Now we can run the above models in a loop:

inv.logit <- function(x) exp(x)/(1+exp(x))
for(m in 1:n.mods){
    if(sel_model[m] == "logistic"){ # logistic selectivity
        # overwrite initial parameter values in ASAP data file (ex1_SNEMAYT.dat)
        input <- prepare_wham_input(asap3, model_name=paste(paste0("Model ",m), sel_model[m], paste(sel_re[[m]], collapse="-"), sep=": "), recruit_model=2,
                    selectivity=list(model=rep("logistic",3), re=sel_re[[m]], initial_pars=list(c(1.5,0.2),c(2,0.2),c(2,0.2))),
                    NAA_re = list(sigma='rec+1',cor='iid'))
    } else { # age-specific selectivity
        # # don't fix any ages
        # input <- prepare_wham_input(asap3, model_name=paste(paste0("Model ",m), sel_model[m], paste(sel_re[[m]], collapse="-"), sep=": "), recruit_model=2, 
        #           selectivity=list(model=rep("age-specific",3), re=sel_re[[m]], 
        #               initial_pars=list(rep(0.5,6), rep(0.5,6), rep(0.5,6))),
        #           NAA_re = list(sigma='rec+1',cor='iid'))

        # fix selectivity for one age per block: 4 / 4 / 2
        input <- prepare_wham_input(asap3, model_name=paste(paste0("Model ",m), sel_model[m], paste(sel_re[[m]], collapse="-"), sep=": "), recruit_model=2,
                    selectivity=list(model=rep("age-specific",3), re=sel_re[[m]],
                        initial_pars=list(c(inv.logit(-4),0.5,0.5,1,0.5,0.5),c(0.5,0.5,0.5,1,0.5,0.5),c(0.5,1,0.5,0.5,0.5,0.5)),
                        fix_pars=list(4,4,2)),
                    NAA_re = list(sigma='rec+1',cor='iid'))
    }

    # overwrite age comp model (all models use logistic normal)
    input$data$age_comp_model_indices = rep(7, input$data$n_indices)
    input$data$age_comp_model_fleets = rep(7, input$data$n_fleets)
    input$data$n_age_comp_pars_indices = rep(1, input$data$n_indices)
    input$data$n_age_comp_pars_fleets = rep(1, input$data$n_fleets)
    input$par$index_paa_pars = rep(0, input$data$n_indices)
    input$par$catch_paa_pars = rep(0, input$data$n_fleets)
    input$map = input$map[!(names(input$map) %in% c("index_paa_pars", "catch_paa_pars"))]

    # fit model
    mod <- fit_wham(input, do.osa=F, do.proj=F, do.retro=F)
    saveRDS(mod, file=paste0("m",m,".rds"))
}

4. Model convergence and comparison

Collect all models into a list:

mod.list <- file.path(write.dir,paste0("m",1:n.mods,".rds"))
mods <- lapply(mod.list, readRDS)

Check which models converged. Note that 3 of the 9 models did NOT converge:

sapply(mods, function(x) check_convergence(x))
#> [[1]]
#> [1] "stats:nlminb thinks the model has converged: mod$opt$convergence == 0"
#> [2] "Maximum gradient component: 7.53e-10 "                                
#> [3] "Max gradient parameter: F_devs "                                      
#> [4] "TMB:sdreport() was performed successfully for this model"             
#> 
#> [[2]]
#> [1] "stats:nlminb thinks the model has converged: mod$opt$convergence == 0"
#> [2] "Maximum gradient component: 7.96e-13 "                                
#> [3] "Max gradient parameter: logit_q "                                     
#> [4] "TMB:sdreport() was performed successfully for this model"             
#> 
#> [[3]]
#> [1] "stats:nlminb thinks the model has converged: mod$opt$convergence == 0"
#> [2] "Maximum gradient component: 5.84e-09 "                                
#> [3] "Max gradient parameter: sel_repars "                                  
#> [4] "TMB:sdreport() was performed successfully for this model"             
#> 
#> [[4]]
#> [1] "stats:nlminb thinks the model has NOT converged: mod$opt$convergence != 0"             
#> [2] "Maximum gradient component: NaN "                                                      
#> [3] "Max gradient parameter:  "                                                             
#> [4] "TMB:sdreport() was performed for this model, but it appears hessian was not invertible"
#> 
#> [[5]]
#> [1] "stats:nlminb thinks the model has converged: mod$opt$convergence == 0"
#> [2] "Maximum gradient component: 1.60e-12 "                                
#> [3] "Max gradient parameter: F_devs "                                      
#> [4] "TMB:sdreport() was performed successfully for this model"             
#> 
#> [[6]]
#> [1] "stats:nlminb thinks the model has converged: mod$opt$convergence == 0"
#> [2] "Maximum gradient component: 1.92e-11 "                                
#> [3] "Max gradient parameter: log_F1 "                                      
#> [4] "TMB:sdreport() was performed successfully for this model"             
#> 
#> [[7]]
#> [1] "stats:nlminb thinks the model has NOT converged: mod$opt$convergence != 0"
#> [2] "Maximum gradient component: 1.17e+05 "                                    
#> [3] "Max gradient parameter: logit_selpars "                                   
#> [4] "TMB:sdreport() was performed successfully for this model"                 
#> 
#> [[8]]
#> [1] "stats:nlminb thinks the model has converged: mod$opt$convergence == 0"
#> [2] "Maximum gradient component: 1.38e-06 "                                
#> [3] "Max gradient parameter: log_F1 "                                      
#> [4] "TMB:sdreport() was performed successfully for this model"             
#> 
#> [[9]]
#> [1] "stats:nlminb thinks the model has NOT converged: mod$opt$convergence != 0"             
#> [2] "Maximum gradient component: NaN "                                                      
#> [3] "Max gradient parameter:  "                                                             
#> [4] "TMB:sdreport() was performed for this model, but it appears hessian was not invertible"

Plot output for models that converged (in a subfolder for each model):

conv <- sapply(mods, function(x) x$opt$convergence == 0)
conv_mods <- (1:n.mods)[conv] # 0 means converged
for(m in conv_mods){
    plot_wham_output(mod=mods[[m]], out.type='html', dir.main=file.path(write.dir,paste0("m",m)))
}

Compare the models using AIC:

df.aic <- as.data.frame(compare_wham_models(mods, sort=FALSE, calc.rho=FALSE)$tab)
df.aic[!conv,] = NA # don't get AIC for unconverged models
minAIC <- min(df.aic$AIC, na.rm=T)
df.aic$dAIC <- round(df.aic$AIC - minAIC,1)
df.mods <- cbind(data.frame(Model=paste0("m",1:n.mods), Selectivity=sel_model,
                            "Block1_re"=sapply(sel_re, function(x) x[[1]]),
                            "Converged"= ifelse(conv, "Yes", "No"),
                            "NLL"=sapply(mods, function(x) round(x$opt$objective,3)),
                            "runtime"=sapply(mods, function(x) x$runtime)), df.aic)
rownames(df.mods) <- NULL
df.mods
#>   Model  Selectivity Block1_re Converged       NLL runtime dAIC     AIC
#> 1    m1     logistic      none       Yes  -788.748    0.20 54.5 -1449.5
#> 2    m2     logistic       iid       Yes  -811.603    0.28 10.8 -1493.2
#> 3    m3     logistic       ar1       Yes  -788.748    0.31 58.5 -1445.5
#> 4    m4     logistic     ar1_y        No  -815.888    0.34   NA      NA
#> 5    m5     logistic     2dar1       Yes  -818.980    0.31  0.0 -1504.0
#> 6    m6 age-specific      none       Yes  -806.515    0.19 37.0 -1467.0
#> 7    m7 age-specific       iid        No 98222.812    3.99   NA      NA
#> 8    m8 age-specific     ar1_y       Yes  -811.522    0.31 31.0 -1473.0
#> 9    m9 age-specific     2dar1        No       NaN    1.68   NA      NA

Prepare to plot selectivity-at-age for block 1 (fleet).

library(tidyverse)
library(viridis)

selAA <- lapply(mods, function(x) x$report()$selAA[[1]])
sel_mod <- c("Age-specific","Logistic")[sapply(mods, function(x) x$env$data$selblock_models[1])]
sel_cor <- c("None","IID","AR1","AR1_y","2D AR1")[sapply(mods, function(x) x$env$data$selblock_models_re[1])]
df.selAA <- data.frame(matrix(NA, nrow=0, ncol=11))
colnames(df.selAA) <- c(paste0("Age_",1:6),"Year","Model","conv","sel_mod","sel_cor")
for(m in 1:n.mods){
    df <- as.data.frame(selAA[[m]])
    df$Year <- input$years
    colnames(df) <- c(paste0("Age_",1:6),"Year")
    df$Model <- m
    df$conv <- ifelse(df.mods$Converged[m]=="Yes",1,0)
    df$sel_mod = sel_mod[m]
    df$sel_cor = sel_cor[m]
    df.selAA <- rbind(df.selAA, df)
}
df <- df.selAA %>% pivot_longer(-c(Year,Model,conv,sel_mod,sel_cor),
                names_to = "Age",
                names_prefix = "Age_",
                names_ptypes = list(Age = integer()),
                values_to = "Selectivity")
df$sel_mod <- factor(df$sel_mod, levels=c("Logistic","Age-specific"))
df$sel_cor <- factor(df$sel_cor, levels=c("None","IID","AR1","AR1_y","2D AR1"))

Now plot selectivity-at-age for block 1 (fleet) in all models, with unconverged models pale and converged models solid.

print(ggplot(df, aes(x=Year, y=Age)) +
    geom_tile(aes(fill=Selectivity, alpha=factor(conv))) +
    scale_alpha_discrete(range=c(0.4,1), guide=FALSE) +
    theme_bw() +
    geom_label(aes(x=Year, y=Age, label=lab), size=5, alpha=1, #fontface = "bold",
      data=data.frame(Year=1975.5, Age=5.8, lab=paste0("m",1:length(mods)), sel_mod=sel_mod, sel_cor=sel_cor)) +
    scale_x_continuous(expand=c(0,0)) +
    scale_y_continuous(expand=c(0,0)) +
    facet_grid(rows=vars(sel_cor), cols=vars(sel_mod)) +
    scale_fill_viridis())

A note on convergence

When fitting age-specific selectivity, oftentimes some of the (mean, \(\mu^{s}_a\)) selectivity parameters need to be fixed for the model to converge. The specifications used here follow this procedure:

  1. Fit the model without fixing any selectivity parameters.
  2. If the model fails to converge or the hessian is not invertible (i.e. not positive definite), look for mean selectivity parameters that are very close to 0 or 1 (> 5 or < -5 on the logit scale) and/or have NaN estimates of their standard error:
mod$rep$logit_selpars # mean sel pars
mod$rep$sel_repars # if time-varying selectivity turned on
mod$rep$selAA # selectivity-at-age by block
mod$sdrep # look for sel pars with NaN standard errors
  1. Re-run the model fixing the worst selectivity-at-age parameter for each block at 0 or 1 as appropriate. In the above age-specific models, we initialized and fixed ages 4, 4, and 2 for blocks 1-3, respectively. Sometimes just initializing the worst parameter is enough, without fixing it.
        input <- prepare_wham_input(asap3, model_name=paste(paste0("Model ",m), sel_model[m], paste(sel_re[[m]], collapse="-"), sep=": "), recruit_model=2,
                    selectivity=list(model=rep("age-specific",3), re=sel_re[[m]],
                        initial_pars=list(c(inv.logit(-4),0.5,0.5,1,0.5,0.5),c(0.5,0.5,0.5,1,0.5,0.5),c(0.5,1,0.5,0.5,0.5,0.5)),
                        fix_pars=list(4,4,2)),
                    NAA_re = list(sigma='rec+1',cor='iid'))
  1. The goal is to find a set of selectivity parameter initial/fixed values that allow all nested models to converge. Fixing parameters should not affect the NLL much, and any model that is a superset of another should not have a greater NLL (indicates not converged to global minimum). The following commands may be helpful:
mod.list <- file.path(getwd(),paste0("m",1:n.mods,".rds"))
mods <- lapply(mod.list, readRDS)
sapply(mods, function(x) check_convergence(x))
sapply(mods, function(x) x$opt$obj) # get NLL 
lapply(mods, function(x) x$rep$logit_selpars)
lapply(mods, function(x) x$rep$sel_repars)