After the data file is read into R by read_asap3_dat, this function prepares the data and parameter settings for fit_wham. By default, this will set up a SCAA version like ASAP. As of version 1.0.5, if an asap3 object is not provided, a dummy input is generated with some arbitrary assumptions. The various options described below, such as NAA_re and selectivity, can still be used without the asap3 object.

prepare_wham_input(
  asap3 = NULL,
  model_name = "WHAM for unnamed stock",
  recruit_model = 2,
  ecov = NULL,
  selectivity = NULL,
  M = NULL,
  NAA_re = NULL,
  catchability = NULL,
  age_comp = NULL,
  basic_info = NULL
)

Arguments

asap3

(optional) list containing data and parameters (output from read_asap3_dat)

model_name

character, name of stock/model

recruit_model

numeric, option to specify stock-recruit model (see details)

ecov

(optional) named list of environmental covariate data and parameters (see details)

selectivity

(optional) list specifying selectivity options by block: models, initial values, parameters to fix, and random effects (see details)

M

(optional) list specifying natural mortality options: model, random effects, initial values, and parameters to fix (see details)

NAA_re

(optional) list specifying options for random effect on numbers-at-age, initial numbers at age, and recruitment model (see details)

catchability

(optional) list specifying options for priors and random effects on catchability (see details)

age_comp

(optional) character or named list, specifies age composition model for fleet(s) and indices (see details)

basic_info

(optional) list of basic population information for use when asap3 is not provided (see details)

Value

a named list with the following components:

data

Named list of data, passed to TMB::MakeADFun

par

Named list of parameters, passed to TMB::MakeADFun

map

Named list defining how to optionally collect and fix parameters, passed to TMB::MakeADFun

random

Character vector of parameters to treat as random effects, passed to TMB::MakeADFun

years

Numeric vector of years to fit WHAM model (specified in ASAP3 .dat file)

ages.lab

Character vector of age labels, ending with plus-group (specified in ASAP3 .dat file)

model_name

Character, name of stock/model (specified in call to prepare_wham_input)

Details

recruit_model specifies the stock-recruit model. See wham.cpp for implementation.

= 1

SCAA (without NAA_re option specified) or Random walk (if NAA_re$sigma specified), i.e. predicted recruitment in year i = recruitment in year i-1

= 2

(default) Random about mean, i.e. steepness = 1

= 3

Beverton-Holt

= 4

Ricker

Note: we allow fitting a SCAA model (NAA_re = NULL), which estimates recruitment in every year as separate fixed effect parameters, but in that case no stock-recruit function is estimated. A warning message is printed if recruit_model > 2 and NAA_re = NULL. If you wish to use a stock-recruit function for expected recruitment, choose recruitment deviations as random effects, either only age-1 (NAA_re = list(sigma="rec")) or all ages (NAA_re = list(sigma="rec+1"), "full state-space" model). See below for details on NAA_re specification.

ecov specifies any environmental covariate data and model. Environmental covariate data need not span the same years as the fisheries data. It can be NULL if no environmental data are to be fit. Otherwise, it must be a named list with the following components:

$label

Name(s) of the environmental covariate(s). Used in printing.

$mean

Mean observations (matrix). number of years x number of covariates. Missing values = NA.

$logsigma

Configure observation standard errors. Options:

Matrix of \(log\) standard errors with same dimensions as $mean

Specified values for each time step

log standard errors for each covariate, numeric vector or matrix w/ dim 1 x n.ecov

Specified value the same for all time steps

estimation option (for all covariates). character string:

"est_1": Estimated, one value shared among time steps. "est_re": Estimated value for each time step as random effects with two parameters (mean, var)

list of two elements.

First is the matrix of log standard errors or the vector of single values for each covariate as above. Second is a character vector of estimation options (NA, "est_1","est_re") for each covariate. For covariates with non-NA values, values in the first element are ignored.

$year

Years corresponding to observations (vector of same length as $mean and $logsigma)

$use_obs

T/F (or 0/1) vector/matrix of the same dimension as $mean and $logsigma. Use the observation? Can be used to ignore subsets of the ecov without changing data files.

$lag

integer vector of offsets between the ecov observation/process and their affect on the stock. I.e. if SST in year t affects recruitment in year t + 1, set lag = 1. May also be a list (length=n_Ecov) of vectors (length = 2+n_indices) if multiple effects of one or more Ecovs are modeled.

$process_model

Process model for the ecov time-series. "rw" = random walk, "ar1" = 1st order autoregressive, NA = do not fit

$where

Character vector for where each ecov affects the population. "recruit" = recruitment, "M" = natural mortality, "q" = catchability for indices, "none" = fit ecov process model(s) but without an effect on the population. May also be a list (element for each ecov) of character vectors ("none", "recruit", "M", and/or "q") so each ecov can multiple effects.

$indices

indices that each ecov affects. Must be a list (length = n_Ecov), where each element is a vector of indices (1:n_indices). Must be provided when any of where = "q"

$link_model

vector of (orthogonal polynomial order) models for effects of each ecov on the $where process. Options: "none", "linear" (default) or "poly-x" where x = 2, ... (e.g. "poly-2" specifies a quadratic model, \(b_0 + b_1*ecov + b_2*ecov^2 + ...\)). Or a list (length = n_Ecov) of character vectors (same options) for modeling effects on (1) recruitment, (2) M, and catchabilities for (3) index 1,..., (2+n_indices) index n_indices (length of the vector is 2 + n_indices).

$ages

Ages that each ecov affects. Must be a list of length n.ecov, where each element is a vector of ages. Default = list(1:n.ages) to affect all ages.

$how

How does the ecov affect the $where process? An integer vector (length = n_Ecov). If corresponding $where = "none", then this is ignored. These options are specific to the $where process.

Recruitment options (see Iles & Beverton (1998)):

= 0

none (but fit ecov time-series model in order to compare AIC)

= 1

"controlling" (dens-indep mortality)

= 2

"limiting" (carrying capacity, e.g. ecov determines amount of suitable habitat)

= 3

"lethal" (threshold, i.e. R --> 0 at some ecov value)

= 4

"masking" (metabolic/growth, decreases dR/dS)

= 5

"directive" (e.g. behavioral)

M options:

= 0

none (but fit ecov time-series model in order to compare AIC)

= 1

effect on mean M (shared across ages)

Catchability options:

= 0

none (but fit ecov time-series model in order to compare AIC)

= 1

effect on one or more catchabilities (see $indices))

selectivity specifies options for selectivity, to overwrite existing options specified in the ASAP data file. If NULL, selectivity options from the ASAP data file are used. selectivity is a list with the following entries:

$model

Selectivity model for each block. Vector with length = number of selectivity blocks. Each entry must be one of: "age-specific", "logistic", "double-logistic", or "decreasing-logistic".

$re

Time-varying (random effects) for each block. Vector with length = number of selectivity blocks. If NULL, selectivity parameters in all blocks are constant over time and uncorrelated. Each entry of selectivity$re must be one of the following options, where the selectivity parameters are:

"none"

(default) are constant and uncorrelated

"iid"

vary by year and age/par, but uncorrelated

"ar1"

correlated by age/par (AR1), but not year

"ar1_y"

correlated by year (AR1), but not age/par

"2dar1"

correlated by year and age/par (2D AR1)

$initial_pars

Initial parameter values for each block. List of length = number of selectivity blocks. Each entry must be a vector of length # parameters in the block, i.e. c(2,0.2) for logistic or c(0.5,0.5,0.5,1,1,0.5) for age-specific with 6 ages. Default is to set at middle of parameter range. This is 0.5 for age-specific and n.ages/2 for logistic, double-logistic, and decreasing-logistic.

$fix_pars

Which parameters to fix at initial values. List of length = number of selectivity blocks. E.g. model with 3 age-specific blocks and 6 ages, list(c(4,5),4,c(2,3,4)) will fix ages 4 and 5 in block 1, age 4 in block 2, and ages 2, 3, and 4 in block 3.

$n_selblocks

How many selectivity blocks. Optional. If unspecified and no asap3 object, then this is set to the number of fleets + indices. If specified, ensure other components of selectivity are consistent.

M specifies estimation options for natural mortality and can overwrite M-at-age values specified in the ASAP data file. If NULL, the M-at-age matrix from the ASAP data file is used (M fixed, not estimated). To estimate M-at-age shared/mirrored among some but not all ages, modify input$map$M_a after calling prepare_wham_input (see vignette for more details). M is a list with the following entries:

$model

Natural mortality model options are:

"constant"

(default) estimate a single mean M shared across all ages

"age-specific"

estimate M_a independent for each age

"weight-at-age"

specifies M as a function of weight-at-age, \(M_y,a = exp(b0 + b1*log(W_y,a))\), as in Lorenzen (1996) and Miller & Hyun (2018).

$re

Time- and age-varying (random effects) on M. Note that random effects can only be estimated if mean M-at-age parameters are ($est_ages is not NULL).

"none"

(default) M constant in time and across ages.

"iid"

M varies by year and age, but uncorrelated.

"ar1_a"

M correlated by age (AR1), constant in time.

"ar1_y"

M correlated by year (AR1), constant all ages.

"2dar1"

M correlated by year and age (2D AR1), as in Cadigan (2016).

$initial_means

Initial/mean M-at-age vector, with length = number of ages (if $model = "age-specific") or 1 (if $model = "weight-at-age" or "constant"). If NULL, initial mean M-at-age values are taken from the first row of the MAA matrix in the ASAP data file.

$est_ages

Vector of ages to estimate M (others will be fixed at initial values). E.g. in a model with 6 ages, $est_ages = 5:6 will estimate M for the 5th and 6th ages, and fix M for ages 1-4. If NULL, M at all ages is fixed at M$initial_means (if not NULL) or row 1 of the MAA matrix from the ASAP file (if M$initial_means = NULL).

$logb_prior

(Only if $model = "weight-at-age") TRUE or FALSE (default), should a N(0.305, 0.08) prior be used on log_b? Based on Fig. 1 and Table 1 (marine fish) in Lorenzen (1996).

NAA_re specifies options for random effects on numbers-at-age (NAA, i.e. state-space model or not) If NULL, a traditional statistical catch-at-age model is fit (NAA = pred_NAA for all ages, deterministic). To fit a state-space model, specify NAA_re, a list with the following entries:

$sigma

Which ages allow deviations from pred_NAA? Common options are specified with the strings:

"rec"

Random effects on recruitment (deviations), all other ages deterministic

"rec+1"

"Full state space" model with 2 estimated sigma_a, one for recruitment and one shared among other ages

Alternatively, you can specify a more complex structure by entering a vector with length = n.ages, where each entry points to the NAA_sigma to use for that age. E.g. c(1,2,2,3,3,3) will estimate 3 sigma_a, with recruitment (age-1) deviations having their own sigma_R, ages 2-3 sharing sigma_2, and ages 4-6 sharing sigma_3.

$cor

Correlation structure for the NAA deviations. Options are:

"iid"

NAA deviations vary by year and age, but uncorrelated.

"ar1_a"

NAA deviations correlated by age (AR1).

"ar1_y"

NAA deviations correlated by year (AR1).

"2dar1"

NAA deviations correlated by year and age (2D AR1).

$decouple_recruitment

T/F determining whether correlation structure of recruitment is independent of RE deviations for older ages (default = FALSE). Only applicable for NAA_re$sigma = "rec+1" and correlation across ages is specified. If TRUE and NAA_re$cor = "ar1_a", only deviations for ages>1 have the correlation structure. If TRUE and NAA_re$cor is not "iid" separate year correlation parameters are estimated for recruitment and older ages.

NAA_re also can be used to configure initial numbers at age and recruitment models. The optional associated components of NAA_re are:

$N1_model

Integer determining which way to model the initial numbers at age:

0

(default) age-specific fixed effects parameters

1

2 fixed effects parameters: an initial recruitment and an instantaneous fishing mortality rate to generate an equilibruim abundance at age.

$N1_pars

if N1_model = 0, then these would be the initial values to use for abundance at age in the first year. If N1_model = 1, This would be the initial numbers in the first age class and the equilibrium fishing mortality rate generating the rest of the numbers at age in the first year.

$recruit_model

Integer determining how to model recruitment. Overrides recruit_model argument to prepare_wham_input. Must make sure NAA_re$sigma, NAA_re$cor and ecov are properly specified.

1

SCAA, estimating all recruitements as fixed effects or a random walk if NAA_re$sigma specified

2

estimating a mean recruitment with yearly recruitements as random effects

3

Beverton-Holt stock-recruitment with yearly recruitements as random effects

4

Ricker stock-recruitment with yearly recruitements as random effects

$use_steepness

T/F determining whether to use a steepness parameterization for a stock-recruit relationship. Only used if recruit_model>2

.
$recruit_pars

vector of initial parameters for recruitment model. If use_steepness=F, parameters are "alpha" and "beta" otherwise they are steepness and R0.

catchability specifies options for catchability. If NULL and asap3 is not NULL, a single catchability parameter for each index is used with initial values specified in ASAP file. If both are NULL, initial catchabilities for all indices = 0.3. Otherwise, it is a list with the following optional entries:

$re

Time-varying (random effects) for each index. Vector with length = number of indices. Each entry of catchability$re must be one of the following options:

"none"

(default) are constant

"iid"

vary by year and age/par, but uncorrelated

"ar1"

correlated by year (AR1)

$initial_q

Initial catchabilities for each index. vector length = number of indices. Will override values provided in basic_info$q. If basic_info$q and asap3 are not provided, default q values are 0.3.

$q_lower

Lower bound for catchabilities for each index. vector length = number of indices. For indices with NULL components default lower values are 0.

$q_upper

Upper bound for catchabilities for each index. vector length = number of indices. For indices with NULL components default lower values are 1000.

$prior_sd

vector of NA and standard deviations to use for gaussian prior on logit transform of catchability parameter. Length = number of indices. Indices with non-NA values will have mean logit q as a random effect with mean determined by logit transform of catchability$initial_q and sigma as standard error.

age_comp specifies the age composition models for fleet(s) and indices. If NULL, the multinomial is used because this was the only option in ASAP. The age composition models available are:

"multinomial"

Multinomial. This is the default because it was the only option in ASAP. 0 parameters.

"dir-mult"

Saturating Dirichlet-multinomial, parameterized such that effective-sample-size is a nonlinear and saturating function with respect to input-sample-size. 1 parameter. Effective sample size is estimated by the model (Candy 2008)

"dirichlet-pool0"

Dirichlet, pooling zero observations with adjacent age classes. 1. parameter. See Francis 2014 and Albertsen et al. 2016

"dirichlet-miss0"
Dirichlet, treating zero observations as missing. 1 parameter.
"logistic-normal-miss0"

Logistic normal, treating zero observations as missing. 1 parameter.

"logistic-normal-ar1-miss0"

Logistic normal, treating zero observations as missing. 1 parameter.

"logistic-normal-pool0"

Logistic normal, pooling zero observations with adjacent age classes. 1 parameter. See Schnute and Haigh (2007) and Francis (2014)

.
"logistic-normal-01-infl"

Zero-or-one inflated logistic normal. Inspired by zero-one inflated beta in Ospina and Ferrari (2012). 3 parameters. . No OSA residuals.

"logistic-normal-01-infl-2par"

Zero-one inflated logistic normal where p0 is a function of binomial sample size. 2 parameters. No OSA residuals.

"mvtweedie"

Multivariate-tweedie, where the product of composition proportions and input sample sizes follows a distribution with mean equal to the product of predicted proportions and input sample size, and other parameters define the ratio of effective to input sample size (with is bounded 0 to Inf) and the probability of zeros. 2 parameters. No OSA residuals.

"dir-mult-linear"

Linear Dirichlet-multinomial, parameterized such that effective-sample-size is a linear function with respect to input-sample-size, estimating 1 parameter, \(log(\theta)\), where the ratio of effective and input sample size is approximately \(\theta / (1+\theta)\), i.e., the logistic transformation of the estimated parameter \(log(\theta)\). (Thorson et al. 2017)

The two Dirichlet-multinomial options will only differ when input-sample-size differs among years. In these cases, the linear-Dirichlet multinomial is designed to decrease the effective sample size in each year by approximately the same proportion, while the saturating-Dirichlet multinomial will decrease the years with highest input-sample-size much more than those with lower input-sample-size. One-step-ahead residuals will be calculated for all but the last three options when do.osa=TRUE (Nielsen et al. in prep.). An age composition model needs to be specified for each fleet and index. If you would like all fleets and indices to use the same age composition likelihood, you can simply specify one of the strings above, i.e. age_comp = "logistic-normal-miss0". If you do not want the same age composition model to be used for all fleets and indices, you must specify a named list with the following entries:

$fleets

A vector of the above strings with length = the number of fleets.

$indices

A vector of the above strings with length = the number of indices.

basic_info is an optional list of information that can be used to set up the population and types of observations when there is no asap3 file given. Particularly useful for setting up an operating model to simulate population processes and observations. Also can be useful for setting up the structure of assessment model when asap3 has not been used. The current options are:

$ages

integer vector of ages (years) with the last being a plus group

$years

integer vector of years that the population model spans.

$n_fleets

number of fleets.

$agg_catch

matrix (length(years) x n_fleets) of annual aggregate catches (biomass) for each fleet.

$catch_paa

array (n_fleets x length(years) x n_ages) of each fleet's age composition data (numbers).

$catch_cv

matrix (length(years) x n_fleets) of annual CVs for each fleet's aggregate catch observations.

$catch_Neff

matrix (length(years) x n_fleets) of annual effective sample sizes for each fleet's age composition observation.

$use_catch_paa

0/1 matrix (length(years) x n_fleets) indicated whether to use each fleet's age composition observation.

$selblock_pointer_fleets

integer matrix (length(years) x n_fleets) indicated which selectivity model to use for each fleet each year. Must be consistent with options to selectivity option.

$F

matrix (length(years) x n_fleets) of annual fishing mortality rates for each fleet to initialize the model.

$n_indices

number of indices.

$agg_indices

matrix (length(years) x n_indices) of annual aggregate catches (biomass or number) for each fleet.

$index_paa

array (n_indices x length(years) x n_ages) of each index's age composition data (biomass or number).

$index_cv

matrix (length(years) x n_indices) of annual CVs for each index's aggregate observations.

$index_Neff

matrix (length(years) x n_indices) of annual effective sample sizes for each index's age composition observation.

$units_indices

1/2 matrix (length = n_indices) indicated whether indices are in biomass or numbers, respectively.

$units_index_paa

1/2 matrix (length = n_indices) indicated whether to use each index's age composition observation are in numbers or biomass.

$use_index_paa

0/1 matrix (length(years) x n_indices) indicated whether to use each index's age composition observation.

$selblock_pointer_indices

integer matrix (length(years) x n_indices) indicated which selectivity model to use for each index each year. Must be consistent with options to selectivity option.

$fracyr_indices

matrix (length(years) x n_indices) of annual proportions of the year elapsed when each index is observing the population.

$waa

array ((n_fleets + n_indices + 2) x length(years) x length(ages)) of annual weight at at age for each fleet, each index, total catch, and spawning biomass.

$maturity

matrix (length(years) x length(ages)) of annual maturity at age for estimating spawning biomass.

$fracyr_SSB

vector (1 or length(years)) of yearly proportions (0-1) of the year at which to calculate spawning biomass.

$Fbar_ages

integer vector of ages to use to average total F at age defining fully selected F for reference points. May not be clearly known until a model is fitted.

$q

vector (length(n_indices)) of catchabilities for each of the indices to initialize the model.

$percentSPR

(0-100) percentage of unfished spawning biomass per recruit for determining equilibrium fishing mortality reference point

$percentFXSPR

(0-100) percentage of SPR-based F to use in projections.

$percentFMSY

(0-100) percentage of Fmsy to use in projections.

$XSPR_input_average_years

which years to average inputs to per recruit calculation (selectivity, M, WAA, maturity) for SPR-based reference points. Default is last 5 years (tail(1:length(years),5))

$XSPR_R_avg_yrs

which years to average recruitments for calculating SPR-based SSB reference points. Default is 1:length(years)

$XSPR_R_opt

1(3): use annual R estimates(predictions) for annual SSB_XSPR, 2(4): use average R estimates(predictions). 5: use bias-corrected expected recruitment

$simulate_process_error

T/F vector (length = 5). When simulating from the model, whether to simulate any process errors for NAA, M, selectivity, Ecov, and q. Only used if applicable.

$simulate_observation_error

T/F vector (length = 3). When simulating from the model, whether to simulate catch, index, and ecov observations.

$simulate_period

T/F vector (length = 2). When simulating from the model, whether to simulate base period (model years) and projection period.

$bias_correct_process

T/F. Perform bias correction of log-normal random effects for NAA.

$bias_correct_observation

T/F. Perform bias correction of log-normal observations.

$bias_correct_BRPs

T/F. Perform bias correction of analytic SSB/R and Y/R when there is bias correction of log-normal NAA.

If other arguments to prepare_wham_input are provided such as selectivity, M, and age_comp, the information provided there must be consistent with basic_info. For example the dimensions for number of years, ages, fleets, and indices.

Examples

if (FALSE) {
asap3 = read_asap3_dat("ex1_SNEMAYT.dat")
input = prepare_wham_input(asap3)
mod = fit_wham(input)

# no ASAP3 file, default parameter values and configuration
input = prepare_wham_input()
mod = fit_wham(input, fit = FALSE)
set.seed(8675309)
simdata = mod$simulate(complete=TRUE)
input$data = simdata
fit = fit_wham(input, do.osa = FALSE)
}