R/prepare_wham_input.R
prepare_wham_input.Rd
After the data file is read into R by read_asap3_dat
, this function
prepares the data and parameter settings for fit_wham
.
By default, this will set up a SCAA version like ASAP.
As of version 1.0.5, if an asap3 object is not provided, a dummy input is generated with some arbitrary
assumptions. The various options described below, such as NAA_re
and selectivity
,
can still be used without the asap3 object.
prepare_wham_input(
asap3 = NULL,
model_name = "WHAM for unnamed stock",
recruit_model = 2,
ecov = NULL,
selectivity = NULL,
M = NULL,
NAA_re = NULL,
catchability = NULL,
age_comp = NULL,
basic_info = NULL
)
(optional) list containing data and parameters (output from read_asap3_dat
)
character, name of stock/model
numeric, option to specify stock-recruit model (see details)
(optional) named list of environmental covariate data and parameters (see details)
(optional) list specifying selectivity options by block: models, initial values, parameters to fix, and random effects (see details)
(optional) list specifying natural mortality options: model, random effects, initial values, and parameters to fix (see details)
(optional) list specifying options for random effect on numbers-at-age, initial numbers at age, and recruitment model (see details)
(optional) list specifying options for priors and random effects on catchability (see details)
(optional) character or named list, specifies age composition model for fleet(s) and indices (see details)
(optional) list of basic population information for use when asap3 is not provided (see details)
a named list with the following components:
data
Named list of data, passed to TMB::MakeADFun
par
Named list of parameters, passed to TMB::MakeADFun
map
Named list defining how to optionally collect and fix parameters, passed to TMB::MakeADFun
random
Character vector of parameters to treat as random effects, passed to TMB::MakeADFun
years
Numeric vector of years to fit WHAM model (specified in ASAP3 .dat file)
ages.lab
Character vector of age labels, ending with plus-group (specified in ASAP3 .dat file)
model_name
Character, name of stock/model (specified in call to prepare_wham_input
)
recruit_model
specifies the stock-recruit model. See wham.cpp
for implementation.
SCAA (without NAA_re option specified) or Random walk (if NAA_re$sigma specified), i.e. predicted recruitment in year i = recruitment in year i-1
(default) Random about mean, i.e. steepness = 1
Beverton-Holt
Ricker
Note: we allow fitting a SCAA model (NAA_re = NULL
), which estimates recruitment in every year as separate fixed effect parameters,
but in that case no stock-recruit function is estimated. A warning message is printed if recruit_model > 2
and NAA_re = NULL
.
If you wish to use a stock-recruit function for expected recruitment, choose recruitment deviations as random effects,
either only age-1 (NAA_re = list(sigma="rec")
) or all ages (NAA_re = list(sigma="rec+1")
, "full state-space" model).
See below for details on NAA_re
specification.
ecov
specifies any environmental covariate data and model. Environmental covariate data need not span
the same years as the fisheries data. It can be NULL
if no environmental data are to be fit.
Otherwise, it must be a named list with the following components:
Name(s) of the environmental covariate(s). Used in printing.
Mean observations (matrix). number of years x number of covariates. Missing values = NA.
Configure observation standard errors. Options:
$mean
Specified values for each time step
Specified value the same for all time steps
"est_1"
: Estimated, one value shared among time steps.
"est_re"
: Estimated value for each time step as random effects with two parameters (mean, var)
First is the matrix of log standard errors or the vector of single values for each covariate as above.
Second is a character vector of estimation options (NA
, "est_1"
,"est_re"
) for each covariate.
For covariates with non-NA values, values in the first element are ignored.
Years corresponding to observations (vector of same length as $mean
and $logsigma
)
T/F (or 0/1) vector/matrix of the same dimension as $mean
and $logsigma
.
Use the observation? Can be used to ignore subsets of the ecov without changing data files.
integer vector of offsets between the ecov observation/process and their affect on the stock.
I.e. if SST in year t affects recruitment in year t + 1, set lag = 1
. May also be a list (length=n_Ecov) of vectors (length = 2+n_indices) if multiple effects of one or more Ecovs are modeled.
Process model for the ecov time-series. "rw"
= random walk, "ar1"
= 1st order autoregressive, NA
= do not fit
Character vector for where each ecov affects the population. "recruit"
= recruitment,
"M"
= natural mortality, "q"
= catchability for indices, "none"
= fit ecov process model(s) but without an effect on the
population. May also be a list (element for each ecov) of character vectors ("none", "recruit", "M", and/or "q") so each ecov can multiple effects.
indices that each ecov affects. Must be a list (length = n_Ecov), where each element is a vector of indices (1:n_indices). Must be provided when any of where
= "q"
vector of (orthogonal polynomial order) models for effects of each ecov on the $where
process. Options: "none", "linear" (default) or "poly-x"
where x = 2, ... (e.g. "poly-2" specifies a quadratic model, \(b_0 + b_1*ecov + b_2*ecov^2 + ...\)). Or a list (length = n_Ecov) of character vectors (same options) for modeling
effects on (1) recruitment, (2) M, and catchabilities for (3) index 1,..., (2+n_indices) index n_indices (length of the vector is 2 + n_indices).
Ages that each ecov affects. Must be a list of length n.ecov, where each element is a vector of ages. Default = list(1:n.ages) to affect all ages.
How does the ecov affect the $where
process? An integer vector (length = n_Ecov). If corresponding $where
= "none", then this is ignored.
These options are specific to the $where
process.
none (but fit ecov time-series model in order to compare AIC)
"controlling" (dens-indep mortality)
"limiting" (carrying capacity, e.g. ecov determines amount of suitable habitat)
"lethal" (threshold, i.e. R --> 0 at some ecov value)
"masking" (metabolic/growth, decreases dR/dS)
"directive" (e.g. behavioral)
none (but fit ecov time-series model in order to compare AIC)
effect on mean M (shared across ages)
none (but fit ecov time-series model in order to compare AIC)
effect on one or more catchabilities (see $indices)
)
selectivity
specifies options for selectivity, to overwrite existing options specified in the ASAP data file.
If NULL
, selectivity options from the ASAP data file are used. selectivity
is a list with the following entries:
Selectivity model for each block. Vector with length = number of selectivity blocks. Each entry must be one of: "age-specific", "logistic", "double-logistic", or "decreasing-logistic".
Time-varying (random effects) for each block. Vector with length = number of selectivity blocks.
If NULL
, selectivity parameters in all blocks are constant over time and uncorrelated.
Each entry of selectivity$re
must be one of the following options, where the selectivity parameters are:
(default) are constant and uncorrelated
vary by year and age/par, but uncorrelated
correlated by age/par (AR1), but not year
correlated by year (AR1), but not age/par
correlated by year and age/par (2D AR1)
Initial parameter values for each block. List of length = number of selectivity blocks. Each entry must be a vector of length # parameters in the block, i.e. c(2,0.2)
for logistic or c(0.5,0.5,0.5,1,1,0.5)
for age-specific with 6 ages. Default is to set at middle of parameter range. This is 0.5 for age-specific and n.ages/2 for logistic, double-logistic, and decreasing-logistic.
Which parameters to fix at initial values. List of length = number of selectivity blocks. E.g. model with 3 age-specific blocks and 6 ages, list(c(4,5),4,c(2,3,4))
will fix ages 4 and 5 in block 1, age 4 in block 2, and ages 2, 3, and 4 in block 3.
How many selectivity blocks. Optional. If unspecified and no asap3 object, then this is set to the number of fleets + indices. If specified, ensure other components of selectivity
are consistent.
M
specifies estimation options for natural mortality and can overwrite M-at-age values specified in the ASAP data file.
If NULL
, the M-at-age matrix from the ASAP data file is used (M fixed, not estimated). To estimate M-at-age
shared/mirrored among some but not all ages, modify input$map$M_a
after calling prepare_wham_input
(see vignette for more details). M
is a list with the following entries:
Natural mortality model options are:
(default) estimate a single mean M shared across all ages
estimate M_a independent for each age
specifies M as a function of weight-at-age, \(M_y,a = exp(b0 + b1*log(W_y,a))\), as in Lorenzen (1996) and Miller & Hyun (2018).
Time- and age-varying (random effects) on M. Note that random effects can only be estimated if
mean M-at-age parameters are ($est_ages
is not NULL
).
(default) M constant in time and across ages.
M varies by year and age, but uncorrelated.
M correlated by age (AR1), constant in time.
M correlated by year (AR1), constant all ages.
M correlated by year and age (2D AR1), as in Cadigan (2016).
Initial/mean M-at-age vector, with length = number of ages (if $model = "age-specific"
)
or 1 (if $model = "weight-at-age" or "constant"
). If NULL
, initial mean M-at-age values are taken
from the first row of the MAA matrix in the ASAP data file.
Vector of ages to estimate M (others will be fixed at initial values). E.g. in a model with 6 ages,
$est_ages = 5:6
will estimate M for the 5th and 6th ages, and fix M for ages 1-4. If NULL
,
M at all ages is fixed at M$initial_means
(if not NULL
) or row 1 of the MAA matrix from the ASAP file (if M$initial_means = NULL
).
(Only if $model = "weight-at-age"
) TRUE or FALSE (default), should a N(0.305, 0.08) prior be
used on log_b? Based on Fig. 1 and Table 1 (marine fish) in Lorenzen (1996).
NAA_re
specifies options for random effects on numbers-at-age (NAA, i.e. state-space model or not)
If NULL
, a traditional statistical catch-at-age model is fit (NAA = pred_NAA for all ages, deterministic).
To fit a state-space model, specify NAA_re
, a list with the following entries:
Which ages allow deviations from pred_NAA? Common options are specified with the strings:
Random effects on recruitment (deviations), all other ages deterministic
"Full state space" model with 2 estimated sigma_a
, one for recruitment and one shared among other ages
Alternatively, you can specify a more complex structure by entering a vector with length = n.ages, where each entry points to the
NAA_sigma to use for that age. E.g. c(1,2,2,3,3,3) will estimate 3 sigma_a
, with recruitment (age-1) deviations having their
own sigma_R
, ages 2-3 sharing sigma_2
, and ages 4-6 sharing sigma_3
.
Correlation structure for the NAA deviations. Options are:
NAA deviations vary by year and age, but uncorrelated.
NAA deviations correlated by age (AR1).
NAA deviations correlated by year (AR1).
NAA deviations correlated by year and age (2D AR1).
T/F determining whether correlation structure of recruitment is independent of RE deviations for older ages
(default = FALSE). Only applicable for NAA_re$sigma = "rec+1"
and correlation across ages is specified. If TRUE and NAA_re$cor = "ar1_a"
, only deviations for ages>1
have the correlation structure. If TRUE and NAA_re$cor is not "iid" separate year correlation parameters are estimated for recruitment and older
ages.
NAA_re
also can be used to configure initial numbers at age and recruitment models. The optional associated components of NAA_re
are:
Integer determining which way to model the initial numbers at age:
(default) age-specific fixed effects parameters
2 fixed effects parameters: an initial recruitment and an instantaneous fishing mortality rate to generate an equilibruim abundance at age.
if N1_model = 0, then these would be the initial values to use for abundance at age in the first year. If N1_model = 1, This would be the initial numbers in the first age class and the equilibrium fishing mortality rate generating the rest of the numbers at age in the first year.
Integer determining how to model recruitment. Overrides recruit_model
argument to prepare_wham_input
. Must make sure NAA_re$sigma
, NAA_re$cor
and ecov
are properly specified.
SCAA, estimating all recruitements as fixed effects or a random walk if NAA_re$sigma specified
estimating a mean recruitment with yearly recruitements as random effects
Beverton-Holt stock-recruitment with yearly recruitements as random effects
Ricker stock-recruitment with yearly recruitements as random effects
T/F determining whether to use a steepness parameterization for a stock-recruit relationship. Only used if recruit_model>2
vector of initial parameters for recruitment model. If use_steepness=F, parameters are "alpha" and "beta" otherwise they are steepness and R0.
catchability
specifies options for catchability. If NULL
and asap3
is not NULL, a single catchability parameter for each index is used with initial values specified in ASAP file. If both are NULL, initial catchabilities for all indices = 0.3.
Otherwise, it is a list with the following optional entries:
Time-varying (random effects) for each index. Vector with length = number of indices.
Each entry of catchability$re
must be one of the following options:
(default) are constant
vary by year and age/par, but uncorrelated
correlated by year (AR1)
Initial catchabilities for each index. vector length = number of indices. Will override values provided in basic_info$q
.
If basic_info$q
and asap3
are not provided, default q values are 0.3.
Lower bound for catchabilities for each index. vector length = number of indices. For indices with NULL components default lower values are 0.
Upper bound for catchabilities for each index. vector length = number of indices. For indices with NULL components default lower values are 1000.
vector of NA and standard deviations to use for gaussian prior on logit transform of catchability parameter. Length = number of indices.
Indices with non-NA values will have mean logit q as a random effect with mean determined by logit transform of catchability$initial_q
and
sigma as standard error.
age_comp
specifies the age composition models for fleet(s) and indices.
If NULL
, the multinomial is used because this was the only option in ASAP.
The age composition models available are:
"multinomial"
Multinomial. This is the default because it was the only option in ASAP. 0 parameters.
"dir-mult"
Saturating Dirichlet-multinomial, parameterized such that effective-sample-size is a nonlinear and saturating function with respect to input-sample-size. 1 parameter. Effective sample size is estimated by the model (Candy 2008)
"dirichlet-pool0"
Dirichlet, pooling zero observations with adjacent age classes. 1. parameter. See Francis 2014 and Albertsen et al. 2016
"dirichlet-miss0"
"logistic-normal-miss0"
Logistic normal, treating zero observations as missing. 1 parameter.
"logistic-normal-ar1-miss0"
Logistic normal, treating zero observations as missing. 1 parameter.
"logistic-normal-pool0"
Logistic normal, pooling zero observations with adjacent age classes. 1 parameter. See Schnute and Haigh (2007) and Francis (2014)
"logistic-normal-01-infl"
Zero-or-one inflated logistic normal. Inspired by zero-one inflated beta in Ospina and Ferrari (2012). 3 parameters. . No OSA residuals.
"logistic-normal-01-infl-2par"
Zero-one inflated logistic normal where p0 is a function of binomial sample size. 2 parameters. No OSA residuals.
"mvtweedie"
Multivariate-tweedie, where the product of composition proportions and input sample sizes follows a distribution with mean equal to the product of predicted proportions and input sample size, and other parameters define the ratio of effective to input sample size (with is bounded 0 to Inf) and the probability of zeros. 2 parameters. No OSA residuals.
"dir-mult-linear"
Linear Dirichlet-multinomial, parameterized such that effective-sample-size is a linear function with respect to input-sample-size, estimating 1 parameter, \(log(\theta)\), where the ratio of effective and input sample size is approximately \(\theta / (1+\theta)\), i.e., the logistic transformation of the estimated parameter \(log(\theta)\). (Thorson et al. 2017)
The two Dirichlet-multinomial options will only differ when input-sample-size differs among years. In these cases, the linear-Dirichlet multinomial is designed to decrease the effective sample size in each year by approximately the same proportion, while the saturating-Dirichlet multinomial will decrease the years with highest input-sample-size much more than those with lower input-sample-size.
One-step-ahead residuals will be calculated for all but the last three options when do.osa=TRUE
(Nielsen et al. in prep.). An age composition model needs
to be specified for each fleet and index. If you would like all fleets and indices to use the same age composition likelihood, you
can simply specify one of the strings above, i.e. age_comp = "logistic-normal-miss0"
. If you do not want the same
age composition model to be used for all fleets and indices, you must specify a named list with the following entries:
A vector of the above strings with length = the number of fleets.
A vector of the above strings with length = the number of indices.
basic_info
is an optional list of information that can be used to set up the population and types of observations when there is no asap3 file given. Particularly useful for setting
up an operating model to simulate population processes and observations. Also can be useful for setting up the structure of assessment model when asap3 has not been used.
The current options are:
integer vector of ages (years) with the last being a plus group
integer vector of years that the population model spans.
number of fleets.
matrix (length(years) x n_fleets) of annual aggregate catches (biomass) for each fleet.
array (n_fleets x length(years) x n_ages) of each fleet's age composition data (numbers).
matrix (length(years) x n_fleets) of annual CVs for each fleet's aggregate catch observations.
matrix (length(years) x n_fleets) of annual effective sample sizes for each fleet's age composition observation.
0/1 matrix (length(years) x n_fleets) indicated whether to use each fleet's age composition observation.
integer matrix (length(years) x n_fleets) indicated which selectivity model to use for each fleet each year. Must be consistent with options to selectivity
option.
matrix (length(years) x n_fleets) of annual fishing mortality rates for each fleet to initialize the model.
number of indices.
matrix (length(years) x n_indices) of annual aggregate catches (biomass or number) for each fleet.
array (n_indices x length(years) x n_ages) of each index's age composition data (biomass or number).
matrix (length(years) x n_indices) of annual CVs for each index's aggregate observations.
matrix (length(years) x n_indices) of annual effective sample sizes for each index's age composition observation.
1/2 matrix (length = n_indices) indicated whether indices are in biomass or numbers, respectively.
1/2 matrix (length = n_indices) indicated whether to use each index's age composition observation are in numbers or biomass.
0/1 matrix (length(years) x n_indices) indicated whether to use each index's age composition observation.
integer matrix (length(years) x n_indices) indicated which selectivity model to use for each index each year. Must be consistent with options to selectivity
option.
matrix (length(years) x n_indices) of annual proportions of the year elapsed when each index is observing the population.
array ((n_fleets + n_indices + 2) x length(years) x length(ages)) of annual weight at at age for each fleet, each index, total catch, and spawning biomass.
matrix (length(years) x length(ages)) of annual maturity at age for estimating spawning biomass.
vector (1 or length(years)) of yearly proportions (0-1) of the year at which to calculate spawning biomass.
integer vector of ages to use to average total F at age defining fully selected F for reference points. May not be clearly known until a model is fitted.
vector (length(n_indices)) of catchabilities for each of the indices to initialize the model.
(0-100) percentage of unfished spawning biomass per recruit for determining equilibrium fishing mortality reference point
(0-100) percentage of SPR-based F to use in projections.
(0-100) percentage of Fmsy to use in projections.
which years to average inputs to per recruit calculation (selectivity, M, WAA, maturity) for SPR-based reference points. Default is last 5 years (tail(1:length(years),5))
which years to average recruitments for calculating SPR-based SSB reference points. Default is 1:length(years)
1(3): use annual R estimates(predictions) for annual SSB_XSPR, 2(4): use average R estimates(predictions). 5: use bias-corrected expected recruitment
T/F vector (length = 5). When simulating from the model, whether to simulate any process errors for NAA, M, selectivity, Ecov, and q. Only used if applicable.
T/F vector (length = 3). When simulating from the model, whether to simulate catch, index, and ecov observations.
T/F vector (length = 2). When simulating from the model, whether to simulate base period (model years) and projection period.
T/F. Perform bias correction of log-normal random effects for NAA.
T/F. Perform bias correction of log-normal observations.
T/F. Perform bias correction of analytic SSB/R and Y/R when there is bias correction of log-normal NAA.
If other arguments to prepare_wham_input
are provided such as selectivity
, M
, and age_comp
, the information provided there
must be consistent with basic_info
. For example the dimensions for number of years, ages, fleets, and indices.
if (FALSE) {
asap3 = read_asap3_dat("ex1_SNEMAYT.dat")
input = prepare_wham_input(asap3)
mod = fit_wham(input)
# no ASAP3 file, default parameter values and configuration
input = prepare_wham_input()
mod = fit_wham(input, fit = FALSE)
set.seed(8675309)
simdata = mod$simulate(complete=TRUE)
input$data = simdata
fit = fit_wham(input, do.osa = FALSE)
}