Skip to content
AI KIMOTO edited this page Apr 24, 2020 · 6 revisions

SAM (State-space assessment model)


Current Entry Version: 0.1.0 (Date: 2017-02-20)

First Entry Version: 0.1.0 (Date: 2017-02-20)

First ICCAT Reference

Catalogue Committee

Proposer: George Tserpes

Seconder: Abdelouahed Ben Mhamed

Reviewers:

ICCAT Secretariat

PACKAGE

Download

USER'S GUIDE

The

EXAMPLES & TUTORIALS

SOURCE CODE

UNIT TESTS


DETAILS

Contact details

Anders Nielsen DTU-Aqua an@aqua.dtu.dk

Licence

Installation needs and operating system

All sources available on github. Runs on all common platforms (Linux, Mac, windows)

Description of programs

Language

R, C++, via the R-package TMB.

Compiler needs/stand alone

Depends on the TMB R-package, but all tools are free and open source.

Description

Bootstrap sampling, subsampling, and the jackknife all rely on estimating the variance of a statistic by using the variability between resamples rather than using statistical distributions. This vignette will cover the jackknife and how it is applied to the resampling distribution to generate variance estimates for random forest predictions.

Ordinary Jackknife

The ordinary jackknife is a resampling method useful for estimating the variance or bias of a statistic. The jackknife estimate of a statistic can be found be by repeatedly calculating the statistic, each time leaving one observation from the sample out and averaging all estimates. The variance of the estimate can be found by calculating the variance of the jackknifed estimates:

\begin{equation} \hat{V}{J} = \frac{n-1}{n}\sum\limits{i=1}^{n} \left(\hat{\theta}{(-i)} - \hat{\theta}{(\cdot)}\right)^2 \end{equation}

To fit the state-space assessment model to data and make it easy to produce common outputs from the model.

The basic state-space assessment model (SAM) is described in Nielsen & Berg (2014). The model has been continuously developed and adapted for different stocks (e.g.~to include tagging data and biomass indices). The current implementation (https://github.com/fishfollower/SAM) is now an R-package based on Template Model Builder (TMB) (Kristensen et al. 2016).

\subsubsection*{Model} The model is a state--space model.The states $\alpha$ are the log-transformed stock sizes $\log{N_1},\ldots,\log{N_A}$ and fishing mortalities $\log{F_1},\ldots,\log{F_{A}}$ corresponding to total age specific catches. It is often assumed that some age classes have the same fishing mortality. In any given year $y$ the state is the combined vector $\alpha_y$ $=$ $(\log{N_1},\ldots,\log{N_A},$ $ \log{F_1},\ldots,\log{F_{A}})'$. The transition equation describes the distribution of the next years state from a given state in the current year. The following is assumed: \begin{align*} \alpha_{y}=T(\alpha_{y-1})+\eta_y \end{align*}
The transition function $T$ is where the stock equation and assumptions about stock--recruitment enters the model. The equations are:
\begin{align*} \log N_{1,y}&=SR(\log(N_{,y-1}))&\ \log N_{a,y}&=\log N_{a-1,y-1} - F_{a-1,y-1} - M_{a-1,y-1}\ , \quad &2\leq a < A \ \log N_{A,y}&=\log( N_{A-1,y-1}\exp^{- F_{A-1,y-1} - M_{A-1},y-1} + N_{A,y-1}\exp^{- F_{A,y-1} - M_{A,y-1}})&\ \log F_{a,y}&=\log F_{a,y-1}\ , \quad &1\leq a \leq A \end{align*}
Here $M_{a,y}$ is the age and year specific natural mortality parameter, which is assumed known from outside sources. $F_{a,y}$ is the total fishing mortality. The function `$SR$' is the stock-recruitment relationship assumed.

The prediction noise $\eta$ is assumed to be Gaussian with zero mean, and separate variance parameters. Typically one for recruitment ($\sigma^2_{N_{a=1}}$), one for survival ($\sigma^2_{N_{a&gt;1}}$), one for fishing mortality at age ($\sigma^2_{F}$), but these can be flexibily configured. The $N$-part of $\eta$ is assumed uncorrelated, and the $F$-part can be assumed correlated according to an ar(1) correlation structure, such that $\mbox{cor}(\Delta\log(F_{a,y}),\Delta\log(F_{\tilde{a},y}))=\rho^{|a-\tilde{a}|}$.

The observation part of the state--space model describes the distribution of the observations for a given state $\alpha_y$. Here the vector of all observations from a given year $y$ is denoted $x_y$. The elements of $x_y$ are age-specific log-catches $\log C_{a,y}$ and age-specific log-indices from scientific surveys $\log I^{(s)}{a,y}$. The combined observation equation is: \begin{align*} x_y=O(\alpha_y)+\varepsilon_y \end{align*} The observation function $O$ consists of the catch equations for total catches and scientific surveys. The measurement noise term $\varepsilon_y$ is assumed to be Gaussian. An expanded view of the observation equation becomes: \begin{align*} \log C{a,y} &= \log\left(\frac{F_{a,y}}{Z_{a,y}}(1-e^{-Z_{a,y}})N_{a,y}\right)+\varepsilon^{(c)}{a,y}\ \log I^{(s)}{a,y} &= \log\left(Q^{(s)}a e^{-Z{a,y}\frac{D^{(s)}}{365}}N_{a,y}\right)+\varepsilon^{(s)}{a,y} \end{align*}
Here $Z$ is the total mortality rate $Z
{a,y}=M_{a,y}+F_{a,y}$, $D^{(s)}$ is the number of days into the year where the survey $s$ is conducted, $Q^{(s)}_a$ are model parameters describing catchability coefficients. The variance of $\varepsilon_y$ is setup such that each data source catches, and the four scientific surveys have their own covariance matrix.

Observation uncertainty is important e.g.~to get the relative weighting of the different information sources correct, so a lot of effort has been invested in getting the optimal options into SAM. In Berg and Nielsen (2016) different covariance structures are compared for four ICES stocks. It was found that irregular lattice AR(1) observation correlation structure was optimal for surveys. The covariance structures tested were inspired by a previous study (Berg et al.~2014) of the structures obtained from survey calculations. In the paper Albertsen et al. (2016) 13 different observational likelihood formulations were evaluated for four ICES stocks. It was found that the multivariate log-normal representation was among the optimal in all four cases.

To describe the options available consider a yearly vector $C_y=(C_{a=1,y},\ldots,C_{a=A,y})$ of age specific observations from a fleet (survey or commercial). Assume first that the $\log(C_y)$ is multivariate Gaussian: [ \log(C_y) \sim N(\log(\widehat{C}y), \Sigma) ]
where $\Sigma$ is the covariance matrix, and $\hat{C}y$ is the vector of the usual model predictions. The covariance matrix is specified from a vector of standard deviations $\sigma=(\sigma_1\ldots\sigma_A)$ and a correlation matrix $\rho$ (by $\Sigma{a\tilde{a}}=\sigma_a\sigma
{\tilde{a}}\rho_{a\tilde{a}}$). Four options are available for the correlation $\rho$: Independent ($\rho=I$), auto-regressive of order 1 ($\rho_{a\tilde{a}}=0.5^{\theta|a-\tilde{a}|}, \ \theta&gt;0$)\footnote[$\star$]{This parametrization is equvivalent to the more common $\phi^{|a-\tilde{a}|}$, where $0&lt;\phi&lt;1$}, irregular auto-regressive of order 1 ($\rho_{a\tilde{a}}=0.5^{|\theta_a-\theta_{\tilde{a}}|}, \ \theta_1=0\leq\theta_2\leq\cdots\leq\theta_A$), and unstructured (parameterized by the Cholesky of $\rho$). The options for covariance structure can be set for each fleet individually. In addition it is also possible to supply external weights for each individual observation. This option can be used in two ways. To set the relative weighting, or to actually set the fixed variance of each individual observation.

\subsubsection*{Likelihood and approximation}

The likelihood function for this is set up by first defining the joint likelihood of both random effects (here collected in the $\alpha_y$ states), and the observations (here collected in the $x_y$ vectors). The joint likelihood is: \begin{align*} L(\theta,\alpha,x)=\prod_{y=2}^Y{ \phi(\alpha_y-T(\alpha_{y-1}),\Sigma_\eta)} \prod_{y=1}^Y{ \phi(x_y-O(\alpha_{y}),\Sigma_\varepsilon)} \end{align*} Here $\theta$ is a vector of model parameters. Since the random effects $\alpha$ are not observed inference should be obtain from the marginal likelihood: \begin{align*} L_M(\theta,x)=\int L(\theta,\alpha,x) d\alpha \end{align*} This integral is difficult to calculate directly, so the Laplace approximation is used. The Laplace approximation is derived by first approximating the joint log likelihood $\ell(\theta,\alpha,x)$ by a second order Taylor approximation around the optimum $\hat{\alpha}$ w.r.t.~$\alpha$. The resulting approximated joint log likelihood can then be integrated by recognizing it as a constant term and a term where the integral is know as the normalizing constant from a multivariate Gaussian. The approximation becomes: \begin{align*} \int L(\theta,\alpha,x)d\alpha \approx \sqrt{\frac{(2\pi)^{n}}{\det(-\ell_{\alpha\alpha}''(\theta,\alpha,x)|{\alpha=\hat{\alpha}\theta})}}\exp(\ell(\theta,\hat{\alpha}\theta,x)) \end{align*} Taking the logarithm gives the Laplace approximation of the marginal log likelihood
\begin{align*} \ell_M(\theta,x) = \ell(\theta,\hat{u}
\theta,x)-\frac{1}{2}\log(\det(-\ell_{uu} ''(\theta,u,x)|{u=\hat{u}\theta}))+\frac{n}{2}\log(2\pi) \end{align*}

\subsection*{Required input}

The model use catches at age $C_{a,y}$ and age-specific indices scientific surveys $(I^{(s)}{a,y})$. In addition to the observations on catches and surveys a set of biological parameters are needed, which are: Mean weight in stock $W^{(s)}{a,y}$, mean weight in catch $W^{(c)}{a,y}$, proportion mature $P{a,y}$, and an estimate of natural mortality $M_{a,y}$.

\subsection*{Program outputs}

The SAM package produce tables and figures of all standard assessment output (N, F, SSB, TSB, agerage F, and forecasts).

Data requirements

Program outputs

Diagnostics

History of method and peer review

Validation by programmer

Tests by others

Required input

The model use catches at age $C_{a,y}$ and age-specific indices scientific surveys $(I^{(s)}{a,y})$. In addition to the observations on catches and surveys a set of biological parameters are needed, which are: Mean weight in stock $W^{(s)}{a,y}$, mean weight in catch $W^{(c)}{a,y}$, proportion mature $P{a,y}$, and an estimate of natural mortality $M_{a,y}$.

Program outputs

The SAM package produce tables and figures of all standard assessment output (N, F, SSB, TSB, agerage F, and forecasts).

Diagnostics

The residual calculation procedure in state-space assessment models can be difficult, but is extremely important when evaluating the assumed covariance structure. The standard practice of calculating the residuals (as observed' minus predicted' divided by an estimate of the standard deviation) is strictly only valid for models with purely independent observations. It is not valid for state-space models, where an underlying unobserved process is introducing a correlation structure in the (marginal) distribution of the observations. It is also not valid if the observations are directly assumed to be correlated (e.g. multivariate normal, or multinomial for age compositions). The problem is that the resulting residuals will not become independent.

To get independent residuals the so-called `one-observation-ahead' residuals are computed. The residual for the $n$'th observation is computed by using the first $n-1$ observations to predict the $n$'th. Details can be found in Thygesen et. al. (2017).

Also standard diagnosis like leave-fleet-out and retrospective runs are produced.

Other features

Many convenient features for additional plotting, Jitter analysis (for validating convergence), and build-in simulation from model function.

History of method peer review

The SAM model has been validated by the ICES method working group. It is used as primary assessment model for more than 25 ICES stocks, and as such validated by each expert working group. The model participated and was simulation tested by the big model comparison initiative in Boston in 2012. The paper about the model is peer reviewed.

Steps taken by programmer for validation

Simulation studies are built into the model, so routinely used to validate. The Laplace approximation is validated by parallel implementations as Unscented Kalman Filter and via MCMC.

Tests conducted by others

Validation by comparing to other standard models (XSA, ADAPT, SCAA) done in ICES expert groups and in method groups.

NOTES

ICCAT

AUTHOR

Anders Nielsen

Contact details

DTU-Aqua an@aqua.dtu.dk

References

Albertsen, C.M., A. Nielsen, U.H. Thygesen 2016. Choosing the observational likelihood in state-space stock assessment models. Canadian Journal of Fisheries and Aquatic Sciences.

Berg, C.W., A. Nielsen 2016. Accounting for correlated observations in an age-based state-space stock assessment model. ICES Journal of Marine Science: Journal du Conseil 73 (7), 1788-1797

Berg C. W., A. Nielsen, K. Kristensen 2014. Evaluation of alternative age-based methods for estimating relative abundance from survey data in relation to assessment models. Fisheries Research , 151: 91-99.

Kristensen, K, A. Nielsen, C.W. Berg, H.J. Skaug, B. Bell. 2016. TMB: Automatic differentiation and laplace approximation. Journal of Statistical Software 70 (5), 1-21

Nielsen, A. and C.W. Berg 2014. Estimation of time-varying selectivity in stock assessments using state-space models. Fisheries Research 158, 96-101

Thygesen, U.H., C.M. Albertsen, C.W. Berg, K. Kristensen, and A. Nielsen 2017. Validation of state space models fitted as mixed effects models. (Accepted. EES).

Clone this wiki locally