Skip to content
This repository was archived by the owner on Oct 4, 2022. It is now read-only.

Statistics

R Nugent III edited this page Feb 18, 2022 · 2 revisions

This section of the fda-model wiki documents the statistics library.

Convergence Criteria

Convergence criteria consists of four arguments: the minimum number of iterations, the maximum number of iterations, the number of standardized deviations away from the mean at which to test for convergence (Z value), and the tolerance (the acceptable amount of relative error).

Default Criteria

The default values of the arguments are the following:

  • Minimum iterations: 100
  • Maximum iterations: 100,000
  • Z value: 1.96
  • Tolerance: .01

Distributions

The statistics library provides the ability to construct continuous distributions to support Monte Carlo analysis. A continuous distribution provides the ability to perform four basic functions:

  • Cumulative Density Function (CDF)
  • Probability Density Function (PDF)
  • Inverse Cumulative Density Function (InverseCDF)
  • Fit

CDF

Provides the probability of a being greater than or equal to a provided value (non-exceedence probability). Produces values between 0 and 1.

PDF

Provides the probability of a being equal to a provided value, this effectively represents the slope (or derivative) of the CDF at the value provided.

InverseCDF

Provides the value produced off of the CDF that will not be equaled or exceeded for a given probability (non-exceedence probability). This is the workhorse of the Monte Carlo method.

Fit

When provided a sample dataset, the Fit method will use method of moments to calculate summary statistics of the dataset that parameterize the given distribution. This is leveraged heavily to perform an analytical bootstrap.

Deterministic

A distribution that gives a constant value for all probability range.

parameter name description notes
value the value of the distribution must be less than or equal to max

Uniform

A distribution bounded by a min and a max value

parameter name description notes
min the min of the distribution must be less than or equal to max
max the max of the distribution must be greater than or equal to min

Triangular

A distribution bounded by a min and a max with a most likely value.

parameter name description notes
min the min of the distribution must be less than or equal to max
most likely the most likely value of the distribution must be less than or equal to max, and greater than or equal to min
max the max of the distribution must be greater than or equal to min

Normal

https://www.researchgate.net/publication/46462650_A_New_Approximation_to_the_Normal_Distribution_Quantile_Function

parameter name description notes
mean the mean of the distribution n/a
standard deviation the standard deviation of the distribution warnings if standard deviation is zero.

LogNormal

returns the log of the standard normal shifted by the mean and scaled by the standard deviation. standard deviation must be greater than 0.

parameter name description notes
mean the mean of the distribution n/a
standard deviation the standard deviation of the distribution must be greater than zero

LogPearsonIII

https://agupubs.onlinelibrary.wiley.com/doi/epdf/10.1029/WR008i005p01251

parameter name description notes
mean the mean of the distribution must be less than or equal to 5
standard deviation the standard deviation of the distribution must be greater than zero and less than 3
skew the skew of the distribution must be between -3 and 3

Graphical Relationships

The statistics library contains logic to implement the estimation of uncertainty about graphical exceedance probability functions consistent with "HEC-FDA Flood Damage Reduction Analysis Technical Reference Uncertainty Estimates for Graphical (Non-Analytic) Frequency Curves CPD-72a," dated October 2014.

Graphical Functions

A graphical relationship is created using an array of exceedance probabilities, an array of flow or stage values, and an equivalent record length. At least nine exceedance probabilities that span probability space and nine flows are recommended to get good results.

The confidence limits of the uncertainty about a graphical exceedance probability function are calculated using the Less Simple Method. The Less Simple Method relies on an asymptotically Normal approximation. The variance about a stage or flow is calculated using Equation 6 of CPD-72a. The standard error can be held constant outside a minimum and maximum exceedance probability. This is the default behavior where the standard error is held constant outside exceedance probabilities (.01,.99).

Interpolation

A calculation of the uncertainty about a graphical exceedance probability function produces an array of 173 exceedance probabilities and a coinciding array of normal distributions. The 173 probabilities are those described in the HEC-FDA Version 1.4.3 Release Notes. The graphical relationship is both extrapolated and interpolated to span the probability space of the 173 probabilities. The interpolation and extrapolation takes place using standard normal deviates.

Monotonicity

The coordinates of the exceedance probability function must be strictly monotonically increasing in stages or flows with decreasing exceedance probabilities. The monotonicity must hold at the minimum and maximum of the distributions. The standard error of a non-monotonically increasing stage or flow is revised to equal the standard error of the previous (smaller) stage or flow. A strictly increasing flow or stage with the standard error held constant for the next (larger) stage or flow results in a strictly increasing minimum and maximum of the distributions.

Histograms

This statistics library has the functionality to use a histogram to record the frequency of observations. A histogram can be constructed using at least a bin width. A minimum and convergence criteria can also be used in the construction of a histogram. The histogram can be thought of as an empirical distribution, and has the same functionality of the other distributions in this statistics library.

Methods

The critical methods of a histogram --- inverse CDF and CDF --- are based on the relative frequencies of the bin values. Interpolation between bins is used to get a more accurate result than use of relative frequency alone.

Convergence

Convergence works the same way in fda-statistics as in HEC-FDA Version 1.4.x except that convergence is tested at tail values rather than the mean. Please see the HEC-FDA User's Manual Appendix G for more details on how convergence works. This approach uses the same empirical measure of variance as described in the technical reference manual, equation 6. Please see the HEC-FDA Technical Reference, Section 2.1.5 for more information.

Threadsafe Histogram

The threadsafe histogram contains logic that makes parallel computing safe - reducing the chance of running into a race condition.

Clone this wiki locally