-
Notifications
You must be signed in to change notification settings - Fork 0
Statistics
This section of the fda-model wiki documents the statistics library.
Convergence criteria consists of four arguments: the minimum number of iterations, the maximum number of iterations, the number of standardized deviations away from the mean at which to test for convergence (Z value), and the tolerance (the acceptable amount of relative error).
The default values of the arguments are the following:
- Minimum iterations: 100
- Maximum iterations: 100,000
- Z value: 1.96
- Tolerance: .01
The statistics library provides the ability to construct continuous distributions to support Monte Carlo analysis. A continuous distribution provides the ability to perform four basic functions:
- Cumulative Density Function (CDF)
- Probability Density Function (PDF)
- Inverse Cumulative Density Function (InverseCDF)
- Fit
Provides the probability of a being greater than or equal to a provided value (non-exceedence probability). Produces values between 0 and 1.
Provides the probability of a being equal to a provided value, this effectively represents the slope (or derivative) of the CDF at the value provided.
Provides the value produced off of the CDF that will not be equaled or exceeded for a given probability (non-exceedence probability). This is the workhorse of the Monte Carlo method.
When provided a sample dataset, the Fit method will use method of moments to calculate summary statistics of the dataset that parameterize the given distribution. This is leveraged heavily to perform an analytical bootstrap.
A distribution that gives a constant value for all probability range.
| parameter name | description | notes |
|---|---|---|
| value | the value of the distribution | must be less than or equal to max |
A distribution bounded by a min and a max value
| parameter name | description | notes |
|---|---|---|
| min | the min of the distribution | must be less than or equal to max |
| max | the max of the distribution | must be greater than or equal to min |
A distribution bounded by a min and a max with a most likely value.
| parameter name | description | notes |
|---|---|---|
| min | the min of the distribution | must be less than or equal to max |
| most likely | the most likely value of the distribution | must be less than or equal to max, and greater than or equal to min |
| max | the max of the distribution | must be greater than or equal to min |
| parameter name | description | notes |
|---|---|---|
| mean | the mean of the distribution | n/a |
| standard deviation | the standard deviation of the distribution | warnings if standard deviation is zero. |
returns the log of the standard normal shifted by the mean and scaled by the standard deviation. standard deviation must be greater than 0.
| parameter name | description | notes |
|---|---|---|
| mean | the mean of the distribution | n/a |
| standard deviation | the standard deviation of the distribution | must be greater than zero |
https://agupubs.onlinelibrary.wiley.com/doi/epdf/10.1029/WR008i005p01251
| parameter name | description | notes |
|---|---|---|
| mean | the mean of the distribution | must be less than or equal to 5 |
| standard deviation | the standard deviation of the distribution | must be greater than zero and less than 3 |
| skew | the skew of the distribution | must be between -3 and 3 |
The statistics library contains logic to implement the estimation of uncertainty about graphical exceedance probability functions consistent with "HEC-FDA Flood Damage Reduction Analysis Technical Reference Uncertainty Estimates for Graphical (Non-Analytic) Frequency Curves CPD-72a," dated October 2014.
A graphical relationship is created using an array of exceedance probabilities, an array of flow or stage values, and an equivalent record length. At least nine exceedance probabilities that span probability space and nine flows are recommended to get good results.
The confidence limits of the uncertainty about a graphical exceedance probability function are calculated using the Less Simple Method. The Less Simple Method relies on an asymptotically Normal approximation. The variance about a stage or flow is calculated using Equation 6 of CPD-72a. The standard error can be held constant outside a minimum and maximum exceedance probability. This is the default behavior where the standard error is held constant outside exceedance probabilities (.01,.99).
A calculation of the uncertainty about a graphical exceedance probability function produces an array of 173 exceedance probabilities and a coinciding array of normal distributions. The 173 probabilities are those described in the HEC-FDA Version 1.4.3 Release Notes. The graphical relationship is both extrapolated and interpolated to span the probability space of the 173 probabilities. The interpolation and extrapolation takes place using standard normal deviates.
The coordinates of the exceedance probability function must be strictly monotonically increasing in stages or flows with decreasing exceedance probabilities. The monotonicity must hold at the minimum and maximum of the distributions. The standard error of a non-monotonically increasing stage or flow is revised to equal the standard error of the previous (smaller) stage or flow. A strictly increasing flow or stage with the standard error held constant for the next (larger) stage or flow results in a strictly increasing minimum and maximum of the distributions.
This statistics library has the functionality to use a histogram to record the frequency of observations. A histogram can be constructed using at least a bin width. A minimum and convergence criteria can also be used in the construction of a histogram. The histogram can be thought of as an empirical distribution, and has the same functionality of the other distributions in this statistics library.
The critical methods of a histogram --- inverse CDF and CDF --- are based on the relative frequencies of the bin values. Interpolation between bins is used to get a more accurate result than use of relative frequency alone.
Convergence works the same way in fda-statistics as in HEC-FDA Version 1.4.x except that convergence is tested at tail values rather than the mean. Please see the HEC-FDA User's Manual Appendix G for more details on how convergence works. This approach uses the same empirical measure of variance as described in the technical reference manual, equation 6. Please see the HEC-FDA Technical Reference, Section 2.1.5 for more information.
The threadsafe histogram contains logic that makes parallel computing safe - reducing the chance of running into a race condition.