A worked example showing how to transform sparse data into quantifiable uncertainty models for early-stage design decision-making.
- The Data
- 1. Interval Analysis
- 2. Probability Theory
- 3. Evidence Theory (Dempster-Shafer Theory)
- References
Consider an early-stage conceptual design of an electric vehicle system employing lithium-ion battery (LIB) electric propulsion. This encompasses various applications, including electric ground vehicles, aircraft, and maritime systems. To assess the lifecycle Global Warming Potential (GWP) of the battery system, we focus on the specific energy-based metric (
| GWP ( |
Value | Source |
|---|---|---|
| 60-93.2 | Range | (A) Abdelbaky et al.1 |
| 60-150 | Range | (B) Amarakoon et al.2 |
| 120.5-172.9 | Range | (C) André & Hajek3 |
| 170.5 | Scalar | (D) Liberacki et al.4 |
| 115 | Scalar | (E) Pollet et al.5 |
| 72.9 | Scalar | (F) Pontika et al.6 |
The discrepancy among sources introduces epistemic uncertainty, defined as uncertainty due to incomplete knowledge of the system. Crucially, the true lifecycle GWP of the future battery system is a fixed value, albeit unknown at this stage. This distinguishes it from aleatory uncertainty, which arises from inherent variability. The following sections present three approaches to quantify this epistemic uncertainty, translating the sparse literature data into formal uncertainty metrics suitable for integration into decision-making frameworks (e.g., design optimization, lifecycle assessment)7.
Interval analysis provides a fundamental approach to handling non-stochastic uncertainty. It assumes knowledge of a value's bounds — its minimum and maximum — but no information about the distribution of values within those bounds. While we know the range spanned by the data, we have no insight into how values accumulate or cluster within that interval.
We treat every data point as set
Using the script (interval_analysis.py), this results in [60.0, 172.9]
Figure 1 illustrates the resulting interval (shown in red) as the union of all individual data ranges, along with the distribution of each data point across the total span (y-axis). Interval analysis provides a robust safety envelope by introducing no additional assumptions about the data. However, this approach is inherently conservative, as it disregards any potential clustering of values within specific sub-ranges.
While Interval Analysis provides the absolute boundaries of our data, it treats all values within those boundaries as equally unknown. However, in decision-making we often assume that there is a central tendency, meaning the true value is more likely to be near the average of reported values than at the extreme edges. To model this, we can use Probability Theory. We discuss two approaches to handle uncertainty with Probability Theory.
Since we are dealing with sparse literature data, we treat each source as a separate probability distribution and aggregate them into a single Mixture Model. In this approach, we make the following assumptions for each literature source
The Mean (
The Standard Deviation (
For scalar values
We combine all
where probability_analysis_normal.py), we derive the following statistical model for the Battery GWP (Figure 2):
-
Mean (
$\mu_{normal}$ ): 114.45$kg CO_2e/kWh$
The "consensus" average of all literature. -
Standard Deviation (
$\sigma_{normal}$ ): 37.25$kg CO_2e/kWh$
A measure of how much the studies disagree. -
95% CI Lower Bound (2.5% quantile): 65.26
$kg CO_2e/kWh$
The optimistic boundary. -
95% CI Upper Bound (97.5% quantile): 179.61
$kg CO_2e/kWh$
The conservative boundary.
A large standard deviation (approximately 33% relative to the mean) is a quantitative signal of high epistemic uncertainty. It tells the designer that the literature is significantly divided on the true GWP of the battery system. Notably, the normal distribution extends beyond the physically plausible bounds implied by the raw data, assigning small but non-zero probability to values outside the original reported range. These tails reflect the model's assumption that values near — but outside — the observed extremes remain possible, though with diminishing likelihood.
While the Normal distribution assumes a central tendency (that the middle of a range is more likely), the Uniform Distribution is more conservative. It assumes that for any given study, every value within the reported range is equally likely. This approach is often preferred when we have no evidence to suggest the true value is in the center of a range rather than at the boundaries.
For each range [a,b], the Cumulative Distribution Function (CDF) is a linear ramp:
For scalar values
Using the provided Python script (probability_analysis_uniform.py), we derive the following statistical model (Figure 3):
- Mean (
$\mu_{uniform}$ ): 114.45$kg CO_2e/kWh$ - Standard Deviation (
$\sigma_{uniform}$ ): 37.47$kg CO_2e/kWh$ - 95% CI Lower (2.5%): 63.64
$kg CO_2e/kWh$ - 95% CI Upper (97.5%): 170.53
$kg CO_2e/kWh$
While the Normal distribution creates smooth S-curves, the Uniform distribution results in a piecewise linear CDF. The curve is composed of two distinct geometric features that correspond directly to our data: The linear ramps represent the ranges (sources A, B, C), with constant rate of cummulating probability across those intervals. The vertical steps represent the scalars (sources D, E, F), where each jump indicates a "point of agreement", where 1/6 (16.6%) of the total probability is concentrated at a single, precise value.
While Probability Theory forces us to distribute likelihood across a range (even if we don't know the shape), Evidence Theory (also called Dempster-Shafer theory) allows us to measure uncertainty through Belief and Plausibility. In this model, we assign a Basic Belief Assignment (BBA), denoted as
- Cumulative Belief Function (CBF - Red): This is the conservative lower bound. For a given value x, the CBF only increases if a study's entire range is below x. The Belief (Bel) (the lower bound) represents the total evidence that strictly supports a proposition. For a GWP value x, Bel(x) is the sum of masses where the entire reported interval is below x. It is our "guaranteed" certainty.
- Cumulative Plausibility Function (CPF - Blue): This is the optimistic upper bound. It represents the evidence that could be true. For a given value x, the CPF increases if even just the lowest point of a study's range is below x. The Plausibility (Pl) (the upper bound) represents the total evidence that could be true (i.e., not yet ruled out). For a GWP value x, Pl(x) is the sum of masses where at least part of the reported interval is below x.
Instead of a single CDF curve, Evidence Theory produces two bounding curves that create a Probability Box (P-Box), illustrating how belief and plausibility define a probability interval as lower and upper bounds. Using the provided Python script (evidence_theory.py), we visualize the literature data (Figure 4):
The gray shaded area between the Blue (Plausibility) and Red (Belief) lines is a direct measurement of our lack of knowledge (ignorance). Where the lines are far apart (e.g., between 75 and 120), the literature is either vague or conflicting. Where they pinch together (e.g. at the scalar points), our certainty is higher because the sources provided precise values. A risk-averse designer would look at the Belief curve (Red) to see what can be proven, while a risk-tolerant designer might look at the Plausibility curve (Blue) to see what is possible.
We notice that at approximately 115
In Evidence Theory, we don't get a single mean or standard deviation like we do in Probability Theory because the theory is designed to avoid making a single guess. Instead of a single number, we get an interval of possible means, known as the Lower Expected Value (
For your specific dataset it is the interval [99.82, 129.08]. This tells the decision-maker: "Based on the evidence, the average GWP is somewhere in this range, and our ignorance prevents us from being more precise."
| Method | Estimated Mean / Range | Key Takeaway |
|---|---|---|
| Interval | [60.0, 172.9] |
Maximum possible bounds; no assumptions about internal distribution. |
| Probability (Normal) | 114.45 ± 37.25 |
Most likely central value with normally distributed tails extending beyond observed data. |
| Probability (Uniform) | 114.45 (95% CI: 63.6–170.5) |
Equal likelihood across all reported ranges; piecewise linear CDF. |
| Evidence (D-S) | Expected value: [99.8, 129.1] |
Quantifies ignorance via Belief-Plausibility gap; consensus point at 115. |
Footnotes
-
Mohammad Abdelbaky, Lilian Schwich, João Henriques, Bernd Friedrich, Jef R. Peeters, and Wim Dewulf. Global warming potential of lithium-ion battery cell production: Determining influential primary and secondary raw material supply routes. Cleaner Logistics and Supply Chain, 9:100130, December 2023. ↩
-
Shanika Amarakoon, Jay Smith, and Brian Segal. Application of Life-Cycle Assessment to Nanoscale Technology: Lithium-ion Batteries for Electric Vehicles, April 2013. ↩
-
Nicolas André and Manfred Hajek. Robust Environmental Life Cycle Assessment of Electric VTOL Concepts for Urban Air Mobility. In AIAA Aviation 2019 Forum, Dallas, Texas, June 2019. American Institute of Aeronautics and Astronautics. ↩
-
Adam Liberacki, Barbara Trincone, Gabriella Duca, Luigi Aldieri, Concetto Paolo Vinci, and Fabio Carlucci. The Environmental Life Cycle Costs (ELCC) of Urban Air Mobility (UAM) as an input for sustainable urban mobility. Journal of Cleaner Production, 389:136009, February 2023. ↩
-
Félix Pollet, Florent Lutz, Thomas Planès, Scott Delbecq, and Marc Budinger. A generic life cycle assessment tool for overall aircraft design. Applied Energy, 399:126514, December 2025. ↩
-
Evangelia Pontika, Panagiotis Laskaridis, and Phillip J. Ansell. Technology exploration of zero-emission regional aircraft: Why, what, when and how? August 2025. ↩
-
The theoretical framework for uncertainty modeling in this work is based on: Wen Yao, Xiaoqian Chen, Wencai Luo, Michel Van Tooren, and Jian Guo. Review of uncertainty-based multidisciplinary design optimization methods for aerospace vehicles. Progress in Aerospace Sciences, 47(6):450–479, August 2011. ↩



