MaxentDisaggregation R-package

Installation

You can install the development version of MaxentDisaggregation from GitHub with:

# install.packages("devtools")
devtools::install_github("simschul/MaxentDisaggregation")

Note, this package is under constant development. Together with co-authors, I’m currently preparing a journal article for describing more of the background of data disaggregation and showing use cases within the field of Industrial Ecology.

Background: Uncertainty propagation involving data disaggregation

The goal of MaxentDisaggregation is an R-package that helps you with uncertainty propagation when data disaggregation is involved. Data disaggregation usually involves splitting one data point into several disaggregates using proxy data. It is a common problem in many different research disciplines.

flowchart-elk TD
    %% Define node classes
    classDef Aggregate fill:#eeeee4,color:black,stroke:none;
    classDef DisAgg1 fill:#abdbe3,color:black,stroke:none;
    classDef DisAgg2 fill:#e28743,color:black,stroke:none;
    classDef DisAgg3 fill:#abdbe3,color:black,stroke:none;

    agg("Y_0"):::Aggregate
    disagg1("Y_1=x_1 Y_0"):::DisAgg1
    disagg2("Y_2=x_2 Y_0"):::DisAgg1
    disagg3("Y_3=x_3 Y_0"):::DisAgg1
   

    %% Define connections
    agg  --> disagg1
    agg  --> disagg2
    agg  --> disagg3

Data disaggregation usually involves an aggregate flow $Y_0$, which is known, such as the total amount of steel manufactured in a given time and geography. What we do not know but are interested in are the $K$ disaggregate flows $Y_1,...,Y_K$, such as the different end-use sectors where the manufactured steel ends up. Even though we do not know the values of $Y_1, ..., Y_K$, our model structures commonly demands that the individual $Y_i$’s need to sum to the known aggregate flow $Y_0$ to respect the mass, energy, stoichiometric or economic balance of the model

$$ Y_0 = \sum_{i=1}^{K} Y_i $$

This equation, also called an accounting identity introduces dependencies/correlations between the individual disaggregate flows $Y_i$.

To get estimates for the disaggregate flows, one usually looks for proxy data. Those proxy data are used to calculate shares (ratios/fractions) of the respective disaggregate units $x_1, ..., x_K$. To allocate the entire aggregate flow without leaving any residual (thus to respect the system balance), those fractions need to sum to one:

$$ \sum_{i=1}^{K} x_i = 1 $$

Disaggregate flows are calculated as

$$ y_i = x_i y_0, \forall i \in {1,...,K}. $$

Sampling disaggregates

This package generates a random sample of disaggregates based on the information provided. The aggregate and the shares are sampled independently. The distribution from which to sample is determined internally based on the information provided by the user. This choice of distribution is mostly based on the principle of Maximum Entropy (MaxEnt).

The aggregate distribution is determined using the following decision tree:

flowchart-elk TD
    MeanDecision{{"Best guess/
    mean available?"}} -- no --> BoundsDecision1{{"Bounds available?"}}
    MeanDecision -- yes --> SDDecision{{"Standard deviation available?"}}
    SDDecision -- yes --> BoundsDecision2{{"Bounds available?"}}
    BoundsDecision2 -- yes --> GeneralBounds{{"General Bounds a,b"}}
    GeneralBounds -- "no,  $$a=0, b=\infty$$" --> LogNorm("LogNormal distribution
    or
    Truncated Normal")
    GeneralBounds -- yes --> TruncNorm("Truncated Normal 
    (Maximum Entropy distribution)")
    BoundsDecision2 -- no --> Normal("Normal distribution")
    SDDecision -- no --> LowerBound0{{"Lower bound = 0?"}}
    LowerBound0 -- yes --> Exponential("Exponential distribution")
    LowerBound0 -- no --> NotImplemented["No MaxEnt solution
    (currently not implemented)"]
    BoundsDecision1 -- yes --> Uniform("Uniform distribution on [a,b]")
    BoundsDecision1 -- no --> GoBackToStart["☠️ !Game Over!
    We suggest to rethink your problem... 🤓"]
     MeanDecision:::decision
     BoundsDecision1:::decision
     SDDecision:::decision
     BoundsDecision2:::decision
     GeneralBounds:::decision
     LogNorm:::distribution
     TruncNorm:::distribution
     Normal:::distribution
     LowerBound0:::decision
     Exponential:::distribution
     NotImplemented:::notimplementednode
     Uniform:::distribution
     GoBackToStart:::notimplementednode
    classDef decision fill:#e28743,color:black,stroke:none
    classDef distribution fill:#abdbe3,color:black,stroke:none
    classDef notimplementednode fill:#eeeee4,color:black,stroke:none

The shares are sampled from different variants of the Dirichlet distribution:

flowchart-elk TD
    %% Define node classes
    classDef decision fill:#e28743,color:black,stroke:none;
    classDef distribution fill:#abdbe3,color:black,stroke:none;
    classDef explanationnode fill:#eeeee4,color:black,stroke:none;

    MeanDecision{{"Best guess/mean available?"}}:::decision
    SDDecision{{"Standard deviation available?"}}:::decision
    MaxEntDir("Maximum Entropy Dirichlet"):::distribution
    GenDir("Generalised Dirichlet"):::distribution
    NestedDir("Nested Dirichlet"):::distribution
    UniformDir("Uniform Dirichlet"):::distribution
    

    %% Define connections
    MeanDecision -- "no" --> UniformDir
    MeanDecision -- "yes" --> SDDecision
    MeanDecision -- "paritially" --> NestedDir
    SDDecision -- "no" --> MaxEntDir
    SDDecision -- "yes" --> GenDir
    SDDecision -- "partially" --> NestedDir

How to use

Sampling disaggregates

The main function is rdisagg which creates a random sample of disaggregates based on the information provided:

library(MaxentDisaggregation)
#> Loading required package: truncnorm
#> Loading required package: nloptr
#> Loading required package: gtools
#> Loading required package: data.table
#> 
#> Attaching package: 'MaxentDisaggregation'
#> The following object is masked from 'package:gtools':
#> 
#>     rdirichlet
sample <- rdisagg(n = 1000, mean_0 = 100, sd_0 = 5, min = 0, shares = c(0.1, 0.3, 0.6))
head(sample)
#>           [,1]     [,2]     [,3]
#> [1,] 25.542248 15.44779 54.45482
#> [2,]  2.194530 25.33249 79.60015
#> [3,]  1.481192 16.74706 82.64684
#> [4,] 19.181736 43.81459 45.58907
#> [5,] 18.558886 49.36833 36.33028
#> [6,]  2.633181 38.11875 60.79836

We can plot the marginal histograms of the sample:

hist(sample[,1])

hist(sample[,2])

hist(sample[,3])

The samples are consistent with all information provided. Thus, summing the disaggregate samples should provide an aggregate sample consistent with the information provided (mean: 100, sd: 5):

sample_agg <- rowSums(sample)
hist(sample_agg)

And indeed:

cat('Mean: ', mean(sample_agg), '\n')
#> Mean:  99.91481
cat('SD: ', sd(sample_agg))
#> SD:  5.083025

Sampling aggregates and shares seperatedely

With MaxentDisaggregation you can also sample the aggregate and the shares independently using the ragg and rshares functions:

sample_agg <- ragg(1000, mean = 100, sd = 5)
hist(sample_agg)

sample_shares <- rshares(1000, shares = c(0.1, 0.3, 0.6))
boxplot(sample_shares)

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
R		R
images		images
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
MaxentDisaggregation.Rproj		MaxentDisaggregation.Rproj
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

MaxentDisaggregation R-package

Installation

Background: Uncertainty propagation involving data disaggregation

Sampling disaggregates

How to use

Sampling disaggregates

Sampling aggregates and shares seperatedely

About

Licenses found

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

Licenses found

simschul/MaxentDisaggregation

Folders and files

Latest commit

History

Repository files navigation

MaxentDisaggregation R-package

Installation

Background: Uncertainty propagation involving data disaggregation

Sampling disaggregates

How to use

Sampling disaggregates

Sampling aggregates and shares seperatedely

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages