diff --git a/python/notebooks/us-1994-census.ipynb b/python/notebooks/us-1994-census.ipynb
index 6d4accb..52adc05 100644
--- a/python/notebooks/us-1994-census.ipynb
+++ b/python/notebooks/us-1994-census.ipynb
@@ -21,24 +21,19 @@
"source": [
"# Introduction\n",
"\n",
- "Information and its theory are an important quantity that govern many fields, rangning from communication to machine learning. In this tutorial, we are going to demonstate a novey approach of decomposing information into parts. Each part describes a certain aspect of the relationship of the variables.\n",
+ "This tutorial shows an application of the BROJA information decomposition on the **[US Census 1994][us-census]** income data set. The task is to relate a list of attributes or predictors with a binary target variable. The attributes include: sex, age, race, education level, occupation, hours-per-week, etc. The target is the yearly income, with values $>50$K and $\\leq50$K. \n",
"\n",
"\n",
- "We start the turotial with dataset description and some preprocessing steps ([Section 2](#Data-Preparation)). Then, we define some auxiliary functions for visualizing results. Then, we talk about the foundation of information theory ([Section 3](#Basic-Information-Theory)) and Information Decomposition ([Section 4](#Information-Decomposition)). Thorough this tutorial, we provide code that implement or compute quantities of current interests."
+ "This tutorial is written by [Pattarawat Chormai](https://pat.chormai.org) and licensed under [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).\n",
+ "\n",
+ "[us-census]: https://archive.ics.uci.edu/ml/datasets/adult"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Data Preparation\n",
- "\n",
- "In this tutorial, we use **[US Census 1994][us-census]**, a publily available dataset, to demonstate the content of this turorial. The dataset contains individual's attributes, such as race, age, gender, and the level of income ( <= X, > X ). Hence, ML learners use it to train a classifier for predicting the level of income based on other attributes.\n",
- "\n",
- "\n",
- "Here, we are interested in explaing the relationship between these attributes and the income variable; therefore, we only use the training set.\n",
- "\n",
- "[us-census]: https://archive.ics.uci.edu/ml/datasets/adult"
+ "# Data Preparation"
]
},
{
@@ -130,7 +125,7 @@
" \n",
"
\n",
" \n",
- " 0 \n",
+ " 0 \n",
" 39 \n",
" State-gov \n",
" 77516 \n",
@@ -148,7 +143,7 @@
" <=50K \n",
" \n",
" \n",
- " 1 \n",
+ " 1 \n",
" 50 \n",
" Self-emp-not-inc \n",
" 83311 \n",
@@ -166,7 +161,7 @@
" <=50K \n",
" \n",
" \n",
- " 2 \n",
+ " 2 \n",
" 38 \n",
" Private \n",
" 215646 \n",
@@ -184,7 +179,7 @@
" <=50K \n",
" \n",
" \n",
- " 3 \n",
+ " 3 \n",
" 53 \n",
" Private \n",
" 234721 \n",
@@ -202,7 +197,7 @@
" <=50K \n",
" \n",
" \n",
- " 4 \n",
+ " 4 \n",
" 28 \n",
" Private \n",
" 338409 \n",
@@ -283,10 +278,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "As you can see from the above, some attributes (columns) contains a space in the begining. Although these spaces do not\n",
- "affect our computation, it is still good to clean it up. \n",
- "\n",
- "This can been done by finding string columns (stored as `object`) and use Python's `strip` function to remove these prefix spaces."
+ "As you can see from the above, some attributes (columns) contains a space in the beginning. Although these spaces do not\n",
+ "affect our computation, it is still good to clean it up."
]
},
{
@@ -306,7 +299,7 @@
"source": [
"## Discretization\n",
"\n",
- "As you can see from the data exploration part, `age` and `hours-per-week` are continuous. This might be useful for some cases to treat them as they are. In this tutorial, we are interested in only groups of these values. Therefore, we need first need to perform discretization on this values. More precisely,\n",
+ "Some attributes such as age and hours-per-week are continuous. We discretize these attributes as follows: \n",
"\n",
"- We categorize `age` into four groups: [ '<24', '24-35', '36-50', '>50' ], and\n",
"- We group `hours_per_week_group` into two groups: ['<=40', '>40']"
@@ -329,11 +322,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Data Distribution\n",
- "\n",
- "Once we have finished cleansing and discretizing the data, we are now ready to instantiate a [dit][dit]'s `Distribution` variable. This variable comes with necessary methods for dealing probabilistic operators, such as marginalization and conditioning.\n",
+ "## Distribution\n",
"\n",
- "[dit]: https://github.com/dit/dit"
+ "The code expects a joint distribution over predictors and target to be passed in dit format."
]
},
{
@@ -355,7 +346,7 @@
"\n",
"# aliases\n",
"rvs_names = [\n",
- " 'S', # income\n",
+ " 'X', # income\n",
" 'E', # education\n",
" 'G', # sex\n",
" 'R', # race\n",
@@ -394,7 +385,7 @@
"Class: Distribution Alphabet: ('Female', 'Male') for all rvs Base: linear Outcome Class: tuple Outcome Lenght: 1
G p(x) Female 0.33079450876815825 Male 0.6692054912318418
"
],
"text/plain": [
- ""
+ ""
]
},
"execution_count": 10,
@@ -425,7 +416,7 @@
"Base: linear\n",
"Outcome Class: tuple\n",
"Outcome Length: 1\n",
- "RV Names: ('S',)\n",
+ "RV Names: ('X',)\n",
"\n",
"x p(x)\n",
"('<=50K',) 0.8905394113824158\n",
@@ -439,7 +430,7 @@
"Base: linear\n",
"Outcome Class: tuple\n",
"Outcome Length: 1\n",
- "RV Names: ('S',)\n",
+ "RV Names: ('X',)\n",
"\n",
"x p(x)\n",
"('<=50K',) 0.6942634235888022\n",
@@ -449,7 +440,7 @@
],
"source": [
"# conditional probablity P(S|G).\n",
- "marginal, cdists = dist_census.condition_on('G', rvs='S')\n",
+ "marginal, cdists = dist_census.condition_on('G', rvs='X')\n",
"\n",
"for i, (c, d) in enumerate(zip(cdists, marginal.zipped())):\n",
" print(\"\")\n",
@@ -465,22 +456,17 @@
"source": [
"# Basic Information Theory\n",
"\n",
- "[note: revise this paragraph, make it relevant to what we're trying to do]\n",
+ "We review some basic definitions from information theory. \n",
"\n",
- "Information theory is a foundation of many fields and technologies. The theory provides rigourous methods that enable us to develop reliable ways of communication between senders and receivers via **noisy** channels. It is also a lens that helps us analyzing relationships between variables in pricipled ways. \n",
- "\n",
- "Some important quantities in Information Theory are:\n",
"\n",
"## Entropy\n",
"Let $X$ be a discreate random variable. Random variable $X$ takes values from $\\mathcal{X}$ and has probability mass function $p(x) = P\\{X=x\\}, x \\in \\mathcal{X} $.\n",
"\n",
- "**Definition:** Entropy $H(X)$ of a discrete random variable $X$ is defined by\n",
+ "**Definition:** Entropy $H(X)$ of a discrete random variable $X$ is defined as\n",
"\n",
"$$\n",
"H(X) = - \\sum_{x \\in \\mathcal{X} } p(x) \\log_{2} p(x).\n",
- "$$\n",
- "\n",
- "Because of $\\log_2$ in the equation, it is measured in terms of *bits*, and we omit writing the base of the log from now onwards."
+ "$$"
]
},
{
@@ -513,10 +499,10 @@
"\n",
"Let $(X, Y)$ be a pair of discrete random variable with a joint distribution $p(x,y)$, $x \\in \\mathcal{X}$ and $y \\in \\mathcal{Y}$.\n",
"\n",
- "**Definition:** Joint Entropy $H(X,Y)$ is definied by\n",
+ "**Definition:** Joint Entropy $H(X,Y)$ is definied as\n",
"\n",
"$$\n",
- "H(X, Y) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(x, y)\n",
+ "H(X, Y) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(x, y).\n",
"$$"
]
},
@@ -537,19 +523,19 @@
}
],
"source": [
- "# H(S, G)\n",
- "p_SG = dist_census.marginal('SG')\n",
- "dit.shannon.entropy(p_SG)"
+ "# H(X, G)\n",
+ "p_XG = dist_census.marginal('XG')\n",
+ "dit.shannon.entropy(p_XG)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "**Definition:** Conditional Entropy $H(Y|X)$ is defined by \n",
+ "**Definition:** Conditional Entropy $H(Y|X)$ is defined as \n",
"\n",
"$$\n",
- "H(Y|X) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(y|x)\n",
+ "H(Y|X) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(y|x).\n",
"$$"
]
},
@@ -570,25 +556,8 @@
}
],
"source": [
- "# H(S|G)\n",
- "dit.shannon.conditional_entropy(dist_census, 'S', 'G') "
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "**Theorem:** Chain Rule of Joint Entropy $H(X, Y)$\n",
- "\n",
- "$$\n",
- "H(X, Y) = H(X) + H(Y|X)\n",
- "$$\n",
- "\n",
- "Proof can be found at [Cover, Thomas M and Thomas, Joy A's Elements of Information Theory, Theorem 2.2.1][element].\n",
- "\n",
- "Below, we verify the theorem computationally.\n",
- "\n",
- "[element]: https://www.wiley.com/en-it/Elements+of+Information+Theory,+2nd+Edition-p-9780471241959"
+ "# H(X|G)\n",
+ "dit.shannon.conditional_entropy(dist_census, 'X', 'G') "
]
},
{
@@ -608,8 +577,8 @@
}
],
"source": [
- "dit.shannon.entropy(p_SG) \\\n",
- " == dit.shannon.entropy(p_G) + dit.shannon.conditional_entropy(dist_census, 'S', 'G') "
+ "dit.shannon.entropy(p_XG) \\\n",
+ " == dit.shannon.entropy(p_G) + dit.shannon.conditional_entropy(dist_census, 'X', 'G') "
]
},
{
@@ -620,10 +589,10 @@
"\n",
"Let $p$ and $q$ be two distributions with probability mass function $p(x)$ and $q(x)$ accordingly. Relative entropy is a distance between $p$ and $q$.\n",
"\n",
- "**Definition:** Relative entropy between two distributions $p$ and $q$ $D(p\\|q)$ is defined by \n",
+ "**Definition:** Relative entropy between two distributions $p$ and $q$ $D(p\\|q)$ is defined as \n",
"\n",
"$$\n",
- "D(p\\|q) = \\sum_{x \\in \\mathcal{X}} p(x) \\log \\frac{p(x)}{q(x)}\n",
+ "D(p\\|q) = \\sum_{x \\in \\mathcal{X}} p(x) \\log \\frac{p(x)}{q(x)}.\n",
"$$\n",
"\n",
"Another name of relative entropy is **Kullback-Leibler** divergence. Important properties of $D(p\\|q)$ are:\n",
@@ -641,23 +610,10 @@
"$$\n",
"\\begin{align*}\n",
"I(X; Y) &= D \\big(p(x, y) \\| p(x)p(y) \\big) \\\\\n",
- "&= \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)}\n",
- "\\end{align*}\n",
- "$$\n",
- "\n",
- "If we derive it further, we have"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "$$\n",
- "\\begin{align*}\n",
- "I(X; Y) &= \\sum_{x \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)} \\\\\n",
+ "&= \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)} \\\\\n",
"&= \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\big( \\log p(x, y) - \\log p(x) - \\log p(y) \\big) \\\\\n",
"&= - H(X, Y) - \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(x) - \\sum_{x \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(y) \\\\\n",
- "&= H(X) + H(Y) - H(X, Y)\n",
+ "&= H(X) + H(Y) - H(X, Y).\n",
"\\end{align*}\n",
"$$"
]
@@ -679,7 +635,7 @@
}
],
"source": [
- "dit.shannon.mutual_information(dist_census, 'S', 'G')"
+ "dit.shannon.mutual_information(dist_census, 'X', 'G')"
]
},
{
@@ -688,7 +644,7 @@
"source": [
"### Contitional Mutual Information\n",
"\n",
- "**Definition:** For three discrete random variables $X, Y, Z$, the conditional mutual information $I(X; Y |Z)$ is defined by\n",
+ "**Definition:** For three discrete random variables $X, Y, Z$, the conditional mutual information $I(X; Y |Z)$ is defined as\n",
"\n",
"$$\n",
"I(X; Y |Z) = H(X|Z) - H(X|Y,Z).\n",
@@ -712,9 +668,9 @@
}
],
"source": [
- "# I(S, H | O)\n",
- "dit.shannon.conditional_entropy(dist_census, 'S', 'O') - \\\n",
- " dit.shannon.conditional_entropy(dist_census, 'S', 'HO')"
+ "# I(X, H | O)\n",
+ "dit.shannon.conditional_entropy(dist_census, 'X', 'O') - \\\n",
+ " dit.shannon.conditional_entropy(dist_census, 'X', 'HO')"
]
},
{
@@ -723,10 +679,11 @@
"source": [
"# Information Decomposition\n",
"\n",
- "Let consider the setting that we have three random variables $X, Y, Z$. We are interested in knowing about $X$, but it is not observable. We can only observe the values of $Y$ and $Z$. \n",
+ "Suppose that we have three jointly distributed random variables, $X, Y, Z$. Suppose that we are interested in knowing $X$, but we can only observe $Y$ and / or $Z$.\n",
+ "\n",
+ "\n",
"\n",
"\n",
- "Utimately, we would like to quantify how much we know about $X$ based on the information of $Y$ and $Z$. More precisely, $I(X; (Y, Z))$ is the total information of $X$ that $(Y, Z)$ contains.\n",
"\n",
"\n",
"[Bertschinger et al. (2014)][paper] proposes one approach to decompose $I(X; (Y, Z))$ into four quantities:\n",
@@ -743,7 +700,6 @@
"- $CI(X; Y, Z)$ is complimentary (synergic) information that $Y$ and $Z$ have about $X$ when considering them together,\n",
"- $UI(X; Y \\backslash Z)$ is unique information that only $Y$ has about $X$ (in respect to $Z$), and vice versa. \n",
"\n",
- "With the formulation above, these four quantities have to be non-negative.\n",
"\n",
"## Shared Information\n",
"Furthermore, we have the following equilities:\n",
@@ -764,17 +720,16 @@
"source": [
"## Co-Information\n",
"\n",
- "**Definition:** Previouly known as **interaction information** [(McGill W. (1994)][mcgill], co-Information $CoI(X;Y;Z)$ is defined by\n",
+ "**Definition:** The coinformation is defined as [(McGill W. (1994)][mcgill]\n",
"\n",
"$$\n",
- "CoI(X;Y;Z) = I(X;Y) - I(X;Y|Z).\n",
+ "\\begin{align*}\n",
+ "CoI(X;Y;Z) &= I(X;Y) - I(X;Y|Z) \\\\\n",
+ "&= SI(X; Y, Z) - CI(X; Y, Z)\n",
+ "\\end{align*}\n",
+ ".\n",
"$$\n",
"\n",
- "With the chain rule of mutual information, we can write the identity above as\n",
- "\n",
- "$$\n",
- "CoI(X;Y;Z) = SI(X; Y, Z) - CI(X; Y, Z).\n",
- "$$\n",
"\n",
"[mcgill]: https://ieeexplore.ieee.org/abstract/document/1057469\n"
]
@@ -796,8 +751,8 @@
}
],
"source": [
- "# compute CoI(S; E; G) using dit\n",
- "dit.multivariate.coinformation(dist_census, 'SEG')"
+ "# compute CoI(X; E; G) using dit\n",
+ "dit.multivariate.coinformation(dist_census, 'XEG')"
]
},
{
@@ -817,8 +772,8 @@
}
],
"source": [
- "# compute CoI(S; H; G) using dit\n",
- "dit.multivariate.coinformation(dist_census, 'SHG')"
+ "# compute CoI(X; H; G) using dit\n",
+ "dit.multivariate.coinformation(dist_census, 'XHG')"
]
},
{
@@ -827,7 +782,8 @@
"source": [
"## Unique Information\n",
"\n",
- "From above, we have everything in place except $UI(\\cdot)$ that we haven't defined yet. Bertschinger et al. (2014) define the unique information as follows:\n",
+ "To compute the information decomposition, it suffices to specify either a measure for $SI$, for $CI$ or for $UI$.\n",
+ "Bertschinger et al. (2014) define the unique information as follows:\n",
"\n",
"$$\n",
"UI(X; Y \\backslash Z) = \\min_{Q \\in \\Delta_p} I_Q(X; Y|Z), \n",
@@ -848,12 +804,12 @@
"metadata": {},
"outputs": [],
"source": [
- "# find q for S, G, R (income, race, gender)\n",
- "q_SGR = admUI.computeQUI(distSXY = dist_census.marginal('SGR'))\n",
+ "# find q for X, G, R (income, race, gender)\n",
+ "q_XGR = admUI.computeQUI(distSXY = dist_census.marginal('XGR'))\n",
"\n",
"# due to the fact that computeQUI rename variables to SXY\n",
- "# we need to rename them back to SRG\n",
- "q_SGR.set_rv_names(\"SGR\")"
+ "# we need to rename them back to XGR\n",
+ "q_XGR.set_rv_names(\"XGR\")"
]
},
{
@@ -864,10 +820,10 @@
{
"data": {
"text/html": [
- "Class: Distribution Alphabet: (('<=50K', '>50K'), ('Female', 'Male'), ('Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other', 'White')) Base: linear Outcome Class: tuple Outcome Lenght: 3
S G R p(x) <=50K Female Amer-Indian-Eskimo 0.008247547005324936 <=50K Female Asian-Pac-Islander 0.005815377888949708 <=50K Female Black 0.07930424583381772 <=50K Female Other 0.007555050560623192 <=50K Female White 0.1936633251347222 <=50K Male Amer-Indian-Eskimo 0.0001981395952810085 <=50K Male Asian-Pac-Islander 0.01761756333138691 <=50K Male Black 0.00475336950884044 <=50K Male White 0.44203582369502953 >50K Female Amer-Indian-Eskimo 0.001018361885656181 >50K Female Asian-Pac-Islander 0.0007180555469803716 >50K Female Black 0.009792116811899728 >50K Female Other 0.0007677896847523185 >50K Female White 0.023912637763622506 >50K Male Amer-Indian-Eskimo 8.725526069439487e-05 >50K Male Asian-Pac-Islander 0.007758342599998848 >50K Male Black 0.0020932675154360938 >50K Male White 0.19466173037698395
"
+ "Class: Distribution Alphabet: (('<=50K', '>50K'), ('Female', 'Male'), ('Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other', 'White')) Base: linear Outcome Class: tuple Outcome Lenght: 3
X G R p(x) <=50K Female Amer-Indian-Eskimo 0.008247547005324936 <=50K Female Asian-Pac-Islander 0.005815377888949708 <=50K Female Black 0.07930424583381772 <=50K Female Other 0.007555050560623192 <=50K Female White 0.1936633251347222 <=50K Male Amer-Indian-Eskimo 0.0001981395952810085 <=50K Male Asian-Pac-Islander 0.01761756333138691 <=50K Male Black 0.00475336950884044 <=50K Male White 0.44203582369502953 >50K Female Amer-Indian-Eskimo 0.001018361885656181 >50K Female Asian-Pac-Islander 0.0007180555469803716 >50K Female Black 0.009792116811899728 >50K Female Other 0.0007677896847523185 >50K Female White 0.023912637763622506 >50K Male Amer-Indian-Eskimo 8.725526069439487e-05 >50K Male Asian-Pac-Islander 0.007758342599998848 >50K Male Black 0.0020932675154360938 >50K Male White 0.19466173037698395
"
],
"text/plain": [
- ""
+ ""
]
},
"execution_count": 21,
@@ -876,7 +832,7 @@
}
],
"source": [
- "q_SGR"
+ "q_XGR"
]
},
{
@@ -896,13 +852,13 @@
}
],
"source": [
- "# H(S|G,R)\n",
- "h_SgGR = dit.shannon.conditional_entropy(q_SGR, 'S', 'GR')\n",
+ "# H(X|G,R)\n",
+ "h_XgGR = dit.shannon.conditional_entropy(q_XGR, 'X', 'GR')\n",
"\n",
- "# UI(S; G \\ R) = I_Q { S; G \\ R }\n",
- "ui_SG_R = dit.shannon.conditional_entropy(q_SGR, 'S', 'R') - h_SgGR\n",
+ "# UI(X; G \\ R) = I_Q { X; G \\ R }\n",
+ "ui_XG_R = dit.shannon.conditional_entropy(q_XGR, 'X', 'R') - h_XgGR\n",
"\n",
- "ui_SG_R"
+ "ui_XG_R"
]
},
{
@@ -922,10 +878,10 @@
}
],
"source": [
- "# UI(S; R \\ G) = I_Q { S; R \\ G }\n",
- "ui_SR_G = dit.shannon.conditional_entropy(q_SGR, 'S', 'G') - h_SgGR\n",
+ "# UI(X; R \\ G) = I_Q { X; R \\ G }\n",
+ "ui_XR_G = dit.shannon.conditional_entropy(q_XGR, 'X', 'G') - h_XgGR\n",
"\n",
- "ui_SR_G "
+ "ui_XR_G "
]
},
{
@@ -946,11 +902,11 @@
],
"source": [
"# compute shared information\n",
- "si_SGR = dit.shannon.mutual_information(q_SGR, 'S', 'G') - ui_SG_R \n",
- "si_SRG = dit.shannon.mutual_information(q_SGR, 'S', 'R') - ui_SR_G\n",
+ "si_XGR = dit.shannon.mutual_information(q_XGR, 'X', 'G') - ui_XG_R \n",
+ "si_XRG = dit.shannon.mutual_information(q_XGR, 'X', 'R') - ui_XR_G\n",
"\n",
"# sanity check: by the definition of shared information\n",
- "si_SGR == si_SRG"
+ "si_XGR == si_XRG"
]
},
{
@@ -971,8 +927,8 @@
],
"source": [
"# by the definition of co-information\n",
- "ci_SGR = si_SRG - dit.multivariate.coinformation(q_SGR, 'SGR')\n",
- "ci_SGR"
+ "ci_XGR = si_XRG - dit.multivariate.coinformation(q_XGR, 'XGR')\n",
+ "ci_XGR"
]
},
{
@@ -993,14 +949,14 @@
],
"source": [
"# sanity check: by the definition of Bertschinger et al. (2014)'s information decomposition\n",
- "dit.shannon.mutual_information(q_SGR, 'S', 'GR') == ui_SG_R + ui_SR_G + si_SRG + ci_SGR "
+ "dit.shannon.mutual_information(q_XGR, 'X', 'GR') == ui_XG_R + ui_XR_G + si_XRG + ci_XGR "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "# Result Visualization"
+ "# The information decomposition for the US census dataset"
]
},
{
@@ -1016,8 +972,8 @@
"latex_labels = [\n",
" '$SI$',\n",
" '$CI$',\n",
- " '$UI(S; Y \\\\backslash Z)$',\n",
- " '$UI(S; Z \\\\backslash Y)$'\n",
+ " '$UI(X; Y \\\\backslash Z)$',\n",
+ " '$UI(X; Z \\\\backslash Y)$'\n",
"]\n",
"\n",
"# define colour for ploting\n",
@@ -1036,7 +992,7 @@
"outputs": [
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
""
]
@@ -1099,11 +1055,11 @@
" {\n",
" \"variables\": [\"sex\", \"race\"],\n",
" \"metrics\": {\n",
- " \"mi\": dit.shannon.mutual_information(q_SGR, 'S', 'RG'),\n",
- " \"si\": si_SGR, \n",
- " \"ci\": ci_SGR,\n",
- " \"ui_0\": ui_SG_R,\n",
- " \"ui_1\": ui_SR_G,\n",
+ " \"mi\": dit.shannon.mutual_information(q_XGR, 'X', 'RG'),\n",
+ " \"si\": si_XGR, \n",
+ " \"ci\": ci_XGR,\n",
+ " \"ui_0\": ui_XG_R,\n",
+ " \"ui_1\": ui_XR_G,\n",
" }\n",
" },\n",
"])"
@@ -1113,7 +1069,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "From the figure above, we can see that `sex(Y)` contains substantial amount of information about `income(S)`, while `race(Z)` does not. Futhermore, the two attributes `sex(Y)` and `race(Z)` also share some information about `income(S)` but not in a complementary manner."
+ "We see that race (Z) conveys no unique information about income (X) w.r.t sex (Y). "
]
},
{
@@ -1128,9 +1084,7 @@
"- education and race\n",
"- race and occupation\n",
"- age-group and sex\n",
- "- hours-per-week-group and occupation.\n",
- "\n",
- "We first define a function that computes the decomposition for us."
+ "- hours-per-week-group and occupation."
]
},
{
@@ -1162,7 +1116,7 @@
" si_SXY = si_SXY_1\n",
" \n",
" ci_SXY = si_SXY - dit.multivariate.coinformation(P, rvs)\n",
- " i_S_XY = dit.shannon.mutual_information(P, 'S', to) \n",
+ " i_S_XY = dit.shannon.mutual_information(P, 'X', to) \n",
"\n",
" # sanity check\n",
" assert math.isclose(i_S_XY, si_SXY + ci_SXY + ui_SX_Y + ui_SY_X, abs_tol=1e-6), \\\n",
@@ -1180,7 +1134,7 @@
" }\n",
" }\n",
"\n",
- "decomp_S_HO = information_decomposition(dist_census, 'S', 'HO')"
+ "decomp_X_HO = information_decomposition(dist_census, 'X', 'HO')"
]
},
{
@@ -1189,7 +1143,7 @@
"metadata": {},
"outputs": [],
"source": [
- "decomp_S_EG = information_decomposition(dist_census, 'S', 'EG')"
+ "decomp_X_EG = information_decomposition(dist_census, 'X', 'EG')"
]
},
{
@@ -1198,7 +1152,7 @@
"metadata": {},
"outputs": [],
"source": [
- "decomp_S_ER = information_decomposition(dist_census, 'S', 'ER')"
+ "decomp_X_ER = information_decomposition(dist_census, 'X', 'ER')"
]
},
{
@@ -1207,7 +1161,7 @@
"metadata": {},
"outputs": [],
"source": [
- "decomp_S_RO = information_decomposition(dist_census, 'S', 'RO')"
+ "decomp_X_RO = information_decomposition(dist_census, 'X', 'RO')"
]
},
{
@@ -1216,27 +1170,17 @@
"metadata": {},
"outputs": [],
"source": [
- "decomp_S_AG = information_decomposition(dist_census, 'S', 'AG')"
+ "decomp_X_AG = information_decomposition(dist_census, 'X', 'AG')"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
- "outputs": [],
- "source": [
- "# didn't converge, should we remove it?\n",
- "# decomp_S_EO = compute_decomposition_from(dist_census, 'S', ['E', 'O'])"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 35,
- "metadata": {},
"outputs": [
{
"data": {
- "image/png": "\n",
+ "image/png": "\n",
"text/plain": [
""
]
@@ -1249,11 +1193,11 @@
],
"source": [
"plot([\n",
- " decomp_S_EG,\n",
- " decomp_S_ER,\n",
- " decomp_S_RO,\n",
- " decomp_S_AG,\n",
- " decomp_S_HO,\n",
+ " decomp_X_EG,\n",
+ " decomp_X_ER,\n",
+ " decomp_X_RO,\n",
+ " decomp_X_AG,\n",
+ " decomp_X_HO,\n",
"][::-1])"
]
},
@@ -1261,17 +1205,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "When looking at `education` and `sex`, we can see that `education` uniquely contains considerable amount of information about `income` with respect to `sex`. The proportion is also large than what they convey about `income` in both shared and complementary aspects. Notably, `education` have large unique information about `income` when considering with `race`, and `occupation` also has a similar relative portion with respect to `race`.\n",
+ "We see that most of the information that race ($Y$) and occupation ($Z$) convey about income ($X$), is uniquely in the occupation. A more classical approach would be, of course, to test for the Markov relation $X-Z-Y$. This Markov relation almost holds, in the sense that $I(X;Y|Z)$ is small. The additional insight, due to the information decomposition, is that $I(X;Y|Z)$ is purely synergistic since there is no unique information that race conveys about income w.r.t. occupation. The phenomenon is even stronger for the pair (age, sex): The conditional mutual information is (relatively) larger, but still there is no unique information that sex conveys about income w.r.t. age. Education and sex have about equally large shared and synergistic components. Occupation conveys a large unique information about income both w.r.t. hours-per-week and education.\n",
"\n",
- "This way of decomposition provides us a new aspect for analyzing the interactions between variables (predictors and responses) at a granular level."
+ "These observations appear quite reasonable. They illustrate how the decomposition allows us to obtain a fine-grained quantitative analysis of the relationships between the predictors and the target. "
]
}
],
"metadata": {
"kernelspec": {
- "display_name": "broja",
+ "display_name": "vdb-new",
"language": "python",
- "name": "broja"
+ "name": "vdb-new"
},
"language_info": {
"codemirror_mode": {
@@ -1283,7 +1227,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.7.4"
+ "version": "3.7.6"
},
"toc": {
"nav_menu": {},
@@ -1299,7 +1243,7 @@
"width": "212px"
},
"toc_section_display": "block",
- "toc_window_display": true
+ "toc_window_display": false
}
},
"nbformat": 4,