From fc1a4f4439bbea31e03d4dddbc45b4de4b0e91e1 Mon Sep 17 00:00:00 2001 From: heytitle Date: Thu, 26 Sep 2019 18:06:03 +0200 Subject: [PATCH 1/7] add a tutorial notebook --- python/notebooks/us-1994-census.ipynb | 1329 +++++++++++++++++++++++++ 1 file changed, 1329 insertions(+) create mode 100644 python/notebooks/us-1994-census.ipynb diff --git a/python/notebooks/us-1994-census.ipynb b/python/notebooks/us-1994-census.ipynb new file mode 100644 index 0000000..aa4cea7 --- /dev/null +++ b/python/notebooks/us-1994-census.ipynb @@ -0,0 +1,1329 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from matplotlib import pyplot as plt\n", + "import numpy as np\n", + "\n", + "import dit\n", + "import admUI\n", + "import math" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## General Structure (will be removed later)\n", + "\n", + "describe about the general structuure of this notebook. Ideally, it should contain\n", + "\n", + "- Problem definition\n", + "- Setting\n", + " - Data Preparation\n", + " - some statistics of us-census 1994\n", + " - auxiliary functions (helper) functions\n", + "\n", + "- Information theory\n", + " - basic quantities\n", + " - Entropy, KL-Divergence, Mutual Information, and some properites (chain rule, ... )\n", + " - information decomposision (BROJA & Pradeep's paper)\n", + " - MI = SI + CI + 2 UI\n", + " - breifly describe how the UI solver works. (simplex...), intuitively\n", + " - detail in appendix?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction\n", + "\n", + "Information and its theory are an important quantity that govern many fields, rangning from communication to machine learning. \n", + "\n", + "[figure src-> one receiver]\n", + "\n", + "In this tutorial, we are going to demonstate a way of decomposing information into parts. Each part describes a certain aspect of the relationship of the variables.\n", + "\n", + "[figure src-> two receiver]\n", + "\n", + "We start the turotial with dataset description and some preprocessing steps ([Section X](#Data-Preparation)). Then, we define some auxiliary functions for visualizing results. Then, we talk about the foundation of information theory (Section X) and Information Decomposition. Along the way, we have code that implement or compute quantities of current interest." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Data Preparation\n", + "\n", + "In this tutorial, we use **[US Census 1994][us-census]**, a publily available dataset, to demonstate the content of this turorial. The dataset contains individual's attributes, such as race, age, gender, and the level of income ( <= X, > X ). Hence, ML learners use it to train a classifier for predicting the level of income based on other attributes.\n", + "\n", + "\n", + "Here, we are interested in explaing the relationship between these attributes and the income variable; therefore, we only use the training set.\n", + "\n", + "[us-census]: https://archive.ics.uci.edu/ml/datasets/adult" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "data_url = \"https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data\"" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "column_names = [\n", + " \"age\",\n", + " \"workclass\",\n", + " \"fnlwgt\",\n", + " \"education\",\n", + " \"education-num\",\n", + " \"martital-status\",\n", + " \"occupation\",\n", + " \"relationship\",\n", + " \"race\",\n", + " \"sex\",\n", + " \"capital-gain\",\n", + " \"capital-loss\",\n", + " \"hours-per-week\",\n", + " \"native-country\",\n", + " \"income\",\n", + "]\n", + "\n", + "\n", + "df = pd.read_csv(data_url, names=column_names)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleansing" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageworkclassfnlwgteducationeducation-nummartital-statusoccupationrelationshipracesexcapital-gaincapital-losshours-per-weeknative-countryincome
039State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174040United-States<=50K
150Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0013United-States<=50K
238Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0040United-States<=50K
353Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0040United-States<=50K
428Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0040Cuba<=50K
\n", + "
" + ], + "text/plain": [ + " age workclass fnlwgt education education-num \\\n", + "0 39 State-gov 77516 Bachelors 13 \n", + "1 50 Self-emp-not-inc 83311 Bachelors 13 \n", + "2 38 Private 215646 HS-grad 9 \n", + "3 53 Private 234721 11th 7 \n", + "4 28 Private 338409 Bachelors 13 \n", + "\n", + " martital-status occupation relationship race sex \\\n", + "0 Never-married Adm-clerical Not-in-family White Male \n", + "1 Married-civ-spouse Exec-managerial Husband White Male \n", + "2 Divorced Handlers-cleaners Not-in-family White Male \n", + "3 Married-civ-spouse Handlers-cleaners Husband Black Male \n", + "4 Married-civ-spouse Prof-specialty Wife Black Female \n", + "\n", + " capital-gain capital-loss hours-per-week native-country income \n", + "0 2174 0 40 United-States <=50K \n", + "1 0 0 13 United-States <=50K \n", + "2 0 0 40 United-States <=50K \n", + "3 0 0 40 United-States <=50K \n", + "4 0 0 40 Cuba <=50K " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# data exploration\n", + "\n", + "df[:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[39, ' State-gov', 77516, ' Bachelors', 13, ' Never-married',\n", + " ' Adm-clerical', ' Not-in-family', ' White', ' Male', 2174, 0,\n", + " 40, ' United-States', ' <=50K']], dtype=object)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[:1].values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As you can see from the above, some attributes (columns) contains a space in the begining. Although these spaces do not\n", + "affect our computation, it is still good to clean it up. \n", + "\n", + "This can been done by finding string columns (stored as `object`) and use Python's `strip` function to remove these prefix spaces." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "for k, v in zip(column_names, df.dtypes):\n", + " if v == \"object\":\n", + " df[k] = df[k].apply(lambda x: x.strip())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Discretization\n", + "\n", + "As you can see from the data exploration part, `age` and `hours-per-week` are continuous. This might be useful for some cases to treat them as they are. In this tutorial, we are interested in only groups of these values. Therefore, we need first need to perform discretization on this values. More precisely,\n", + "\n", + "- We categorize `age` into four groups: [ '<24', '24-35', '36-50', '>50' ], and\n", + "- We group `hours_per_week_group` into two groups: ['<=40', '>40']" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "age_groups = ['<24', '24-35', '36-50', '>50']\n", + "df['age-group'] = df.age.apply(lambda x: age_groups[np.digitize(x, [24, 36, 51])])\n", + "\n", + "hours_per_week_group = ['<=40', '>40']\n", + "df['hours-per-week-group'] = df['hours-per-week'].apply(lambda x: hours_per_week_group[0 if x <= 40 else 1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data Distribution\n", + "\n", + "Once we have finished cleansing and discretizing the data, we are now ready to instantiate a [dit][dit]'s `Distribution` variable. This variable comes with necessary methods for dealing probabilistic operators, such as marginalization and conditioning.\n", + "\n", + "[dit]: https://github.com/dit/dit" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "# select only columns that we will use in this tutorial\n", + "selected_columns = [\n", + " 'income',\n", + " 'education',\n", + " 'sex',\n", + " 'race',\n", + " 'occupation',\n", + " 'age-group',\n", + " 'hours-per-week-group'\n", + "]\n", + "\n", + "# aliases\n", + "rvs_names = [\n", + " 'S', # income\n", + " 'E', # education\n", + " 'G', # sex\n", + " 'R', # race\n", + " 'O', # occupation\n", + " 'A', # age\n", + " 'H', # hours-per-week\n", + "]\n", + "\n", + "rvs_to_name = dict(zip(rvs_names, selected_columns))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "# take all samples with attributes that we're interested in \n", + "data_array = list(map(lambda r: tuple(r[k] for k in selected_columns), df.to_dict(\"record\")))\n", + "\n", + "# create distribution from the samples with uniform distribution\n", + "dist_census = dit.Distribution(data_array, [1. / df.shape[0] ] * df.shape[0])\n", + "\n", + "# set variable aliases to the discribution\n", + "dist_census.set_rv_names(\"\".join(rvs_names))" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Class:Distribution
Alphabet:('Female', 'Male') for all rvs
Base:linear
Outcome Class:tuple
Outcome Lenght:1
Gp(x)
Female0.33079450876815825
Male0.6692054912318418
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# marginal distribution of P(G), G is `sex`\n", + "dist_census.marginal('G')" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "-----------------------\n", + "Marginal: ('Female',) | 0.330795\n", + "P(S|H='Female')\n", + "Class: Distribution\n", + "Alphabet: ('<=50K', '>50K') for all rvs\n", + "Base: linear\n", + "Outcome Class: tuple\n", + "Outcome Length: 1\n", + "RV Names: ('S',)\n", + "\n", + "x p(x)\n", + "('<=50K',) 0.8905394113824158\n", + "('>50K',) 0.10946058861758425\n", + "\n", + "-----------------------\n", + "Marginal: ('Male',) | 0.669205\n", + "P(S|H='Male')\n", + "Class: Distribution\n", + "Alphabet: ('<=50K', '>50K') for all rvs\n", + "Base: linear\n", + "Outcome Class: tuple\n", + "Outcome Length: 1\n", + "RV Names: ('S',)\n", + "\n", + "x p(x)\n", + "('<=50K',) 0.6942634235888022\n", + "('>50K',) 0.3057365764111978\n" + ] + } + ], + "source": [ + "# conditional probablity P(S|G).\n", + "marginal, cdists = dist_census.condition_on('G', rvs='S')\n", + "\n", + "for i, (c, d) in enumerate(zip(cdists, marginal.zipped())):\n", + " print(\"\")\n", + " print(\"-----------------------\")\n", + " print(\"Marginal: %s | %f\" % d)\n", + " print(\"P(S|H='%s')\" % (d[0]))\n", + " print(c)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Basic Information Theory\n", + "\n", + "[note: revise this paragraph, make it relevant to what we're trying to do]\n", + "\n", + "Information theory is foundation of many fields and technologies. The theory provides rigourous methods that enable us to develop ways of communication between source and receivers via **noisy** channels with least amount of error. \n", + "\n", + "Some important quantities in Information Theory are:\n", + "\n", + "## Entropy\n", + "Let $X$ be a discreate random variable. Random variable $X$ takes values from $\\mathcal{X}$ and has probability mass function $p(x) = P\\{X=x\\}, x \\in \\mathcal{X} $.\n", + "\n", + "**Definition:** Entropy $H(X)$ of a discrete random variable $X$ is defined by\n", + "\n", + "$$\n", + "H(X) = - \\sum_{x \\in \\mathcal{X} } p(x) \\log_{2} p(x).\n", + "$$\n", + "\n", + "Because of $\\log_2$ in the equation, it is measured in terms of *bits*, and we omit writing the base of the log from now onwards." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.9157360598501509" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# H(G)\n", + "p_G = dist_census.marginal('G')\n", + "dit.shannon.entropy(p_G)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Joint Entropy and Contional Entropy\n", + "\n", + "Let $(X, Y)$ be a pair of discrete random variable with a joint distribution $p(x,y)$, $x \\in \\mathcal{X}$ and $y \\in \\mathcal{Y}$.\n", + "\n", + "**Definition:** Joint Entropy $H(X,Y)$ is definied by\n", + "\n", + "$$\n", + "H(X, Y) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(x, y)\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.674948627614043" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# H(S, G)\n", + "p_SG = dist_census.marginal('SG')\n", + "dit.shannon.entropy(p_SG)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Definition:** Conditional Entropy $H(Y|X)$ is defined by \n", + "\n", + "$$\n", + "H(Y|X) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(y|x)\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.7592125677638922" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# H(S|G)\n", + "dit.shannon.conditional_entropy(dist_census, 'S', 'G') " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Theorem:** Chain Rule of Joint Entropy $H(X, Y)$\n", + "\n", + "$$\n", + "H(X, Y) = H(X) + H(Y|X)\n", + "$$\n", + "\n", + "Proof can be found at [Cover, Thomas M and Thomas, Joy A's Elements of Information Theory, Theorem 2.2.1][element].\n", + "\n", + "Below, we verify the theorem computationally.\n", + "\n", + "[element]: https://www.wiley.com/en-it/Elements+of+Information+Theory,+2nd+Edition-p-9780471241959" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dit.shannon.entropy(p_SG) \\\n", + " == dit.shannon.entropy(p_G) + dit.shannon.conditional_entropy(dist_census, 'S', 'G') " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Relative Entropy\n", + "\n", + "Let $p$ and $q$ be two distributions with probability mass function $p(x)$ and $q(x)$ accordingly. Relative entropy is a distance between $p$ and $q$.\n", + "\n", + "**Definition:** Relative entropy between two distributions $p$ and $q$ $D(p\\|q)$ is defined by \n", + "\n", + "$$\n", + "D(p\\|q) = \\sum_{x \\in \\mathcal{X}} p(x) \\log \\frac{p(x)}{q(x)}\n", + "$$\n", + "\n", + "Another name of relative entropy is **Kullback-Leibler** divergence. Important properties of $D(p\\|q)$ are:\n", + "- $D(p\\|q) \\neq D(q\\|p)$\n", + "- $D(p\\|q) = 0$ iff. $p=q$ \n", + "- $D(p\\|q) = \\infty $ if there is some $x \\in \\mathcal{X}$ that $p(x) > 0$ and $q(x) = 0$.\n", + "\n", + "## Mutual Information\n", + "\n", + "For two discrete random variables $X$ and $Y$ with a joint probability $p(x, y)$ and marginal probablity mass function $p(x)$ and $p(y)$. \n", + "\n", + "\n", + "**Definition:** Mutual information $I(X; Y)$ is the relative entropy between $p(x, y)$ and $p(x)p(y)$: \n", + "\n", + "$$\n", + "\\begin{align*}\n", + "I(X; Y) &= D \\big(p(x, y) \\| p(x)p(y) \\big) \\\\\n", + "&= \\sum_{x \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)}\n", + "\\end{align*}\n", + "$$\n", + "\n", + "[note: Remove this part below]\n", + "If we continute the derivation, we have \n", + "\n", + "$$\n", + "I(X; Y) = H(X) - H(X|Y)\n", + "$$\n", + "\n", + "Intuitively, $I(X; Y)$ describes the overvall uncertainty of $X$ when $Y$ is known to us or vice versa. Please see [note: Appendix X][continue] for the derivation.\n", + "\n", + "[continue]: ..." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.037171387438320824" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dit.shannon.mutual_information(dist_census, 'S', 'G')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Contitional Mutual Information\n", + "\n", + "**Definition:** For three discrete random variables $X, Y, Z$, the conditional mutual information $I(X; Y |Z)$ is defined by\n", + "\n", + "$$\n", + "I(X; Y |Z) = H(X|Z) - H(X|Y,Z).\n", + "$$" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.02187436337675841" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# I(S, H | O)\n", + "dit.shannon.conditional_entropy(dist_census, 'S', 'O') - \\\n", + " dit.shannon.conditional_entropy(dist_census, 'S', 'HO')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Information Decomposition\n", + "\n", + "Let consider the setting that we have three random variables $X, Y, Z$. We are interested in knowing about $X$, but it is not observable. We can only observe the values of $Y$ and $Z$. \n", + "![](https://i.imgur.com/GpHQ6MW.png)\n", + "\n", + "Utimately, we would like to quantify how much we know about $X$ based on the information of $Y$ and $Z$. More precisely, $I(X; (Y, Z))$ is the total information of $X$ that $(Y, Z)$ contains.\n", + "\n", + "\n", + "[Bertschinger et al. (2014)][paper] proposes one approach to decompose $I(X; (Y, Z))$ into four quantities:\n", + "\n", + "\n", + "![](https://i.imgur.com/c3tEced.png)\n", + "\n", + "$$\n", + "I(X; Y, Z) = SI(X; Y, Z) + CI(X; Y, Z) + UI(X; Y \\backslash Z) + UI(X; Z \\backslash Y),\n", + "$$\n", + "\n", + "where \n", + "- $SI(X; Y, Z)$ is shared information that both $Y$ and $Z$ have about $X$,\n", + "- $CI(X; Y, Z)$ is complimentary (synergic) information that $Y$ and $Z$ have about $X$ when considering them together,\n", + "- $UI(X; Y \\backslash Z)$ is unique information that only $Y$ has about $X$ (in respect to $Z$), and vice versa. \n", + "\n", + "With the formulation above, these four quantities have to be non-negative.\n", + "\n", + "## Shared Information\n", + "Furthermore, we have the following equilities:\n", + "\n", + "$$\n", + "\\begin{align*}\n", + " SI(X; Y, Z) = I(X; Y) - UI(X; Y \\backslash Z) = I(X; Z) - UI(X; Z \\backslash Y) \\\\\n", + "\\end{align*}\n", + "$$\n", + "\n", + "\n", + "[paper]: https://www.mdpi.com/1099-4300/16/4/2161" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Co-Information\n", + "\n", + "**Definition:** Previouly known as **interaction information** [note: cite], co-Information $CoI(X;Y;Z)$ is defined by\n", + "\n", + "$$\n", + "CoI(X;Y;Z) = I(X;Y) - I(X;Y|Z).\n", + "$$\n", + "\n", + "With the chain rule of mutual information, we can write the identity above as\n", + "\n", + "$$\n", + "CoI(X;Y;Z) = SI(X; Y, Z) - CI(X; Y, Z).\n", + "$$\n" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "-0.00010688124181967851" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# compute CoI(S; E; G) using dit\n", + "dit.multivariate.coinformation(dist_census, 'SEG')" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.012185581661140699" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# compute CoI(S; H; G) using dit\n", + "dit.multivariate.coinformation(dist_census, 'SHG')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Unique Information\n", + "\n", + "From above, we have everything in place except $UI(\\cdot)$ that we haven't defined yet. Bertschinger et al. (2014) define the unique information as follows:\n", + "\n", + "$$\n", + "UI(X; Y \\backslash Z) = \\min_{Q \\in \\Delta_p} I_Q(X; Y|Z), \n", + "$$\n", + "\n", + "where $\\Delta_p$ is a set of joint probability distributions that have the same marginal probablity distributions on $(X, Y)$ and $(X, Z)$ as $P \\in \\mathbb{P}_{ \\mathcal{X} \\times \\mathcal{Y} \\times \\mathcal{Z} }$.\n", + "\n", + "Formally, $\\Delta_p$ is\n", + "\n", + "$$\n", + "\\Delta_p = \\{ Q \\in \\mathbb{P}_{ \\mathcal{X} \\times \\mathcal{Y} \\times \\mathcal{Z} }: Q(X, Y) = P(X, Y) \\text{ and } Q(X, Z) = P(X, Z) \\}\n", + "$$\n", + "\n", + "[todo: add figure to illustrate this optimization procedure, maybe in appendix]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# find q for S, G, R (income, race, gender)\n", + "q_SGR = admUI.computeQUI(distSXY = dist_census.marginal('SGR'))\n", + "\n", + "# due to the fact that computeQUI rename variables to SXY\n", + "# we need to rename them back to SRG\n", + "q_SGR.set_rv_names(\"SGR\")" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Class:Distribution
Alphabet:(('<=50K', '>50K'), ('Female', 'Male'), ('Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other', 'White'))
Base:linear
Outcome Class:tuple
Outcome Lenght:3
SGRp(x)
<=50KFemaleAmer-Indian-Eskimo0.008247547005324936
<=50KFemaleAsian-Pac-Islander0.005815377888949708
<=50KFemaleBlack0.07930424583381772
<=50KFemaleOther0.007555050560623192
<=50KFemaleWhite0.1936633251347222
<=50KMaleAmer-Indian-Eskimo0.0001981395952810085
<=50KMaleAsian-Pac-Islander0.01761756333138691
<=50KMaleBlack0.00475336950884044
<=50KMaleWhite0.44203582369502953
>50KFemaleAmer-Indian-Eskimo0.001018361885656181
>50KFemaleAsian-Pac-Islander0.0007180555469803716
>50KFemaleBlack0.009792116811899728
>50KFemaleOther0.0007677896847523185
>50KFemaleWhite0.023912637763622506
>50KMaleAmer-Indian-Eskimo8.725526069439487e-05
>50KMaleAsian-Pac-Islander0.007758342599998848
>50KMaleBlack0.0020932675154360938
>50KMaleWhite0.19466173037698395
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "q_SGR" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.028813064495489815" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# H(S|G,R)\n", + "h_SgGR = dit.shannon.conditional_entropy(q_SGR, 'S', 'GR')\n", + "\n", + "# UI(S; G \\ R) = I_Q { S; G \\ R }\n", + "ui_SG_R = dit.shannon.conditional_entropy(q_SGR, 'S', 'R') - h_SgGR\n", + "\n", + "ui_SG_R" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.9623869407237038e-05" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# UI(S; R \\ G) = I_Q { S; R \\ G }\n", + "ui_SR_G = dit.shannon.conditional_entropy(q_SGR, 'S', 'G') - h_SgGR\n", + "\n", + "ui_SR_G " + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# compute shared information\n", + "si_SGR = dit.shannon.mutual_information(q_SGR, 'S', 'G') - ui_SG_R \n", + "si_SRG = dit.shannon.mutual_information(q_SGR, 'S', 'R') - ui_SR_G\n", + "\n", + "# sanity check: by the definition of shared information\n", + "si_SGR == si_SRG" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.0" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# by the definition of co-information\n", + "ci_SGR = si_SRG - dit.multivariate.coinformation(q_SGR, 'SGR')\n", + "ci_SGR" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# sanity check: by the definition of Bertschinger et al. (2014)'s information decomposition\n", + "dit.shannon.mutual_information(q_SGR, 'S', 'GR') == ui_SG_R + ui_SR_G + si_SRG + ci_SGR " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Result Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "# define measurement metrics\n", + "metric_keys = ['si', 'ci', 'ui_0', 'ui_1']\n", + "\n", + "# define legend labels\n", + "latex_labels = [\n", + " '$SI$',\n", + " '$CI$',\n", + " '$UI(S; Y \\\\backslash Z)$',\n", + " '$UI(S; Z \\\\backslash Y)$'\n", + "]\n", + "\n", + "# define colour for ploting\n", + "colors = [\n", + " '#27AB93',\n", + " '#FF5716',\n", + " '#D33139',\n", + " '#522F60',\n", + "] " + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "\"\"\"\n", + "define a auxiliary function which takes an array of\n", + "\n", + " {\n", + " \"variables\": [\"variable_0\", \"variable_1\"],\n", + " \"metrics\": {\n", + " \"mi\": ...,\n", + " \"si\": ..., \n", + " \"ci\": ...,\n", + " \"ui_0\": ...,\n", + " \"ui_1\": ...\n", + " }\n", + " },\n", + "\n", + "\"\"\"\n", + "\n", + "\n", + "def plot(data):\n", + " \n", + " # definte the size of figure\n", + " plt.figure(figsize=(6, 0.5*len(data)))\n", + " \n", + " labels = []\n", + " metrics = []\n", + " \n", + " # suffix each variable name with an alias {Y, Z}\n", + " for v in data:\n", + " labels.append(\", \".join(map(lambda p: \"%s (%s)\" % p, zip(v['variables'], ['Y', 'Z']))))\n", + " \n", + " for m in metric_keys:\n", + " mm = []\n", + " # extract corresponding metric from elements in array\n", + " for v in data:\n", + " mm.append(v['metrics'][m]/v['metrics']['mi'])\n", + " metrics.append(mm)\n", + " \n", + "\n", + " left = np.array([0]*len(labels))\n", + " \n", + " # plotting\n", + " for i, kk in enumerate(metric_keys):\n", + " plt.barh(\n", + " labels, metrics[i], align='center', height=.5, left=left, label=latex_labels[i],color=colors[i],\n", + " )\n", + " left = left + metrics[i]\n", + " plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))\n", + "\n", + "plot([\n", + " {\n", + " \"variables\": [\"sex\", \"race\"],\n", + " \"metrics\": {\n", + " \"mi\": dit.shannon.mutual_information(q_SGR, 'S', 'RG'),\n", + " \"si\": si_SGR, \n", + " \"ci\": ci_SGR,\n", + " \"ui_0\": ui_SG_R,\n", + " \"ui_1\": ui_SR_G,\n", + " }\n", + " },\n", + "])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[note: writing some summary here]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Other variable pairs\n", + "\n", + "In the following, we are going to compute the information decomposition between `income` and several pairs of variables, namely\n", + "\n", + "- education and sex\n", + "- education and race\n", + "- race and occupation\n", + "- **education and occupation [note: didn't converged]**\n", + "- age-group and sex\n", + "- hours-per-week-group and occupation.\n", + "\n", + "We first define a function that computes the decomposition for us." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "def information_decomposition(dist, src, to=\"\"):\n", + " rvs = src+to\n", + " \n", + " P = dist.marginal(rvs)\n", + " \n", + " variables = P._rvs\n", + " \n", + " q_SXY = admUI.computeQUI(distSXY = P)\n", + " \n", + " h_SgXY = dit.shannon.conditional_entropy(q_SXY, 'S', 'XY')\n", + " \n", + " ui_SX_Y = dit.shannon.conditional_entropy(q_SXY, 'S', 'Y') - h_SgXY\n", + " ui_SY_X = dit.shannon.conditional_entropy(q_SXY, 'S', 'X') - h_SgXY\n", + "\n", + " si_SXY_1 = dit.shannon.mutual_information(q_SXY, 'S', 'X') - ui_SX_Y\n", + " si_SXY_2 = dit.shannon.mutual_information(q_SXY, 'S', 'Y') - ui_SY_X\n", + " \n", + " # sanity check\n", + " assert math.isclose(si_SXY_1, si_SXY_2, abs_tol=1e-6), \"SI_S_XY: %f | %f\" % (si_SXY_1, si_SXY_2)\n", + "\n", + " si_SXY = si_SXY_1\n", + " \n", + " ci_SXY = si_SXY - dit.multivariate.coinformation(P, rvs)\n", + " i_S_XY = dit.shannon.mutual_information(P, 'S', to) \n", + "\n", + " # sanity check\n", + " assert math.isclose(i_S_XY, si_SXY + ci_SXY + ui_SX_Y + ui_SY_X, abs_tol=1e-6), \\\n", + " \"MI = decompose : %f | %f\" % (i_S_XY, si_SXY + ci_SXY + ui_SX_Y + ui_SY_X)\n", + " \n", + " uis = [ui_SX_Y, ui_SY_X]\n", + " return {\n", + " \"variables\": tuple(map(lambda x: rvs_to_name[x], to)),\n", + " \"metrics\": {\n", + " \"mi\": i_S_XY,\n", + " \"si\": si_SXY, \n", + " \"ci\": ci_SXY,\n", + " \"ui_0\": uis[variables[to[0]]-1] ,\n", + " \"ui_1\": uis[variables[to[1]]-1]\n", + " }\n", + " }\n", + "\n", + "decomp_S_HO = information_decomposition(dist_census, 'S', 'HO')" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "decomp_S_EG = information_decomposition(dist_census, 'S', 'EG')" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "decomp_S_ER = information_decomposition(dist_census, 'S', 'ER')" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [], + "source": [ + "decomp_S_RO = information_decomposition(dist_census, 'S', 'RO')" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "decomp_S_AG = information_decomposition(dist_census, 'S', 'AG')" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [], + "source": [ + "# didn't converge, can we remove it?\n", + "# decomp_S_EO = compute_decomposition_from(dist_census, 'S', ['E', 'O'])" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plot([\n", + " decomp_S_EG,\n", + " decomp_S_ER,\n", + " decomp_S_RO,\n", + " decomp_S_AG,\n", + " decomp_S_HO,\n", + "][::-1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[note: interpret the results]" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "broja", + "language": "python", + "name": "broja" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + }, + "toc": { + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "toc_cell": false, + "toc_position": { + "height": "673px", + "left": "0px", + "right": "1228px", + "top": "110px", + "width": "212px" + }, + "toc_section_display": "block", + "toc_window_display": true + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} From 5fa44d9da54ea2c44ff2f9b0fb61a51c5d2b8847 Mon Sep 17 00:00:00 2001 From: heytitle Date: Thu, 26 Sep 2019 21:23:32 +0200 Subject: [PATCH 2/7] update notebook --- python/notebooks/us-1994-census.ipynb | 88 ++++++++++----------------- 1 file changed, 33 insertions(+), 55 deletions(-) diff --git a/python/notebooks/us-1994-census.ipynb b/python/notebooks/us-1994-census.ipynb index aa4cea7..6d4accb 100644 --- a/python/notebooks/us-1994-census.ipynb +++ b/python/notebooks/us-1994-census.ipynb @@ -15,44 +15,16 @@ "import math" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## General Structure (will be removed later)\n", - "\n", - "describe about the general structuure of this notebook. Ideally, it should contain\n", - "\n", - "- Problem definition\n", - "- Setting\n", - " - Data Preparation\n", - " - some statistics of us-census 1994\n", - " - auxiliary functions (helper) functions\n", - "\n", - "- Information theory\n", - " - basic quantities\n", - " - Entropy, KL-Divergence, Mutual Information, and some properites (chain rule, ... )\n", - " - information decomposision (BROJA & Pradeep's paper)\n", - " - MI = SI + CI + 2 UI\n", - " - breifly describe how the UI solver works. (simplex...), intuitively\n", - " - detail in appendix?" - ] - }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", - "Information and its theory are an important quantity that govern many fields, rangning from communication to machine learning. \n", - "\n", - "[figure src-> one receiver]\n", + "Information and its theory are an important quantity that govern many fields, rangning from communication to machine learning. In this tutorial, we are going to demonstate a novey approach of decomposing information into parts. Each part describes a certain aspect of the relationship of the variables.\n", "\n", - "In this tutorial, we are going to demonstate a way of decomposing information into parts. Each part describes a certain aspect of the relationship of the variables.\n", "\n", - "[figure src-> two receiver]\n", - "\n", - "We start the turotial with dataset description and some preprocessing steps ([Section X](#Data-Preparation)). Then, we define some auxiliary functions for visualizing results. Then, we talk about the foundation of information theory (Section X) and Information Decomposition. Along the way, we have code that implement or compute quantities of current interest." + "We start the turotial with dataset description and some preprocessing steps ([Section 2](#Data-Preparation)). Then, we define some auxiliary functions for visualizing results. Then, we talk about the foundation of information theory ([Section 3](#Basic-Information-Theory)) and Information Decomposition ([Section 4](#Information-Decomposition)). Thorough this tutorial, we provide code that implement or compute quantities of current interests." ] }, { @@ -422,7 +394,7 @@ "
Class:Distribution
Alphabet:('Female', 'Male') for all rvs
Base:linear
Outcome Class:tuple
Outcome Lenght:1
Gp(x)
Female0.33079450876815825
Male0.6692054912318418
" ], "text/plain": [ - "" + "" ] }, "execution_count": 10, @@ -495,7 +467,7 @@ "\n", "[note: revise this paragraph, make it relevant to what we're trying to do]\n", "\n", - "Information theory is foundation of many fields and technologies. The theory provides rigourous methods that enable us to develop ways of communication between source and receivers via **noisy** channels with least amount of error. \n", + "Information theory is a foundation of many fields and technologies. The theory provides rigourous methods that enable us to develop reliable ways of communication between senders and receivers via **noisy** channels. It is also a lens that helps us analyzing relationships between variables in pricipled ways. \n", "\n", "Some important quantities in Information Theory are:\n", "\n", @@ -669,25 +641,30 @@ "$$\n", "\\begin{align*}\n", "I(X; Y) &= D \\big(p(x, y) \\| p(x)p(y) \\big) \\\\\n", - "&= \\sum_{x \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)}\n", + "&= \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)}\n", "\\end{align*}\n", "$$\n", "\n", - "[note: Remove this part below]\n", - "If we continute the derivation, we have \n", - "\n", - "$$\n", - "I(X; Y) = H(X) - H(X|Y)\n", + "If we derive it further, we have" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "$$\n", - "\n", - "Intuitively, $I(X; Y)$ describes the overvall uncertainty of $X$ when $Y$ is known to us or vice versa. Please see [note: Appendix X][continue] for the derivation.\n", - "\n", - "[continue]: ..." + "\\begin{align*}\n", + "I(X; Y) &= \\sum_{x \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)} \\\\\n", + "&= \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\big( \\log p(x, y) - \\log p(x) - \\log p(y) \\big) \\\\\n", + "&= - H(X, Y) - \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(x) - \\sum_{x \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(y) \\\\\n", + "&= H(X) + H(Y) - H(X, Y)\n", + "\\end{align*}\n", + "$$" ] }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 16, "metadata": {}, "outputs": [ { @@ -696,7 +673,7 @@ "0.037171387438320824" ] }, - "execution_count": 36, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } @@ -787,7 +764,7 @@ "source": [ "## Co-Information\n", "\n", - "**Definition:** Previouly known as **interaction information** [note: cite], co-Information $CoI(X;Y;Z)$ is defined by\n", + "**Definition:** Previouly known as **interaction information** [(McGill W. (1994)][mcgill], co-Information $CoI(X;Y;Z)$ is defined by\n", "\n", "$$\n", "CoI(X;Y;Z) = I(X;Y) - I(X;Y|Z).\n", @@ -797,7 +774,9 @@ "\n", "$$\n", "CoI(X;Y;Z) = SI(X; Y, Z) - CI(X; Y, Z).\n", - "$$\n" + "$$\n", + "\n", + "[mcgill]: https://ieeexplore.ieee.org/abstract/document/1057469\n" ] }, { @@ -860,9 +839,7 @@ "\n", "$$\n", "\\Delta_p = \\{ Q \\in \\mathbb{P}_{ \\mathcal{X} \\times \\mathcal{Y} \\times \\mathcal{Z} }: Q(X, Y) = P(X, Y) \\text{ and } Q(X, Z) = P(X, Z) \\}\n", - "$$\n", - "\n", - "[todo: add figure to illustrate this optimization procedure, maybe in appendix]" + "$$" ] }, { @@ -890,7 +867,7 @@ "
Class:Distribution
Alphabet:(('<=50K', '>50K'), ('Female', 'Male'), ('Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other', 'White'))
Base:linear
Outcome Class:tuple
Outcome Lenght:3
SGRp(x)
<=50KFemaleAmer-Indian-Eskimo0.008247547005324936
<=50KFemaleAsian-Pac-Islander0.005815377888949708
<=50KFemaleBlack0.07930424583381772
<=50KFemaleOther0.007555050560623192
<=50KFemaleWhite0.1936633251347222
<=50KMaleAmer-Indian-Eskimo0.0001981395952810085
<=50KMaleAsian-Pac-Islander0.01761756333138691
<=50KMaleBlack0.00475336950884044
<=50KMaleWhite0.44203582369502953
>50KFemaleAmer-Indian-Eskimo0.001018361885656181
>50KFemaleAsian-Pac-Islander0.0007180555469803716
>50KFemaleBlack0.009792116811899728
>50KFemaleOther0.0007677896847523185
>50KFemaleWhite0.023912637763622506
>50KMaleAmer-Indian-Eskimo8.725526069439487e-05
>50KMaleAsian-Pac-Islander0.007758342599998848
>50KMaleBlack0.0020932675154360938
>50KMaleWhite0.19466173037698395
" ], "text/plain": [ - "" + "" ] }, "execution_count": 21, @@ -1136,7 +1113,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "[note: writing some summary here]" + "From the figure above, we can see that `sex(Y)` contains substantial amount of information about `income(S)`, while `race(Z)` does not. Futhermore, the two attributes `sex(Y)` and `race(Z)` also share some information about `income(S)` but not in a complementary manner." ] }, { @@ -1150,7 +1127,6 @@ "- education and sex\n", "- education and race\n", "- race and occupation\n", - "- **education and occupation [note: didn't converged]**\n", "- age-group and sex\n", "- hours-per-week-group and occupation.\n", "\n", @@ -1249,13 +1225,13 @@ "metadata": {}, "outputs": [], "source": [ - "# didn't converge, can we remove it?\n", + "# didn't converge, should we remove it?\n", "# decomp_S_EO = compute_decomposition_from(dist_census, 'S', ['E', 'O'])" ] }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 35, "metadata": {}, "outputs": [ { @@ -1285,7 +1261,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "[note: interpret the results]" + "When looking at `education` and `sex`, we can see that `education` uniquely contains considerable amount of information about `income` with respect to `sex`. The proportion is also large than what they convey about `income` in both shared and complementary aspects. Notably, `education` have large unique information about `income` when considering with `race`, and `occupation` also has a similar relative portion with respect to `race`.\n", + "\n", + "This way of decomposition provides us a new aspect for analyzing the interactions between variables (predictors and responses) at a granular level." ] } ], From 034b5b532e54bb10340b63e6c4e411ee40ff36d7 Mon Sep 17 00:00:00 2001 From: heytitle Date: Fri, 31 Jan 2020 13:10:51 +0100 Subject: [PATCH 3/7] add author details --- python/notebooks/us-1994-census.ipynb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/python/notebooks/us-1994-census.ipynb b/python/notebooks/us-1994-census.ipynb index 6d4accb..3e8870a 100644 --- a/python/notebooks/us-1994-census.ipynb +++ b/python/notebooks/us-1994-census.ipynb @@ -24,7 +24,9 @@ "Information and its theory are an important quantity that govern many fields, rangning from communication to machine learning. In this tutorial, we are going to demonstate a novey approach of decomposing information into parts. Each part describes a certain aspect of the relationship of the variables.\n", "\n", "\n", - "We start the turotial with dataset description and some preprocessing steps ([Section 2](#Data-Preparation)). Then, we define some auxiliary functions for visualizing results. Then, we talk about the foundation of information theory ([Section 3](#Basic-Information-Theory)) and Information Decomposition ([Section 4](#Information-Decomposition)). Thorough this tutorial, we provide code that implement or compute quantities of current interests." + "We start the turotial with dataset description and some preprocessing steps ([Section 2](#Data-Preparation)). Then, we define some auxiliary functions for visualizing results. Then, we talk about the foundation of information theory ([Section 3](#Basic-Information-Theory)) and Information Decomposition ([Section 4](#Information-Decomposition)). Thorough this tutorial, we provide code that implement or compute quantities of current interests.\n", + "\n", + "This article is written by [Pattarawat Chormai](https://pat.chormai.org) and licensed under [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)." ] }, { From 7b7cbe1191b5cc34a61acffe34530d4ef0bd2c5c Mon Sep 17 00:00:00 2001 From: Johannes Rauh Date: Fri, 31 Jan 2020 22:33:29 +0100 Subject: [PATCH 4/7] Fix some typos --- python/notebooks/us-1994-census.ipynb | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/python/notebooks/us-1994-census.ipynb b/python/notebooks/us-1994-census.ipynb index 3e8870a..175b649 100644 --- a/python/notebooks/us-1994-census.ipynb +++ b/python/notebooks/us-1994-census.ipynb @@ -21,8 +21,7 @@ "source": [ "# Introduction\n", "\n", - "Information and its theory are an important quantity that govern many fields, rangning from communication to machine learning. In this tutorial, we are going to demonstate a novey approach of decomposing information into parts. Each part describes a certain aspect of the relationship of the variables.\n", - "\n", + "Information and its theory are an important quantity that govern many fields, ranging from communication to machine learning. In this tutorial, we are going to demonstate a novel approach of decomposing information into parts. Each part describes a certain aspect of the relationship of the variables.\n", "\n", "We start the turotial with dataset description and some preprocessing steps ([Section 2](#Data-Preparation)). Then, we define some auxiliary functions for visualizing results. Then, we talk about the foundation of information theory ([Section 3](#Basic-Information-Theory)) and Information Decomposition ([Section 4](#Information-Decomposition)). Thorough this tutorial, we provide code that implement or compute quantities of current interests.\n", "\n", @@ -35,7 +34,7 @@ "source": [ "# Data Preparation\n", "\n", - "In this tutorial, we use **[US Census 1994][us-census]**, a publily available dataset, to demonstate the content of this turorial. The dataset contains individual's attributes, such as race, age, gender, and the level of income ( <= X, > X ). Hence, ML learners use it to train a classifier for predicting the level of income based on other attributes.\n", + "In this tutorial, we use **[US Census 1994][us-census]**, a publicly available dataset, to demonstate the content of this turorial. The dataset contains individual's attributes, such as race, age, gender, and the level of income ( $\\le$50 k, > 50 k). Hence, ML learners use it to train a classifier for predicting the level of income based on other attributes.\n", "\n", "\n", "Here, we are interested in explaing the relationship between these attributes and the income variable; therefore, we only use the training set.\n", @@ -285,9 +284,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As you can see from the above, some attributes (columns) contains a space in the begining. Although these spaces do not\n", - "affect our computation, it is still good to clean it up. \n", - "\n", + "As you can see from the above, some attributes (columns) contains a space in the beginning. Although these spaces do not\n", + "affect our computation, it is still good to clean it up.\n", "This can been done by finding string columns (stored as `object`) and use Python's `strip` function to remove these prefix spaces." ] }, @@ -766,13 +764,13 @@ "source": [ "## Co-Information\n", "\n", - "**Definition:** Previouly known as **interaction information** [(McGill W. (1994)][mcgill], co-Information $CoI(X;Y;Z)$ is defined by\n", + "**Definition:** Also known as **interaction information** [(McGill W. (1994)][mcgill], co-information $CoI(X;Y;Z)$ is defined by\n", "\n", "$$\n", "CoI(X;Y;Z) = I(X;Y) - I(X;Y|Z).\n", "$$\n", "\n", - "With the chain rule of mutual information, we can write the identity above as\n", + "With the definition of information decomposition, we can write the identity above as\n", "\n", "$$\n", "CoI(X;Y;Z) = SI(X; Y, Z) - CI(X; Y, Z).\n", From 13abdbd47f1bfce47caad2cc7b2c41d7445dc94d Mon Sep 17 00:00:00 2001 From: heytitle Date: Sat, 8 Feb 2020 12:36:05 +0100 Subject: [PATCH 5/7] incorporate pradeep's fixes --- python/notebooks/us-1994-census.ipynb | 92 +++++++-------------------- 1 file changed, 24 insertions(+), 68 deletions(-) diff --git a/python/notebooks/us-1994-census.ipynb b/python/notebooks/us-1994-census.ipynb index 175b649..9c312d2 100644 --- a/python/notebooks/us-1994-census.ipynb +++ b/python/notebooks/us-1994-census.ipynb @@ -21,25 +21,19 @@ "source": [ "# Introduction\n", "\n", - "Information and its theory are an important quantity that govern many fields, ranging from communication to machine learning. In this tutorial, we are going to demonstate a novel approach of decomposing information into parts. Each part describes a certain aspect of the relationship of the variables.\n", + "This tutorial shows an application of the BROJA information decomposition on the **[US Census 1994][us-census]** income data set. The task is to relate a list of attributes or predictors with a binary target variable. The attributes include: sex, age, race, education level, occupation, hours-per-week, etc. The target is the yearly income, with values $>50$K and $\\leq50$K. \n", "\n", - "We start the turotial with dataset description and some preprocessing steps ([Section 2](#Data-Preparation)). Then, we define some auxiliary functions for visualizing results. Then, we talk about the foundation of information theory ([Section 3](#Basic-Information-Theory)) and Information Decomposition ([Section 4](#Information-Decomposition)). Thorough this tutorial, we provide code that implement or compute quantities of current interests.\n", "\n", - "This article is written by [Pattarawat Chormai](https://pat.chormai.org) and licensed under [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)." + "This tutorial is written by [Pattarawat Chormai](https://pat.chormai.org) and licensed under [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).\n", + "\n", + "[us-census]: https://archive.ics.uci.edu/ml/datasets/adult" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Data Preparation\n", - "\n", - "In this tutorial, we use **[US Census 1994][us-census]**, a publicly available dataset, to demonstate the content of this turorial. The dataset contains individual's attributes, such as race, age, gender, and the level of income ( $\\le$50 k, > 50 k). Hence, ML learners use it to train a classifier for predicting the level of income based on other attributes.\n", - "\n", - "\n", - "Here, we are interested in explaing the relationship between these attributes and the income variable; therefore, we only use the training set.\n", - "\n", - "[us-census]: https://archive.ics.uci.edu/ml/datasets/adult" + "# Data Preparation" ] }, { @@ -285,8 +279,7 @@ "metadata": {}, "source": [ "As you can see from the above, some attributes (columns) contains a space in the beginning. Although these spaces do not\n", - "affect our computation, it is still good to clean it up.\n", - "This can been done by finding string columns (stored as `object`) and use Python's `strip` function to remove these prefix spaces." + "affect our computation, it is still good to clean it up." ] }, { @@ -306,7 +299,7 @@ "source": [ "## Discretization\n", "\n", - "As you can see from the data exploration part, `age` and `hours-per-week` are continuous. This might be useful for some cases to treat them as they are. In this tutorial, we are interested in only groups of these values. Therefore, we need first need to perform discretization on this values. More precisely,\n", + "Some attributes such as age and hours-per-week are continuous. We discretize these attributes as follows: \n", "\n", "- We categorize `age` into four groups: [ '<24', '24-35', '36-50', '>50' ], and\n", "- We group `hours_per_week_group` into two groups: ['<=40', '>40']" @@ -329,11 +322,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Data Distribution\n", + "## Distribution\n", "\n", - "Once we have finished cleansing and discretizing the data, we are now ready to instantiate a [dit][dit]'s `Distribution` variable. This variable comes with necessary methods for dealing probabilistic operators, such as marginalization and conditioning.\n", - "\n", - "[dit]: https://github.com/dit/dit" + "The code expects a joint distribution over predictors and target to be passed in dit format." ] }, { @@ -465,22 +456,17 @@ "source": [ "# Basic Information Theory\n", "\n", - "[note: revise this paragraph, make it relevant to what we're trying to do]\n", - "\n", - "Information theory is a foundation of many fields and technologies. The theory provides rigourous methods that enable us to develop reliable ways of communication between senders and receivers via **noisy** channels. It is also a lens that helps us analyzing relationships between variables in pricipled ways. \n", + "We review some basic definitions from information theory. \n", "\n", - "Some important quantities in Information Theory are:\n", "\n", "## Entropy\n", "Let $X$ be a discreate random variable. Random variable $X$ takes values from $\\mathcal{X}$ and has probability mass function $p(x) = P\\{X=x\\}, x \\in \\mathcal{X} $.\n", "\n", - "**Definition:** Entropy $H(X)$ of a discrete random variable $X$ is defined by\n", + "**Definition:** Entropy $H(X)$ of a discrete random variable $X$ is defined as\n", "\n", "$$\n", "H(X) = - \\sum_{x \\in \\mathcal{X} } p(x) \\log_{2} p(x).\n", - "$$\n", - "\n", - "Because of $\\log_2$ in the equation, it is measured in terms of *bits*, and we omit writing the base of the log from now onwards." + "$$" ] }, { @@ -513,10 +499,10 @@ "\n", "Let $(X, Y)$ be a pair of discrete random variable with a joint distribution $p(x,y)$, $x \\in \\mathcal{X}$ and $y \\in \\mathcal{Y}$.\n", "\n", - "**Definition:** Joint Entropy $H(X,Y)$ is definied by\n", + "**Definition:** Joint Entropy $H(X,Y)$ is definied as\n", "\n", "$$\n", - "H(X, Y) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(x, y)\n", + "H(X, Y) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(x, y).\n", "$$" ] }, @@ -546,10 +532,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Definition:** Conditional Entropy $H(Y|X)$ is defined by \n", + "**Definition:** Conditional Entropy $H(Y|X)$ is defined as \n", "\n", "$$\n", - "H(Y|X) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(y|x)\n", + "H(Y|X) = - \\sum_{x \\in \\mathcal{X}} \\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(y|x).\n", "$$" ] }, @@ -574,23 +560,6 @@ "dit.shannon.conditional_entropy(dist_census, 'S', 'G') " ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Theorem:** Chain Rule of Joint Entropy $H(X, Y)$\n", - "\n", - "$$\n", - "H(X, Y) = H(X) + H(Y|X)\n", - "$$\n", - "\n", - "Proof can be found at [Cover, Thomas M and Thomas, Joy A's Elements of Information Theory, Theorem 2.2.1][element].\n", - "\n", - "Below, we verify the theorem computationally.\n", - "\n", - "[element]: https://www.wiley.com/en-it/Elements+of+Information+Theory,+2nd+Edition-p-9780471241959" - ] - }, { "cell_type": "code", "execution_count": 15, @@ -620,10 +589,10 @@ "\n", "Let $p$ and $q$ be two distributions with probability mass function $p(x)$ and $q(x)$ accordingly. Relative entropy is a distance between $p$ and $q$.\n", "\n", - "**Definition:** Relative entropy between two distributions $p$ and $q$ $D(p\\|q)$ is defined by \n", + "**Definition:** Relative entropy between two distributions $p$ and $q$ $D(p\\|q)$ is defined as \n", "\n", "$$\n", - "D(p\\|q) = \\sum_{x \\in \\mathcal{X}} p(x) \\log \\frac{p(x)}{q(x)}\n", + "D(p\\|q) = \\sum_{x \\in \\mathcal{X}} p(x) \\log \\frac{p(x)}{q(x)}.\n", "$$\n", "\n", "Another name of relative entropy is **Kullback-Leibler** divergence. Important properties of $D(p\\|q)$ are:\n", @@ -641,23 +610,10 @@ "$$\n", "\\begin{align*}\n", "I(X; Y) &= D \\big(p(x, y) \\| p(x)p(y) \\big) \\\\\n", - "&= \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)}\n", - "\\end{align*}\n", - "$$\n", - "\n", - "If we derive it further, we have" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "$$\n", - "\\begin{align*}\n", - "I(X; Y) &= \\sum_{x \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)} \\\\\n", + "&= \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log \\frac{p(x, y)}{p(x)p(y)} \\\\\n", "&= \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\big( \\log p(x, y) - \\log p(x) - \\log p(y) \\big) \\\\\n", "&= - H(X, Y) - \\sum_{x \\in \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(x) - \\sum_{x \\mathcal{X}}\\sum_{y \\in \\mathcal{Y}} p(x, y) \\log p(y) \\\\\n", - "&= H(X) + H(Y) - H(X, Y)\n", + "&= H(X) + H(Y) - H(X, Y).\n", "\\end{align*}\n", "$$" ] @@ -688,7 +644,7 @@ "source": [ "### Contitional Mutual Information\n", "\n", - "**Definition:** For three discrete random variables $X, Y, Z$, the conditional mutual information $I(X; Y |Z)$ is defined by\n", + "**Definition:** For three discrete random variables $X, Y, Z$, the conditional mutual information $I(X; Y |Z)$ is defined as\n", "\n", "$$\n", "I(X; Y |Z) = H(X|Z) - H(X|Y,Z).\n", @@ -1269,9 +1225,9 @@ ], "metadata": { "kernelspec": { - "display_name": "broja", + "display_name": "vdb-new", "language": "python", - "name": "broja" + "name": "vdb-new" }, "language_info": { "codemirror_mode": { @@ -1283,7 +1239,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.7.6" }, "toc": { "nav_menu": {}, From 75509078cde8cb63518156cac7c86fcbd1c158ff Mon Sep 17 00:00:00 2001 From: heytitle Date: Sat, 8 Feb 2020 12:38:54 +0100 Subject: [PATCH 6/7] fix typo --- python/notebooks/us-1994-census.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/notebooks/us-1994-census.ipynb b/python/notebooks/us-1994-census.ipynb index 9c312d2..a16ebcf 100644 --- a/python/notebooks/us-1994-census.ipynb +++ b/python/notebooks/us-1994-census.ipynb @@ -720,7 +720,7 @@ "source": [ "## Co-Information\n", "\n", - "**Definition:** Also known as **interaction information** [(McGill W. (1994)][mcgill], co-information $CoI(X;Y;Z)$ is defined by\n", + "**Definition:** Also known as **interaction information** [(McGill W. (1994)][mcgill], co-information $CoI(X;Y;Z)$ is defined as\n", "\n", "$$\n", "CoI(X;Y;Z) = I(X;Y) - I(X;Y|Z).\n", From 319551def697cbd772130d3f7b1c47a35911305d Mon Sep 17 00:00:00 2001 From: heytitle Date: Sat, 8 Feb 2020 22:36:14 +0100 Subject: [PATCH 7/7] incorporate fixes for other sections --- python/notebooks/us-1994-census.ipynb | 178 ++++++++++++-------------- 1 file changed, 83 insertions(+), 95 deletions(-) diff --git a/python/notebooks/us-1994-census.ipynb b/python/notebooks/us-1994-census.ipynb index a16ebcf..52adc05 100644 --- a/python/notebooks/us-1994-census.ipynb +++ b/python/notebooks/us-1994-census.ipynb @@ -125,7 +125,7 @@ " \n", " \n", " \n", - " 0\n", + " 0\n", " 39\n", " State-gov\n", " 77516\n", @@ -143,7 +143,7 @@ " <=50K\n", " \n", " \n", - " 1\n", + " 1\n", " 50\n", " Self-emp-not-inc\n", " 83311\n", @@ -161,7 +161,7 @@ " <=50K\n", " \n", " \n", - " 2\n", + " 2\n", " 38\n", " Private\n", " 215646\n", @@ -179,7 +179,7 @@ " <=50K\n", " \n", " \n", - " 3\n", + " 3\n", " 53\n", " Private\n", " 234721\n", @@ -197,7 +197,7 @@ " <=50K\n", " \n", " \n", - " 4\n", + " 4\n", " 28\n", " Private\n", " 338409\n", @@ -346,7 +346,7 @@ "\n", "# aliases\n", "rvs_names = [\n", - " 'S', # income\n", + " 'X', # income\n", " 'E', # education\n", " 'G', # sex\n", " 'R', # race\n", @@ -385,7 +385,7 @@ "
Class:Distribution
Alphabet:('Female', 'Male') for all rvs
Base:linear
Outcome Class:tuple
Outcome Lenght:1
Gp(x)
Female0.33079450876815825
Male0.6692054912318418
" ], "text/plain": [ - "" + "" ] }, "execution_count": 10, @@ -416,7 +416,7 @@ "Base: linear\n", "Outcome Class: tuple\n", "Outcome Length: 1\n", - "RV Names: ('S',)\n", + "RV Names: ('X',)\n", "\n", "x p(x)\n", "('<=50K',) 0.8905394113824158\n", @@ -430,7 +430,7 @@ "Base: linear\n", "Outcome Class: tuple\n", "Outcome Length: 1\n", - "RV Names: ('S',)\n", + "RV Names: ('X',)\n", "\n", "x p(x)\n", "('<=50K',) 0.6942634235888022\n", @@ -440,7 +440,7 @@ ], "source": [ "# conditional probablity P(S|G).\n", - "marginal, cdists = dist_census.condition_on('G', rvs='S')\n", + "marginal, cdists = dist_census.condition_on('G', rvs='X')\n", "\n", "for i, (c, d) in enumerate(zip(cdists, marginal.zipped())):\n", " print(\"\")\n", @@ -523,9 +523,9 @@ } ], "source": [ - "# H(S, G)\n", - "p_SG = dist_census.marginal('SG')\n", - "dit.shannon.entropy(p_SG)" + "# H(X, G)\n", + "p_XG = dist_census.marginal('XG')\n", + "dit.shannon.entropy(p_XG)" ] }, { @@ -556,8 +556,8 @@ } ], "source": [ - "# H(S|G)\n", - "dit.shannon.conditional_entropy(dist_census, 'S', 'G') " + "# H(X|G)\n", + "dit.shannon.conditional_entropy(dist_census, 'X', 'G') " ] }, { @@ -577,8 +577,8 @@ } ], "source": [ - "dit.shannon.entropy(p_SG) \\\n", - " == dit.shannon.entropy(p_G) + dit.shannon.conditional_entropy(dist_census, 'S', 'G') " + "dit.shannon.entropy(p_XG) \\\n", + " == dit.shannon.entropy(p_G) + dit.shannon.conditional_entropy(dist_census, 'X', 'G') " ] }, { @@ -635,7 +635,7 @@ } ], "source": [ - "dit.shannon.mutual_information(dist_census, 'S', 'G')" + "dit.shannon.mutual_information(dist_census, 'X', 'G')" ] }, { @@ -668,9 +668,9 @@ } ], "source": [ - "# I(S, H | O)\n", - "dit.shannon.conditional_entropy(dist_census, 'S', 'O') - \\\n", - " dit.shannon.conditional_entropy(dist_census, 'S', 'HO')" + "# I(X, H | O)\n", + "dit.shannon.conditional_entropy(dist_census, 'X', 'O') - \\\n", + " dit.shannon.conditional_entropy(dist_census, 'X', 'HO')" ] }, { @@ -679,10 +679,11 @@ "source": [ "# Information Decomposition\n", "\n", - "Let consider the setting that we have three random variables $X, Y, Z$. We are interested in knowing about $X$, but it is not observable. We can only observe the values of $Y$ and $Z$. \n", + "Suppose that we have three jointly distributed random variables, $X, Y, Z$. Suppose that we are interested in knowing $X$, but we can only observe $Y$ and / or $Z$.\n", + "\n", + "\n", "![](https://i.imgur.com/GpHQ6MW.png)\n", "\n", - "Utimately, we would like to quantify how much we know about $X$ based on the information of $Y$ and $Z$. More precisely, $I(X; (Y, Z))$ is the total information of $X$ that $(Y, Z)$ contains.\n", "\n", "\n", "[Bertschinger et al. (2014)][paper] proposes one approach to decompose $I(X; (Y, Z))$ into four quantities:\n", @@ -699,7 +700,6 @@ "- $CI(X; Y, Z)$ is complimentary (synergic) information that $Y$ and $Z$ have about $X$ when considering them together,\n", "- $UI(X; Y \\backslash Z)$ is unique information that only $Y$ has about $X$ (in respect to $Z$), and vice versa. \n", "\n", - "With the formulation above, these four quantities have to be non-negative.\n", "\n", "## Shared Information\n", "Furthermore, we have the following equilities:\n", @@ -720,17 +720,16 @@ "source": [ "## Co-Information\n", "\n", - "**Definition:** Also known as **interaction information** [(McGill W. (1994)][mcgill], co-information $CoI(X;Y;Z)$ is defined as\n", + "**Definition:** The coinformation is defined as [(McGill W. (1994)][mcgill]\n", "\n", "$$\n", - "CoI(X;Y;Z) = I(X;Y) - I(X;Y|Z).\n", + "\\begin{align*}\n", + "CoI(X;Y;Z) &= I(X;Y) - I(X;Y|Z) \\\\\n", + "&= SI(X; Y, Z) - CI(X; Y, Z)\n", + "\\end{align*}\n", + ".\n", "$$\n", "\n", - "With the definition of information decomposition, we can write the identity above as\n", - "\n", - "$$\n", - "CoI(X;Y;Z) = SI(X; Y, Z) - CI(X; Y, Z).\n", - "$$\n", "\n", "[mcgill]: https://ieeexplore.ieee.org/abstract/document/1057469\n" ] @@ -752,8 +751,8 @@ } ], "source": [ - "# compute CoI(S; E; G) using dit\n", - "dit.multivariate.coinformation(dist_census, 'SEG')" + "# compute CoI(X; E; G) using dit\n", + "dit.multivariate.coinformation(dist_census, 'XEG')" ] }, { @@ -773,8 +772,8 @@ } ], "source": [ - "# compute CoI(S; H; G) using dit\n", - "dit.multivariate.coinformation(dist_census, 'SHG')" + "# compute CoI(X; H; G) using dit\n", + "dit.multivariate.coinformation(dist_census, 'XHG')" ] }, { @@ -783,7 +782,8 @@ "source": [ "## Unique Information\n", "\n", - "From above, we have everything in place except $UI(\\cdot)$ that we haven't defined yet. Bertschinger et al. (2014) define the unique information as follows:\n", + "To compute the information decomposition, it suffices to specify either a measure for $SI$, for $CI$ or for $UI$.\n", + "Bertschinger et al. (2014) define the unique information as follows:\n", "\n", "$$\n", "UI(X; Y \\backslash Z) = \\min_{Q \\in \\Delta_p} I_Q(X; Y|Z), \n", @@ -804,12 +804,12 @@ "metadata": {}, "outputs": [], "source": [ - "# find q for S, G, R (income, race, gender)\n", - "q_SGR = admUI.computeQUI(distSXY = dist_census.marginal('SGR'))\n", + "# find q for X, G, R (income, race, gender)\n", + "q_XGR = admUI.computeQUI(distSXY = dist_census.marginal('XGR'))\n", "\n", "# due to the fact that computeQUI rename variables to SXY\n", - "# we need to rename them back to SRG\n", - "q_SGR.set_rv_names(\"SGR\")" + "# we need to rename them back to XGR\n", + "q_XGR.set_rv_names(\"XGR\")" ] }, { @@ -820,10 +820,10 @@ { "data": { "text/html": [ - "
Class:Distribution
Alphabet:(('<=50K', '>50K'), ('Female', 'Male'), ('Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other', 'White'))
Base:linear
Outcome Class:tuple
Outcome Lenght:3
SGRp(x)
<=50KFemaleAmer-Indian-Eskimo0.008247547005324936
<=50KFemaleAsian-Pac-Islander0.005815377888949708
<=50KFemaleBlack0.07930424583381772
<=50KFemaleOther0.007555050560623192
<=50KFemaleWhite0.1936633251347222
<=50KMaleAmer-Indian-Eskimo0.0001981395952810085
<=50KMaleAsian-Pac-Islander0.01761756333138691
<=50KMaleBlack0.00475336950884044
<=50KMaleWhite0.44203582369502953
>50KFemaleAmer-Indian-Eskimo0.001018361885656181
>50KFemaleAsian-Pac-Islander0.0007180555469803716
>50KFemaleBlack0.009792116811899728
>50KFemaleOther0.0007677896847523185
>50KFemaleWhite0.023912637763622506
>50KMaleAmer-Indian-Eskimo8.725526069439487e-05
>50KMaleAsian-Pac-Islander0.007758342599998848
>50KMaleBlack0.0020932675154360938
>50KMaleWhite0.19466173037698395
" + "
Class:Distribution
Alphabet:(('<=50K', '>50K'), ('Female', 'Male'), ('Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other', 'White'))
Base:linear
Outcome Class:tuple
Outcome Lenght:3
XGRp(x)
<=50KFemaleAmer-Indian-Eskimo0.008247547005324936
<=50KFemaleAsian-Pac-Islander0.005815377888949708
<=50KFemaleBlack0.07930424583381772
<=50KFemaleOther0.007555050560623192
<=50KFemaleWhite0.1936633251347222
<=50KMaleAmer-Indian-Eskimo0.0001981395952810085
<=50KMaleAsian-Pac-Islander0.01761756333138691
<=50KMaleBlack0.00475336950884044
<=50KMaleWhite0.44203582369502953
>50KFemaleAmer-Indian-Eskimo0.001018361885656181
>50KFemaleAsian-Pac-Islander0.0007180555469803716
>50KFemaleBlack0.009792116811899728
>50KFemaleOther0.0007677896847523185
>50KFemaleWhite0.023912637763622506
>50KMaleAmer-Indian-Eskimo8.725526069439487e-05
>50KMaleAsian-Pac-Islander0.007758342599998848
>50KMaleBlack0.0020932675154360938
>50KMaleWhite0.19466173037698395
" ], "text/plain": [ - "" + "" ] }, "execution_count": 21, @@ -832,7 +832,7 @@ } ], "source": [ - "q_SGR" + "q_XGR" ] }, { @@ -852,13 +852,13 @@ } ], "source": [ - "# H(S|G,R)\n", - "h_SgGR = dit.shannon.conditional_entropy(q_SGR, 'S', 'GR')\n", + "# H(X|G,R)\n", + "h_XgGR = dit.shannon.conditional_entropy(q_XGR, 'X', 'GR')\n", "\n", - "# UI(S; G \\ R) = I_Q { S; G \\ R }\n", - "ui_SG_R = dit.shannon.conditional_entropy(q_SGR, 'S', 'R') - h_SgGR\n", + "# UI(X; G \\ R) = I_Q { X; G \\ R }\n", + "ui_XG_R = dit.shannon.conditional_entropy(q_XGR, 'X', 'R') - h_XgGR\n", "\n", - "ui_SG_R" + "ui_XG_R" ] }, { @@ -878,10 +878,10 @@ } ], "source": [ - "# UI(S; R \\ G) = I_Q { S; R \\ G }\n", - "ui_SR_G = dit.shannon.conditional_entropy(q_SGR, 'S', 'G') - h_SgGR\n", + "# UI(X; R \\ G) = I_Q { X; R \\ G }\n", + "ui_XR_G = dit.shannon.conditional_entropy(q_XGR, 'X', 'G') - h_XgGR\n", "\n", - "ui_SR_G " + "ui_XR_G " ] }, { @@ -902,11 +902,11 @@ ], "source": [ "# compute shared information\n", - "si_SGR = dit.shannon.mutual_information(q_SGR, 'S', 'G') - ui_SG_R \n", - "si_SRG = dit.shannon.mutual_information(q_SGR, 'S', 'R') - ui_SR_G\n", + "si_XGR = dit.shannon.mutual_information(q_XGR, 'X', 'G') - ui_XG_R \n", + "si_XRG = dit.shannon.mutual_information(q_XGR, 'X', 'R') - ui_XR_G\n", "\n", "# sanity check: by the definition of shared information\n", - "si_SGR == si_SRG" + "si_XGR == si_XRG" ] }, { @@ -927,8 +927,8 @@ ], "source": [ "# by the definition of co-information\n", - "ci_SGR = si_SRG - dit.multivariate.coinformation(q_SGR, 'SGR')\n", - "ci_SGR" + "ci_XGR = si_XRG - dit.multivariate.coinformation(q_XGR, 'XGR')\n", + "ci_XGR" ] }, { @@ -949,14 +949,14 @@ ], "source": [ "# sanity check: by the definition of Bertschinger et al. (2014)'s information decomposition\n", - "dit.shannon.mutual_information(q_SGR, 'S', 'GR') == ui_SG_R + ui_SR_G + si_SRG + ci_SGR " + "dit.shannon.mutual_information(q_XGR, 'X', 'GR') == ui_XG_R + ui_XR_G + si_XRG + ci_XGR " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Result Visualization" + "# The information decomposition for the US census dataset" ] }, { @@ -972,8 +972,8 @@ "latex_labels = [\n", " '$SI$',\n", " '$CI$',\n", - " '$UI(S; Y \\\\backslash Z)$',\n", - " '$UI(S; Z \\\\backslash Y)$'\n", + " '$UI(X; Y \\\\backslash Z)$',\n", + " '$UI(X; Z \\\\backslash Y)$'\n", "]\n", "\n", "# define colour for ploting\n", @@ -992,7 +992,7 @@ "outputs": [ { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ "
" ] @@ -1055,11 +1055,11 @@ " {\n", " \"variables\": [\"sex\", \"race\"],\n", " \"metrics\": {\n", - " \"mi\": dit.shannon.mutual_information(q_SGR, 'S', 'RG'),\n", - " \"si\": si_SGR, \n", - " \"ci\": ci_SGR,\n", - " \"ui_0\": ui_SG_R,\n", - " \"ui_1\": ui_SR_G,\n", + " \"mi\": dit.shannon.mutual_information(q_XGR, 'X', 'RG'),\n", + " \"si\": si_XGR, \n", + " \"ci\": ci_XGR,\n", + " \"ui_0\": ui_XG_R,\n", + " \"ui_1\": ui_XR_G,\n", " }\n", " },\n", "])" @@ -1069,7 +1069,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "From the figure above, we can see that `sex(Y)` contains substantial amount of information about `income(S)`, while `race(Z)` does not. Futhermore, the two attributes `sex(Y)` and `race(Z)` also share some information about `income(S)` but not in a complementary manner." + "We see that race (Z) conveys no unique information about income (X) w.r.t sex (Y). " ] }, { @@ -1084,9 +1084,7 @@ "- education and race\n", "- race and occupation\n", "- age-group and sex\n", - "- hours-per-week-group and occupation.\n", - "\n", - "We first define a function that computes the decomposition for us." + "- hours-per-week-group and occupation." ] }, { @@ -1118,7 +1116,7 @@ " si_SXY = si_SXY_1\n", " \n", " ci_SXY = si_SXY - dit.multivariate.coinformation(P, rvs)\n", - " i_S_XY = dit.shannon.mutual_information(P, 'S', to) \n", + " i_S_XY = dit.shannon.mutual_information(P, 'X', to) \n", "\n", " # sanity check\n", " assert math.isclose(i_S_XY, si_SXY + ci_SXY + ui_SX_Y + ui_SY_X, abs_tol=1e-6), \\\n", @@ -1136,7 +1134,7 @@ " }\n", " }\n", "\n", - "decomp_S_HO = information_decomposition(dist_census, 'S', 'HO')" + "decomp_X_HO = information_decomposition(dist_census, 'X', 'HO')" ] }, { @@ -1145,7 +1143,7 @@ "metadata": {}, "outputs": [], "source": [ - "decomp_S_EG = information_decomposition(dist_census, 'S', 'EG')" + "decomp_X_EG = information_decomposition(dist_census, 'X', 'EG')" ] }, { @@ -1154,7 +1152,7 @@ "metadata": {}, "outputs": [], "source": [ - "decomp_S_ER = information_decomposition(dist_census, 'S', 'ER')" + "decomp_X_ER = information_decomposition(dist_census, 'X', 'ER')" ] }, { @@ -1163,7 +1161,7 @@ "metadata": {}, "outputs": [], "source": [ - "decomp_S_RO = information_decomposition(dist_census, 'S', 'RO')" + "decomp_X_RO = information_decomposition(dist_census, 'X', 'RO')" ] }, { @@ -1172,27 +1170,17 @@ "metadata": {}, "outputs": [], "source": [ - "decomp_S_AG = information_decomposition(dist_census, 'S', 'AG')" + "decomp_X_AG = information_decomposition(dist_census, 'X', 'AG')" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, - "outputs": [], - "source": [ - "# didn't converge, should we remove it?\n", - "# decomp_S_EO = compute_decomposition_from(dist_census, 'S', ['E', 'O'])" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, "outputs": [ { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ "
" ] @@ -1205,11 +1193,11 @@ ], "source": [ "plot([\n", - " decomp_S_EG,\n", - " decomp_S_ER,\n", - " decomp_S_RO,\n", - " decomp_S_AG,\n", - " decomp_S_HO,\n", + " decomp_X_EG,\n", + " decomp_X_ER,\n", + " decomp_X_RO,\n", + " decomp_X_AG,\n", + " decomp_X_HO,\n", "][::-1])" ] }, @@ -1217,9 +1205,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "When looking at `education` and `sex`, we can see that `education` uniquely contains considerable amount of information about `income` with respect to `sex`. The proportion is also large than what they convey about `income` in both shared and complementary aspects. Notably, `education` have large unique information about `income` when considering with `race`, and `occupation` also has a similar relative portion with respect to `race`.\n", + "We see that most of the information that race ($Y$) and occupation ($Z$) convey about income ($X$), is uniquely in the occupation. A more classical approach would be, of course, to test for the Markov relation $X-Z-Y$. This Markov relation almost holds, in the sense that $I(X;Y|Z)$ is small. The additional insight, due to the information decomposition, is that $I(X;Y|Z)$ is purely synergistic since there is no unique information that race conveys about income w.r.t. occupation. The phenomenon is even stronger for the pair (age, sex): The conditional mutual information is (relatively) larger, but still there is no unique information that sex conveys about income w.r.t. age. Education and sex have about equally large shared and synergistic components. Occupation conveys a large unique information about income both w.r.t. hours-per-week and education.\n", "\n", - "This way of decomposition provides us a new aspect for analyzing the interactions between variables (predictors and responses) at a granular level." + "These observations appear quite reasonable. They illustrate how the decomposition allows us to obtain a fine-grained quantitative analysis of the relationships between the predictors and the target. " ] } ], @@ -1255,7 +1243,7 @@ "width": "212px" }, "toc_section_display": "block", - "toc_window_display": true + "toc_window_display": false } }, "nbformat": 4,