Add Information Bottleneck #63

Ashvin-Ranjan · 2025-05-14T22:19:34Z

Changes

Add IB Optimization tools to ULTK
- Adds IBStructure to represent a situation (Meanings and Referents)
- Adds IBLanguage to represent a language (Mapping from Meanings to Expressions)
- Adds optimization methods from Tishby et al.
Add tests for IB Optimization
- Tests for ib_structure.py, ib_language.py, ib_optimization.py, ib_utils.py
Adds utils for conversion from ULTK classes to IB classes
Small fixes for Meaning and Grammar
- Added type annotations for Meaning.__init__
- Fixed bug where an error message in Grammar was not a proper format string

Notes

These new features do not add suboptimal sampling
- These can be added in the PR if wanted
These have been tested against the results from Zaslavsky et al.

… the github-actions bot

Added GitHub action to automatically push after linting changes. Merging and updating while Mickey's on vacation

commit to test black

Renamed altk to ultk in most files that aren't links to documentation

…y and util to rate_distortion

…n to rate_distortion

- Fixed bug where `log_mh_sample` would return `False` instead of just skipping return

- Subtract node counts intead of adding to get proper likelihood

- Fix issue where natural log of node counts was not taken - Add in two new parameters for weighting the values for `log_mh_accept` - Add relevant documentation

- Involved the fact that `mh_generate` would edit `old_tree`, causing `mh_sample` to be wrong - No changes to `mh_generate`, instead precalculates `old_tree_likelihood` from `expr` - Changes made both to `mh_sample` and `log_mh_sample`

- Address syntax feedback in grammar.py - Remove unneeded braces in the file - Add new types and rewrite functions in likelihood.py - Added Datum type and rewrote proper type signatures - Rewrote `noise_match` to use `aggregate_individual_likelihoods` - Add new tests for likelihood functions - This is to hopefully avoid mysterious errors in the future caused by broken likelihoods

- Created new classes for IB handling - Created IBStructure for system information - Created IBLanguage for individual languages - Added relevant functions for complexity and accuracy - Added utils file - Made QOL edits to Grammar and Universe - Most of the code, especially all of the matrix math, is only lightly tested

- Add expected KL divergence to IBLanguage - Fix math for I(W; u) in IBLanguage - Have confirmed that I(M; U) - I(W; U) == E[D[M; M']] - Added tests for IB things - Very rudementary, will add specific value checks later - All tests are passing - Reformatted various files

- First attempt at writing the optimization function - The function appears to be right, but is not producing desired results

- Optimization appears to work - There is a strange normalization step in there which should not be needed, but it breaks without it - TODO: Add error to require structure.mu has no 0s - This was one of the main issues which was stopping optimization earlier

- Allow dropping expressions during recalculation - Languages end up recalculating to the simplest language over time - This is in part seemingly an issue with the optimization algorithm - Add calculate optimal - Add random expression distribution generator

- When being create the transpose was actually being made - This causes the convergence to seem to work properly now - Normalization is still needed, which is really strange - Tests need to be updated - Structure should probably ensure that there are no 0s in mu

- Stop checking based on an incorrect metric - Verify definitions for complexity and accuracy - Move use of the mutual information function into ib_utils - Change language and structures from taking in Meanings and Dicts to ndarrays - There are new helper functions to convert from those into arrays - This saves on processing time - Remove test file - Avoid testing errors until after everything is fully implemented - TODO: Optimization function appears to be doing opposite of what is wanted, investigate

- Fixed issue where one normal was calculated for all meaingings - This fixes the issue where reconstructed qwm needed to be normalized - Added divergence_array to IBLanguage to reduce time cost - Optimizer still does not work, but one step closer

- Untested - Allows for multithreading

- Fix potential issue with change of base log - Program seems to optimize for the right hyperparameter - Issue is that it finds local minimums too fast - May be alleviated with log probability (?) - Move from ultk/language/ib to ultk/ib - There is little relation to the ultk/language package anymore

Add new features for Metropolis-Hastings sampler

- Confirm that functions work as expected - Add normalization to language reconstructed meanings and expression prior - This is to avoid floating point rounding error at small numbers - Add deterministic annealing function - TODO: Clean up names and add docstrings

- Add docstrings to all functions and classes - Rename structure.mu to structure.pum - This is to have it make more sense

- Added IB Tests file - Made fixes to various files

- Fix recalculate_language test - Add missing tests for `ib_language.py`

- Add docstrings for get_optimal_languages

mickeyshi-bah and others added 30 commits May 27, 2023 10:53

Modified the GitHub Actions to automatically run linter and commit as…

914606b

… the github-actions bot

Removed test comment from test_grammar, added bit to README

1e06c0d

Update after linting

7cd42e3

Adding build+pypi publish action

dcaa7fe

Merge branch 'main' of https://github.com/mickeyshi/altk-actions

955e1fd

Added pip install build

e61b068

updated version to hopefully avoid file already exists error

cb9873c

Updated version to new version

e009d83

updated version in pyproject.toml

b1c3161

fix repo url in black.yml

9fe78eb

only publish on release

3053f3c

Merge pull request CLMBRs#18 from mickeyshi/main

2755144

Added GitHub action to automatically push after linting changes. Merging and updating while Mickey's on vacation

full support for pyproject.toml

f5562aa

edit to test lint action

df40baa

auto format only on PR

6d77bba

commit to test black

f16643a

update black for non-detached head

57fcc2f

Automated black formatting

59c20db

Merge pull request CLMBRs#19 from CLMBRs/test-auto-format

f93cd02

commit to test black

Renamed ALTK to ULTK

e170370

Renamed altk to ultk in most files that aren't links to documentation

added some tests for Meaning

1ef48c8

Added equality checks for semantics, added hashing to Referent

1d1f6c0

important bug fix to information.expected_distortion, diagonalized prior

2b51cde

Added tests for language module

dd24070

Changed referents to better match world objects, rather than POS tags

666a62d

Made POS changes to tests

b34dfef

cruft unnecessary cruft from languages

ce78afc

Fixed some ultk pypi publish stuff, renamed information to probabilit…

6e71d3f

…y and util to rate_distortion

Added some language tests, renamed util to probability and informatio…

bd3aa2f

…n to rate_distortion

marked some bugs in information and agent relating to the IB listener

64a2dac

Ashvin-Ranjan and others added 25 commits February 26, 2025 19:06

Fix bug with log_mh_sample

93856f6

- Fixed bug where `log_mh_sample` would return `False` instead of just skipping return

Fix major issue when calculating log probabilities

cf96036

- Subtract node counts intead of adding to get proper likelihood

Fix more issues with acceptance calculation, add weighting

c4e0981

- Fix issue where natural log of node counts was not taken - Add in two new parameters for weighting the values for `log_mh_accept` - Add relevant documentation

Fix bug with both log_mh_sample and mh_sample

869fdbf

- Involved the fact that `mh_generate` would edit `old_tree`, causing `mh_sample` to be wrong - No changes to `mh_generate`, instead precalculates `old_tree_likelihood` from `expr` - Changes made both to `mh_sample` and `log_mh_sample`

Reformat test file

4452135

Start of IB Optimization

81cd169

First attempt at optimization

692c4af

- First attempt at writing the optimization function - The function appears to be right, but is not producing desired results

Actual optimization

6062b8f

- Optimization appears to work - There is a strange normalization step in there which should not be needed, but it breaks without it - TODO: Add error to require structure.mu has no 0s - This was one of the main issues which was stopping optimization earlier

Formatting

a223820

Fix major issue with optimizer

fc1c8c6

- Fixed issue where one normal was calculated for all meaingings - This fixes the issue where reconstructed qwm needed to be normalized - Added divergence_array to IBLanguage to reduce time cost - Optimizer still does not work, but one step closer

Add helper function for optimal language

608020d

- Untested - Allows for multithreading

Merge pull request CLMBRs#61 from Ashvin-Ranjan/main

1b297fa

Add new features for Metropolis-Hastings sampler

Merge branch 'CLMBRs:main' into ib_optimization

b54ee59

Add docstrings

05247c3

- Add docstrings to all functions and classes - Rename structure.mu to structure.pum - This is to have it make more sense

Add tests and fixes across the board

850e32e

- Added IB Tests file - Made fixes to various files

Fix tests

3121752

- Fix recalculate_language test - Add missing tests for `ib_language.py`

Update docstrings

2f589dd

- Add docstrings for get_optimal_languages

nathimel force-pushed the main branch 4 times, most recently from b33a0c7 to a17b214 Compare June 24, 2025 23:56

nathimel closed this Jun 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Information Bottleneck #63

Add Information Bottleneck #63

Uh oh!

Ashvin-Ranjan commented May 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Add Information Bottleneck #63

Add Information Bottleneck #63

Uh oh!

Conversation

Ashvin-Ranjan commented May 14, 2025

Changes

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants