Add new features for Metropolis-Hastings sampler #61

Ashvin-Ranjan · 2025-02-16T01:56:18Z

Changes

Adds log_mh_sample, which uses log probabilities to calculate the likelihood
- Both Grammar.log_prior and Grammar.log_probability have been added
Adds 4 new functions and respective documentation
- percent_match: The percentage of matching outputs between tree and expected
- percent_match_unique: The same as percent_match but returns 0 if all items are the same
- noise_match: From Piantadosi et al. Treats output as having noise applied and calculated probability that way
  - Written for log_mh_sample
- aggregate_individual_likelihoods takes in a function which returns the probability of a single datum and returns a function which sums the log probabilities for all datums in the dataset
  - Written for log_mh_sample
Add tests for likelihood functions

Notes

Closes grammar.generate will occasionally hit recursion limit #59
Fixes bug with mh_sample where if the expr.meaning is not None then copy.deepcopy fails
Fixed bug with mh_sample where mh_generate would change old_tree, causing likelihood_func(data, old_tree) to be calculated incorrectly
- old_tree_likelihood is now calculated beforehand using expr

- Adds 3 new functions and docstrings - `percent_match`: The percentage of matching outputs between tree and expected - `percent_match_unique`: The same as `percent_match` but returns 0 if all items are the same - `noise_match`: From Piantadosi et al. Treats output as having noise applied and calculated probability that way

- Adds relevant functions to GrammaticalExpression - Changes `noise_match` to log probability - Fixes bug with `Grammar.generate`

- Added `__deepcopy__` to FrozenDict - Ran reformatter on all files since github actions does not work on my fork

- Adds `aggregate_individual_likelihoods` - Adds relevant documentation for the function - Ran formatter

- Fixed bug where `log_mh_sample` would return `False` instead of just skipping return

- Subtract node counts intead of adding to get proper likelihood

- Fix issue where natural log of node counts was not taken - Add in two new parameters for weighting the values for `log_mh_accept` - Add relevant documentation

- Involved the fact that `mh_generate` would edit `old_tree`, causing `mh_sample` to be wrong - No changes to `mh_generate`, instead precalculates `old_tree_likelihood` from `expr` - Changes made both to `mh_sample` and `log_mh_sample`

shanest

Looks great overall! A few stylistic changes requested here; and would like to then sit down with the noise likelihood function and make sure it's doing what we want

shanest · 2025-04-28T21:37:59Z

src/ultk/language/grammar/grammar.py

            if the_rule.rhs is None
-            else tuple([self.generate(child_lhs) for child_lhs in the_rule.rhs])
+            else tuple(
+                [


Very minor detail, but don't need to use [ and ] here (which first constructs a list)

src/ultk/language/grammar/likelihood.py

shanest · 2025-04-28T21:43:22Z

src/ultk/language/grammar/likelihood.py

+        matches = sum([tree(datum[0]) == datum[1] for datum in data])
+        return (len(data) - matches) * (incorrect_chance) + matches * (correct_chance)
+
+    return noise_match_probability


I think it would be good to re-write noise_match and all_or_nothing in terms of aggregate_individual_likelidhoods, since the fundamental thing in each is the likelihood of a single datum. This will help me also reason about noise_match.

Relatedly: maybe a type-def up top of
Datum[T] = tuple[Referent, T]
and then we can do Dataset = Iterable[Datum]

I think I can rewrite noise_match in terms of aggregate_individual_likelihoods but I think all_or_nothing should stay as is because it's not a log probability function (it also may be good to add something which delineates which functions are log probability and which are regular probability somewhere outside of the docstring, but I am not sure what)

- Address syntax feedback in grammar.py - Remove unneeded braces in the file - Add new types and rewrite functions in likelihood.py - Added Datum type and rewrote proper type signatures - Rewrote `noise_match` to use `aggregate_individual_likelihoods` - Add new tests for likelihood functions - This is to hopefully avoid mysterious errors in the future caused by broken likelihoods

shanest

Looks good to me; thanks for all of this Ash!

Add new features for Metropolis-Hastings sampler

Ashvin-Ranjan added 2 commits February 15, 2025 17:41

Run formatter and add more detail to documentation

b9ad2f7

Ashvin-Ranjan marked this pull request as draft February 24, 2025 20:32

Ashvin-Ranjan added 2 commits February 24, 2025 12:37

Add log probability version of mh_sample

3005613

- Adds relevant functions to GrammaticalExpression - Changes `noise_match` to log probability - Fixes bug with `Grammar.generate`

Fix bug with copy.deepcopy not working on FrozenDict

f17bdc4

- Added `__deepcopy__` to FrozenDict - Ran reformatter on all files since github actions does not work on my fork

Ashvin-Ranjan changed the title ~~Add new probability functions for mh_sample~~ Add new features for Metropolis-Hastings sampler Feb 24, 2025

Add new likelihood function for individual likelihood aggregation

bfacb77

- Adds `aggregate_individual_likelihoods` - Adds relevant documentation for the function - Ran formatter

Ashvin-Ranjan marked this pull request as ready for review February 25, 2025 22:27

Ashvin-Ranjan added 2 commits February 26, 2025 19:06

Fix bug with log_mh_sample

93856f6

- Fixed bug where `log_mh_sample` would return `False` instead of just skipping return

Fix major issue when calculating log probabilities

cf96036

- Subtract node counts intead of adding to get proper likelihood

Ashvin-Ranjan marked this pull request as draft February 27, 2025 20:42

Fix more issues with acceptance calculation, add weighting

c4e0981

- Fix issue where natural log of node counts was not taken - Add in two new parameters for weighting the values for `log_mh_accept` - Add relevant documentation

Ashvin-Ranjan marked this pull request as ready for review March 2, 2025 06:41

Fix bug with both log_mh_sample and mh_sample

869fdbf

- Involved the fact that `mh_generate` would edit `old_tree`, causing `mh_sample` to be wrong - No changes to `mh_generate`, instead precalculates `old_tree_likelihood` from `expr` - Changes made both to `mh_sample` and `log_mh_sample`

shanest requested changes Apr 28, 2025

View reviewed changes

Ashvin-Ranjan added 2 commits April 28, 2025 15:51

Reformat test file

4452135

shanest approved these changes May 13, 2025

View reviewed changes

shanest merged commit 1b297fa into CLMBRs:main May 13, 2025
1 check failed

nathimel pushed a commit that referenced this pull request Jun 27, 2025

Merge pull request #61 from Ashvin-Ranjan/main

24c3d38

Add new features for Metropolis-Hastings sampler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new features for Metropolis-Hastings sampler #61

Add new features for Metropolis-Hastings sampler #61

Uh oh!

Ashvin-Ranjan commented Feb 16, 2025 •

edited

Loading

Uh oh!

shanest left a comment

Uh oh!

shanest Apr 28, 2025

Uh oh!

Uh oh!

shanest Apr 28, 2025

Uh oh!

Ashvin-Ranjan Apr 28, 2025

Uh oh!

shanest left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add new features for Metropolis-Hastings sampler #61

Add new features for Metropolis-Hastings sampler #61

Uh oh!

Conversation

Ashvin-Ranjan commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Notes

Uh oh!

shanest left a comment

Choose a reason for hiding this comment

Uh oh!

shanest Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shanest Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

Ashvin-Ranjan Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

shanest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ashvin-Ranjan commented Feb 16, 2025 •

edited

Loading