Skip to content

Conversation

@Ashvin-Ranjan
Copy link
Contributor

@Ashvin-Ranjan Ashvin-Ranjan commented Feb 16, 2025

Changes

  • Adds log_mh_sample, which uses log probabilities to calculate the likelihood
    • Both Grammar.log_prior and Grammar.log_probability have been added
  • Adds 4 new functions and respective documentation
    • percent_match: The percentage of matching outputs between tree and expected
    • percent_match_unique: The same as percent_match but returns 0 if all items are the same
    • noise_match: From Piantadosi et al. Treats output as having noise applied and calculated probability that way
      • Written for log_mh_sample
    • aggregate_individual_likelihoods takes in a function which returns the probability of a single datum and returns a function which sums the log probabilities for all datums in the dataset
      • Written for log_mh_sample
  • Add tests for likelihood functions

Notes

  • Closes grammar.generate will occasionally hit recursion limit #59
  • Fixes bug with mh_sample where if the expr.meaning is not None then copy.deepcopy fails
  • Fixed bug with mh_sample where mh_generate would change old_tree, causing likelihood_func(data, old_tree) to be calculated incorrectly
    • old_tree_likelihood is now calculated beforehand using expr

- Adds 3 new functions and docstrings
  - `percent_match`: The percentage of matching outputs between tree and expected
  - `percent_match_unique`: The same as `percent_match` but returns 0 if all items are the same
  - `noise_match`: From Piantadosi et al. Treats output as having noise applied and calculated probability that way
@Ashvin-Ranjan Ashvin-Ranjan marked this pull request as draft February 24, 2025 20:32
- Adds relevant functions to GrammaticalExpression
- Changes `noise_match` to log probability
- Fixes bug with `Grammar.generate`
- Added `__deepcopy__` to FrozenDict
- Ran reformatter on all files since github actions does not work on my fork
@Ashvin-Ranjan Ashvin-Ranjan changed the title Add new probability functions for mh_sample Add new features for Metropolis-Hastings sampler Feb 24, 2025
- Adds `aggregate_individual_likelihoods`
- Adds relevant documentation for the function
- Ran formatter
@Ashvin-Ranjan Ashvin-Ranjan marked this pull request as ready for review February 25, 2025 22:27
- Fixed bug where `log_mh_sample` would return `False` instead of just skipping return
- Subtract node counts intead of adding to get proper likelihood
@Ashvin-Ranjan Ashvin-Ranjan marked this pull request as draft February 27, 2025 20:42
- Fix issue where natural log of node counts was not taken
- Add in two new parameters for weighting the values for `log_mh_accept`
  - Add relevant documentation
@Ashvin-Ranjan Ashvin-Ranjan marked this pull request as ready for review March 2, 2025 06:41
- Involved the fact that `mh_generate` would edit `old_tree`, causing `mh_sample` to be wrong
  - No changes to `mh_generate`, instead precalculates `old_tree_likelihood` from `expr`
  - Changes made both to `mh_sample` and `log_mh_sample`
Copy link
Collaborator

@shanest shanest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall! A few stylistic changes requested here; and would like to then sit down with the noise likelihood function and make sure it's doing what we want

if the_rule.rhs is None
else tuple([self.generate(child_lhs) for child_lhs in the_rule.rhs])
else tuple(
[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor detail, but don't need to use [ and ] here (which first constructs a list)

matches = sum([tree(datum[0]) == datum[1] for datum in data])
return (len(data) - matches) * (incorrect_chance) + matches * (correct_chance)

return noise_match_probability
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to re-write noise_match and all_or_nothing in terms of aggregate_individual_likelidhoods, since the fundamental thing in each is the likelihood of a single datum. This will help me also reason about noise_match.

Relatedly: maybe a type-def up top of
Datum[T] = tuple[Referent, T]
and then we can do Dataset = Iterable[Datum]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can rewrite noise_match in terms of aggregate_individual_likelihoods but I think all_or_nothing should stay as is because it's not a log probability function (it also may be good to add something which delineates which functions are log probability and which are regular probability somewhere outside of the docstring, but I am not sure what)

- Address syntax feedback in grammar.py
  - Remove unneeded braces in the file
- Add new types and rewrite functions in likelihood.py
  - Added Datum type and rewrote proper type signatures
  - Rewrote `noise_match` to use `aggregate_individual_likelihoods`
- Add new tests for likelihood functions
  - This is to hopefully avoid mysterious errors in the future caused by broken likelihoods
Copy link
Collaborator

@shanest shanest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me; thanks for all of this Ash!

@shanest shanest merged commit 1b297fa into CLMBRs:main May 13, 2025
1 check failed
nathimel pushed a commit that referenced this pull request Jun 27, 2025
Add new features for Metropolis-Hastings sampler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

grammar.generate will occasionally hit recursion limit

2 participants