-
Notifications
You must be signed in to change notification settings - Fork 6
Add new features for Metropolis-Hastings sampler #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Adds 3 new functions and docstrings - `percent_match`: The percentage of matching outputs between tree and expected - `percent_match_unique`: The same as `percent_match` but returns 0 if all items are the same - `noise_match`: From Piantadosi et al. Treats output as having noise applied and calculated probability that way
- Adds relevant functions to GrammaticalExpression - Changes `noise_match` to log probability - Fixes bug with `Grammar.generate`
- Added `__deepcopy__` to FrozenDict - Ran reformatter on all files since github actions does not work on my fork
mh_sample- Adds `aggregate_individual_likelihoods` - Adds relevant documentation for the function - Ran formatter
- Fixed bug where `log_mh_sample` would return `False` instead of just skipping return
- Subtract node counts intead of adding to get proper likelihood
- Fix issue where natural log of node counts was not taken - Add in two new parameters for weighting the values for `log_mh_accept` - Add relevant documentation
- Involved the fact that `mh_generate` would edit `old_tree`, causing `mh_sample` to be wrong - No changes to `mh_generate`, instead precalculates `old_tree_likelihood` from `expr` - Changes made both to `mh_sample` and `log_mh_sample`
shanest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall! A few stylistic changes requested here; and would like to then sit down with the noise likelihood function and make sure it's doing what we want
src/ultk/language/grammar/grammar.py
Outdated
| if the_rule.rhs is None | ||
| else tuple([self.generate(child_lhs) for child_lhs in the_rule.rhs]) | ||
| else tuple( | ||
| [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor detail, but don't need to use [ and ] here (which first constructs a list)
| matches = sum([tree(datum[0]) == datum[1] for datum in data]) | ||
| return (len(data) - matches) * (incorrect_chance) + matches * (correct_chance) | ||
|
|
||
| return noise_match_probability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to re-write noise_match and all_or_nothing in terms of aggregate_individual_likelidhoods, since the fundamental thing in each is the likelihood of a single datum. This will help me also reason about noise_match.
Relatedly: maybe a type-def up top of
Datum[T] = tuple[Referent, T]
and then we can do Dataset = Iterable[Datum]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can rewrite noise_match in terms of aggregate_individual_likelihoods but I think all_or_nothing should stay as is because it's not a log probability function (it also may be good to add something which delineates which functions are log probability and which are regular probability somewhere outside of the docstring, but I am not sure what)
- Address syntax feedback in grammar.py - Remove unneeded braces in the file - Add new types and rewrite functions in likelihood.py - Added Datum type and rewrote proper type signatures - Rewrote `noise_match` to use `aggregate_individual_likelihoods` - Add new tests for likelihood functions - This is to hopefully avoid mysterious errors in the future caused by broken likelihoods
shanest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me; thanks for all of this Ash!
Add new features for Metropolis-Hastings sampler
Changes
log_mh_sample, which uses log probabilities to calculate the likelihoodGrammar.log_priorandGrammar.log_probabilityhave been addedpercent_match: The percentage of matching outputs between tree and expectedpercent_match_unique: The same aspercent_matchbut returns 0 if all items are the samenoise_match: From Piantadosi et al. Treats output as having noise applied and calculated probability that waylog_mh_sampleaggregate_individual_likelihoodstakes in a function which returns the probability of a single datum and returns a function which sums the log probabilities for all datums in the datasetlog_mh_sampleNotes
grammar.generatewill occasionally hit recursion limit #59mh_samplewhere if theexpr.meaningis notNonethencopy.deepcopyfailsmh_samplewheremh_generatewould changeold_tree, causinglikelihood_func(data, old_tree)to be calculated incorrectlyold_tree_likelihoodis now calculated beforehand usingexpr