Skip to content

WIP: Sampling programs from a given domain#18

Draft
theoxo wants to merge 6 commits intomainfrom
sample_progs
Draft

WIP: Sampling programs from a given domain#18
theoxo wants to merge 6 commits intomainfrom
sample_progs

Conversation

@theoxo
Copy link
Collaborator

@theoxo theoxo commented Jan 19, 2022

As discussed yesterday, we would probably benefit from having a way to synthesize test programs from a domain specification. This would allow us to test new features, and in the long term might even prove useful internally too (e.g. using synthesized programs to find library fns).

This is heavily WIP. TODO:

  • MVP which randomly generates some programs from the Simple domain
  • Handle multi-arg functions with nested apps (i.e. classical lambda calc style - dreamcoder requires this, right?) No need for this I suppose if apps aren't needed
  • Un-hardcode the domain
  • Un-hardcode terminals - I guess these should be provided by the domain writer too?
  • Sample using a PCFG rather than uniformly - PCFG also needs to be defined by the domain writer...
  • Fix hacks
  • Use getters etc instead of making fields public (I guess this is the Rust style? It's not like Kotlin where getters are redundant, right?)
  • Optimize the code (currently it can generate and print about 20k progs/second on my laptop - depending on the use case, we may need more). There's lots of trivial optimizations like moving stuff outside of the inner loop, but if we want to get serious about it we'd probably need to do some profiling.
  • Add some checks to discard sampled programs which do not comply with dreamcoder or the domain's expectations (or, even better, prevent such programs from being generated in the first place). For example: no redexes.
  • Address TODOs (Address TODOs (Address TODOs (Address TODOs ...
  • Package everything up into a nice API so that the program sampler can be used within the compressor, for example.

This is still heavily WIP. As of this commit,
lots of things are hardcoded (including the domain)
and/or done in a hacky way. The sampled programs
are not checked for dreamcoder compatibility, either.
another hack bites the dust
The program sampler is now much more modular w.r.t. which domain to use,
supports arbitrary unigram PCFGs, does not hard-code terminals, etc.

This update also removes some needless "pub" modifiers for fields,
instead using getter functions.
@theoxo
Copy link
Collaborator Author

theoxo commented Jan 20, 2022

@mlb2251 in 21b94d1 I have extended the Domain trait a bit to make the implementation of the sampler domain-agnostic. An alternative could be to make a new trait, say SampleableDomain, which inherits from Domain and which adds these features. That way, people can still write interpreters/executors for their domains without having to write code that interfaces with the program sampler (if they don't intend to use it). However, the overhead is really quite small, and maybe sampling will become an integral part of the dreamegg workflow anyway. What do you think? It's a minor code change either way so probs not something that needs to be addressed now, but might be good to keep it in mind going forward.

@mlb2251
Copy link
Owner

mlb2251 commented Jan 21, 2022

I think I agree with keeping it in Domain for now. If some part required a lot of work to implement then yeah we'd need this.

And I bet with some refactoring later the same implementation of non_terminal_tokens_with_arities, terminal_tokens, pcfg etc can be reused by everyone and just live in the Domain definition instead of needing to separately implement it for each domain. I'm hoping nearly everything can be automatically derived from our define_semantics! macro at some point, if that's able to just expose the terminals + nonterminals/arities for each domain then that's all you really need for generic implementations of these things as far as I can tell.

E.g. the constant 2 will now just be listed as "2" rather than "(2)" in
the generated program. This prevents it from being parsed as a list
sexpr.
@theoxo
Copy link
Collaborator Author

theoxo commented Feb 18, 2022

I suppose we'll wait with making more progress on this until we have some more domains to try it on.

@theoxo
Copy link
Collaborator Author

theoxo commented Nov 20, 2022

@mlb2251 am I right in thinking this has been superseded entirely by your synthesizer? If so, should we close this PR too?

@mlb2251
Copy link
Owner

mlb2251 commented Nov 21, 2022

@theoxo lets keep it open for a little bit, I haven't totally looked into this. Right now we can enumerate but not sample in the synthesizer https://github.com/mlb2251/synthestitch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants