Skip to content

Thompson sampling for improved exploration in GFlowNets #374

@josephdviviano

Description

@josephdviviano

https://arxiv.org/abs/2306.17693

to implement. Explores high-uncertainty regions by using an ensemble of policy heads with a shared torso. A random head generates the on-policy trajectory, and the loss is computed by averaging contributions over heads, where each head is independently included with probability p.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions