-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Hi, thank you for this amazing work!
I think code generation based on syntax tree is more natural than the left-to-right linear generation. And do you think it's possible to apply your ideas to do mathematical expression recognition, i.e. LaTeX-OCR?
There are already several solutions there, but basically the pipeline is to first using a vision encoder to get vision tokens for the image and then put them into a VLM decoder to do typical autoregressive text/code generation, without using any syntactical information, e.g. there may be un-paired curly brackets in the output.
So I wonder have you think about such application? From my perspective, there may be one major difficulty: the CSG2D program in the paper produces a relatively regular size image (a square) without too much resizing issue to concern, while a LaTeX-rendered image maybe arbitrarily long, which may be hard for the value network to estimate the program edit distance in a consistent way.
Have you explore the influence of such resizing issue?