Possibility on application to LaTeX-OCR (image of arbitrary size)

Hi, thank you for this amazing work!

I think code generation based on syntax tree is more natural than the left-to-right linear generation. And do you think it's possible to apply your ideas to do mathematical expression recognition, i.e. LaTeX-OCR?

There are already several solutions there, but basically the pipeline is to first using a vision encoder to get vision tokens for the image and then put them into a VLM decoder to do typical autoregressive text/code generation, without using any syntactical information, e.g. there may be un-paired curly brackets in the output.

So I wonder have you think about such application? From my perspective, there may be one major difficulty: the CSG2D program in the paper produces a relatively regular size image (a square) without too much resizing issue to concern, while a LaTeX-rendered image maybe arbitrarily long, which may be hard for the value network to estimate the program edit distance in a consistent way.

Have you explore the influence of such resizing issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibility on application to LaTeX-OCR (image of arbitrary size) #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possibility on application to LaTeX-OCR (image of arbitrary size) #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions