LLMs still struggle to reliably extract non-english text from images. This has become a huge blocker for existing models, as over a billion people read and write Mandarin alone, yet most evaluation data for this task is tiny, messy, and english-centric. The core issue is simple: there’s no controlled, scalable way to test how LLMs behave on complex scripts. The existing datasets are scraped, mislabeled, inconsistent, and impossible to tune. you can’t say ‘make this 20% harder’ or systematically test radicals, stroke density, angles, blur, or font variation.
Nono Banana fixes that. The name is a small RL wink — you keep saying ‘no no’ until the model improves — but the tech is the serious part.
The key unlock is that Gemini NanoBanana Pro can now generate synthetic non-english text images with perfect ground truth baked in. We specify the exact characters, and nano renders them. that means automatic scoring, infinite scale, and full control over difficulty.
We prompt Nano Banana with a Mandarin phrase. Because we know the ground truth, evaluating the LLMs is instant. now we dial up difficulty: more strokes, nested components, weirder fonts, motion blur, angled lighting, clutter.
This is where Reinforcement Learning becomes the perfect tool for this problem. We treat the LLMs as fixed agents and our generator as the environment. As a model succeeds, the environment escalates complexity; as it fails, we log the precise failure mode — maybe it drops a radical, confuses traditional vs simplified, or collapses dense characters.
This process creates a highly targeted dataset of failure points. The output isn’t just a score — it’s a controllable gradient of difficulty and a clean JSON dataset that is perfect for improving a specific model's capabilities through downstream fine-tuning.
The result is a fully automated, infinitely scalable benchmark for non-english text extraction. no manual labeling, no inconsistent images, just a deterministic way to push models until they break — and finally understand why.
To get a local copy up and running, follow these simple steps.
You'll need bun installed on your machine. You can find installation instructions at the official Bun website.
- Clone the repo
git clone https://github.com/your_username_/rl-nano.git
- Install BUN packages
bun install
Run the development server:
bun devOpen http://localhost:3000 with your browser to see the result.