Skip to content

agrimsingh/rl-nano

Repository files navigation

🍌 Nono Banana: A Controllable Benchmark for Non-English LLM Text Recognition

LLMs still struggle to reliably extract non-english text from images. This has become a huge blocker for existing models, as over a billion people read and write Mandarin alone, yet most evaluation data for this task is tiny, messy, and english-centric. The core issue is simple: there’s no controlled, scalable way to test how LLMs behave on complex scripts. The existing datasets are scraped, mislabeled, inconsistent, and impossible to tune. you can’t say ‘make this 20% harder’ or systematically test radicals, stroke density, angles, blur, or font variation.

Nono Banana fixes that. The name is a small RL wink — you keep saying ‘no no’ until the model improves — but the tech is the serious part.

What We Do

The key unlock is that Gemini NanoBanana Pro can now generate synthetic non-english text images with perfect ground truth baked in. We specify the exact characters, and nano renders them. that means automatic scoring, infinite scale, and full control over difficulty.

We prompt Nano Banana with a Mandarin phrase. Because we know the ground truth, evaluating the LLMs is instant. now we dial up difficulty: more strokes, nested components, weirder fonts, motion blur, angled lighting, clutter.

How It Works: An RL Approach

This is where Reinforcement Learning becomes the perfect tool for this problem. We treat the LLMs as fixed agents and our generator as the environment. As a model succeeds, the environment escalates complexity; as it fails, we log the precise failure mode — maybe it drops a radical, confuses traditional vs simplified, or collapses dense characters.

This process creates a highly targeted dataset of failure points. The output isn’t just a score — it’s a controllable gradient of difficulty and a clean JSON dataset that is perfect for improving a specific model's capabilities through downstream fine-tuning.

Why This Matters

The result is a fully automated, infinitely scalable benchmark for non-english text extraction. no manual labeling, no inconsistent images, just a deterministic way to push models until they break — and finally understand why.

Getting Started

To get a local copy up and running, follow these simple steps.

Prerequisites

You'll need bun installed on your machine. You can find installation instructions at the official Bun website.

Installation

  1. Clone the repo
    git clone https://github.com/your_username_/rl-nano.git
  2. Install BUN packages
    bun install

Usage

Run the development server:

bun dev

Open http://localhost:3000 with your browser to see the result.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors