🍌 Nono Banana: A Controllable Benchmark for Non-English LLM Text Recognition

LLMs still struggle to reliably extract non-english text from images. This has become a huge blocker for existing models, as over a billion people read and write Mandarin alone, yet most evaluation data for this task is tiny, messy, and english-centric. The core issue is simple: there’s no controlled, scalable way to test how LLMs behave on complex scripts. The existing datasets are scraped, mislabeled, inconsistent, and impossible to tune. you can’t say ‘make this 20% harder’ or systematically test radicals, stroke density, angles, blur, or font variation.

Nono Banana fixes that. The name is a small RL wink — you keep saying ‘no no’ until the model improves — but the tech is the serious part.

What We Do

The key unlock is that Gemini NanoBanana Pro can now generate synthetic non-english text images with perfect ground truth baked in. We specify the exact characters, and nano renders them. that means automatic scoring, infinite scale, and full control over difficulty.

We prompt Nano Banana with a Mandarin phrase. Because we know the ground truth, evaluating the LLMs is instant. now we dial up difficulty: more strokes, nested components, weirder fonts, motion blur, angled lighting, clutter.

How It Works: An RL Approach

This is where Reinforcement Learning becomes the perfect tool for this problem. We treat the LLMs as fixed agents and our generator as the environment. As a model succeeds, the environment escalates complexity; as it fails, we log the precise failure mode — maybe it drops a radical, confuses traditional vs simplified, or collapses dense characters.

This process creates a highly targeted dataset of failure points. The output isn’t just a score — it’s a controllable gradient of difficulty and a clean JSON dataset that is perfect for improving a specific model's capabilities through downstream fine-tuning.

Why This Matters

The result is a fully automated, infinitely scalable benchmark for non-english text extraction. no manual labeling, no inconsistent images, just a deterministic way to push models until they break — and finally understand why.

Getting Started

To get a local copy up and running, follow these simple steps.

Prerequisites

You'll need bun installed on your machine. You can find installation instructions at the official Bun website.

Installation

Clone the repo

git clone https://github.com/your_username_/rl-nano.git

Install BUN packages
```
bun install
```

Usage

Run the development server:

bun dev

Open http://localhost:3000 with your browser to see the result.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
app		app
components		components
contexts		contexts
hooks		hooks
lib		lib
public		public
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.js		postcss.config.js
rl-nano-export-2025-11-23T16_37_01.415Z.json		rl-nano-export-2025-11-23T16_37_01.415Z.json
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍌 Nono Banana: A Controllable Benchmark for Non-English LLM Text Recognition

What We Do

How It Works: An RL Approach

Why This Matters

Getting Started

Prerequisites

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🍌 Nono Banana: A Controllable Benchmark for Non-English LLM Text Recognition

What We Do

How It Works: An RL Approach

Why This Matters

Getting Started

Prerequisites

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages