Skip to content

Table column detection regression with vLLM ≥0.15 + latest transformers (dots.ocr detects 5 columns as 3) #268

@ELMEHDAOUIAhmed

Description

@ELMEHDAOUIAhmed

After upgrading to vLLM 0.15.x and newer versions of transformers, I’m seeing a regression in table extraction accuracy when using dots.ocr.

Previously, tables were detected and extracted correctly, including column structure and table header. With the newer stack, some tables are mis-detected — specifically, 5-column tables are sometimes extracted as only 3 columns.

This behavior did not occur in my earlier tests with older vLLM/transformers versions. vllm==0.10, transformers==4.51.3

Expected behavior

Tables should preserve the correct number of columns (e.g., a 5-column table should be extracted as 5 columns).

Actual behavior

Some tables are detected with fewer columns than exist:

5 columns → detected as 3

leads to merged/misaligned cells and incorrect structured output

Environment

vLLM: 0.15

transformers:

dots.ocr: <version/commit>

CUDA: 13.0

GPU: H100 NVL 96GB

Python: 3.12

OS: PyTorch (Vast) docker image based on nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

Repro steps

Run dots.ocr with vLLM backend

Process document containing multi-column tables (≥5 columns)

Observe incorrect column detection

i wish i kept examples but i didnt when i find i will reply in this

vLLM call params

    payload = {
        "model": "model",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": image_base64
                        }
                    },
                    {
                        "type": "text",
                        "text": prompt
                    }
                ]
            }
        ],
        "max_tokens": 8096,
        #"presence_penalty": 0.0,
        #"frequency_penalty": 0.0,
        "repetition_penalty": 1.05,
        "temperature": 0.1,
        "top_p": 1.0,
        "top_k": 0,
        "min_p": 0.0
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions