After upgrading to vLLM 0.15.x and newer versions of transformers, I’m seeing a regression in table extraction accuracy when using dots.ocr.
Previously, tables were detected and extracted correctly, including column structure and table header. With the newer stack, some tables are mis-detected — specifically, 5-column tables are sometimes extracted as only 3 columns.
This behavior did not occur in my earlier tests with older vLLM/transformers versions. vllm==0.10, transformers==4.51.3
Expected behavior
Tables should preserve the correct number of columns (e.g., a 5-column table should be extracted as 5 columns).
Actual behavior
Some tables are detected with fewer columns than exist:
5 columns → detected as 3
leads to merged/misaligned cells and incorrect structured output
Environment
vLLM: 0.15
transformers:
dots.ocr: <version/commit>
CUDA: 13.0
GPU: H100 NVL 96GB
Python: 3.12
OS: PyTorch (Vast) docker image based on nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
Repro steps
Run dots.ocr with vLLM backend
Process document containing multi-column tables (≥5 columns)
Observe incorrect column detection
i wish i kept examples but i didnt when i find i will reply in this
vLLM call params
payload = {
"model": "model",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": image_base64
}
},
{
"type": "text",
"text": prompt
}
]
}
],
"max_tokens": 8096,
#"presence_penalty": 0.0,
#"frequency_penalty": 0.0,
"repetition_penalty": 1.05,
"temperature": 0.1,
"top_p": 1.0,
"top_k": 0,
"min_p": 0.0
}
After upgrading to vLLM 0.15.x and newer versions of transformers, I’m seeing a regression in table extraction accuracy when using dots.ocr.
Previously, tables were detected and extracted correctly, including column structure and table header. With the newer stack, some tables are mis-detected — specifically, 5-column tables are sometimes extracted as only 3 columns.
This behavior did not occur in my earlier tests with older vLLM/transformers versions. vllm==0.10, transformers==4.51.3
Expected behavior
Tables should preserve the correct number of columns (e.g., a 5-column table should be extracted as 5 columns).
Actual behavior
Some tables are detected with fewer columns than exist:
5 columns → detected as 3
leads to merged/misaligned cells and incorrect structured output
Environment
vLLM: 0.15
transformers:
dots.ocr: <version/commit>
CUDA: 13.0
GPU: H100 NVL 96GB
Python: 3.12
OS: PyTorch (Vast) docker image based on nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04
Repro steps
Run dots.ocr with vLLM backend
Process document containing multi-column tables (≥5 columns)
Observe incorrect column detection
i wish i kept examples but i didnt when i find i will reply in this
vLLM call params