fix: compress prompts shorter than iterative_size (#196) by ousamabenyounes · Pull Request #247 · microsoft/LLMLingua

ousamabenyounes · 2026-04-11T18:58:27Z

What does this PR do?

Fixes #196

LLMLingua / LongLLMLingua's iterative token-level compression path calls `get_compressed_input(..., end=prompt_len, iterative_size=200)`. When the prompt is shorter than `iterative_size`, `end - iterative_size` goes negative and the line

```python
need_idx[: end - iterative_size] = 1
```

ends up overwriting the tail of `need_idx` from the right (Python negative indexing). Every token ends up marked "keep", the threshold-based decision gets thrown away, and the user sees 1.0x achieved compression on short prompts — the reporter traced this to prompt lengths below ~66 tokens with `iterative_size=200` and even graphed the resulting zig-zag behaviour out to 200 tokens.

Fix

Clamp `end` to at least `iterative_size` at the top of `get_compressed_input`:

```python
if end < iterative_size:
end = iterative_size
```

This turns the two masking writes into:

`need_idx[: 0] = 1` → no-op
`need_idx[iterative_size:] = 1` → keeps the trailing padding (beyond prompt_len it is a no-op anyway)

so the threshold-based `need_idx` actually takes effect. Long prompts (where `end >= iterative_size` already) are untouched. This is the exact fix suggested by the reporter.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Was this discussed/approved via a Github issue? Please add a link to it if that's the case — [Bug]: Prompts smaller than iterative_size are not compressed #196.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? — `tests/test_issue_196.py` (4 new tests, ~4s, no model download). Tests exercise `get_compressed_input` directly with a bare-`new` `PromptCompressor` shim, covering: short prompt actually compresses, below-66-token boundary, exactly-at-iterative-size boundary, long-prompt no-regression.

Verification

Baseline: 2 tests pass, 3 tests fail with a pre-existing `ValueError: too many values to unpack (expected 2)` in `iterative_compress_prompt` (transformers DynamicCache API change — addressed in a separate PR fix: normalize past_key_values across transformers DynamicCache API (#210) #246).
Post-fix: 2 baseline tests still pass + 4 new tests pass, same 3 pre-existing failures — zero new regressions.

Generated by Claude Code
Vibe coded by ousamabenyounes

When LLMLingua/LongLLMLingua iteratively compresses a prompt shorter than iterative_size (default 200), get_compressed_input is called with end == prompt_len. The line `need_idx[: end - iterative_size] = 1` then uses a negative index that wraps around from the right, which silently overwrites the tail of need_idx with True — so every token gets kept regardless of the thresholding decision and the achieved compression rate collapses to 1.0x for small prompts. Clamp end to at least iterative_size at the top of get_compressed_input so the two masking writes become a no-op on the left slice and only keep the tail on the right slice, letting the threshold-based need_idx actually take effect. Generated by Claude Code Vibe coded by ousamabenyounes Co-Authored-By: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: compress prompts shorter than iterative_size (#196)#247

fix: compress prompts shorter than iterative_size (#196)#247
ousamabenyounes wants to merge 1 commit intomicrosoft:mainfrom
ousamabenyounes:fix/issue-196

ousamabenyounes commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ousamabenyounes commented Apr 11, 2026

What does this PR do?

Fix

Before submitting

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant