Skip to content

fix: compress prompts shorter than iterative_size (#196)#247

Open
ousamabenyounes wants to merge 1 commit intomicrosoft:mainfrom
ousamabenyounes:fix/issue-196
Open

fix: compress prompts shorter than iterative_size (#196)#247
ousamabenyounes wants to merge 1 commit intomicrosoft:mainfrom
ousamabenyounes:fix/issue-196

Conversation

@ousamabenyounes
Copy link
Copy Markdown

What does this PR do?

Fixes #196

LLMLingua / LongLLMLingua's iterative token-level compression path calls `get_compressed_input(..., end=prompt_len, iterative_size=200)`. When the prompt is shorter than `iterative_size`, `end - iterative_size` goes negative and the line

```python
need_idx[: end - iterative_size] = 1
```

ends up overwriting the tail of `need_idx` from the right (Python negative indexing). Every token ends up marked "keep", the threshold-based decision gets thrown away, and the user sees 1.0x achieved compression on short prompts — the reporter traced this to prompt lengths below ~66 tokens with `iterative_size=200` and even graphed the resulting zig-zag behaviour out to 200 tokens.

Fix

Clamp `end` to at least `iterative_size` at the top of `get_compressed_input`:

```python
if end < iterative_size:
end = iterative_size
```

This turns the two masking writes into:

  • `need_idx[: 0] = 1` → no-op
  • `need_idx[iterative_size:] = 1` → keeps the trailing padding (beyond prompt_len it is a no-op anyway)

so the threshold-based `need_idx` actually takes effect. Long prompts (where `end >= iterative_size` already) are untouched. This is the exact fix suggested by the reporter.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Was this discussed/approved via a Github issue? Please add a link to it if that's the case — [Bug]: Prompts smaller than iterative_size are not compressed  #196.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests? — `tests/test_issue_196.py` (4 new tests, ~4s, no model download). Tests exercise `get_compressed_input` directly with a bare-`new` `PromptCompressor` shim, covering: short prompt actually compresses, below-66-token boundary, exactly-at-iterative-size boundary, long-prompt no-regression.

Verification

  • Baseline: 2 tests pass, 3 tests fail with a pre-existing `ValueError: too many values to unpack (expected 2)` in `iterative_compress_prompt` (transformers DynamicCache API change — addressed in a separate PR fix: normalize past_key_values across transformers DynamicCache API (#210) #246).
  • Post-fix: 2 baseline tests still pass + 4 new tests pass, same 3 pre-existing failures — zero new regressions.

Generated by Claude Code
Vibe coded by ousamabenyounes

When LLMLingua/LongLLMLingua iteratively compresses a prompt shorter
than iterative_size (default 200), get_compressed_input is called with
end == prompt_len. The line `need_idx[: end - iterative_size] = 1`
then uses a negative index that wraps around from the right, which
silently overwrites the tail of need_idx with True — so every token
gets kept regardless of the thresholding decision and the achieved
compression rate collapses to 1.0x for small prompts.

Clamp end to at least iterative_size at the top of
get_compressed_input so the two masking writes become a no-op on the
left slice and only keep the tail on the right slice, letting the
threshold-based need_idx actually take effect.

Generated by Claude Code
Vibe coded by ousamabenyounes

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Prompts smaller than iterative_size are not compressed

1 participant