How to use  torch_npu._npu_flash_attention_qlens

I'm trying to use `torch_npu._npu_flash_attention_qlens` and I'm confused about its input parameters. ( torch_npu: 2.5.1 )

```python
torch_npu._npu_flash_attention_qlens(
    query=query,
    key_cache=self.key_cache,
    value_cache=self.value_cache,
    block_table=block_tables,
    mask=compress_mask,
    seq_len=self.query_lens_tensor_cpu,
    context_lens=self.seq_lens_tensor_cpu,
    num_kv_heads=self.num_kv_heads,
    num_heads=self.num_heads,
    scale_value=self.scale,
    out=output)
```

Could you please provide details on the specific requirements and usage of the **`query`**, **`block_table`**, **`mask`**, **`seq_len`**, and **`context_lens`** parameters?

Any example code or links to documentation would be helpful. Thanks for your time and help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use torch_npu._npu_flash_attention_qlens #84

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to use torch_npu._npu_flash_attention_qlens #84

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions