Skip to content

fp16推理结果全是感叹号 #48

@spiderlx

Description

@spiderlx

请问是我漏了什么信息导致llm_dtype=fp16时推理错误还是模型并不支持fp16推理?

推理脚本使用的是demo2.py:

import torch
from model import FunASRNano


def main():
    model_dir = "/workspace/fun-asr-nano/model_bin/Fun-ASR-Nano-2512"
    device = (
        "cuda:0"
        if torch.cuda.is_available()
        else "mps" if torch.backends.mps.is_available() else "cpu"
    )
    m, kwargs = FunASRNano.from_pretrained(model=model_dir, device=device)
    m.eval()

    wav_path = f"/workspace/fun-asr-nano/data_bin/yue.mp3"
    res = m.inference(data_in=[wav_path],llm_dtype='fp16', **kwargs)
    text = res[0][0]["text"]
    print(text)


if __name__ == "__main__":
    main()

model.py使用的是当前最新版model.py文件,仅在inferencew_llm函数中添加print:

def inference_llm(
        self,
        data_in,
        data_lengths=None,
        key: list = None,
        tokenizer=None,
        frontend=None,
        **kwargs,
    ):
        inputs_embeds, contents, batch, source_ids, meta_data = self.inference_prepare(
            data_in, data_lengths, key, tokenizer, frontend, **kwargs
        )
        llm_dtype = kwargs.get("llm_dtype", "fp32")
        if llm_dtype == "fp32":
            llm_dtype = "fp16" if kwargs.get("fp16", False) else llm_dtype
            llm_dtype = "bf16" if kwargs.get("bf16", False) else llm_dtype
        print(llm_dtype)
        device_type = torch.device(kwargs.get("device", "cuda")).type
        with torch.autocast(
            device_type=device_type if device_type in ["cuda", "xpu", "mps"] else "cpu",
            enabled=True if llm_dtype != "fp32" else False,
            dtype=dtype_map[llm_dtype]
        ):

以下是推理结果:

python /workspace/fun-asr-nano/tmp-workspace/Fun-ASR/demo2.py
WARNING:root:trust_remote_code: True
Loading remote code successfully: model
fp16
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

fp32和bf16推理结果都正常,以下是fp32推理结果:

poetry run python /workspace/fun-asr-nano/tmp-workspace/Fun-ASR/demo2.py
WARNING:root:trust_remote_code: True
Loading remote code successfully: model
fp32
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
呢几个字都表达唔到我想讲嘅意思。

以下是bf16推理结果:

poetry run python /workspace/fun-asr-nano/tmp-workspace/Fun-ASR/demo2.py
WARNING:root:trust_remote_code: True
Loading remote code successfully: model
bf16
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
呢几个字都表达唔到我想讲嘅意思。

以下是我的环境:

pip list
Package                  Version     Editable project location
------------------------ ----------- -------------------------
aliyun-python-sdk-core   2.16.0
aliyun-python-sdk-kms    2.16.5
antlr4-python3-runtime   4.9.3
audioread                3.1.0
decorator                5.2.1
editdistance             0.8.1
filelock                 3.20.1
fsspec                   2025.12.0
fun-asr-nano             0.1.0       /workspace/fun-asr-nano
funasr                   1.2.9
…
huggingface-hub          0.36.0
hydra-core               1.3.2
jieba                    0.42.1
Jinja2                   3.1.6
jmespath                 0.10.0
kaldiio                  2.18.1
librosa                  0.11.0
llvmlite                 0.46.0
mdurl                    0.1.2
modelscope               1.33.0
numba                    0.63.1
numpy                    1.26.4
nvidia-cublas-cu12       12.8.4.1
nvidia-cuda-cupti-cu12   12.8.90
nvidia-cuda-nvrtc-cu12   12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12        9.10.2.21
nvidia-cufft-cu12        11.3.3.83
nvidia-cufile-cu12       1.13.1.3
nvidia-curand-cu12       10.3.9.90
nvidia-cusolver-cu12     11.7.3.90
nvidia-cusparse-cu12     12.5.8.93
nvidia-cusparselt-cu12   0.7.1
nvidia-nccl-cu12         2.27.3
nvidia-nvjitlink-cu12    12.8.93
nvidia-nvtx-cu12         12.8.90
sentencepiece            0.2.1
setuptools               80.9.0
shellingham              1.5.4
soundfile                0.13.1
soxr                     1.0.0
sympy                    1.14.0
tensorboardX             2.6.4
threadpoolctl            3.6.0
tokenizers               0.22.1
torch                    2.8.0
torch-complex            0.4.4
torchaudio               2.8.0
tqdm                     4.67.1
transformers             4.57.3
triton                   3.4.0
typer                    0.20.1
typing_extensions        4.15.0
umap-learn               0.5.9.post2
urllib3                  2.6.2

请问是我漏了什么信息导致llm_dtype=fp16时推理错误还是模型并不支持fp16推理

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions