-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
请问是我漏了什么信息导致llm_dtype=fp16时推理错误还是模型并不支持fp16推理?
推理脚本使用的是demo2.py:
import torch
from model import FunASRNano
def main():
model_dir = "/workspace/fun-asr-nano/model_bin/Fun-ASR-Nano-2512"
device = (
"cuda:0"
if torch.cuda.is_available()
else "mps" if torch.backends.mps.is_available() else "cpu"
)
m, kwargs = FunASRNano.from_pretrained(model=model_dir, device=device)
m.eval()
wav_path = f"/workspace/fun-asr-nano/data_bin/yue.mp3"
res = m.inference(data_in=[wav_path],llm_dtype='fp16', **kwargs)
text = res[0][0]["text"]
print(text)
if __name__ == "__main__":
main()
model.py使用的是当前最新版model.py文件,仅在inferencew_llm函数中添加print:
def inference_llm(
self,
data_in,
data_lengths=None,
key: list = None,
tokenizer=None,
frontend=None,
**kwargs,
):
inputs_embeds, contents, batch, source_ids, meta_data = self.inference_prepare(
data_in, data_lengths, key, tokenizer, frontend, **kwargs
)
llm_dtype = kwargs.get("llm_dtype", "fp32")
if llm_dtype == "fp32":
llm_dtype = "fp16" if kwargs.get("fp16", False) else llm_dtype
llm_dtype = "bf16" if kwargs.get("bf16", False) else llm_dtype
print(llm_dtype)
device_type = torch.device(kwargs.get("device", "cuda")).type
with torch.autocast(
device_type=device_type if device_type in ["cuda", "xpu", "mps"] else "cpu",
enabled=True if llm_dtype != "fp32" else False,
dtype=dtype_map[llm_dtype]
):
以下是推理结果:
python /workspace/fun-asr-nano/tmp-workspace/Fun-ASR/demo2.py
WARNING:root:trust_remote_code: True
Loading remote code successfully: model
fp16
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
fp32和bf16推理结果都正常,以下是fp32推理结果:
poetry run python /workspace/fun-asr-nano/tmp-workspace/Fun-ASR/demo2.py
WARNING:root:trust_remote_code: True
Loading remote code successfully: model
fp32
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
呢几个字都表达唔到我想讲嘅意思。
以下是bf16推理结果:
poetry run python /workspace/fun-asr-nano/tmp-workspace/Fun-ASR/demo2.py
WARNING:root:trust_remote_code: True
Loading remote code successfully: model
bf16
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
呢几个字都表达唔到我想讲嘅意思。
以下是我的环境:
pip list
Package Version Editable project location
------------------------ ----------- -------------------------
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
antlr4-python3-runtime 4.9.3
audioread 3.1.0
decorator 5.2.1
editdistance 0.8.1
filelock 3.20.1
fsspec 2025.12.0
fun-asr-nano 0.1.0 /workspace/fun-asr-nano
funasr 1.2.9
…
huggingface-hub 0.36.0
hydra-core 1.3.2
jieba 0.42.1
Jinja2 3.1.6
jmespath 0.10.0
kaldiio 2.18.1
librosa 0.11.0
llvmlite 0.46.0
mdurl 0.1.2
modelscope 1.33.0
numba 0.63.1
numpy 1.26.4
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile-cu12 1.13.1.3
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-nccl-cu12 2.27.3
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvtx-cu12 12.8.90
sentencepiece 0.2.1
setuptools 80.9.0
shellingham 1.5.4
soundfile 0.13.1
soxr 1.0.0
sympy 1.14.0
tensorboardX 2.6.4
threadpoolctl 3.6.0
tokenizers 0.22.1
torch 2.8.0
torch-complex 0.4.4
torchaudio 2.8.0
tqdm 4.67.1
transformers 4.57.3
triton 3.4.0
typer 0.20.1
typing_extensions 4.15.0
umap-learn 0.5.9.post2
urllib3 2.6.2
请问是我漏了什么信息导致llm_dtype=fp16时推理错误还是模型并不支持fp16推理
Metadata
Metadata
Assignees
Labels
No labels