The code in llama2_split line 69, the origin code is
causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position, past_seen_tokens)
there are two questions first is there neet a parameter output_attentions,what's more is the type of past_seen_tokens is int but it should be cache.
这里是作者漏掉了吗