Motivation
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads (https://github.com/FasterDecoding/Medusa) The proposed method can greatly improve the inference speed
Related resources
https://github.com/FasterDecoding/Medusa