The value generated is largely different from what reported

Can you take a look at the following news: 

https://www.reddit.com/r/singularity/comments/1c8kdbo/emad_llama_3_70b_just_a_casual_3000_tokenssecond/
https://developer.nvidia.com/blog/boost-llama-3-3-70b-inference-throughput-3x-with-nvidia-tensorrt-llm-speculative-decoding/

or we can set up different parameters, e.g., batch size, to adjust the results? 

Thanks.