AWS-Trainium2

1 article tagged with this topic

Speculative Decoding on AWS Trainium2 Cuts LLM Lat ency Up to 3x

AWS benchmarks show speculative decoding with vLLM on Trainium2 reduces inter -token latency up to 3x for decode-heavy workloads.