RESEARCH28
Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models
arXiv CS.LGΒ·April 23, 2026
This paper evaluates speculative decoding with EAGLE3 as an inference-time optimization for PayPal's Commerce Agent, powered by fine-tuned Nemotron models. The study demonstrates significant performance improvements, including 22-49% throughput increase and 18-33% latency reduction at zero additional hardware cost.
Performance benchmarkingLLM optimizationInference accelerationlarge language modelsSpeculative Decoding
Read original β