RESEARCH28

Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models

arXiv CS.LG·April 23, 2026

This paper evaluates speculative decoding with EAGLE3 as an inference-time optimization for PayPal's Commerce Agent, powered by fine-tuned Nemotron models. The study demonstrates significant performance improvements, including 22-49% throughput increase and 18-33% latency reduction at zero additional hardware cost.

Performance benchmarking LLM optimization Inference acceleration large language models Speculative Decoding

Read original ↗