vllm-project/vllm

↓ decelerating · vllm-project/vllm

Velocity declined from 21.6 to 14.3 per day with -7.3 acceleration, post v0.20.0 release on April 27 setting CUDA 13.0 defaults and adding system_fingerprint, which boosted initial interest but tapered off. This is caused by fulfilled API compatibility needs, drawing parallels to Ollama's -5.9 on 11.7 velocity after v0.21.0. Trailing llama.cpp's 26.0 but leading Ollama, it highlights throughput advantages, implying investors should monitor for ROCm expansions to reverse trends in inference serving.


From the briefing: 2026-04-28 · AI OSS Momentum Decelerates Across Key Projects