Alibaba Cloud has revealed a new GPU pooling system that slashed the number of Nvidia accelerators needed for large-scale inference by more than 80%. The system, known as Aegaeon, was presented at the 2025 SOSP conference in Korea and piloted in Alibaba’s own production environment. It allows multiple large language models to share a single GPU. By doing so, it cuts the hardware footprint for inference workloads to a fraction of what was previously required.
The company claims it served dozens of LLMs…

![[CITYPNG.COM]White Google Play PlayStore Logo – 1500×1500](https://startupnews.fyi/wp-content/uploads/2025/08/CITYPNG.COMWhite-Google-Play-PlayStore-Logo-1500x1500-1-630x630.png)