At Inflection AI, we recently made a major shift in our infrastructure: we ported our LLM inference stack from NVIDIA GPUs to Intel’s Gaudi accelerators. The reasons behind the shift are ones that nearly every enterprise is also facing today: GPU supply shortages, rising prices, and inflexible long-term leases meant building on NVIDIA hardware could limit our ability — and our customers’ ability — to scale.
It was clear we needed a more flexible stack. When assessing the options, Intel rose to the top of the list as it already has the…