Nvidia has expanded its Vera Rubin AI platform with a new component: the Groq 3 LPU, an inference accelerator equipped with a massive 500 MB of SRAM that delivers breakthrough bandwidth for latency-sensitive AI tasks. Unveiled at GTC, this chip complements Rubin’s ecosystem, which already includes GPUs, CPUs, smart NICs, and dedicated network switches, powering next-generation AI data centers that Nvidia calls ”factories.”
The Groq 3 LPU’s SRAM memory is modest in capacity compared to Rubin GPUs’ 288 GB of HBM4, but it offers an astonishing 150 TB/s bandwidth – nearly seven times the speed of HBM’s 22 TB/s. This makes it particularly suited for inference workloads that require swift token generation and interaction at scale. By emphasizing SRAM over traditional high-capacity memory, Nvidia is targeting AI applications where bandwidth, not sheer size, bottlenecks performance.
Groq 3 LPU enhances Nvidia Rubin AI platform’s inference capabilities
Rubin, already a multi-chip platform featuring the proprietary Vera CPU, Rubin GPUs, NVLink 6 switches, ConnectX 9 smart NICs, Bluefield 4 data processing units, and Spectrum-X optical switches, now integrates the Groq 3 LPU as a new building block for rack-level to factory-scale AI deployments. Nvidia’s CEO Jensen Huang pitches this platform as the backbone for the next wave of AI systems arriving later this year.
Shift from CPX inference accelerator to Groq 3 LPU
Interestingly, Nvidia is pivoting some focus away from its CPX inference accelerator, potentially replacing it with the Groq 3 LPU in certain roles. This shift is logical given Groq 3’s ability to perform similar AI inference enhancements but without relying on the large GDDR7 memory pools that CPX modules demand. It reflects Nvidia’s broader strategy to optimize inference efficiency amid growing memory constraints in AI hardware.
Combining GPUs, CPUs, and Groq 3 LPU for optimized AI workloads
Nvidia’s integration of Groq’s intellectual property signals an aggressive push to differentiate Rubin from other AI platforms. By combining traditional GPU compute, specialized CPUs, smart networking, and now SRAM-heavy inference accelerators, Nvidia is building a highly diverse hardware stack tailored to meet the evolving demands of large language models and generative AI.
While Groq itself was an independent startup focused on inference accelerators, its technology now forms a core part of Nvidia’s vision to deliver scalable, low-latency AI systems. With Rubin’s modular approach, enterprises will have more granular control over balancing compute, memory, and interconnect resources – a critical competitive advantage as AI workloads become more complex and memory bandwidth becomes a chokepoint.
Outlook for Groq 3 LPU adoption in AI infrastructure
The coming months at GTC and beyond will reveal how Nvidia customers adopt the Groq 3 LPU in production, and whether it reshapes the inference acceleration landscape dominated so far by GPU-heavy approaches and emerging CPUs. For now, Nvidia investors and AI infrastructure builders have yet another tool aimed squarely at pushing inference speed and efficiency into new territory.


