At CES 2026, NVIDIA detailed how its new BlueField-4 data processing unit underpins the NVIDIA Inference Context Memory Storage Platform, a storage architecture designed specifically for long-context, multi-turn agentic AI. As AI systems move beyond single-prompt inference toward persistent reasoning, they generate large volumes of key-value (KV) cache data that cannot remain resident in GPU memory without constraining throughput. BlueField-4 addresses this gap by offloading context memory management, sharing, and security into a dedicated storage processor tightly integrated with Rubin-class GPU clusters.
The Inference Context Memory Storage Platform extends effective GPU memory by enabling high-bandwidth, low-latency sharing of KV cache across rack-scale systems. NVIDIA says this approach can increase tokens per second and improve power efficiency by up to 5x compared with traditional storage paths. BlueField-4 manages KV placement in hardware, eliminating metadata overhead and reducing data movement between GPUs and storage. Combined with RDMA over Spectrum-X Ethernet, the platform allows AI agents to reuse and persist context across sessions and nodes, improving responsiveness and throughput for multi-agent inference.
BlueField-4 represents a generational shift from earlier BlueField DPUs, which focused primarily on infrastructure offload and security. BlueField-1 and -2 concentrated on networking, storage, and security acceleration for cloud and enterprise data centers, while BlueField-3 expanded inline acceleration and isolation for large-scale AI fabrics. With BlueField-4, NVIDIA positions the DPU as a core component of the AI memory hierarchy, optimized for KV cache handling and integrated with the DOCA framework, NIXL library, and NVIDIA Dynamo software to support cluster-level coordination for agentic workloads.
- Purpose-built for AI-native storage supporting long-context, multi-turn inference
- Hardware-accelerated KV cache placement and sharing managed by BlueField-4
- Up to 5x improvement in tokens per second and power efficiency versus traditional storage
- RDMA-based access over Spectrum-X Ethernet for high-bandwidth, low-latency context sharing
- Integrated with NVIDIA DOCA, NIXL, and Dynamo for coordinated inference at scale
- Platform availability targeted for the second half of 2026, with broad storage partner support
“AI is revolutionizing the entire computing stack — and now, storage,” said Jensen Huang, founder and CEO of NVIDIA. “With BlueField-4, NVIDIA and our software and hardware partners are reinventing the storage stack for the next frontier of AI.”
🌐 Analysis
BlueField-4 signals NVIDIA’s view that memory and storage, not just compute, now define AI system scalability. As agentic and MoE models push context sizes beyond what GPUs can hold economically, dedicated KV-aware storage becomes a performance lever. Competing approaches rely on CPU-centric or software-managed cache tiers, but NVIDIA’s strategy embeds context management directly into the DPU and fabric, reinforcing its rack-scale architecture model alongside Rubin GPUs and NVLink-centric systems.







