VAST Data Reworks Inference Architecture for Agentic AI with NVIDIA

VAST Data has redesigned its AI inference architecture to support long-lived, agentic workloads by integrating its AI Operating System directly with NVIDIA’s data-center networking stack. The company announced that VAST AI OS now runs natively on NVIDIA BlueField-4 DPUs as part of the NVIDIA Inference Context Memory Storage Platform. The architecture targets large-scale inference environments where models operate across long sessions, multiple turns, and multiple agents, shifting the performance focus from raw GPU compute to how efficiently inference context is stored, shared, and reused.

As inference evolves beyond stateless prompts, VAST argues that keeping context local to GPUs no longer scales. The updated design embeds storage and data services directly inside GPU servers as well as dedicated data nodes, removing traditional client-server contention and reducing data copies that increase time-to-first-token as concurrency rises. Using VAST’s Disaggregated Shared-Everything (DASE) architecture with NVIDIA Spectrum-X Ethernet, the system exposes a shared, globally coherent key-value cache across nodes with deterministic access characteristics.

VAST positions the platform as a foundation for production inference as AI services move into regulated and revenue-generating deployments. By treating inference context as shared infrastructure, the AI OS adds policy controls, isolation, auditability, and lifecycle management while maintaining high-speed access to KV cache. The company says this approach helps reduce idle GPU time and improves infrastructure efficiency as context sizes and concurrent sessions increase.

Runs VAST AI Operating System natively on NVIDIA BlueField-4 DPUs
Collapses traditional storage tiers into a shared, pod-scale KV cache
Enables direct GPU-to-NVMe access over RDMA Ethernet fabrics
Targets predictable latency for long-context, multi-turn, and multi-agent inference
Adds policy, security, and lifecycle controls for production environments

“Inference is becoming a memory system, not a compute job,” said John Mao, Vice President of Global Technology Alliances at VAST Data. “If context isn’t available on demand, GPUs idle and economics collapse. With the VAST AI Operating System on NVIDIA BlueField-4, we’re turning context into shared infrastructure built to stay predictable as agentic AI scales.”

🌐 Analysis

The announcement highlights a broader shift in AI infrastructure, where memory systems and data movement increasingly define inference performance. It also aligns with NVIDIA’s strategy of extending its platform beyond GPUs into DPUs and Ethernet fabrics that support scalable, multi-tenant AI factories, where efficient context sharing becomes central to system design.

VAST Data is a privately held, remote-first data infrastructure company with its corporate headquarters listed in New York City, while maintaining a significant engineering presence in Israel and distributed teams across North America and Europe. The company was founded in 2016 by Renen Hallak (CEO), Jeff Denworth (President), and Shachar Fienblit (CTO), all of whom previously held senior technical and leadership roles at all-flash storage vendor Kaminario, where they worked on large-scale, NVMe-based distributed storage systems. VAST does not publicly disclose headcount, but industry estimates place the company in the several-hundred-employee range, commonly cited between 400 and 700 staff as of 2025. The company has raised over $380 million in private funding from investors including Tiger Global, Norwest Venture Partners, Goldman Sachs, and Next47, reaching unicorn valuation status in later rounds, and remains independent as it focuses on AI-scale data, training, and inference infrastructure.

Tags: Nvidia

VAST Data Reworks Inference Architecture for Agentic AI with NVIDIA

AT&T Debuts Analytics Platform to Monitor IoT Across Its Cellular Network

xAI Commits $20B to 2-GW AI Data Center in Mississippi

Jim Carroll

Related Posts

NVIDIA Pushes Telecom AI Toward Autonomous Operations at DTW Ignite 2026

Groq Raises $650 Million to Expand AI Inference Cloud

NVIDIA: Europe Unveils Record 35 AI Supercomputers

NVIDIA Expands Korea AI Push

NVIDIA Vera Rubin Enters Full Production

NVIDIA Adds In-Silicon Security to Vera BlueField-4 STX

xAI Commits $20B to 2-GW AI Data Center in Mississippi

Categories