MinIO introduced MemKV, a new context memory store designed to accelerate AI inference by delivering microsecond-scale context retrieval across petabyte-scale infrastructure. The announcement expands MinIO’s AI infrastructure portfolio beyond its AIStor object storage platform into the inference memory layer, targeting the growing operational demands of agentic AI systems. The company said MemKV addresses the “recompute tax” that occurs when GPUs repeatedly regenerate lost inference context, driving up latency, power consumption, and infrastructure costs.
MemKV is engineered specifically for NVIDIA’s BlueField-4 STX architecture and integrates with NVIDIA Dynamo and NVIDIA NIXL software components. The platform introduces a shared memory tier that allows GPU clusters to access persistent inference context without relying solely on expensive high-bandwidth memory (HBM) or DRAM. MinIO said the architecture combines microsecond responsiveness with petabyte-scale capacity, enabling AI clusters to retain and share long-context reasoning data during inference workloads. The company positions MemKV as a new “G3.5” layer in the GPU memory hierarchy, optimized for AI inference rather than traditional storage operations.
According to MinIO, MemKV bypasses conventional storage bottlenecks by moving key-value cache data directly between NVMe and GPUs over RDMA transport. The system avoids HTTP protocols, file-system translation layers, and intermediary storage servers. MinIO said MemKV operates using 2 MB to 16 MB GPU-native block sizes and is optimized for NVIDIA Spectrum-X Ethernet fabrics and PCIe Gen6 connectivity. In representative benchmark testing, the company reported substantial reductions in time-to-first-token latency and estimated that a deployment with 128 GPUs and 128K-token context lengths could improve GPU utilization from approximately 50% to more than 90%, translating into roughly $2 million in annual compute savings.
• Designed for AI inference and agentic AI workloads
• Built for NVIDIA BlueField-4 STX architecture
• Native integration with NVIDIA Dynamo and NVIDIA NIXL
• Supports petabyte-scale shared context memory
• Uses end-to-end RDMA transport between NVMe and GPU memory
• Optimized for NVIDIA Spectrum-X Ethernet and PCIe Gen6
• GPU-native block sizes ranging from 2 MB to 16 MB
• Claimed improvement of GPU utilization from ~50% to >90% in representative deployments
• Estimated annual compute savings of approximately $2 million for a 128-GPU deployment
“The industry has been papering over context loss for years because, at small scale, you may be able to absorb the recompute tax and move on. At the GPU density hyperscalers and neoclouds are building toward, that is no longer true,” said AB Periasamy, co-founder and CEO of MinIO. “Yield economics at this scale demand something purpose-built for the inference data path. MemKV was designed for exactly this.”
🌐 Analysis: The launch of MemKV reflects a broader shift in AI infrastructure economics from training-centric optimization toward inference efficiency and token throughput. As AI deployments evolve into long-context agentic systems, infrastructure bottlenecks increasingly emerge in memory movement and context persistence rather than raw GPU compute. Vendors across the AI stack are now introducing specialized memory hierarchies, RDMA fabrics, and GPU-adjacent storage architectures to reduce latency and maximize utilization of increasingly expensive accelerated compute clusters.
🌐 The announcement also highlights NVIDIA’s expanding influence over the end-to-end AI infrastructure stack. BlueField-4 STX, Spectrum-X Ethernet, Dynamo, and NIXL collectively form part of NVIDIA’s vertically integrated architecture strategy for hyperscale AI factories. MinIO joins a growing ecosystem of infrastructure vendors optimizing specifically for these NVIDIA reference architectures, similar to recent initiatives from WEKA, VAST Data, Hammerspace, and DDN targeting AI inference pipelines and disaggregated memory architectures.
| Profile: MinIO | |
|---|---|
| Company | MinIO, Inc. |
| Headquarters | Redwood City, California, USA |
| Founded | 2014 |
| Founders | AB Periasamy and Garima Kapoor |
| Core Focus | High-performance object storage and AI data infrastructure |
| Primary Products | AIStor object storage platform; MemKV context memory store |
| Technology Model | Software-defined, S3-compatible distributed object storage optimized for AI and analytics workloads |
| AI Infrastructure Focus | GPU data pipelines, AI inference memory tiers, high-throughput storage for AI factories |
| Key NVIDIA Integrations | BlueField-4 STX, Spectrum-X Ethernet, Dynamo, NIXL |
| Target Customers | Hyperscalers, neoclouds, enterprise AI platforms, service providers |
| Deployment Scale | Petabyte-scale AI storage and inference environments |
| Recent Milestone | Launch of MemKV, a shared context memory tier for AI inference workloads |
🌐 We’re tracking the latest developments in AI infrastructure, memory fabrics, and GPU networking architectures. Follow our ongoing coverage at: https://convergedigest.com/category/ai-infrastructure/


