• Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Wednesday, June 3, 2026
  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » Tensormesh Raises $20M for KV Cache-Based Inference Platform

Tensormesh Raises $20M for KV Cache-Based Inference Platform

May 27, 2026
in Start-ups
A A

Tensormesh has raised $20 million in new funding from a strategic investor group that includes  AMD VenturesAttachment.tiff,  CoreWeaveAttachment.tiff, and  NVenturesAttachment.tiff, extending its seed round and bringing total funding to $24.5 million. At the same time, the San Francisco startup announced general availability of Tensormesh Inference, a SaaS platform designed to improve AI inference efficiency by reusing previously computed model state rather than recomputing the same prompt context across every request. The company says this can reduce latency and GPU spend by up to 10x in enterprise AI deployments.

The core of the platform is KV caching—short for key-value caching—a technique that stores intermediate outputs generated while large language models process prompts. Instead of recalculating system prompts, conversation history, tool definitions, or repeated context windows with every inference call, Tensormesh retrieves that stored state from cache and serves it instantly. The approach is especially relevant for agentic AI workloads, where long prompts and multi-step reasoning loops repeatedly send overlapping context back into the model. Tensormesh says its platform makes those savings visible through a real-time dashboard showing cache hit rates, token-level cost breakdowns, time to first token, and GPU utilization metrics.

Tensormesh’s announcement stands out because of the strategic mix of backers. Investment from GPU ecosystem players and AI cloud infrastructure operators suggests growing industry interest in software optimization layers that can improve utilization of expensive accelerator infrastructure without changing application code. Tensormesh says new funding will support deeper integrations across AMD, NVIDIA, and CoreWeave environments while continuing development of its open-source  LMCacheAttachment.tiff project, which now integrates with vLLM, SGLang, TensorRT, AWS SageMaker, and Oracle OCI Data Science.

  • $20 million new funding; $24.5 million total raised
  • Investors include AMD Ventures, CoreWeave, NVentures, Valley Capital Partners, and Laude Ventures
  • Launch of Tensormesh Inference in general availability
  • Platform uses KV caching to eliminate redundant LLM inference computation
  • Claims up to 10x lower latency and GPU cost reductions
  • Offers both serverless inference and reserved enterprise deployments
  • Introduces pricing model where cached input tokens are billed at $0
  • Built on the company’s open-source LMCache project with 8,000+ GitHub stars
Profile: Tensormesh
HeadquartersSan Francisco, California
CEO / Co-FounderJunchen Jiang
Core TechnologyKV cache-based inference optimization
Flagship PlatformTensormesh Inference
Open Source ProjectLMCache
Total Funding$24.5 million
Key InvestorsAMD Ventures, CoreWeave, NVentures, Valley Capital Partners, Laude Ventures
Deployment ModelsServerless inference and reserved enterprise deployments
Primary FocusReducing GPU cost and latency for enterprise AI inference

“Tensormesh understood early that enterprises were paying AI systems to recompute the same work again and again, and built foundational infrastructure to eliminate that inefficiency and dramatically improve price-performance,” said Pete Sonsini, co-founder and general partner at Laude Ventures.

🌐 Analysis: KV caching has become one of the most important emerging optimization layers in AI inference infrastructure as enterprises confront the economics of large-scale deployment. While attention often centers on GPUs and model architectures, inference efficiency increasingly depends on software systems that reduce token recomputation, memory movement, and idle accelerator cycles. Tensormesh is entering this space as hyperscalers and AI cloud providers search for ways to stretch GPU capacity without waiting for new silicon supply.

The strategic support from AMD Ventures, NVentures, and CoreWeave also reflects a broader trend: AI infrastructure investment is moving beyond chips into the software layers that govern utilization and economics. As inference workloads expand with agentic AI and long-context models, caching and memory orchestration platforms may become a critical control plane between LLM frameworks and the underlying GPU cluster.

🌐 We’re tracking the latest developments in AI infrastructure. Follow our ongoing coverage at: https://convergedigest.com/category/ai-infrastructure/

ShareTweetShareSummarizeSummarize
Previous Post

Trans Pacific Networks Taps Indigo for Subsea Operations

Next Post

DartPoints Acquires Lexington Data Center Campus with 70 MW Potential

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

Optical

MediaTek Showcases MicroLED Optical Interconnects

June 3, 2026
All

Broadcom Q2 FY2026 Revenue Jumps 48%

June 3, 2026
Semiconductors

GlobalFoundries Joins U.S. DOE Genesis Mission

June 3, 2026
Optical

Keysight Adds GlobalFoundries Silicon Photonics PDK

June 3, 2026
All

Rivvor Proposes Sub-THz Wireless Interconnects inside Data Centers

June 3, 2026
Quantum

IBM Commits $10 Billion to Quantum Computing

June 3, 2026
Next Post

DartPoints Acquires Lexington Data Center Campus with 70 MW Potential

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io

© 2026 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io

© 2026 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version