d-Matrix announced that its Corsair AI inference accelerator platform has entered full production, with volume shipments scheduled to begin this summer for select hyperscalers, neocloud providers, and frontier AI laboratories. The company said demand has accelerated as enterprises deploy increasingly latency-sensitive agentic AI applications, including coding assistants, voice agents, and interactive AI systems that require rapid token generation and low response times.
The company positions Corsair as a complementary accelerator to GPUs rather than a replacement. In heterogeneous AI clusters, GPUs handle compute-intensive model prefill operations while Corsair accelerators execute the decode phase of inference. d-Matrix cited independent testing showing that pairing Corsair accelerators with GPUs reduced response times from approximately 24 seconds to less than two seconds in speculative decoding workloads. The platform uses an SRAM-based in-memory compute architecture combined with LPDDR5 memory rather than HBM-based packaging, a design the company says improves supply chain predictability and reduces manufacturing complexity.
To support large-scale deployments, d-Matrix is offering its SquadRack reference architecture, developed in collaboration with Arista, Broadcom, and Supermicro. The company also highlighted its April acquisition of GigaIO’s data center business, which added rack-scale systems expertise and field deployment capabilities. Corsair is manufactured using TSMC’s N6 process technology through a partnership with Alchip Technologies, and the company said supply agreements are in place to support production ramp requirements.
- Corsair AI inference accelerator platform enters full production
- Volume shipments begin this summer
- Targets hyperscalers, neoclouds, and frontier AI labs
- Designed for heterogeneous AI clusters combining GPUs and inference accelerators
- Independent testing demonstrated greater than 10x inference response improvements
- Built on TSMC N6 process technology
- Uses SRAM-based in-memory compute architecture
- Avoids HBM and CoWoS packaging dependencies
- SquadRack integrates Corsair, JetStream networking, and Aviator software
- April acquisition of GigaIO data center business expands rack-scale deployment capabilities
“We built Corsair specifically for this moment, the Age of AI Inference,” said Sid Sheth, founder and CEO of d-Matrix. “The applications that matter most today — agentic AI, interactive coding, real-time voice agents — live or die on latency.”
🌐 Analysis
The announcement reflects a broader shift occurring across AI infrastructure. While much of the industry’s attention has focused on training clusters built around increasingly powerful GPUs, inference is emerging as the dominant long-term workload. As AI models become embedded into enterprise applications and consumer services, operators are seeking ways to reduce latency and operating costs while scaling inference capacity. This has created opportunities for specialized inference accelerators designed to work alongside GPUs rather than compete directly with them.
The production ramp also follows d-Matrix’s April acquisition of GigaIO’s data center business. GigaIO built its reputation around composable and disaggregated infrastructure architectures that allow compute, memory, and accelerators to be dynamically pooled and allocated. The acquisition gives d-Matrix experienced rack-scale systems engineers, deployment expertise, and integration capabilities that complement its silicon portfolio. Combined with SquadRack, the move positions d-Matrix to deliver complete rack-level inference systems rather than standalone accelerator cards, aligning with a broader industry trend toward integrated AI infrastructure platforms.




