Tensordyne Tapes Out 3nm Napier AI Inference Proce

Tensordyne, a startup headquartered in Sunnyvale, California and Munich, Germany, announced the successful tape-out of its 3nm Napier AI inference processor and unveiled a rack-scale inference architecture that the company says delivers up to 17x more tokens per watt and 13x higher throughput than NVIDIA Blackwell-based systems. The company developed the platform in partnership with Broadcom and HPE Juniper Networks, with manufacturing at TSMC using its 3nm process technology.

Tensordyne’s architecture takes an unconventional approach to AI acceleration by replacing many traditional multiplication operations with logarithmic arithmetic. The company says its proprietary “Pareto” logarithmic number system converts multiplication-heavy neural network operations into simpler addition-based calculations, reducing silicon complexity and power consumption. The Napier processor integrates 144 GB of HBM3e memory, 256 MB of on-chip SRAM, and delivers up to 2.1 PFLOPS of FP8 dense compute while operating within a 300-watt thermal envelope.

The flagship deployment configuration is the TDN72 inference pod, which combines 72 processors, while a full rack integrates four pods for a total of 288 processors. Tensordyne claims a fully populated rack delivers 608 PFLOPS of FP8 compute and approximately 41.4 TB of HBM3e memory while remaining fully air-cooled. The company says a single Napier rack can support multi-trillion parameter AI models at more than 1,000 tokens per second per user and achieve equivalent throughput to multiple competing AI racks while drawing approximately 120 kW.

• Completed tape-out of the Napier AI inference processor on TSMC’s 3nm process node

• Claims up to 17x higher tokens-per-watt efficiency versus liquid-cooled NVIDIA NVL72 GB300 systems

• Claims up to 13x higher inference throughput than benchmarked alternative architectures

• Uses a proprietary logarithmic number system to reduce multiplication-intensive AI computations

• Integrates 144 GB HBM3e and 256 MB SRAM per processor

• Supports fully air-cooled rack deployments at approximately 120 kW

• Reports more than a dozen Letters of Intent and a sales pipeline exceeding $200 million

• Working with early infrastructure partners including Cirrascale Cloud Services and BlueSky Compute

“The market is hungry for fast AI; customers want speed, but achieving it has always meant accepting prohibitive costs. By optimising math, compute, memory, and networking from first principles, Napier delivers affordable inference without compromising on speed,” said Marc Bolitho, CEO of Tensordyne.

🌐 Analysis

Tensordyne enters one of the most competitive segments of AI infrastructure: large-scale inference. While most AI accelerator vendors focus on larger matrix engines, additional HBM capacity, or faster interconnects, Tensordyne is attempting a more fundamental architectural shift through logarithmic arithmetic. The concept is not new academically, but commercial implementation at hyperscale AI workloads has historically proven difficult due to precision, software compatibility, and ecosystem challenges. The tape-out milestone demonstrates that the company has moved beyond theory and simulation into silicon validation.

The company’s claims are ambitious and will require independent benchmarking. However, several elements stand out. First, the reported 300W chip power budget is dramatically lower than current flagship AI accelerators. Second, the heavy use of SRAM combined with HBM suggests a focus on minimizing memory bottlenecks that increasingly dominate inference workloads. Third, the air-cooled rack design could appeal to operators seeking alternatives to high-density liquid-cooled AI infrastructure.

Tensordyne also appears to be assembling a notable ecosystem. Broadcom’s involvement provides access to advanced ASIC and packaging technologies, while Juniper contributes expertise in scale-up networking fabrics. Investor Kevin Johnson, former CEO of Starbucks and current Goldman Sachs board member, brings enterprise and financial market credibility as the company prepares for a reported Series D fundraising round. If Tensordyne can validate even a portion of its performance and efficiency claims in customer deployments, it could become one of the more closely watched AI silicon startups heading into 2027.

Note: Tensordyne previously operated under the name Recogni throughout its stealth and early development phases. The company officially rebranded to Tensordyne in late 2025 to mark its commercial transition from conceptual silicon architecture to active hardware production.

Profile: Tensordyne
Corporate Intelligence
Company Roots	Founded as Recogni; rebranded to Tensordyne in late 2025.
Executive Leadership	Marc Bolitho (CEO)
Headquarters	Sunnyvale, California, USA • Munich, Germany
Funding Status	~$176M raised to date; preparing Series D financing round later this year.
Commercial Pipeline	Over $200M in forecasted demand across more than a dozen early customer Letters of Intent (LOIs).
Silicon Microarchitecture
Flagship Compute	Tensordyne Napier AI Inference Chip (Officially taped out)
Core Algorithmic Shift	Proprietary “Pareto” Logarithmic Number System (LNS). Embeds log math onto native silicon, transforming power-hungry matrix multiplications into hardware-level addition operations.
Process Technology	TSMC 3nm (Advanced FinFET/GAA process node)
Processor Spec Sheet	138 Billion Transistors • 144 GB HBM3e Memory • 256 MB On-Chip SRAM • 2.1 PFLOPS Dense FP8 Compute • 300W TDP per package
Rack-Scale Engineering
Rack Configuration	Up to 288 processors per single full rack composed of 4 modular TDN72 Pods (Quarter-rack servers). Total system dense compute equals 608 PFLOPS FP8.
Context Memory Subsystem	Integrated 8 TB hot context / KV cache NVMe array per server tray for sub-microsecond multi-agent workflow routing.
Cooling Mechanics	100% Air-Cooled. Integrates seamlessly into standard, un-retrofitted data center racks, eliminating liquid-cooling overhead.
Ecosystem Partners	Broadcom (Physical silicon layout) • HPE Juniper Networks (Custom scale-up chassis and high-performance fabric networking)
Market Economics & Validation
Performance Benchmarks	When running next-generation reasoning workloads (e.g., DeepSeek-R1): 🚀 13x more tokens/sec compared to an Nvidia NVL72 GB300 cluster 🌱 17x more tokens per Megawatt of power consumption
Infrastructure Advantage	Executes massive 2-Trillion Parameter Mixture-of-Experts (MoE) models at 1,000 tokens/sec on a single 120 kW rack, down from the typical 9–14 racks pulling up to 1.5 MW.
Revenue Impact	Up to $33 Million in estimated additional annual revenue generated per rack through reduced operational overhead and deployment efficiency.
Target Markets	AI Inference Clouds Hyperscalers Neocloud Providers Sovereign AI Infrastructure Enterprise Data Centers
Early Adopters	Cirrascale Cloud Services • BlueSky Compute