Baseten Raises $1.5 Billion to Expand AI Inference Infrastructure

Baseten raised $1.5 billion in Series F financing as demand accelerates for AI inference infrastructure that supports custom and post-trained models in production environments. The funding round was led by Altimeter Capital, Conviction, and Spark Capital, with participation from Sands Capital, Wellington Management, IVP, Greylock, Battery Ventures, D. E. Shaw Ventures, Blackbird, Durable Capital Partners, Verified Capital, 01A, and existing investors. The financing was completed in two tranches at valuations of $11 billion and $13 billion.

The company said enterprises increasingly deploy specialized models trained on proprietary data rather than relying exclusively on frontier foundation models. Baseten reported approximately 20x year-over-year revenue growth and said its platform now handles more than one billion inference calls per day. The company operates 87 clusters across 18 cloud environments, providing a multi-cloud architecture designed to support large-scale AI application deployments.

Founded in 2019 and headquartered in San Francisco, Baseten provides infrastructure software that manages AI workloads, including GPU orchestration, autoscaling, observability, billing, and developer tooling. The company said it will use the new capital to expand engineering, research, operations, and go-to-market teams while increasing investments in compute capacity. Baseten plans to triple headcount during 2026 as it scales to meet growing enterprise demand.

• Raised $1.5 billion in Series F financing

• Funding completed at valuations of $11 billion and $13 billion

• Reports approximately 20x year-over-year revenue growth

• Processes more than 1 billion inference calls daily

• Operates 87 AI clusters globally

• Supports deployments across 18 cloud environments

• Plans to triple employee headcount during 2026

• Has raised more than $2 billion since founding

• Customers include Abridge, Clay, Cursor, Lovable, Mercor, and OpenEvidence

“The future of AI will be built on millions of specialized models, and the companies building the best ones know that post-training has become existential. It’s how they build intelligence they own, on data that’s theirs, optimized for the customers they serve,” said Tuhin Srivastava, CEO and Co-Founder of Baseten.

🌐 Analysis

Baseten’s financing highlights the rapid emergence of inference as one of the fastest-growing segments of the AI infrastructure market. While much of the industry’s attention has focused on training ever-larger foundation models, enterprises are increasingly investing in the infrastructure required to serve AI applications in production. That shift has fueled growth for a new generation of inference-focused providers that optimize deployment, scaling, observability, and cost efficiency across distributed GPU environments.

The announcement also reflects the growing importance of post-training and model customization. Enterprises increasingly seek AI systems trained on proprietary datasets and optimized for specific workflows, creating demand for infrastructure platforms capable of managing large fleets of specialized models. As open-source models continue to improve, inference platforms may become a key layer in the AI stack alongside GPUs, networking, storage, and cloud infrastructure.

Profile: Baseten

AI inference infrastructure platform • Updated June 2026

Headquarters

San Francisco, California

Founded

2019

Leadership

Tuhin Srivastava
CEO & Co-Founder

Latest Funding

$1.5 billion Series F

Valuation

$11B–$13B across two funding tranches

Capital Raised

More than $2 billion since founding

Business Focus

Software infrastructure for deploying, scaling, and operating production AI inference workloads rather than manufacturing GPUs, servers, or network hardware.

Infrastructure Model

Multi-cloud inference platform that handles containerization, GPU scheduling across clouds, autoscaling, observability, and engine-level optimization.

Scale

87 clusters across 18 cloud environments

Inference Volume

More than 1 billion inference calls per day

Developer Framework

Truss, Baseten’s open-source framework for packaging models into deployable containers, including config-based deployment and custom Python model logic.

Serving Stack

Production API endpoints with autoscaling, observability, optimized serving infrastructure, versioning, deployment workflows, and OpenAI-compatible APIs for supported models.

Inference Engines

Baseten says its deployments use inference engines tuned for model architecture, with optimization across quantization, tensor parallelism, KV-cache management, and batching.

NVIDIA Stack

NVIDIA case-study material identifies Baseten’s use of NVIDIA GPUs and TensorRT-LLM for cloud-based generative AI and LLM inference.

Optimization Features

Baseten describes its inference stack as combining infrastructure and runtime optimizations, including custom kernels, speculation, optional quantization, KV-cache optimization, topology-aware parallelism, request prioritization, continuous batching, geo-aware routing, LoRA-aware routing, and fast cold starts.

Deployment Modes

Baseten describes support for deployment in Baseten Cloud, customer environments, or hybrid configurations.

Confirmed Cloud Partners

Public materials describe Baseten deployments or partnerships involving Google Cloud, Nebius, Vultr, and AWS GPU instances referenced in NVIDIA’s Baseten case study.

Workload Types

LLMs, embeddings, image generation, text-to-speech, video generation, custom models, post-trained models, and multi-step AI workflows.

AI Architecture Role

Baseten sits above raw GPU infrastructure as an application-facing inference operations layer: model packaging, serving, routing, scaling, monitoring, and cost/performance optimization.

Networking Relevance

Inference workloads emphasize latency, API delivery, multi-region routing, load balancing, service reliability, and cloud capacity management rather than the tightly coupled east-west GPU traffic patterns associated with large training clusters.

Customers Named

Abridge, Clay, Cursor, Lovable, Mercor, OpenEvidence

Growth Metric

Approximately 20× year-over-year revenue growth reported in 2026

Tags: Baseten Inference

Baseten Raises $1.5 Billion to Expand AI Inference Infrastructure

Dell Unveils Vera Rubin Rack Supporting 144 GPUs

Upscale AI Secures $190 Million for Data Center Networking for AI

Jim Carroll

Related Posts

Categories

Archives