Baseten raised $1.5 billion in Series F financing as demand accelerates for AI inference infrastructure that supports custom and post-trained models in production environments. The funding round was led by Altimeter Capital, Conviction, and Spark Capital, with participation from Sands Capital, Wellington Management, IVP, Greylock, Battery Ventures, D. E. Shaw Ventures, Blackbird, Durable Capital Partners, Verified Capital, 01A, and existing investors. The financing was completed in two tranches at valuations of $11 billion and $13 billion.
The company said enterprises increasingly deploy specialized models trained on proprietary data rather than relying exclusively on frontier foundation models. Baseten reported approximately 20x year-over-year revenue growth and said its platform now handles more than one billion inference calls per day. The company operates 87 clusters across 18 cloud environments, providing a multi-cloud architecture designed to support large-scale AI application deployments.
Founded in 2019 and headquartered in San Francisco, Baseten provides infrastructure software that manages AI workloads, including GPU orchestration, autoscaling, observability, billing, and developer tooling. The company said it will use the new capital to expand engineering, research, operations, and go-to-market teams while increasing investments in compute capacity. Baseten plans to triple headcount during 2026 as it scales to meet growing enterprise demand.
• Raised $1.5 billion in Series F financing
• Funding completed at valuations of $11 billion and $13 billion
• Reports approximately 20x year-over-year revenue growth
• Processes more than 1 billion inference calls daily
• Operates 87 AI clusters globally
• Supports deployments across 18 cloud environments
• Plans to triple employee headcount during 2026
• Has raised more than $2 billion since founding
• Customers include Abridge, Clay, Cursor, Lovable, Mercor, and OpenEvidence
“The future of AI will be built on millions of specialized models, and the companies building the best ones know that post-training has become existential. It’s how they build intelligence they own, on data that’s theirs, optimized for the customers they serve,” said Tuhin Srivastava, CEO and Co-Founder of Baseten.
🌐 Analysis
Baseten’s financing highlights the rapid emergence of inference as one of the fastest-growing segments of the AI infrastructure market. While much of the industry’s attention has focused on training ever-larger foundation models, enterprises are increasingly investing in the infrastructure required to serve AI applications in production. That shift has fueled growth for a new generation of inference-focused providers that optimize deployment, scaling, observability, and cost efficiency across distributed GPU environments.
The announcement also reflects the growing importance of post-training and model customization. Enterprises increasingly seek AI systems trained on proprietary datasets and optimized for specific workflows, creating demand for infrastructure platforms capable of managing large fleets of specialized models. As open-source models continue to improve, inference platforms may become a key layer in the AI stack alongside GPUs, networking, storage, and cloud infrastructure.
Profile: Baseten AI inference infrastructure platform • Updated June 2026 | |
| Headquarters | San Francisco, California |
| Founded | 2019 |
| Leadership | Tuhin Srivastava CEO & Co-Founder |
| Latest Funding | $1.5 billion Series F |
| Valuation | $11B–$13B across two funding tranches |
| Capital Raised | More than $2 billion since founding |
| Business Focus | Software infrastructure for deploying, scaling, and operating production AI inference workloads rather than manufacturing GPUs, servers, or network hardware. |
| Infrastructure Model | Multi-cloud inference platform that handles containerization, GPU scheduling across clouds, autoscaling, observability, and engine-level optimization. |
| Scale | 87 clusters across 18 cloud environments |
| Inference Volume | More than 1 billion inference calls per day |
| Developer Framework | Truss, Baseten’s open-source framework for packaging models into deployable containers, including config-based deployment and custom Python model logic. |
| Serving Stack | Production API endpoints with autoscaling, observability, optimized serving infrastructure, versioning, deployment workflows, and OpenAI-compatible APIs for supported models. |
| Inference Engines | Baseten says its deployments use inference engines tuned for model architecture, with optimization across quantization, tensor parallelism, KV-cache management, and batching. |
| NVIDIA Stack | NVIDIA case-study material identifies Baseten’s use of NVIDIA GPUs and TensorRT-LLM for cloud-based generative AI and LLM inference. |
| Optimization Features | Baseten describes its inference stack as combining infrastructure and runtime optimizations, including custom kernels, speculation, optional quantization, KV-cache optimization, topology-aware parallelism, request prioritization, continuous batching, geo-aware routing, LoRA-aware routing, and fast cold starts. |
| Deployment Modes | Baseten describes support for deployment in Baseten Cloud, customer environments, or hybrid configurations. |
| Confirmed Cloud Partners | Public materials describe Baseten deployments or partnerships involving Google Cloud, Nebius, Vultr, and AWS GPU instances referenced in NVIDIA’s Baseten case study. |
| Workload Types | LLMs, embeddings, image generation, text-to-speech, video generation, custom models, post-trained models, and multi-step AI workflows. |
| AI Architecture Role | Baseten sits above raw GPU infrastructure as an application-facing inference operations layer: model packaging, serving, routing, scaling, monitoring, and cost/performance optimization. |
| Networking Relevance | Inference workloads emphasize latency, API delivery, multi-region routing, load balancing, service reliability, and cloud capacity management rather than the tightly coupled east-west GPU traffic patterns associated with large training clusters. |
| Customers Named | Abridge, Clay, Cursor, Lovable, Mercor, OpenEvidence |
| Growth Metric | Approximately 20× year-over-year revenue growth reported in 2026 |
