
Baseten announces first platform expansion powered by the Baseten Inference Stack: APIs for open-source AI models and features for training models to improve inference performance
SAN FRANCISCO–(BUSINESS WIRE)–Baseten, the leader for mission-critical inference, announced the public launch of Baseten Model APIs and the closed beta of Baseten Training today. These new products enable AI teams to seamlessly transition from rapid prototyping to scaling in production, building on Basetenโs proprietary inference stack.
In recent months, new releases of DeepSeek, Llama, and Qwen models erased the quality gap between open and closed models. Organizations are more incentivized than ever to use open models in their products. Many AI teams have been limited to testing open models at low scale due to insufficient performance, reliability, and economics offered by model endpoint providers. While easy to get started with, the deficiencies of these shared model endpoints have fundamentally gated enterprisesโ ability to convert prototypes into high-functioning products.
Basetenโs new products – Model APIs and Training – solve two critical bottlenecks in the AI lifecycle. Both products are built using Basetenโs Inference Stack and Inference-optimized Infrastructure, which power inference at scale in production for leading AI companies like Writer, Descript, and Abridge. Using Model APIs, developers can instantly access open-source models optimized for maximum inference performance and cost-efficiency to rapidly create production-ready minimum viable products (MVPs) or test new workloads.
โIn the AI market, your number one differentiator is how fast you can move,โ said Tuhin Srivastava, co-founder and CEO of Baseten. โModel APIs give developers the speed and confidence to ship AI features knowing that weโve handled the heavy lifting on performance and scale.โ Baseten Model APIs enable AI engineers to test open models with a confident scaling story in place from day one. As inference increases, Model APIs customers can easily transfer to Dedicated Deployments that provide greater reliability, performance, and economics at scale.
“With Baseten, we now support open-source models like DeepSeek and Llama in Retool, giving users more flexibility for what they can build,โ said DJ Zappegos, Engineering Manager at Retool. โOur customers are creating AI apps and workflows, and Baseten’s Model APIs deliver the enterprise-grade performance and reliability they need to ship to production.”
Customers can also use Basetenโs new Training product to rapidly train and tune models, which will result in superior inference performance, quality, and cost-efficiency to further optimize inference workloads. Unlike traditional training solutions that operate in siloed research environments, Baseten Training runs on the same production-optimized infrastructure that powers its inference. This coherence ensures that models trained or fine-tuned on Baseten will behave consistently in production, with no last-minute refactoring. Together, the latest offerings enable customers to get products to market more rapidly, improve performance and quality, and reduce costs for mission-critical inference workloads
These launches reinforce Basetenโs belief that product-focused AI teams must care deeply about inference performance, cost, and quality. โSpeed, reliability, and cost-efficiency are non-negotiables, and thatโs where we devote 100 percent of our focus,โ said Amir Haghighat, co-founder and CTO of Baseten. โOur Baseten Inference Stack is purpose-built for production AI because you canโt just have one piece work well. It takes everything working well together, which is why we ensure that each layer of the Inference Stack is optimized to work with the other pieces.โ
โHaving lifelike text-to-speech requires models to operate with very low latency and very high quality,โ said Amu Varma, co-founder of Canopy Labs. โWe chose Baseten as our preferred inference provider for Orpheus TTS because we want our customers to have the best performance possible. Basetenโs Inference Stack allows our customers to create voice applications that sound as close to human as possible.โ
Teams can start with a quick MVP and seamlessly scale it to a dedicated, production-grade deployment when needed, without changing platforms. An enterprise can prototype a feature on Baseten Cloud, then graduate to its own private clusters or on-prem deployment (via Basetenโs hybrid and self-hosted options) for greater control, performance tuning, and cost optimization, all with the same code and tooling. This โdevelop once, deploy anywhereโ capability directly results from Basetenโs Inference-optimized Infrastructure, which abstracts the complexity of multi-cloud and on-premise orchestration for the user.
The news follows on a year of considerable growth for the company. In February, Baseten announced the close of a series C funding round co-led by IVP and Spark and which moved its total amount of venture capital funding to $135 million. It was recently named to Forbes AI 50 2025, a list of the pre-eminent privately held tech companies in AI which also featured a number of companies that Baseten powers 100 percent of the inference for, like Writer and Abridge.
About Baseten
Baseten is the leader in infrastructure software for high-scale AI products, offering the industry’s most powerful AI inference platform. Committed to delivering exceptional performance, reliability, and cost-efficiency, Baseten is on a mission to help the next great AI products scale. Top-tier investors, including IVP, Spark, Greylock, Conviction, Base Case, and South Park Commons back Baseten. Learn more at Baseten.co
Contacts
Media contact:
Creighton Vance for Baseten
[email protected]




