November 27, 2023
Portfolio
Unusual

MosaicML's product-market fit journey

Sandhya Hegde
No items found.
MosaicML's product-market fit journeyMosaicML's product-market fit journey
All posts
Editor's note: 

SFG 34: Naveen Rao on building an edge in AI infrastructure

In this episode of the Startup Field Guide podcast, Sandhya Hegde chats with Naveen Rao, co-founder of MosaicML, developer of open source infrastructure for training LLMs. The company was acquired by Databricks for $1.3 billion in July 2023. and has gone from 0 to over $30M in revenue this year in just 6 months.

Be sure to check out more Startup Field Guide Podcast episodes on Spotify, Apple, and Youtube. Hosted by Unusual Ventures General Partner Sandhya Hegde (former EVP at Amplitude), the SFG podcast uncovers how the top unicorn founders of today really found product-market fit.

If you are interested in learning more about some of the themes and ideas in this episode, please check out the Unusual Ventures Field Guides on starting an open source company, and customer development for an open source company.


Episode transcript

Sandhya Hegde

Welcome to the Startup Field Guide, where we learn from successful founders of unicorn startups, how their companies truly found product market fit. I'm your host, Sandhya Hegde, and joining us today is my partner Wei Lien Dang. Together, we'll be diving into the story of MosaicML. Mosaic is the developer of open source infrastructure for training LLMs.

The company was acquired by Databricks for $1.3 billion in July this year. And has gone from zero to over 30 million in revenue in just over six months. Joining us today is Naveen Rao, the CEO and co founder of MosaicML, and now the head of generative AI at Databricks. Welcome to the Field Guide, Naveen.

I noticed that you've been working in AI research since, at least 2012, your days at Qualcomm. And even in your previous startup, Nirvana, which got acquired by Intel. So I'm curious could you share more about those experiences and, did they play a significant role in informing when you started your company MosaicML?

Naveen Rao

Oh, yeah. This has been a long term trajectory for me. It's overnight success, 15, 20 years in the making, maybe longer at this point. I've been interested in the field since the 90s when I was an undergrad. I even did research on neuromorphic machines at that time some extending some of Carver Mead's work. If anyone's familiar with that and I came out to the Valley went to a bunch of startups, designed chips, software stacks, all of this kind of stuff. And actually went back to grad school in 2007  to rediscover this thing. I basically said it's now or never.

And that was before a lot of these techniques really started showing massive promise. But in my mind, it was very clear that neural networks, large parameter spaces were the way to solve problems we were currently not able to solve. At that time machine learning was largely regression based, and we didn't have enough data or enough compute to really do interesting things, but that started to change pretty rapidly.

And so I think I saw that earlier than a lot of folks. My family all thought I was nuts for doing that, going back to grad school with a kid and a half and all of that. But yeah, so then after finishing grad school, after a brief stint as a postdoc and in finance, then I went to Qualcomm and worked on neuromorphic computing architectures.

And that's right around the time that deep learning started becoming, I would say, prevalent in the research world. And it was like this is not just a flash in the pan. This is something, this is the change we've been looking for. I think you almost have to have this preconceived notion of change to see it sometimes, it's this is the thing I'm looking for and now I see it. And then it was clear to me then that, neural networks are a new way to express computations, a new way to, to build a learning system. And it was also clear was that we needed new computing architectures to subserve that. And that's when I quit Qualcomm and started Nirvana. This, the idea started, maybe around the end of 2013 or so, but then we started the company in the beginning of 2014.

We were the first AI chip company. And really explored this thing and I would argue pushed NVIDIA a lot to rethink their architecture, especially around tensor math and the acquisition of Nirvana was done by Intel in 2016, which kind of became the seed for a lot of the neural network efforts and accelerators. And we started a new division called Intel AI and a new corporate brand there. And grew that, I led it for a few years.

Sandhya Hegde

And what were the early applications that you saw back in 2013? When you were, thinking of your AI chip strategy, was there an obvious, this is going to be the first, killer wave of applications for my startup?

Naveen Rao

Yeah, applications are hard, right? It's interesting if I can pull up, if I pulled up my deck and I don't have it with me right now, unfortunately, but if I had that with me, I could show you personalizing device interaction was actually one of the big ones. And, at that time there were things like Siri and stuff like that coming. It was still very new and not very good, but that was very clear. And it was also very clear that's going to require a ton of compute. Also we had the notion that very large models were going to do something much better. Wasn't clear exactly what that was. This is pre-transformer, but it was clear to me that very large parameter spaces were going to do something something more magical and we're building the computing architecture to support that from day one. The architecture was distributed from day one. I think some of the very early ideas were there. Then large language models started, I guess not really large language models, but yeah. BERT style models started in 2017. And, that, I think that was the beginnings of it. Again, knew this thing was going to come and that's it. And scale was going to be important. At that time no one cared about scale, except for a few players. And even in 2021, when we started Mosaic in January of 2021. Toyed around with the idea in 2020, cause I quit Intel in the beginning of 2020. And it was clear to me that large models were now at a point where they're going to do something great. Again, before ChatGPT, only a few companies cared about it. And what then became a huge market opportunity was enabling more people to have access to this technology, bringing the costs down and the complexity down to a point where more people could use it.

The whole thing was going to explode, very similar to what we saw with PCs, with programming. In the 1970s, only a few academic researchers had access to mainframes. Once it became cheap enough, small enough, easy enough to use like an appliance, then, a whole generation of people learn how to use these, these devices, we call computers and we want to program them. I was one of them as a kid. And so I saw this happen in the past and this was another moment like that. And so the way I looked at it as a problem was if we can solve that problem of making it accessible, it's a huge market unlock. And so let's put all of our time and energy into making this easy to use so that we can unlock that market.

Wei Lien Dang

Naveen, if you go back to early 2021 and you're just starting Mosaic, adoption of these large models, even though you had a sense that they were going to have some impact, but still really early, if you look at newer LLMs and foundation models, outside of the context of the large tech companies and so on that you're referencing, curious what was your core insight when you started the company?

If you had this vision of making it accessible and so on. Like how does that translate into what you set out to build? And I'm, curious in the last couple of years, how much has that changed? Or, if at all?

Naveen Rao

Yeah. There’s a few points of clarification along the way. We set out to make. learning systems vastly cheaper and and faster and at scale. And, the methods we were exploring then to do that, we were trying to potentially even deviate from back propagation and a few other things, those things weren't practical and we abandoned them, but that was the basic research questions to ask. And then we found there's this whole path of. things we can do to make that learning process more efficient and more effective. And we started going down that path and actually turned out to be like, Oh, this is not like finding 20%. This is like finding 4X, 5X, 10X. And we started doing that. And we actually then innovated on smaller neural networks, which were more tractable at the time like ResNet, we won MLPerf benchmarks without changing the hardware. We can just change the algorithms and make them vastly more efficient. So we started seeing a lot of positive news there on the technology side. And so then it became clear that yes, this was the path. This is what we want to build. So I don't think the notion of what we want to build changed throughout. It was maybe the tactics of what we were trying to do.

 And, I have a framework for thinking about this a little bit. And this applies to technology startups specifically not necessarily product startups. There are lots of ways to build a company. You can build a great interface and have great product uptake. But in terms of technologies, you've got to be about two years out from mass adoption. If you're within six months, everybody already knows about it. This is not like a secret. If you're five years out, you just can't weather the storm long enough to keep going.So two years about seems to be the optimal amount. And actually we called it January of 2021. We really started these ideas in late 2020. And lo and behold, late 2022 is when it took off. So I think we got the timing just about right.

Wei Lien Dang
Super interesting, Naveen, because you had been steeped in AI much earlier than most people, which positioned you to understand where the puck was headed what was going to become important in the next couple of years. Just on the tactics, for a moment, in terms of what the path looked like. Was it always clear to you that, you had to go end to end in terms of the product, go from training to inference versus, a lot of folks would argue you could build, successful companies if you solved an important piece of maybe a bigger platform or, workflow or things like that. So how did you think about that? Was the plan from the beginning to build an end to end platform or did it evolve into that?

Naveen Rao
You got to have this kind of minimum useful piece. And I've seen so many people try to build these MLOps startups and I hated that whole segment. In fact, I still hate it. I think it's you're building point solutions that are just their features that it's very clear that it's not solving the thing that people think is hard, right?

You have to go and solve the actual thing that adds value. If I have a good model, now I can do all this stuff. It's not that if I have a orchestrator that can orchestrate something, but I have no idea how to actually build the model. I solve anybody's problem. You solve a very tiny portion of the population's problem.

So I think this is a bit of a gut check, honestly. It's just like understanding how the users use the tools. What matters? We did from the very beginning, think about it holistically, we want to get to the trained model, the trained model is the artifact, not the tool that enables you to train necessarily. That's part of the journey. So I don't think these point solutions make a lot of sense. And really, that just comes down to understanding how researchers and model builders and engineers think about it. Get me to the finish line. Don't show me a method to learn how to run better to the finish line.

Wei Lien Dang

Sure. And you focus a lot on open models, open LLMs. And was that always the case from the beginning? Because I think if a lot of folks would say, if you went back a couple of years, there were a lot of people saying that open source and open models aren't viable challengers to proprietary models and solutions. And I was curious what gave you the insight that there was going to be this wave of open source or open model development and how much of that was, in your view, predictable versus, what surprised you?

Naveen Rao
We had lots of internal debates about this. We're not maximalists in any direction other than building a company. And I think open source is a way to expose the world to what you're doing to understand if it's important. Do people care, right? If nobody cares about what you're doing, then maybe you need to rethink it or you need to tell that story differently.

And was efficiency of ML an important topic that people cared about? Easy way to test that, put some tools out, see if you get usage. So I think that was one part of why we opened it. I think what we also saw was that the open model, the models that we built are, they have a time to live. They're not themselves an enduring thing. If I buy a car, car is relevant for seven, eight years, maybe 10 years. I don't want to buy a car that's more than 20 years old anymore. Most people don't. So a car has a time to live, then the same way so does a model. The techniques get better, the models just get better.

You get more relevant information and you make them more specific to particular use cases. So that keeps happening. And so we said, look, it's probably better to enable people to build off of our models and then bring them into our ecosystem. And we have tools that can enable them to customize that for their purposes.So that was a very conscious thought process for us. It wasn't a just maximally ideologically driven view. It was a “we think this is the right way” to enable the users, the future users of our platform. Whether it's the right thing to do to constantly open up open source models is something we assess every step of the way.

That being said, I do think open source is a very important tool and in development of technology. It enables ecosystems to form and, people to disseminate information in a very efficient way through code. We were still in the place of let's figure out how to maximally help developers that we're building tools for, and then, we will figure out what they need and add value to the system. 

Wei Lien Dang

Naveen, maybe just given, what you referenced as the time to live, any hot takes on foundation model companies and those business models, just given the fact that, I think you're right in some aspects, maybe it'll be a race to the bottom. Maybe it's you have to keep getting better and better, any, any opinions on that?

Naveen Rao

Oh lots of opinion. I think the the vast majority of foundation model companies are just gonna fail You are not gonna beat OpenAI and you won't have a differentiated use case. You've got to do something better than they do. And If you don't, and it's cheap enough to move, then why would you use somebody else's model? So it doesn't make sense to me to just go try to be head to head unless you can beat them. If you can't beat them at the consumer use case, you've got to beat them on some other dimension.

Are you better in a particular vertical? Okay. Character AI has done that really well. They went and built their own thing, and it's really good for these personality type LLMs. Great. So they have their own niche. Our niche has been, we believe the value doesn't reside in the model itself. That is something that's ephemeral.

It's the process of building and customizing models. And that's basically what our products are about, is how do you go and do this and end up with that artifact at the end of it? Everyone has to have their take, but the vast majority of them just going and building models and calling it victory, I just don't think it's going to work.

 Like I said, the op space, the MLOp space is very similar to that. I don't think you're differentiated. I don't think you're solving a problem. Woohoo. You built a model. Great. Now a lot of our customers can do that with our tools without a whole lot of engineering work. And so what are you really doing? That's new and adding value. And I think that's the key is you have to always have to solve a problem. Just building a piece of technology because you said you can do it doesn't really prove that you can solve a problem. .

Sandhya Hegde
Going back to Mosaic ML in 2021, like what was your timeline to having something people had started using and who were your very early adopters? What were they doing with Mosaic ML?

Naveen Rao

So we started with a composer library, which is an open source library. And really this was designed for researchers mostly, was about combining different methods to make training and inference efficient. Efficient from a compute standpoint, a flox standpoint. And we went after those users first, we talked to PhD students at MIT and just people we knew from our network, and they were the ones we asked for feedback. And this was this was towards the end of 2021 is when we first had that library out. And, we learned a lot from that. What do people care about? And how do people want to use this? And we actually did get a reasonable community of folks using it. And from that, from there on, we said okay, what are the pain points?

What do they really care about? And that's when we started to bring our tools online. I think our earliest version of our tools were available probably beginning of 2022. We weren't selling them yet. We were working with some academic partners like Stanford, the Center for Research on Foundation Models. They were a partner of ours, first ones using some of our tools, in fact. We did believe in this model of external validation for what we're doing and, getting partners who will communicate their pain points to us and providing them some value in return was the way we went about it.

Sandhya Hegde

Got it. And what were they using it for? Was it model evaluation? What were some of the early things that you saw people focused on doing with Mosaic ML? And I'm curious about your take on model evaluation. I'm very curious what were some of those early use cases and how did you think about okay, what makes a model good? How do we give our developers confidence about the way they will be using these?

Naveen Rao

Yeah. The first use case that we had in 2021 were mostly people building computer vision models and things like this and for particular domains. I think we had a lot of people in industry using stuff too, because it was so cheap that you could take a single A100 instance on cloud for 20 minutes and you solved  ImageNet with our tools, that was a three or four hour thing without our tools. So that was pretty good. And it just made it so you could iterate quickly, right? So if you're an engineer at some company trying to build a computer vision system, this is just a quicker way to do it.

So we had folks doing things like that. The first onboard was, like I said, the Stanford CRFM, we actually built a model with them. The PubMedGPT model that we put out there, we open sourced it.. It was basically an LLM trained on all of PubMed. Everything from 1970 to, I think, to 2022, something like that. We learned a lot from that in terms of stability of the training process, how to make the whole process repeatable and easy to use. But they were building an LLM that was domain specific. It was basically the exact thing we're doing today at scale, the same idea was if you build something that's specific to a use case, you get higher performance and it's lower cost.

Economics are always better. So I think an underlying belief that we always have is that economics will matter. It's not a, hey, go raise a billion dollars or 2 billion and just throw it at GPUs. Economics are the physics of business and it's the way I look at it. And if you don't think about that, you are going to be in a world of hurt. Yes, you can consciously make a choice to have to put an investment into something that's okay. But you've gotta think about it from a market standpoint. I'm an engineer and a researcher, but I still know what a DCF is, right? This kind of cash flow model. Look at the model opportunity.

Like I can think about it very simply, if it costs me $10 million to train a model and I run that model and inference at 30% margin. Then I need essentially 3 million or 30 million worth of revenue to basically justify it minimum. If I don't think I can get that, then it's not worth doing. And so by making things more efficient from a cost standpoint, we can actually justify more and more use cases. You start thinking about it through this lens of a DCF. I think that's why, that's why I think a lot of the model companies today are ill founded, it's all a space race. I'm going to go build a model. I'm going to make it bigger, blah, blah, blah. It doesn't have any differentiation. No one's going to use it. It's not justified. So anyway, I think that's that's the way I look at this. I always looked at it as a business. Didn't look at it as a research project.

And I think that's a good, that's a good mindset.

Sandhya Hegde

Definitely was not that common in 2021, but a little more popular with both investors. I'm curious as you started getting that wave of early adoption, both with, industry folks, as well as students, researchers, any surprises for you?

What was some of the kind of moments of feedback that, aligned with the eventual direction in terms of how MosaicML prioritized your product roadmap?

Naveen Rao

I think I guess it was a surprise for us. Just more and more companies are being funded to do these things. And a lot of them became our customers, which was great. And so the number of people who wanted to build models from scratch was actually much bigger than we anticipated. So I think that was interesting. I think it's still true. And then, this actually became a core part of our strategy was like, hang on. The reason they want to build their models is not just for performance reasons, it's actually an ownership thing. If you're a company, if you're a big company, big bank or, whatever enterprise company, you don't actually want to outsource something that could be core to your business.

You want to own it fully. And I think that was actually a bit of a surprise for us. It actually became a core value prop. So if you want to actually go and own this, we do that. We enable you to build these things. We have the recipe, we have the model code, and then the trained model weights at the end of it are yours. And I think that was actually something we didn't anticipate early. I think the economics were something we thought about from the beginning. I actually thought more people would think that way earlier. Now it's very prevalent, you can look at the trade off between inference, cost per inference and cost per training. And you can actually essentially do this build versus buy math, right? It's very low volume. Probably just want to buy once you get to a certain volume. You justify building at different scales. That thinking took a little while because I think it was shrouded by a lot of excitement. A lot of people were just mesmerized by what they saw in ChatGPT.

And so they looked past this for a little while, but now it's being re-grounded again. And maybe because the macroeconomic situation dictates that as well. I don't know, but but I think people are now thinking about, okay, what is rational for me to do? How much am I willing to pay to own this thing to be able to customize it, to have full control over it?

Sandhya Hegde

2022 was obviously such a formative year for the market, right? Because OpenAI accidentally became a consumer company with ChatGPT and just made investing in AI a board level conversation everywhere around the world, all of a sudden, I'm curious, what you were seeing in MosaicML internally. How did it affect your, inbound demand, did it change kind of the questions people were asking or who were asking the questions and who was interested in using MosaicML?

Naveen Rao

Yeah, I mean we actually had some uptake right before then as well. I mean it was still early for us. I would say the first signed customer, we had a paying customer was in mid 22 and then we started getting a few customers on board right around the time that ChatGPT came out. I think there was a groundswell of startups coming, then they were being funded, they got seed or A rounds of 10 million bucks, and those started becoming companies that we were talking with. ChatGPT just changed the scale, it exploded right. Now all of a sudden, the awareness in the enterprise just shot through the roof. And I think they, interestingly enough, they went to security and privacy as their first thing, I guess it's not that surprising, and then we became that game in town, right?

So if you wanted something that was your own, you wanted to do it securely and privately, you came to us. And so I think we were situated well from the perspective of having some ideals about what the world should look like in terms of ownership of models. And it turned out to be right or at least positioned us well to build products against that demand stream, but that demand really just super charged after ChatGPT. So I would say, yeah, that is what drove probably a lot of our growth from the beginning of this year through the summer.

Sandhya Hegde

Great. Makes sense. And, that definitely brings me to the acquisition. This is not the first time you've had a company acquired. And I'm curious kind of what the process was like and what have been your, learnings and takeaways from going through this not once, but twice now and what was your calculus as a founder in terms of A, is acquisition the right path? And B, what's the right company?

Naveen Rao

Yeah. Having done it the first time around, there were a lot of challenges in Intel. I remember I came into Intel after the acquisition, we were put on a hiring freeze right away. They paid $408 million for Nirvana and we were put on a hiring freeze. And I was just like, really, why did you buy us? What is going on here? And then I was told that deep learning is a fad. This is not something that's going to last. And it's okay, why are we here then? I would have just stayed alone. It was a pretty rough experience being inside of Intel.

There were a lot of people who, I don't know if it's just their own interests or the way they think about it, but they just weren't willing to think about a world that was different than the one they knew. And I think that's a core problem. It's like an innovator's dilemma sort of a thing, right? It's like you got this core business and you want to put blinders on to everything else. But the reality was that core business was being disruptive.. And to me, it was abundantly clear. It was as clear as looking at the computer screen app right now to me, and these folks couldn't see it.

And to me, it was like just mind boggling. Anyway, I think what happened after a little while, it was about almost a year, we were able to form this new group because the CEO,Brian Krzanich did start to see that this is going to be important and said, okay, let's go start a new group and you're going to lead it. That changed things a bit. We definitely started developing our products and making some big changes. And at the end of the day, we did get Intel to shift towards this. There's still a lot of resistance within the company. And I think it just has to do with some of the systems and people who have just been there in a different world than what is happening now. And so it's really hard to make big companies change what they do. You've gotta look more like Nvidia than you had to look like Intel and, and that's just hard to do. So through that experience I was pretty reticent to sell this company. I had no desire to, I told everybody, I'm not gonna sell this company

And in fact, part of what I would say to folks is we could become another Databricks. There's no reason we couldn't, we're going after the new thing. And we just had to build the sales team up. And I actually had a few conversations with big players about acquisition. To me, it was actually pretty easy conversation to say no. With Databricks, it was different because, it's still run by founders, it's still a big startup. It's very true after being here now for four months, still very much a big startup.

You asked about the process. I met Ali at Cerebral Valley events in April of this year. I'd never met him before. I'd heard about him and I'd heard, this is the startup founder who was able to wrangle the clouds, right? He's been the one who could go and negotiate hard against them and carve out a market that's big and, when I met him at Cerebral Valley, it actually turned out that he knew us because we won a deal from them. Replit is a distributed IDE company. They actually use our tools to build their LLMs over Databricks. And Ali knew about this cause he actually knew the folks at Replit pretty well. And he's damn, we lost to Mosaic. So he started looking into us.

So he actually came over and talked to me at the Cerebral Valley event. I didn't know this at the time. But, it was an interesting conversation, because I was like, okay, clearly this guy is... Obviously he thinks like a founder. He is the founder. And still very hungry. Like one of the things I thought was interesting is that I congratulated him on the success of Databricks and he said, yeah, we're still a small guy. We got a lot to do, so I thought that was pretty cool. And, then we just kept in touch after that, he’d text me about stuff where we'd talk about strategies and things like this. And then early May, he texted me on the weekend. He's hey, can you talk? And I'm like, I can't talk right now, but we can talk later. And so I said at the time on Monday, I think he's thinking about acquisition. And so then he said it during the call. And I think he fully expected me to just say, no, I'm not interested.

But I had thought about it a fair bit because I was like, what are the potential outcomes over the next several years? If I look on a three year basis, I'll probably need to raise more money. I'm going to dilute more. And what's the best possible outcome? And I looked at it and I was like, it's not that different. If I joined Databricks, Databricks has a ton of growth still left. And so joined forces with Databricks, I can, basically get access to the sales channel and get access to all the security capabilities and things like that of Databricks, have a more holistic platform. And then I derisk my position in this whole ecosystem, we could become the enterprise AI player. Then I look at doubling Databricks over the next few years versus staying as Mosaic and, maybe quadrupling it or 5X-ing it, it's actually not that different in terms of with dilution. So I said, okay, maybe this does start to make sense.

And then I said, I told Ali,look, I'm actually open to the conversation. Let's talk about it. But I told him right then it's going to come down to economics. We can make that work. Great. I actually really like working with these guys or talking with these guys. And then we started to talk more with the rest of the founders of Databricks. And honestly, we all got along extremely well. It felt like people I'd known for years. And that's still true. And basically I think both of us on both sides were like, I don't know if all the details are going to work, but I do know that I can trust these folks to figure it out and we're going to be partnering together to make it work.

Sandhya Hegde

And they are really aligned as a leadership in terms of why they are interested in this, as opposed to a really big organization with multiple stakeholders who think multiple things and might acquire you even if they don't want to invest in you, right?

Naveen Rao

Exactly. Very tight alignment top down. It was just going to be clear. We're going to hit the ground running. Let's go as fast as we can.

Wei Lien Dang 

Databricks was already a major player in the AI ecosystem and now even more with Mosaic. From your vantage point, how do you see the ecosystem evolving? We'd love to hear what are you excited about and in particular, what areas do you think are starting to get more mature in terms of the adoption curve and what's still early, going back to the two year outlook what's on the horizon in your view in terms of what's necessary to move the ecosystem forward?

Naveen Rao 

Yeah, I think we are, we're probably the best position company for a lot of reasons because they spent the last 10 years building this amazing set of data tools. And so those data tools allow you to import from many different places. You can ETL the data, you can do business intelligence and plotting, different sorts of graphs. So they've built all of that. They've also built an end to end governance platform. And what that means is that you can track the lineage of data. You can track who can see what data or, any of these attributes that go with data. And now we can extend all of that to to LLMs and generative AI.

I think it puts us in a great position and we can do all of this securely and, adhering to all the policies that different enterprises have, and we have this huge sales team. So we're extremely well positioned now. What's a couple of years out, I think really understanding what the killer apps are, it's still early. Everyone got excited because something can chat, you can speak English. And that's a huge breakthrough, right? Making an artificial system, be able to generate text that's human like. It was impossible five years ago, so that is a huge breakthrough. However, what are you really going to use it for? 

Okay, it's not super reliable. So what are you going to really use it for inside of an enterprise? That's still evolving. I think even in the consumer side, it's entertaining and people were entertained by it. But now it comes down to am I really going to pay for this? I think you've seen that OpenAI's user growth. It's stalled out. And that's because of this. And I don't doubt that they're going to go and continue to innovate and try to figure out new ways to add value and get people to pay for it. But in the enterprise, it comes down to really making your data interactive. How do you do that in an easy to use way that respects privacy and governance and data ownership. And we're actively working on research to do this. There's early evidence of things like retrieval augmented generation. I think it's, I say early because there's still a lot of holes in it. You don't have guarantees if you want to make something, retrieve data accurately. You can say here's a whole bunch of documents. Ask it somethIng. It still could be wrong. That doesn't work in enterprises. They need either a guarantee or a very good SLA on performance. So I think these are the open questions that aren't solved and ones that we're really focused on solving now, and we're going to have a set of tools that are going to be amazing for enterprises to interact with the data.

Sandhya Hegde 

I'm sure you have noticed that there are regulators trying to figure out, how to think about this ecosystem. And there's a lot of geographic differences between how people are talking about it in the US versus EU. How are you thinking about what is the right strategy for you as a company to think about? Okay, there will be some regulation. We don't know exactly what it will look like. But how do you ready yourself for what is to come? And what, in your opinion, if you were asked for advice from a regulatory body, what would be your advice to them about how to think about, especially open source model infrastructures and companies putting this stuff together and wanting to use it without obviously harming either their brand or their customers?

Naveen Rao

I have been involved actually quite a lot in in the regular regulatory side of things. I was at the UK safety summit a couple of weeks ago discussing this stuff, I think it's very early and to go and start regulating development is just going to hurt us. I think the EU has done this to themselves a few times and I think regulating development is just going to push development elsewhere.

If you want to help another person, slow down your own development, right? We just don't understand these systems well enough yet. And we're not at a point where we have something that's a truly autonomous AI that roves around the world and, takes actions on things. We don't have that.

A lot of the things that I've seen highlighted as problems are actually problems in existing software. Viruses. They're self replicating. They can break out of things. They can do all of this stuff. It has nothing to do with AI. And we have technological solutions for that. So I think it's actually time to not regulate, but to actually look at technological solutions for problems that we see come up because we don't know what those problems are. We have a lot of guesswork happening. Some of this is driven by different ideological underpinnings. There's a whole group of people who feel that existential risk is right around the corner. I think it's some time scale any one of us in the AI field believes that there is some sort of a point where machines will become human intelligence and it can take on their own decisions and own actions.We have to think about what the world's going to look like. Do we want to start regulating that now, or do we want to wait till we have better data on what that looks like? I'm more on that side of the fence. 

Open source is the way we make progress. If you close off open source, you close off research, you close off academia. What that does is you just decrease the number of researchers that can solve these problems. If anything, if you close off open source, you create the problem you were trying to solve. You will close it to the point where there's always some profit interested companies who are building solutions and putting them into the world. So I think it's a horrendous mistake to close off open source. Companies can choose to open source or not, but do not make it illegal. That is in my mind, it's crystal clear. Don't make it illegal. I have a pretty strong position on this one. I do believe we need every smart person in the world out there looking at this stuff to help solve these problems. And frankly, that's part of what we built with our tools. We made it so that more people can access this stuff and we can get more ideas into the fray. Now, the other part about what should we prepare for and what do we do? We do have this governance platform.

We,have ways of respecting copyright. You can tell where data came from and you can say, okay, this model, it's trained on copyrighted data or private data or data had PII or whatever. And then you can actually serve that model only to the right places. We have that entire infrastructure. We're a little bit in the wild west right now. It was quoted as saying we, we're in the Napster moments of generative AI, because it's just let's go sweep up every bit of data we can and throw it into the model to try to make it better. It turns out that data is owned by somebody and those people don't like it when you go and try to profit off of it.

So this is going to come to a head. It's not going to keep going like this. We have to have better ways of doing this. And so thinking about governance, thinking about ownership of data is actually very important. I think transparency is important. So as a company in this space we want to have tools that allow you to disseminate information. You can share,things to between different researchers, things like this. So anyway, I think as a company we believe we have some tools today that actually help us in this world of potential regulation.

I don't know what the world's going to look like in a year or two or three in terms of where AI will be. So it's hard for us to build tools that deal with the regulation today. But I think we are well poised because we listen to our customers and we have a lot of enterprise feedback right now.

Wei Lien Dang 

I do think, Naveen, that open source can help with a lot of problems that you're describing. Starting with transparency, our view, to a large extent is that it engenders trust and accountability, and that people can clearly understand how these models are working as opposed to operating as black boxes.

Naveen Rao 

Yeah. You got to introspect, right? Even with the open source models, a lot of them haven't even published where their data came from. That's a problem. We actually try to be very transparent. If you look on our blogs, we show you exactly the data sources. In fact, we can even give you an S3 bucket of all the data.

You can go look through it yourself. We've done our best to try to filter it and make sure that we don't have anything that's illegal or copyrighted or whatever. There's trillions of tokens in there. I guarantee you we've missed something. We want to make it transparent. I think that principle is all we have right now. Until we start seeing real problems in the world, then we can start putting in the appropriate rules. But principles are what we can rely on right now. And openness is a big part of it. Open source is absolutely crucial to to AI safety in my mind.

Sandhya Hegde 

Yeah, we're definitely seeing the same pattern and on the stable diffusion side of the world. There's just so many startups that are using copyrighted artists and, photos that belong to celebrities who would love a way to figure out how to both control as well as monetize what they own.

And definitely I think your point about it being a Napster moment for that right now is very real and I think one of the things that I expect needs to happen is there needs to be like more monetization for the legal system to catch up and then rein it in and hope there will be people who are able to offer infrastructure that incorporates ownership as opposed to just, everyone having to stop because they haven't found a way to do that. And I think, Spotify is a great example of this, right? Yeah, so exciting days ahead. I'm curious what would be your advice to AI founders getting started in 2024. Obviously lots of unknowns, you have to stay very nimble. But, if you were chatting with them, what would be your advice?

Naveen Rao 

Yeah, find those problems that are new ground and you can innovate on. I think evaluation, we touched on it a little bit is an important one. It's pretty unsolved. If you come in with something that's actually useful you get share, right?

Don't pick point solutions. Pick something that's more holistic. Pick things that unlock a bigger market if they're solved. I think this is key. And It's a question I usually ask folks when  I get pitched on seed companies a lot. I invest in a lot of seed companies and I always ask them that, if you solve this what next, what's the rest of the market?

People get very fixated on a particular problem. That's probably why they're good at solving those problems, but at the same time, you have to look beyond it. Okay, I'm going to solve this. Why does it matter? So I always say once you find that thing that has this market ahead of it, that you can't even think about how big it is, that's where you put all your energy, right? Stop everything else and put it all into that, because that's the thing you have to solve, not the thing that unlocks 100 million market, you got to look at it like, no, this changes everything, that's the way I've gone about it in the past.

And I think that's the way we make fundamental change in the world. I encourage founders to think that way, to look for those opportunities. There's a lot of them out there. I'm sure. And I have no idea what they are, but some of these great founders come and pitch me something. I'm like, holy crap. that's great. I never would have thought of that. They looked at the problem in a new way. So think about the unlock, I think is the simple thing to say in two years.

Sandhya Hegde 

Yeah. If you have to really do the math to calculate market size, you're probably not in the right zip code.

Naveen Rao

That's right. You're probably too late.

Sandhya Hegde 

Thank you so much for coming on our podcast and chatting about MosaicML. It's been such a wonderful journey. I feel like you have really been at the leading edge of a whole new industry being born. Hopefully you'll write a book about it sometime.

All posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

All posts
November 27, 2023
Portfolio
Unusual

MosaicML's product-market fit journey

Sandhya Hegde
No items found.
MosaicML's product-market fit journeyMosaicML's product-market fit journey
Editor's note: 

SFG 34: Naveen Rao on building an edge in AI infrastructure

In this episode of the Startup Field Guide podcast, Sandhya Hegde chats with Naveen Rao, co-founder of MosaicML, developer of open source infrastructure for training LLMs. The company was acquired by Databricks for $1.3 billion in July 2023. and has gone from 0 to over $30M in revenue this year in just 6 months.

Be sure to check out more Startup Field Guide Podcast episodes on Spotify, Apple, and Youtube. Hosted by Unusual Ventures General Partner Sandhya Hegde (former EVP at Amplitude), the SFG podcast uncovers how the top unicorn founders of today really found product-market fit.

If you are interested in learning more about some of the themes and ideas in this episode, please check out the Unusual Ventures Field Guides on starting an open source company, and customer development for an open source company.


Episode transcript

Sandhya Hegde

Welcome to the Startup Field Guide, where we learn from successful founders of unicorn startups, how their companies truly found product market fit. I'm your host, Sandhya Hegde, and joining us today is my partner Wei Lien Dang. Together, we'll be diving into the story of MosaicML. Mosaic is the developer of open source infrastructure for training LLMs.

The company was acquired by Databricks for $1.3 billion in July this year. And has gone from zero to over 30 million in revenue in just over six months. Joining us today is Naveen Rao, the CEO and co founder of MosaicML, and now the head of generative AI at Databricks. Welcome to the Field Guide, Naveen.

I noticed that you've been working in AI research since, at least 2012, your days at Qualcomm. And even in your previous startup, Nirvana, which got acquired by Intel. So I'm curious could you share more about those experiences and, did they play a significant role in informing when you started your company MosaicML?

Naveen Rao

Oh, yeah. This has been a long term trajectory for me. It's overnight success, 15, 20 years in the making, maybe longer at this point. I've been interested in the field since the 90s when I was an undergrad. I even did research on neuromorphic machines at that time some extending some of Carver Mead's work. If anyone's familiar with that and I came out to the Valley went to a bunch of startups, designed chips, software stacks, all of this kind of stuff. And actually went back to grad school in 2007  to rediscover this thing. I basically said it's now or never.

And that was before a lot of these techniques really started showing massive promise. But in my mind, it was very clear that neural networks, large parameter spaces were the way to solve problems we were currently not able to solve. At that time machine learning was largely regression based, and we didn't have enough data or enough compute to really do interesting things, but that started to change pretty rapidly.

And so I think I saw that earlier than a lot of folks. My family all thought I was nuts for doing that, going back to grad school with a kid and a half and all of that. But yeah, so then after finishing grad school, after a brief stint as a postdoc and in finance, then I went to Qualcomm and worked on neuromorphic computing architectures.

And that's right around the time that deep learning started becoming, I would say, prevalent in the research world. And it was like this is not just a flash in the pan. This is something, this is the change we've been looking for. I think you almost have to have this preconceived notion of change to see it sometimes, it's this is the thing I'm looking for and now I see it. And then it was clear to me then that, neural networks are a new way to express computations, a new way to, to build a learning system. And it was also clear was that we needed new computing architectures to subserve that. And that's when I quit Qualcomm and started Nirvana. This, the idea started, maybe around the end of 2013 or so, but then we started the company in the beginning of 2014.

We were the first AI chip company. And really explored this thing and I would argue pushed NVIDIA a lot to rethink their architecture, especially around tensor math and the acquisition of Nirvana was done by Intel in 2016, which kind of became the seed for a lot of the neural network efforts and accelerators. And we started a new division called Intel AI and a new corporate brand there. And grew that, I led it for a few years.

Sandhya Hegde

And what were the early applications that you saw back in 2013? When you were, thinking of your AI chip strategy, was there an obvious, this is going to be the first, killer wave of applications for my startup?

Naveen Rao

Yeah, applications are hard, right? It's interesting if I can pull up, if I pulled up my deck and I don't have it with me right now, unfortunately, but if I had that with me, I could show you personalizing device interaction was actually one of the big ones. And, at that time there were things like Siri and stuff like that coming. It was still very new and not very good, but that was very clear. And it was also very clear that's going to require a ton of compute. Also we had the notion that very large models were going to do something much better. Wasn't clear exactly what that was. This is pre-transformer, but it was clear to me that very large parameter spaces were going to do something something more magical and we're building the computing architecture to support that from day one. The architecture was distributed from day one. I think some of the very early ideas were there. Then large language models started, I guess not really large language models, but yeah. BERT style models started in 2017. And, that, I think that was the beginnings of it. Again, knew this thing was going to come and that's it. And scale was going to be important. At that time no one cared about scale, except for a few players. And even in 2021, when we started Mosaic in January of 2021. Toyed around with the idea in 2020, cause I quit Intel in the beginning of 2020. And it was clear to me that large models were now at a point where they're going to do something great. Again, before ChatGPT, only a few companies cared about it. And what then became a huge market opportunity was enabling more people to have access to this technology, bringing the costs down and the complexity down to a point where more people could use it.

The whole thing was going to explode, very similar to what we saw with PCs, with programming. In the 1970s, only a few academic researchers had access to mainframes. Once it became cheap enough, small enough, easy enough to use like an appliance, then, a whole generation of people learn how to use these, these devices, we call computers and we want to program them. I was one of them as a kid. And so I saw this happen in the past and this was another moment like that. And so the way I looked at it as a problem was if we can solve that problem of making it accessible, it's a huge market unlock. And so let's put all of our time and energy into making this easy to use so that we can unlock that market.

Wei Lien Dang

Naveen, if you go back to early 2021 and you're just starting Mosaic, adoption of these large models, even though you had a sense that they were going to have some impact, but still really early, if you look at newer LLMs and foundation models, outside of the context of the large tech companies and so on that you're referencing, curious what was your core insight when you started the company?

If you had this vision of making it accessible and so on. Like how does that translate into what you set out to build? And I'm, curious in the last couple of years, how much has that changed? Or, if at all?

Naveen Rao

Yeah. There’s a few points of clarification along the way. We set out to make. learning systems vastly cheaper and and faster and at scale. And, the methods we were exploring then to do that, we were trying to potentially even deviate from back propagation and a few other things, those things weren't practical and we abandoned them, but that was the basic research questions to ask. And then we found there's this whole path of. things we can do to make that learning process more efficient and more effective. And we started going down that path and actually turned out to be like, Oh, this is not like finding 20%. This is like finding 4X, 5X, 10X. And we started doing that. And we actually then innovated on smaller neural networks, which were more tractable at the time like ResNet, we won MLPerf benchmarks without changing the hardware. We can just change the algorithms and make them vastly more efficient. So we started seeing a lot of positive news there on the technology side. And so then it became clear that yes, this was the path. This is what we want to build. So I don't think the notion of what we want to build changed throughout. It was maybe the tactics of what we were trying to do.

 And, I have a framework for thinking about this a little bit. And this applies to technology startups specifically not necessarily product startups. There are lots of ways to build a company. You can build a great interface and have great product uptake. But in terms of technologies, you've got to be about two years out from mass adoption. If you're within six months, everybody already knows about it. This is not like a secret. If you're five years out, you just can't weather the storm long enough to keep going.So two years about seems to be the optimal amount. And actually we called it January of 2021. We really started these ideas in late 2020. And lo and behold, late 2022 is when it took off. So I think we got the timing just about right.

Wei Lien Dang
Super interesting, Naveen, because you had been steeped in AI much earlier than most people, which positioned you to understand where the puck was headed what was going to become important in the next couple of years. Just on the tactics, for a moment, in terms of what the path looked like. Was it always clear to you that, you had to go end to end in terms of the product, go from training to inference versus, a lot of folks would argue you could build, successful companies if you solved an important piece of maybe a bigger platform or, workflow or things like that. So how did you think about that? Was the plan from the beginning to build an end to end platform or did it evolve into that?

Naveen Rao
You got to have this kind of minimum useful piece. And I've seen so many people try to build these MLOps startups and I hated that whole segment. In fact, I still hate it. I think it's you're building point solutions that are just their features that it's very clear that it's not solving the thing that people think is hard, right?

You have to go and solve the actual thing that adds value. If I have a good model, now I can do all this stuff. It's not that if I have a orchestrator that can orchestrate something, but I have no idea how to actually build the model. I solve anybody's problem. You solve a very tiny portion of the population's problem.

So I think this is a bit of a gut check, honestly. It's just like understanding how the users use the tools. What matters? We did from the very beginning, think about it holistically, we want to get to the trained model, the trained model is the artifact, not the tool that enables you to train necessarily. That's part of the journey. So I don't think these point solutions make a lot of sense. And really, that just comes down to understanding how researchers and model builders and engineers think about it. Get me to the finish line. Don't show me a method to learn how to run better to the finish line.

Wei Lien Dang

Sure. And you focus a lot on open models, open LLMs. And was that always the case from the beginning? Because I think if a lot of folks would say, if you went back a couple of years, there were a lot of people saying that open source and open models aren't viable challengers to proprietary models and solutions. And I was curious what gave you the insight that there was going to be this wave of open source or open model development and how much of that was, in your view, predictable versus, what surprised you?

Naveen Rao
We had lots of internal debates about this. We're not maximalists in any direction other than building a company. And I think open source is a way to expose the world to what you're doing to understand if it's important. Do people care, right? If nobody cares about what you're doing, then maybe you need to rethink it or you need to tell that story differently.

And was efficiency of ML an important topic that people cared about? Easy way to test that, put some tools out, see if you get usage. So I think that was one part of why we opened it. I think what we also saw was that the open model, the models that we built are, they have a time to live. They're not themselves an enduring thing. If I buy a car, car is relevant for seven, eight years, maybe 10 years. I don't want to buy a car that's more than 20 years old anymore. Most people don't. So a car has a time to live, then the same way so does a model. The techniques get better, the models just get better.

You get more relevant information and you make them more specific to particular use cases. So that keeps happening. And so we said, look, it's probably better to enable people to build off of our models and then bring them into our ecosystem. And we have tools that can enable them to customize that for their purposes.So that was a very conscious thought process for us. It wasn't a just maximally ideologically driven view. It was a “we think this is the right way” to enable the users, the future users of our platform. Whether it's the right thing to do to constantly open up open source models is something we assess every step of the way.

That being said, I do think open source is a very important tool and in development of technology. It enables ecosystems to form and, people to disseminate information in a very efficient way through code. We were still in the place of let's figure out how to maximally help developers that we're building tools for, and then, we will figure out what they need and add value to the system. 

Wei Lien Dang

Naveen, maybe just given, what you referenced as the time to live, any hot takes on foundation model companies and those business models, just given the fact that, I think you're right in some aspects, maybe it'll be a race to the bottom. Maybe it's you have to keep getting better and better, any, any opinions on that?

Naveen Rao

Oh lots of opinion. I think the the vast majority of foundation model companies are just gonna fail You are not gonna beat OpenAI and you won't have a differentiated use case. You've got to do something better than they do. And If you don't, and it's cheap enough to move, then why would you use somebody else's model? So it doesn't make sense to me to just go try to be head to head unless you can beat them. If you can't beat them at the consumer use case, you've got to beat them on some other dimension.

Are you better in a particular vertical? Okay. Character AI has done that really well. They went and built their own thing, and it's really good for these personality type LLMs. Great. So they have their own niche. Our niche has been, we believe the value doesn't reside in the model itself. That is something that's ephemeral.

It's the process of building and customizing models. And that's basically what our products are about, is how do you go and do this and end up with that artifact at the end of it? Everyone has to have their take, but the vast majority of them just going and building models and calling it victory, I just don't think it's going to work.

 Like I said, the op space, the MLOp space is very similar to that. I don't think you're differentiated. I don't think you're solving a problem. Woohoo. You built a model. Great. Now a lot of our customers can do that with our tools without a whole lot of engineering work. And so what are you really doing? That's new and adding value. And I think that's the key is you have to always have to solve a problem. Just building a piece of technology because you said you can do it doesn't really prove that you can solve a problem. .

Sandhya Hegde
Going back to Mosaic ML in 2021, like what was your timeline to having something people had started using and who were your very early adopters? What were they doing with Mosaic ML?

Naveen Rao

So we started with a composer library, which is an open source library. And really this was designed for researchers mostly, was about combining different methods to make training and inference efficient. Efficient from a compute standpoint, a flox standpoint. And we went after those users first, we talked to PhD students at MIT and just people we knew from our network, and they were the ones we asked for feedback. And this was this was towards the end of 2021 is when we first had that library out. And, we learned a lot from that. What do people care about? And how do people want to use this? And we actually did get a reasonable community of folks using it. And from that, from there on, we said okay, what are the pain points?

What do they really care about? And that's when we started to bring our tools online. I think our earliest version of our tools were available probably beginning of 2022. We weren't selling them yet. We were working with some academic partners like Stanford, the Center for Research on Foundation Models. They were a partner of ours, first ones using some of our tools, in fact. We did believe in this model of external validation for what we're doing and, getting partners who will communicate their pain points to us and providing them some value in return was the way we went about it.

Sandhya Hegde

Got it. And what were they using it for? Was it model evaluation? What were some of the early things that you saw people focused on doing with Mosaic ML? And I'm curious about your take on model evaluation. I'm very curious what were some of those early use cases and how did you think about okay, what makes a model good? How do we give our developers confidence about the way they will be using these?

Naveen Rao

Yeah. The first use case that we had in 2021 were mostly people building computer vision models and things like this and for particular domains. I think we had a lot of people in industry using stuff too, because it was so cheap that you could take a single A100 instance on cloud for 20 minutes and you solved  ImageNet with our tools, that was a three or four hour thing without our tools. So that was pretty good. And it just made it so you could iterate quickly, right? So if you're an engineer at some company trying to build a computer vision system, this is just a quicker way to do it.

So we had folks doing things like that. The first onboard was, like I said, the Stanford CRFM, we actually built a model with them. The PubMedGPT model that we put out there, we open sourced it.. It was basically an LLM trained on all of PubMed. Everything from 1970 to, I think, to 2022, something like that. We learned a lot from that in terms of stability of the training process, how to make the whole process repeatable and easy to use. But they were building an LLM that was domain specific. It was basically the exact thing we're doing today at scale, the same idea was if you build something that's specific to a use case, you get higher performance and it's lower cost.

Economics are always better. So I think an underlying belief that we always have is that economics will matter. It's not a, hey, go raise a billion dollars or 2 billion and just throw it at GPUs. Economics are the physics of business and it's the way I look at it. And if you don't think about that, you are going to be in a world of hurt. Yes, you can consciously make a choice to have to put an investment into something that's okay. But you've gotta think about it from a market standpoint. I'm an engineer and a researcher, but I still know what a DCF is, right? This kind of cash flow model. Look at the model opportunity.

Like I can think about it very simply, if it costs me $10 million to train a model and I run that model and inference at 30% margin. Then I need essentially 3 million or 30 million worth of revenue to basically justify it minimum. If I don't think I can get that, then it's not worth doing. And so by making things more efficient from a cost standpoint, we can actually justify more and more use cases. You start thinking about it through this lens of a DCF. I think that's why, that's why I think a lot of the model companies today are ill founded, it's all a space race. I'm going to go build a model. I'm going to make it bigger, blah, blah, blah. It doesn't have any differentiation. No one's going to use it. It's not justified. So anyway, I think that's that's the way I look at this. I always looked at it as a business. Didn't look at it as a research project.

And I think that's a good, that's a good mindset.

Sandhya Hegde

Definitely was not that common in 2021, but a little more popular with both investors. I'm curious as you started getting that wave of early adoption, both with, industry folks, as well as students, researchers, any surprises for you?

What was some of the kind of moments of feedback that, aligned with the eventual direction in terms of how MosaicML prioritized your product roadmap?

Naveen Rao

I think I guess it was a surprise for us. Just more and more companies are being funded to do these things. And a lot of them became our customers, which was great. And so the number of people who wanted to build models from scratch was actually much bigger than we anticipated. So I think that was interesting. I think it's still true. And then, this actually became a core part of our strategy was like, hang on. The reason they want to build their models is not just for performance reasons, it's actually an ownership thing. If you're a company, if you're a big company, big bank or, whatever enterprise company, you don't actually want to outsource something that could be core to your business.

You want to own it fully. And I think that was actually a bit of a surprise for us. It actually became a core value prop. So if you want to actually go and own this, we do that. We enable you to build these things. We have the recipe, we have the model code, and then the trained model weights at the end of it are yours. And I think that was actually something we didn't anticipate early. I think the economics were something we thought about from the beginning. I actually thought more people would think that way earlier. Now it's very prevalent, you can look at the trade off between inference, cost per inference and cost per training. And you can actually essentially do this build versus buy math, right? It's very low volume. Probably just want to buy once you get to a certain volume. You justify building at different scales. That thinking took a little while because I think it was shrouded by a lot of excitement. A lot of people were just mesmerized by what they saw in ChatGPT.

And so they looked past this for a little while, but now it's being re-grounded again. And maybe because the macroeconomic situation dictates that as well. I don't know, but but I think people are now thinking about, okay, what is rational for me to do? How much am I willing to pay to own this thing to be able to customize it, to have full control over it?

Sandhya Hegde

2022 was obviously such a formative year for the market, right? Because OpenAI accidentally became a consumer company with ChatGPT and just made investing in AI a board level conversation everywhere around the world, all of a sudden, I'm curious, what you were seeing in MosaicML internally. How did it affect your, inbound demand, did it change kind of the questions people were asking or who were asking the questions and who was interested in using MosaicML?

Naveen Rao

Yeah, I mean we actually had some uptake right before then as well. I mean it was still early for us. I would say the first signed customer, we had a paying customer was in mid 22 and then we started getting a few customers on board right around the time that ChatGPT came out. I think there was a groundswell of startups coming, then they were being funded, they got seed or A rounds of 10 million bucks, and those started becoming companies that we were talking with. ChatGPT just changed the scale, it exploded right. Now all of a sudden, the awareness in the enterprise just shot through the roof. And I think they, interestingly enough, they went to security and privacy as their first thing, I guess it's not that surprising, and then we became that game in town, right?

So if you wanted something that was your own, you wanted to do it securely and privately, you came to us. And so I think we were situated well from the perspective of having some ideals about what the world should look like in terms of ownership of models. And it turned out to be right or at least positioned us well to build products against that demand stream, but that demand really just super charged after ChatGPT. So I would say, yeah, that is what drove probably a lot of our growth from the beginning of this year through the summer.

Sandhya Hegde

Great. Makes sense. And, that definitely brings me to the acquisition. This is not the first time you've had a company acquired. And I'm curious kind of what the process was like and what have been your, learnings and takeaways from going through this not once, but twice now and what was your calculus as a founder in terms of A, is acquisition the right path? And B, what's the right company?

Naveen Rao

Yeah. Having done it the first time around, there were a lot of challenges in Intel. I remember I came into Intel after the acquisition, we were put on a hiring freeze right away. They paid $408 million for Nirvana and we were put on a hiring freeze. And I was just like, really, why did you buy us? What is going on here? And then I was told that deep learning is a fad. This is not something that's going to last. And it's okay, why are we here then? I would have just stayed alone. It was a pretty rough experience being inside of Intel.

There were a lot of people who, I don't know if it's just their own interests or the way they think about it, but they just weren't willing to think about a world that was different than the one they knew. And I think that's a core problem. It's like an innovator's dilemma sort of a thing, right? It's like you got this core business and you want to put blinders on to everything else. But the reality was that core business was being disruptive.. And to me, it was abundantly clear. It was as clear as looking at the computer screen app right now to me, and these folks couldn't see it.

And to me, it was like just mind boggling. Anyway, I think what happened after a little while, it was about almost a year, we were able to form this new group because the CEO,Brian Krzanich did start to see that this is going to be important and said, okay, let's go start a new group and you're going to lead it. That changed things a bit. We definitely started developing our products and making some big changes. And at the end of the day, we did get Intel to shift towards this. There's still a lot of resistance within the company. And I think it just has to do with some of the systems and people who have just been there in a different world than what is happening now. And so it's really hard to make big companies change what they do. You've gotta look more like Nvidia than you had to look like Intel and, and that's just hard to do. So through that experience I was pretty reticent to sell this company. I had no desire to, I told everybody, I'm not gonna sell this company

And in fact, part of what I would say to folks is we could become another Databricks. There's no reason we couldn't, we're going after the new thing. And we just had to build the sales team up. And I actually had a few conversations with big players about acquisition. To me, it was actually pretty easy conversation to say no. With Databricks, it was different because, it's still run by founders, it's still a big startup. It's very true after being here now for four months, still very much a big startup.

You asked about the process. I met Ali at Cerebral Valley events in April of this year. I'd never met him before. I'd heard about him and I'd heard, this is the startup founder who was able to wrangle the clouds, right? He's been the one who could go and negotiate hard against them and carve out a market that's big and, when I met him at Cerebral Valley, it actually turned out that he knew us because we won a deal from them. Replit is a distributed IDE company. They actually use our tools to build their LLMs over Databricks. And Ali knew about this cause he actually knew the folks at Replit pretty well. And he's damn, we lost to Mosaic. So he started looking into us.

So he actually came over and talked to me at the Cerebral Valley event. I didn't know this at the time. But, it was an interesting conversation, because I was like, okay, clearly this guy is... Obviously he thinks like a founder. He is the founder. And still very hungry. Like one of the things I thought was interesting is that I congratulated him on the success of Databricks and he said, yeah, we're still a small guy. We got a lot to do, so I thought that was pretty cool. And, then we just kept in touch after that, he’d text me about stuff where we'd talk about strategies and things like this. And then early May, he texted me on the weekend. He's hey, can you talk? And I'm like, I can't talk right now, but we can talk later. And so I said at the time on Monday, I think he's thinking about acquisition. And so then he said it during the call. And I think he fully expected me to just say, no, I'm not interested.

But I had thought about it a fair bit because I was like, what are the potential outcomes over the next several years? If I look on a three year basis, I'll probably need to raise more money. I'm going to dilute more. And what's the best possible outcome? And I looked at it and I was like, it's not that different. If I joined Databricks, Databricks has a ton of growth still left. And so joined forces with Databricks, I can, basically get access to the sales channel and get access to all the security capabilities and things like that of Databricks, have a more holistic platform. And then I derisk my position in this whole ecosystem, we could become the enterprise AI player. Then I look at doubling Databricks over the next few years versus staying as Mosaic and, maybe quadrupling it or 5X-ing it, it's actually not that different in terms of with dilution. So I said, okay, maybe this does start to make sense.

And then I said, I told Ali,look, I'm actually open to the conversation. Let's talk about it. But I told him right then it's going to come down to economics. We can make that work. Great. I actually really like working with these guys or talking with these guys. And then we started to talk more with the rest of the founders of Databricks. And honestly, we all got along extremely well. It felt like people I'd known for years. And that's still true. And basically I think both of us on both sides were like, I don't know if all the details are going to work, but I do know that I can trust these folks to figure it out and we're going to be partnering together to make it work.

Sandhya Hegde

And they are really aligned as a leadership in terms of why they are interested in this, as opposed to a really big organization with multiple stakeholders who think multiple things and might acquire you even if they don't want to invest in you, right?

Naveen Rao

Exactly. Very tight alignment top down. It was just going to be clear. We're going to hit the ground running. Let's go as fast as we can.

Wei Lien Dang 

Databricks was already a major player in the AI ecosystem and now even more with Mosaic. From your vantage point, how do you see the ecosystem evolving? We'd love to hear what are you excited about and in particular, what areas do you think are starting to get more mature in terms of the adoption curve and what's still early, going back to the two year outlook what's on the horizon in your view in terms of what's necessary to move the ecosystem forward?

Naveen Rao 

Yeah, I think we are, we're probably the best position company for a lot of reasons because they spent the last 10 years building this amazing set of data tools. And so those data tools allow you to import from many different places. You can ETL the data, you can do business intelligence and plotting, different sorts of graphs. So they've built all of that. They've also built an end to end governance platform. And what that means is that you can track the lineage of data. You can track who can see what data or, any of these attributes that go with data. And now we can extend all of that to to LLMs and generative AI.

I think it puts us in a great position and we can do all of this securely and, adhering to all the policies that different enterprises have, and we have this huge sales team. So we're extremely well positioned now. What's a couple of years out, I think really understanding what the killer apps are, it's still early. Everyone got excited because something can chat, you can speak English. And that's a huge breakthrough, right? Making an artificial system, be able to generate text that's human like. It was impossible five years ago, so that is a huge breakthrough. However, what are you really going to use it for? 

Okay, it's not super reliable. So what are you going to really use it for inside of an enterprise? That's still evolving. I think even in the consumer side, it's entertaining and people were entertained by it. But now it comes down to am I really going to pay for this? I think you've seen that OpenAI's user growth. It's stalled out. And that's because of this. And I don't doubt that they're going to go and continue to innovate and try to figure out new ways to add value and get people to pay for it. But in the enterprise, it comes down to really making your data interactive. How do you do that in an easy to use way that respects privacy and governance and data ownership. And we're actively working on research to do this. There's early evidence of things like retrieval augmented generation. I think it's, I say early because there's still a lot of holes in it. You don't have guarantees if you want to make something, retrieve data accurately. You can say here's a whole bunch of documents. Ask it somethIng. It still could be wrong. That doesn't work in enterprises. They need either a guarantee or a very good SLA on performance. So I think these are the open questions that aren't solved and ones that we're really focused on solving now, and we're going to have a set of tools that are going to be amazing for enterprises to interact with the data.

Sandhya Hegde 

I'm sure you have noticed that there are regulators trying to figure out, how to think about this ecosystem. And there's a lot of geographic differences between how people are talking about it in the US versus EU. How are you thinking about what is the right strategy for you as a company to think about? Okay, there will be some regulation. We don't know exactly what it will look like. But how do you ready yourself for what is to come? And what, in your opinion, if you were asked for advice from a regulatory body, what would be your advice to them about how to think about, especially open source model infrastructures and companies putting this stuff together and wanting to use it without obviously harming either their brand or their customers?

Naveen Rao

I have been involved actually quite a lot in in the regular regulatory side of things. I was at the UK safety summit a couple of weeks ago discussing this stuff, I think it's very early and to go and start regulating development is just going to hurt us. I think the EU has done this to themselves a few times and I think regulating development is just going to push development elsewhere.

If you want to help another person, slow down your own development, right? We just don't understand these systems well enough yet. And we're not at a point where we have something that's a truly autonomous AI that roves around the world and, takes actions on things. We don't have that.

A lot of the things that I've seen highlighted as problems are actually problems in existing software. Viruses. They're self replicating. They can break out of things. They can do all of this stuff. It has nothing to do with AI. And we have technological solutions for that. So I think it's actually time to not regulate, but to actually look at technological solutions for problems that we see come up because we don't know what those problems are. We have a lot of guesswork happening. Some of this is driven by different ideological underpinnings. There's a whole group of people who feel that existential risk is right around the corner. I think it's some time scale any one of us in the AI field believes that there is some sort of a point where machines will become human intelligence and it can take on their own decisions and own actions.We have to think about what the world's going to look like. Do we want to start regulating that now, or do we want to wait till we have better data on what that looks like? I'm more on that side of the fence. 

Open source is the way we make progress. If you close off open source, you close off research, you close off academia. What that does is you just decrease the number of researchers that can solve these problems. If anything, if you close off open source, you create the problem you were trying to solve. You will close it to the point where there's always some profit interested companies who are building solutions and putting them into the world. So I think it's a horrendous mistake to close off open source. Companies can choose to open source or not, but do not make it illegal. That is in my mind, it's crystal clear. Don't make it illegal. I have a pretty strong position on this one. I do believe we need every smart person in the world out there looking at this stuff to help solve these problems. And frankly, that's part of what we built with our tools. We made it so that more people can access this stuff and we can get more ideas into the fray. Now, the other part about what should we prepare for and what do we do? We do have this governance platform.

We,have ways of respecting copyright. You can tell where data came from and you can say, okay, this model, it's trained on copyrighted data or private data or data had PII or whatever. And then you can actually serve that model only to the right places. We have that entire infrastructure. We're a little bit in the wild west right now. It was quoted as saying we, we're in the Napster moments of generative AI, because it's just let's go sweep up every bit of data we can and throw it into the model to try to make it better. It turns out that data is owned by somebody and those people don't like it when you go and try to profit off of it.

So this is going to come to a head. It's not going to keep going like this. We have to have better ways of doing this. And so thinking about governance, thinking about ownership of data is actually very important. I think transparency is important. So as a company in this space we want to have tools that allow you to disseminate information. You can share,things to between different researchers, things like this. So anyway, I think as a company we believe we have some tools today that actually help us in this world of potential regulation.

I don't know what the world's going to look like in a year or two or three in terms of where AI will be. So it's hard for us to build tools that deal with the regulation today. But I think we are well poised because we listen to our customers and we have a lot of enterprise feedback right now.

Wei Lien Dang 

I do think, Naveen, that open source can help with a lot of problems that you're describing. Starting with transparency, our view, to a large extent is that it engenders trust and accountability, and that people can clearly understand how these models are working as opposed to operating as black boxes.

Naveen Rao 

Yeah. You got to introspect, right? Even with the open source models, a lot of them haven't even published where their data came from. That's a problem. We actually try to be very transparent. If you look on our blogs, we show you exactly the data sources. In fact, we can even give you an S3 bucket of all the data.

You can go look through it yourself. We've done our best to try to filter it and make sure that we don't have anything that's illegal or copyrighted or whatever. There's trillions of tokens in there. I guarantee you we've missed something. We want to make it transparent. I think that principle is all we have right now. Until we start seeing real problems in the world, then we can start putting in the appropriate rules. But principles are what we can rely on right now. And openness is a big part of it. Open source is absolutely crucial to to AI safety in my mind.

Sandhya Hegde 

Yeah, we're definitely seeing the same pattern and on the stable diffusion side of the world. There's just so many startups that are using copyrighted artists and, photos that belong to celebrities who would love a way to figure out how to both control as well as monetize what they own.

And definitely I think your point about it being a Napster moment for that right now is very real and I think one of the things that I expect needs to happen is there needs to be like more monetization for the legal system to catch up and then rein it in and hope there will be people who are able to offer infrastructure that incorporates ownership as opposed to just, everyone having to stop because they haven't found a way to do that. And I think, Spotify is a great example of this, right? Yeah, so exciting days ahead. I'm curious what would be your advice to AI founders getting started in 2024. Obviously lots of unknowns, you have to stay very nimble. But, if you were chatting with them, what would be your advice?

Naveen Rao 

Yeah, find those problems that are new ground and you can innovate on. I think evaluation, we touched on it a little bit is an important one. It's pretty unsolved. If you come in with something that's actually useful you get share, right?

Don't pick point solutions. Pick something that's more holistic. Pick things that unlock a bigger market if they're solved. I think this is key. And It's a question I usually ask folks when  I get pitched on seed companies a lot. I invest in a lot of seed companies and I always ask them that, if you solve this what next, what's the rest of the market?

People get very fixated on a particular problem. That's probably why they're good at solving those problems, but at the same time, you have to look beyond it. Okay, I'm going to solve this. Why does it matter? So I always say once you find that thing that has this market ahead of it, that you can't even think about how big it is, that's where you put all your energy, right? Stop everything else and put it all into that, because that's the thing you have to solve, not the thing that unlocks 100 million market, you got to look at it like, no, this changes everything, that's the way I've gone about it in the past.

And I think that's the way we make fundamental change in the world. I encourage founders to think that way, to look for those opportunities. There's a lot of them out there. I'm sure. And I have no idea what they are, but some of these great founders come and pitch me something. I'm like, holy crap. that's great. I never would have thought of that. They looked at the problem in a new way. So think about the unlock, I think is the simple thing to say in two years.

Sandhya Hegde 

Yeah. If you have to really do the math to calculate market size, you're probably not in the right zip code.

Naveen Rao

That's right. You're probably too late.

Sandhya Hegde 

Thank you so much for coming on our podcast and chatting about MosaicML. It's been such a wonderful journey. I feel like you have really been at the leading edge of a whole new industry being born. Hopefully you'll write a book about it sometime.

All posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.