February 7, 2023
Portfolio
Unusual

How startup founders can shape the future of generative AI

Sandhya Hegde
No items found.
How startup founders can shape the future of generative AI How startup founders can shape the future of generative AI
All posts
Editor's note: 

Stanford Professor Douwe Kiela on what the AI developer community is learning and expecting to see in 2023 

We recently hosted Douwe Kiela, (Stanford Professor, formerly Head of Research at Hugging face and AI Research Scientist at Facebook) at an Unusual Ventures community event for founders focused on Artificial Intelligence and building the next wave of SaaS startups (sign up here to get alerts for future events!). 

What was apparent from our chat and the eager audience questions was that a lot is going to change over the next two to five years. While the public conversation today is dominated by OpenAI’s chatGPT product, there will be a great abundance of new open source models and developer tools launched that will open up new use cases and capabilities for founders and AI developers all over the world. As early-stage investors, our focus continues to be on startups solving real problems with products that make modern AI easy and safe to use. 

In this interview, Douwe Kiela speaks on this and a range of other topics in AI, including:

  • The advances in and current focus on text-to-image generation and other generative AI.
  • How Hugging Face users make the most of their features and how the community has expanded to a wide-ranging audience.
  • The philosophical, moral, and logical argument for open-source AI.
  • Why AI code generation is more useful than generic language generation.
  • The importance of making AI UX accessible for the layperson.
  • Safety bias and how to advance holistic evaluations of AI.
  • Fair use and ideas for how AI might begin to attribute its output.
  • Overcoming the current limitations of AI, including shortcomings in image generation and latency.
  • Why startups looking to AI should focus on product rather than underlying technology.

​​

Sandhya Hegde: Tell us a little bit about what your AI journey has been and how you see the evolution of modern AI.

Douwe Kiela: My story is a bit unusual. When I was in high school, I taught myself how to code, which meant that I already thought I knew everything and I didn't have to study computer science. So instead I studied philosophy because I was very interested in the human mind. I stayed interested in AI and how we can implement minds on machines. After philosophy, I started studying logic, more applied mathematics, which rolled into natural language processing, and from there rolled into more traditional machine learning.

This was really the beginning of the mainstream application of AI. My mentor at the time said, “Nobody uses neural nets.” I wanted to use neural nets because this is what the brain does, right? So this was 2011 or ’12 when AlexNet burst onto the scene and neural nets — which are now canonical, almost old-school AI — really changed the way we think about Artificial Intelligence.

Over the past couple of years, it's just been amazing to see the growth. One of the themes now that is very popular is generative AI and going beyond just understanding language and classifying examples, and really trying to generate language for all kinds of very interesting use cases. We’re in really interesting times and we’re only starting to scratch the surface of what we can do with this technology. 

Sandhya Hegde: Can you help our audience make sense of the new terminology for neural net based models? Transformers, diffusion models, foundation models, LLMs — how do you see this space panning out and how do you categorize these different models?

Douwe Kiela: The easiest way for me to think about this stuff is based on the modality that you're applying the model to. So first of all, foundation models and large language models are sort of the same thing, right? It's a bit of a contentious term, actually, “foundation model.” So I'm Stanford affiliated, but some people who are not at Stanford don't like the term. It’s still to be designed in what we're going to call these things in the long term. They're the same thing, but maybe “foundation model” is slightly more general in that it could also encompass diffusion models. We would have to ask Percy Liang, who came up with this.

The way we tend to generate language is auto-regressively, which means that we do it word by word. And so we have some prefix and then we predict the next word and then, from there, we keep predicting. The recent revolution where, where suddenly AI has gone even more mainstream, has been text-to-image applications which rely on diffusion models. Diffusion models do have some transformer components. So the way you encode the prompts or the text input, that's often a transformer model. But then, from there, this process of de-noising, or starting from noise and turning it into an image, that process is non-autoregressive, so you generate the image in one go. So it's really a different paradigm, conceptually, from a science perspective.

Sandhya Hegde: Can you share a little bit more about what Hugging Face is particularly focused on and how the company is organized?

Douwe Kiela: What we're focused on is always a tricky question because we're focused on many things. The higher level thing is just that we are trying to grow as large as possible with the community. We're not really focused on monetization yet. It's all just about how we can really make AI pervasive in the world. This is where we're heading anyway, and I think if it continues this way, then Hugging Face will really be a central hub in this whole development. What we're trying to do is just make sure that people can build cool stuff on top of our platform, on top of our libraries.

“GitHub for machine learning” is one of the ways of putting it, but the mission is really about democratizing the state-of-the-art machine learning approaches. You mentioned the Transformers library, which is the classical example. But we also have the Datasets library, which is super popular, which basically has any data set that was ever made in there, so it’s super easy for people to just experiment with things. We have a new library called Diffusers, which is like Transformers, but for diffusion models. This has things like stable diffusion. Three lines of code and you have a state-of-the-art diffusion model running.

There is a new library called Accelerate for training on multi-GPU, multi-note settings. So this used to be super difficult, where you have to keep all of these nodes in sync, and now it’s just three lines of code. So we're just hoping that we'll benefit from the growth and we're just trying to be a central point in the community.

Sandhya Hegde: What do most people in the Hugging Face community look like today? Do you have mostly developers or data scientists? How do different people leverage it and do you see that changing in the future?

Douwe Kiela: I'm always amazed by the growth that we see in the community. You can just see how popular AI is and how it's really taking over the world. But it's also really cool to see how diverse that community is.

Initially it was a pure research focus. So the original PyTorch BERT implementation that started the whole Transformers library was really focused just on enabling researchers to use a model like BERT. From there, I think because AI became so much more mainstream, it's just a lot of developers now. There are also more and more laypeople using this technology. On Hugging Face, we have a thing called Spaces. You can think of that almost as a YouTube for AI demos in a way. There’s a trending Space of the week. Stable Diffusion, Midjourney, Dreambooth, and all of these very cool, latest and greatest diffusion models are up on Spaces and you can just interact with them directly. So that opens up a whole new user base of end users of AI technology, which is very cool to see.

We also, I think a bit uniquely in the ecosystem, take an ethics-first approach. We have some world-renowned, responsible AI researchers in our team where we're constantly trying to think about how to develop this technology responsibly and how to make sure that the community has the tools and guardrails for doing things the right way. 

Sandhya Hegde: This is a good segue to the question of open source versus not. What's been the internal debate and the infrastructure and model community on this? I see a lot of people saying the only way to do this is open source because you want to democratize it. Other folks say no, offer a managed endpoint because that takes a lot of the burden off of the developers’ and founders’ plate. So I'm curious, what does this debate sound like internally? You have definitely picked a side, but what are the other pros and cons? And how should developers who are trying to test out and pilot some new features with this tech think about it?

Douwe Kiela: I just think open source is very important, especially with this sort of technology because of the risks and opportunities that it has for changing the world. I think it is very unhealthy for a model like GPT-3, or something like that, to be essentially just controlled by a bunch of Silicon Valley tech bros who can determine where this technology goes. I think it's much healthier for the world if this is out there and we can scrutinize the technology together. That's almost a moral, philosophical argument.

Some people say that, if you develop these technologies, then you shouldn't open source them. You should put them behind an API because then you can control them, so that people don't abuse the technology for misinformation or things like that. And I think that's a very convenient argument if you just want to make money from technology. I don't think that they're completely unbiased in their assessment of the risks. Maybe some people actually believe in the whole AGI thing or closing models, but I think some of the top people are almost cynically endorsing that so that they can continue making money from these APIs.

Of course, if you are a developer, it's much easier to just have an API call and talk to a model. But you can also do that with open-source models where you could actually look at the model itself in many cases. We've developed BLOOM, one of the largest large language models out there right now. You can also just look directly into the training data and try to understand why the model is generating something. There is no other model that gives you that capability. In the long term, things like explainable AI, interpretability, and trying to justify why a model does a certain thing are going to become more and more important.

Sandhya Hegde: I want to start surfacing a few audience questions. This one is particularly top of mind: What are some of the best use cases of this that either are doing things that no one imagined possible or are bringing back to life old use cases? For example, one of my favorite ones is the chatbot industry because a few years ago everyone tried to mainstream chatbots and they fell just short enough that it was frustrating rather than useful. But now, maybe, there will be a whole new way, given the advances in LLM. What are some of the use cases that you think will have the most impact? 

Douwe Kiela: That's an interesting question. A very short segue on the topic of chat: The reason Hugging Face is called Hugging Face is because the company started as a chatbot for teenagers. A fun, less serious chatbot that you could just hang out with and enjoy talking to. At some point, they pivoted to follow up on the success they had with the open sourcing of the technology that they were using to build the chatbot. 

I still think chatbots are a little bit too hard for large language models if you want to use them for a goal. The holy grail of chatbots is goal-oriented dialogue where you can try to achieve something, make your user happy, or sell them something. The goal oriented-ness and really trying to control what these large language models are doing is still one of the big open problems for the next two to five years. Beyond that, some of the more obvious things that are going to get disrupted by this new generative AI wave are more in the image and audio domain where, in maybe less than five years, we will have Hollywood-length movies generated by AI systems. Things like ShutterStock and Getty Images are no longer going to exist, and that's really happening already now. Why would you pay for a stock photo if you can generate your own custom one? 

It's a very interesting question to think about — where is this really going to get used? Because right now, if you look at the industry, a lot of the generative applications are always with humans in the loop, where it's sort of just a writing assistant, right? I think that's an interesting use case, but it's probably not a super valuable one because any human who went to school can write. So writers of regular language are relatively cheap. That's why I think code generation is much more valuable because engineers are much more expensive. So if you can make engineers more productive, this will give you a much higher value-add.

A lot of people are just jumping on the generative AI train now, but a lot of the applications of just AI and NOP in general are still just about classification. If I have a platform, I have to decide: How do I rank the content, do I take this down because it’s hate speech? And for that sort of stuff, you still don't want to use generative AI. You still want to use the old-school thing. I think the whole field is maybe jumping a little bit too much on the generative AI bandwagon where it's kind of unclear what the real killer application is going to be; whereas, I think it's much clearer for like co-generation than for image, audio, and video generation.

Sandhya Hegde: One thing you said about goal orientation and the degree of control needed to only go in one direction as opposed to what it seems to be good at doing today, which is: “I don't have a direction. Inspire me, show me a hundred options that vaguely meet some criteria I have.” That's a very, very tangential direction. What does that imply in terms of how we are thinking about fine-tuning tools or active-learning tools?

Douwe Kiela: One of the obvious answers — and you've seen OpenAI develop this — is reinforcement learning from humans. So if you do this over multiple rounds of learning with different models in the loop, you get a giant data moat. I'm very bullish on OpenAI just because they have all of this amazing data for how people want to use language models in the first place.

If you compare the open-source models that are out there now, so OPT from Meta AI and BLOOM from Hugging Face and BigScience, those models are maybe not as great as GPT-3, but they're probably quite close to the original GPT-3. But there's just been a ton of product work going into GPT-3 that makes it better. That data moat is quite valuable. 

That's a general trend in the field where we are starting to see humans and models in loops everywhere for data collection. You mentioned active learning, like if you want to do data collection in a smart way, and you already have a decent model, like a language model where you only need a couple of examples and you can already get some signal on it., then you can do data collection with modeling the loop. And if you keep updating the model and making it stronger over time, and we have some nice research results recently showing this, then you get a much steeper data curve, so you get much higher quality data much more quickly. That opens up lots of interesting use cases, especially for startups where you can just prototype super quickly without having to go very deep on the technology stack. 

Sandhya Hegde: We have a related question I would love to surface. This is from Eleanor, who says fine-tuning LLMs hasn't quite worked in her experience. What are examples where fine-tuning for a specific domain has really improved the quality of the output? Any place where we can go do more research?

Douwe Kiela: I think it depends how you define a large language model. If you include mass language models in that description, so things like BERT and RoBERTa and the basic models that everybody builds on top of in normal NOP, all of that is fine tuned. And so, basically, any application using any of these models tends to be fine-tuned.

I think it's the other way around where it's a lot less clear if few-shot or zero-shot or in-context learning actually works. What we found is super high variance and maybe doesn't actually get you the performance you need. Especially if you're already sitting on like a million data points — why wouldn't you fine-tune your model? But what this question is trying to get at is sort of the intermediate stage where you have like a generic large language model and now you want to fine tune it, let's say on like legal data or like science or something like that. And then after that, use it for something. I still think that it’s under development. I have heard of some applications of legal fine-tuned versions of things like BLOOM. Coincidentally, yesterday Galactica came out, which is a very nice example of a very specialized language model that was trained just on scientific data. This was done by Papers With Code with people at Meta AI. And that language model is just insanely good. So, surprisingly, it's not just good at scientific data, per se, but it's also very good at the BIG-bench, which is one of the benchmarks that you would use to evaluate these models. And it's actually better than BLOOM and OPT on that benchmark. So I think that's also something that we're going to see more and more: Why don't you train a large language model on PubNet or on some legal corpus, and then you don't have to train it on something generic first and then fine tune it on something? Maybe you can just train it directly. 

Sandhya Hegde: Another question around data sets: If you're building a startup, day zero, you don't have any unique data, you don't have any customers to leverage unique data from, how do you start building something that’s still valuable? How do the best startups in the Hugging Face community solve the chicken and egg problem?

Douwe Kiela: It’s much easier to prototype new ideas using this technology without having to develop a technology stack or gather the data. For me, a nice example is something like Jasper, which is quite famous now in the Valley for raising a ton of money and making more money than OpenAI. They were able to reach their customer base just because this technology was already there. And now they're also sitting on a giant moat in terms of their customer connections, but also the data that they're getting from how their customers want to use this.

They just raised a ton of money and, if I was them, I would probably now train up my own GPT-3 on my own data. Then you're cutting off the middle man and just doing it yourself. It's a nice way to bootstrap using something generic like a language model. Then, once you have enough data, you can start building on top of the language model until you have so much data that you can really build your own thing. 

How Jasper found PMF
Go deeper on building AI startups: Learn how Jasper found product-market fit on the Startup Field Guide podcast

Sandhya Hegde: And you can score better because you have done a domain specific training as opposed to a generic one. 

Douwe Kiela: Exactly. That's part of the moat, if you just look at it from a data perspective, but I think user experience is heavily underrated, but super important. If you make it very easy for laypeople to use this technology without having to custom design prompts and things like that, then they can just tell you what they want and it always works. That is super valuable. The vertical is where a lot of the money is with this technology. The jury's out on OpenAI's business model of just doing like the horizontal foundational layer because that could be a race to the bottom.

Sandhya Hegde:  A great segue to UX. As investors trying to find the right founders working on this, often there are problems where we need to invent a new user experience. Maybe there are some analogies, but there are also spaces where there’s no existing framework to leverage. What are you seeing on that front? What great user experiences are being invented that you think show us a path to the future and what are people asking from the Hugging Face community to enable new user experiences that didn't quite exist before?

Douwe Kiela: What I enjoy seeing is that, on these Hugging Face Spaces, there's a lot of super-rapid prototyping of just cool, quirky ideas. And the cool thing about Spaces is that it builds on things like Gradio, which are open-source libraries for developing demos very quickly. There's just a lot of prototyping happening. 

My favorite one now — I don't know if you've heard of r/Place on Reddit. It's basically where you can change a pixel and then people are just working together to try to make their country’s flag or something like that. You can do something like that with stable diffusion. People can paint different parts and then you get these weird images created by people building stuff with diffusion models. That’s just happening organically.

If I was at a start-up looking for ideas, I would very closely look at what the Spaces of the week are and what the trending things are there, because it really gives you a really good pulse on what people are starting to think about in terms of user experience.

Sandhya Hegde: There's, of course, the UX of humans in the loop. Like, get me started and then you assert control. How bidirectional can it be between human input and model output? Even things like how do we organize information in a world where we can create, organize, share, and collaborate on information at a hundred thousand times the rate at which we were doing it before? There’s a lot of opportunity for problems to be solved in the application layer, no question.

Coming back to the infrastructure layer. You talked about Accelerate and being able to distribute training easily as opposed to what people were doing before. That was hard, you’d have to hire someone who knew how to do distributed training. So what are other things like that which are new? What are things that are still hard for developers to implement and where is their opportunity for people to actually work on more compelling tools? In terms of the tools for developers, where are there still problems that we should be working on?

Douwe Kiela: I thought you were going in a different direction where I could talk about the work we’re doing on evaluation and trying to be a bit more holistic. We can do that first while I’m thinking about the other question. One of the big problems is around the safety bias angle. The research field has been super myopic in its focus on just pure accuracy and nothing else. It's just like we have a static test set and we want to be amazing at the static test set. But that gets you models that are super biased in super weird ways. Neural nets are amazing at picking up on biases. We have a very poor understanding of whether a model is actually biased or not, so this accuracy number just doesn't cut it anymore. There's really an evaluation crisis in the field of AI because, if you look at all of these benchmarks, they're basically saturated. We have surpassed human performance on a lot of these benchmarks.

But if you work in the field, or if you ever talk to a language model for more than a minute, you know that the technology is not even close to human levels. We have a ton of work to do, but we don't know how to measure the progress we're making because we've saturated all of the prior benchmarking. We need new benchmarks. I've been doing some work on trying to have continuously updating benchmarks, so things like dynabench or where we like to have humans and models in the loop continuously doing this. At Hugging Face, we've also built tools. There's the Evaluate library, which makes it very easy for people to also do fairness and bias measurements, and to look at efficiency and look at confidence intervals of models to really get a proper sense of how good the model really is.

We're trying to solve the kind of reproducibility crisis, as well, that exists in the field. If everybody evaluates things in their own way or if you have a company and you are the CEO and your data science team says, “We have this amazing, 10% increase,” you want to be able to validate that without them telling you that they were great. You want to do that independently. So we've been working on this evaluation on the Hub solution where we have models and data sets and metrics, and you basically evaluate any model on the Hub on any dataset using any metric. We really think that this is the way to do proper evaluation, especially when we start caring about the whole picture. Holistic evaluation of models will be like a big thing in the future. So that is something like Accelerate, but in a different space. 

Sandhya Hegde: How do you think about the issue of copyright and of lineage? Because even if something that's been generated is technically unique, it could still have a lot of latent space overlap with something else. Where do you draw the line? Is it 90% overlap? Is it 20%? Are you all thinking about that issue of how to interpret copyright? 

Douwe Kiela: We have some legal experts at Hugging Face who also work with the BigScience initiative on developing new licenses. So there's a responsible AI license and things like that, which are really great for responsible usage of AI. But, going back to the previous question about what are the areas where we can innovate, this is a very obvious one. The technology is here, how do we understand the risks of the technology? How do we mitigate safety issues? How do we speed up the deployment of generative AI models in a way where people actually believe that it's not gonna do anything weird? That’s a huge space for development. 

The copyright question is interesting. I'm not a lawyer and so I don’t think I should give a strong opinion. But from the people I've been talking to, everybody seems to think that this will all just be fair use. And that is very hard to argue otherwise. There are maybe some issues down-streaming, like licensing issues. We've all seen the lawsuit against OpenAI and GitHub-style things. For general text data, code, and basically anything that you can scrape off of the internet, it's very hard to argue a non-fair use case and it's sufficiently derivative. So I wouldn't be worried about that too much. 

The only case where I'm not really sure is for videos, because if you look at what videos exist on the internet, they tend to have a very skewed power law distribution where 99% of all videos are on YouTube. And then you have Vimeo, and then a bunch of porn websites, and then there's a very, very small percentage of videos that are just out there on the internet. YouTube has heavy restrictions on its terms. You can't just go and scrape YouTube, train a training model on that, and then like go and make money off of that model without, at some point, YouTube coming after you.

So there I'm a bit less clear, but for other other use cases, I think it should be fine. Maybe you've seen DeviantArt — they recently have their own diffusion model trained on their own user basis data. There was a lot of consternation about this. I think that is an interesting model, we're going to start seeing this more and more. The only real way around this is to be very explicit in attribution. So, if you are generating something on DeviantArt in the style of User X, then User X should probably get some royalties or something for whatever it generates. As a field, we're trying to figure that out now. 

Sandhya Hegde: And just from the mathematical perspective, what is the capability the model has to say, “OK, this is the lineage of this output. Here’s how much credit A, B, C should get whether or not it was fair use, and their royalties involved.” Are the models already capable of laying out kind of the attribution themselves or is that also something that needs to be worked on in terms of what is the benchmark? 

Douwe Kiela: There aren’t any direct benchmarks on that, so it would be interesting to develop that, but there are ways of doing this already. When I was still at Facebook, we developed this system called retrieval-augmented generation, and one of the big selling points of doing these so-called semiparametric approaches where you have a big index of data and then you have a reader model on top of that index that makes decisions, then you can say, “I'm seeing this word now because I found this example here.” And, for this example, maybe you know who owns it. So you could think of it like a giant index of images from DeviantArt and you know which user owns which image. Then you can have your model try to retrieve from the index which images it wants to take inspiration from. And that's how you divide the royalties accordingly. 

Sandhya Hegde: What are some limitations and current benchmarks around cost, performance, etc? What does the status quo look like versus what you predict the direction we are going in? Where can we be in two years in terms of new breakthroughs, as well as just overall cost and performance? For example, right now, in terms of end user experience, you still have significant waiting time for output, significant cost per inference call. I would love to hear some more predictions on your end of what that could look like. Is there a Moore’s Law for AI or is it still hard to predict?

Douwe Kiela: The Moore's Law question is a separate one. I do think there's a Moore's Law in terms of scaling compute and data. I think what we're going to continue seeing is people will be scaling more and more with computers and data. The standard players in this, like Google and OpenAI, are the places where you're going to see this stuff coming from. One of the interesting topics now is what emerges at what sort of level of scale, so you can really clearly see insights emerging in these models when they are trained at a certain skill. If they're smaller, then they just never get there. They never actually get these emergent abilities. So the real question is, what will emerge when we go a 100x further in terms of compute? It's still an open question, and what we're finding out as a community is how important data is in this equation. So it's not just about the compute that you throw at it or the size of your model or the number of parameters. It's also about the quality of the data that goes into that model. The higher the quality of that data is, probably the steeper your scaling log can be. That's a very clear sort of alpha you can have over your competitors if you have a steeper scaling log, then you can get there faster or you can train for the same amount and end up with a strictly better model. 

But in terms of the current limitations, again, I would try to split that by the modalities that we're talking about. So if you're interested in text-to-image technologies, it's still very hard to generate hands. So very obvious things, like just making sure you have the right amount of fingers, is very hard for a diffusion model. The same with faces and making them actually look realistic, generating text-to-images is not completely solved. This will require model breakthroughs. It’s not very obvious how to do that. One possible explanation or one possible direction is to train on even more images and for longer with more compute and a bigger model. And then the ability to count fingers might emerge. But maybe we need real algorithmic breakthroughs to get that to work.

The next modality that we're all going to focus on is video. Everybody knows this is coming, but there you have a new set of problems with how you move the frames without jitter and things like that. If we can do it for videos, we can probably also do it for games and other sort of more interactive environments or the metaverse. It could be fun. For pure large language models, the real issue still is the controllability. How can we actually get it to do something really useful without a human in the loop? The jury's still out on how to do that. Maybe we'll see diffusion models for NLP at some point because the nice thing about diffusion, and the fact that it's non-autoregressive, means that you can control it much better. So one of the problems with autoregressive models is that they're very hard to control because they also need to just learn to predict that the next word is done and like this sort of mundane stuff. If you can abstract that away, you have much more control over what the model does.

You mentioned latency. Latency is just one of the aspects of the user experience and just user-friendliness for this technology, and is also going to be an interesting research area. What is the right way to do this? Because people are very weird, especially when it comes to language. We anthropomorphize anything. You can name your dog, but some people name anything in their house. We anthropomorphize or we ascribe intentionality, in the philosophical way of putting it, to any kind of object. When the object produces language, which is the quintessential human property, then we are very strongly inclined to ascribe intentionality.

That comes with very important implications for how humans interact with AI systems. And so we will assume that the AI systems are like humans because that is what we do with other humans. If they're not, then we're gonna get caught out in very weird ways. The limitation is still that AI systems are not really like humans yet. The more we can make them like humans, the better things will be in our interactions with those systems. 

Sandhya Hegde: There's a specific question that you addressed, but I want to bring up since it's related, which is are we going to get bigger and bigger models? You said that there might be some problems that are solved that way. 

Douwe Kiela: Are they going to get bigger and bigger? Yes, absolutely. It is just very obvious that this will keep getting us gains and it's not really plateauing off yet. It would be stupid to not keep going, especially for the people who have the compute to try. That's definitely going to happen. But that's kind of a question of training, and that's a one-off cost where you invest very heavily in this model. 

But now the monetization question is: How do you do efficient enough inference with this model so that it's fast enough for people to use it in interesting use cases and it's cheap enough for you to do it? So OpenAI’s APIs are amazing, but they're also super expensive. That's probably because they have some subsidized compute and things like that. But if they didn't have that, then I don't think they would be very profitable. There's a huge topic there of model efficiency and there's also a lot of research happening around model distillation, model quantization, model compression, where we can try to move away from GPU-based, batched inference towards CPU-based, which is much more efficient. Or maybe even having dedicated chips. If you look at the big players, including things like TikTok but definitely Meta also, they're just building very custom AI accelerators for recommendation systems and things like that. Real, custom hardware for one specific use case. 

Sandhya Hegde: This question comes up a lot in the investor community. A lot of founders say this is the first question they get from investors when they say they are going to be using this open-source model or that one. From your side of the table, as you're trying to democratize access to this, how do you think about competitive moats for the developers who are building applied AI?

Douwe Kiela: I have the same question, actually. It's a super interesting question, so you can tackle it from many different angles. One is the open-source angle. There are very good examples of very strong moats around open source technology. So if you think of the Linux kernel and the amount of value that has created from an economic sense, it's amazing. All of these companies have different kinds of moats, so they're not like pure technology moats, but they're also like network effects, data moats, consumer moats, usability moats, branding moats. I don't know if that's still the correct usage of the word moats, but it doesn't have to just be about the technology. 

A lot of startups are making mistakes where they think that they need to be an AI company and they need to develop new algorithms or new models and do fancy new stuff. There are only a few people who are really good at doing this fancy new stuff and there are only very few places, and the rest of it is just much more boring kind of product work. I would encourage startup founders to think much more seriously about the product rather than the technology underlying it. And how can you build a model around the product and not just, okay, I have like this AI thing and I have some data and some model stuff? 

Sandhya Hegde: What does it mean to have a good domain, specialized model? If someone is trying to build a domain-specialized model, what do you see as an effective approach? How much should they lean on bootstrapping based on the existing monolithic model versus where do they start branching off? 

Douwe Kiela: That’s an interesting, open question and we will soon get more answers when GPT-4 comes out. So if GPT-4, which is rumored to be very multimodal, is going to be so amazing that everybody wants to only use that as their monolithic model for all domains to build on, then that might change the way people think about this. Then maybe it's not worth specializing too much, and you just wait for the next GPT coming up. The size of the step change there is going to be a very useful signal for all of us working in industry. 

But, as it stands at the moment, there's a ton of value in doing domain specialization. You can build on top of monolithic models, especially when you're bootstrapping. Then, at some point, with the commoditization of this technology that is currently happening, it might be very easy for you to just swap out the API call to OpenAI to an API call to Cohere or Hugging Face or wherever. So you won't have to rely too much on the underlying technology. You can just have something on top in smart ways. But then we're going back to this ultimate question that we've kept talking about here, which is, how do you really control these models? If you can control the model in an interesting domain specific way, maybe that's where the real value add is. But verticalization is where the real money is going to be in the long term. So OpenAI and their strategic partnership with Microsoft — like, Microsoft is one of the best vertical players in the world. So maybe at some point OpenAI will start selling this stuff themselves directly to customers. 

Sandhya Hegde: Are there benefits around cost and performance when it comes to specialized models? So, for example, if you are using a monolithic model versus a most domain specialized model, would you anticipate that there will be benefits in cost and performance — like lower latency, better infrastructure, cost overall — or not?

Douwe Kiela: Possibly. I think that really depends on the application. We mentioned code models, there the value add is so big that it's really worth training a specialized model for it. But you could use something; in the end, code is just like language. So you could have one language model that generates code as well. It's obvious that you want to be as good as possible in that niche. But in some of the more mundane cases, I think it’s a lot less clear. Like creative writing or things like that, maybe you just want to use a generic thing. So it really depends on, in investor terms, what your total addressable market is when you actually specialize it.

All posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

All posts
February 7, 2023
Portfolio
Unusual

How startup founders can shape the future of generative AI

Sandhya Hegde
No items found.
How startup founders can shape the future of generative AI How startup founders can shape the future of generative AI
Editor's note: 

Stanford Professor Douwe Kiela on what the AI developer community is learning and expecting to see in 2023 

We recently hosted Douwe Kiela, (Stanford Professor, formerly Head of Research at Hugging face and AI Research Scientist at Facebook) at an Unusual Ventures community event for founders focused on Artificial Intelligence and building the next wave of SaaS startups (sign up here to get alerts for future events!). 

What was apparent from our chat and the eager audience questions was that a lot is going to change over the next two to five years. While the public conversation today is dominated by OpenAI’s chatGPT product, there will be a great abundance of new open source models and developer tools launched that will open up new use cases and capabilities for founders and AI developers all over the world. As early-stage investors, our focus continues to be on startups solving real problems with products that make modern AI easy and safe to use. 

In this interview, Douwe Kiela speaks on this and a range of other topics in AI, including:

  • The advances in and current focus on text-to-image generation and other generative AI.
  • How Hugging Face users make the most of their features and how the community has expanded to a wide-ranging audience.
  • The philosophical, moral, and logical argument for open-source AI.
  • Why AI code generation is more useful than generic language generation.
  • The importance of making AI UX accessible for the layperson.
  • Safety bias and how to advance holistic evaluations of AI.
  • Fair use and ideas for how AI might begin to attribute its output.
  • Overcoming the current limitations of AI, including shortcomings in image generation and latency.
  • Why startups looking to AI should focus on product rather than underlying technology.

​​

Sandhya Hegde: Tell us a little bit about what your AI journey has been and how you see the evolution of modern AI.

Douwe Kiela: My story is a bit unusual. When I was in high school, I taught myself how to code, which meant that I already thought I knew everything and I didn't have to study computer science. So instead I studied philosophy because I was very interested in the human mind. I stayed interested in AI and how we can implement minds on machines. After philosophy, I started studying logic, more applied mathematics, which rolled into natural language processing, and from there rolled into more traditional machine learning.

This was really the beginning of the mainstream application of AI. My mentor at the time said, “Nobody uses neural nets.” I wanted to use neural nets because this is what the brain does, right? So this was 2011 or ’12 when AlexNet burst onto the scene and neural nets — which are now canonical, almost old-school AI — really changed the way we think about Artificial Intelligence.

Over the past couple of years, it's just been amazing to see the growth. One of the themes now that is very popular is generative AI and going beyond just understanding language and classifying examples, and really trying to generate language for all kinds of very interesting use cases. We’re in really interesting times and we’re only starting to scratch the surface of what we can do with this technology. 

Sandhya Hegde: Can you help our audience make sense of the new terminology for neural net based models? Transformers, diffusion models, foundation models, LLMs — how do you see this space panning out and how do you categorize these different models?

Douwe Kiela: The easiest way for me to think about this stuff is based on the modality that you're applying the model to. So first of all, foundation models and large language models are sort of the same thing, right? It's a bit of a contentious term, actually, “foundation model.” So I'm Stanford affiliated, but some people who are not at Stanford don't like the term. It’s still to be designed in what we're going to call these things in the long term. They're the same thing, but maybe “foundation model” is slightly more general in that it could also encompass diffusion models. We would have to ask Percy Liang, who came up with this.

The way we tend to generate language is auto-regressively, which means that we do it word by word. And so we have some prefix and then we predict the next word and then, from there, we keep predicting. The recent revolution where, where suddenly AI has gone even more mainstream, has been text-to-image applications which rely on diffusion models. Diffusion models do have some transformer components. So the way you encode the prompts or the text input, that's often a transformer model. But then, from there, this process of de-noising, or starting from noise and turning it into an image, that process is non-autoregressive, so you generate the image in one go. So it's really a different paradigm, conceptually, from a science perspective.

Sandhya Hegde: Can you share a little bit more about what Hugging Face is particularly focused on and how the company is organized?

Douwe Kiela: What we're focused on is always a tricky question because we're focused on many things. The higher level thing is just that we are trying to grow as large as possible with the community. We're not really focused on monetization yet. It's all just about how we can really make AI pervasive in the world. This is where we're heading anyway, and I think if it continues this way, then Hugging Face will really be a central hub in this whole development. What we're trying to do is just make sure that people can build cool stuff on top of our platform, on top of our libraries.

“GitHub for machine learning” is one of the ways of putting it, but the mission is really about democratizing the state-of-the-art machine learning approaches. You mentioned the Transformers library, which is the classical example. But we also have the Datasets library, which is super popular, which basically has any data set that was ever made in there, so it’s super easy for people to just experiment with things. We have a new library called Diffusers, which is like Transformers, but for diffusion models. This has things like stable diffusion. Three lines of code and you have a state-of-the-art diffusion model running.

There is a new library called Accelerate for training on multi-GPU, multi-note settings. So this used to be super difficult, where you have to keep all of these nodes in sync, and now it’s just three lines of code. So we're just hoping that we'll benefit from the growth and we're just trying to be a central point in the community.

Sandhya Hegde: What do most people in the Hugging Face community look like today? Do you have mostly developers or data scientists? How do different people leverage it and do you see that changing in the future?

Douwe Kiela: I'm always amazed by the growth that we see in the community. You can just see how popular AI is and how it's really taking over the world. But it's also really cool to see how diverse that community is.

Initially it was a pure research focus. So the original PyTorch BERT implementation that started the whole Transformers library was really focused just on enabling researchers to use a model like BERT. From there, I think because AI became so much more mainstream, it's just a lot of developers now. There are also more and more laypeople using this technology. On Hugging Face, we have a thing called Spaces. You can think of that almost as a YouTube for AI demos in a way. There’s a trending Space of the week. Stable Diffusion, Midjourney, Dreambooth, and all of these very cool, latest and greatest diffusion models are up on Spaces and you can just interact with them directly. So that opens up a whole new user base of end users of AI technology, which is very cool to see.

We also, I think a bit uniquely in the ecosystem, take an ethics-first approach. We have some world-renowned, responsible AI researchers in our team where we're constantly trying to think about how to develop this technology responsibly and how to make sure that the community has the tools and guardrails for doing things the right way. 

Sandhya Hegde: This is a good segue to the question of open source versus not. What's been the internal debate and the infrastructure and model community on this? I see a lot of people saying the only way to do this is open source because you want to democratize it. Other folks say no, offer a managed endpoint because that takes a lot of the burden off of the developers’ and founders’ plate. So I'm curious, what does this debate sound like internally? You have definitely picked a side, but what are the other pros and cons? And how should developers who are trying to test out and pilot some new features with this tech think about it?

Douwe Kiela: I just think open source is very important, especially with this sort of technology because of the risks and opportunities that it has for changing the world. I think it is very unhealthy for a model like GPT-3, or something like that, to be essentially just controlled by a bunch of Silicon Valley tech bros who can determine where this technology goes. I think it's much healthier for the world if this is out there and we can scrutinize the technology together. That's almost a moral, philosophical argument.

Some people say that, if you develop these technologies, then you shouldn't open source them. You should put them behind an API because then you can control them, so that people don't abuse the technology for misinformation or things like that. And I think that's a very convenient argument if you just want to make money from technology. I don't think that they're completely unbiased in their assessment of the risks. Maybe some people actually believe in the whole AGI thing or closing models, but I think some of the top people are almost cynically endorsing that so that they can continue making money from these APIs.

Of course, if you are a developer, it's much easier to just have an API call and talk to a model. But you can also do that with open-source models where you could actually look at the model itself in many cases. We've developed BLOOM, one of the largest large language models out there right now. You can also just look directly into the training data and try to understand why the model is generating something. There is no other model that gives you that capability. In the long term, things like explainable AI, interpretability, and trying to justify why a model does a certain thing are going to become more and more important.

Sandhya Hegde: I want to start surfacing a few audience questions. This one is particularly top of mind: What are some of the best use cases of this that either are doing things that no one imagined possible or are bringing back to life old use cases? For example, one of my favorite ones is the chatbot industry because a few years ago everyone tried to mainstream chatbots and they fell just short enough that it was frustrating rather than useful. But now, maybe, there will be a whole new way, given the advances in LLM. What are some of the use cases that you think will have the most impact? 

Douwe Kiela: That's an interesting question. A very short segue on the topic of chat: The reason Hugging Face is called Hugging Face is because the company started as a chatbot for teenagers. A fun, less serious chatbot that you could just hang out with and enjoy talking to. At some point, they pivoted to follow up on the success they had with the open sourcing of the technology that they were using to build the chatbot. 

I still think chatbots are a little bit too hard for large language models if you want to use them for a goal. The holy grail of chatbots is goal-oriented dialogue where you can try to achieve something, make your user happy, or sell them something. The goal oriented-ness and really trying to control what these large language models are doing is still one of the big open problems for the next two to five years. Beyond that, some of the more obvious things that are going to get disrupted by this new generative AI wave are more in the image and audio domain where, in maybe less than five years, we will have Hollywood-length movies generated by AI systems. Things like ShutterStock and Getty Images are no longer going to exist, and that's really happening already now. Why would you pay for a stock photo if you can generate your own custom one? 

It's a very interesting question to think about — where is this really going to get used? Because right now, if you look at the industry, a lot of the generative applications are always with humans in the loop, where it's sort of just a writing assistant, right? I think that's an interesting use case, but it's probably not a super valuable one because any human who went to school can write. So writers of regular language are relatively cheap. That's why I think code generation is much more valuable because engineers are much more expensive. So if you can make engineers more productive, this will give you a much higher value-add.

A lot of people are just jumping on the generative AI train now, but a lot of the applications of just AI and NOP in general are still just about classification. If I have a platform, I have to decide: How do I rank the content, do I take this down because it’s hate speech? And for that sort of stuff, you still don't want to use generative AI. You still want to use the old-school thing. I think the whole field is maybe jumping a little bit too much on the generative AI bandwagon where it's kind of unclear what the real killer application is going to be; whereas, I think it's much clearer for like co-generation than for image, audio, and video generation.

Sandhya Hegde: One thing you said about goal orientation and the degree of control needed to only go in one direction as opposed to what it seems to be good at doing today, which is: “I don't have a direction. Inspire me, show me a hundred options that vaguely meet some criteria I have.” That's a very, very tangential direction. What does that imply in terms of how we are thinking about fine-tuning tools or active-learning tools?

Douwe Kiela: One of the obvious answers — and you've seen OpenAI develop this — is reinforcement learning from humans. So if you do this over multiple rounds of learning with different models in the loop, you get a giant data moat. I'm very bullish on OpenAI just because they have all of this amazing data for how people want to use language models in the first place.

If you compare the open-source models that are out there now, so OPT from Meta AI and BLOOM from Hugging Face and BigScience, those models are maybe not as great as GPT-3, but they're probably quite close to the original GPT-3. But there's just been a ton of product work going into GPT-3 that makes it better. That data moat is quite valuable. 

That's a general trend in the field where we are starting to see humans and models in loops everywhere for data collection. You mentioned active learning, like if you want to do data collection in a smart way, and you already have a decent model, like a language model where you only need a couple of examples and you can already get some signal on it., then you can do data collection with modeling the loop. And if you keep updating the model and making it stronger over time, and we have some nice research results recently showing this, then you get a much steeper data curve, so you get much higher quality data much more quickly. That opens up lots of interesting use cases, especially for startups where you can just prototype super quickly without having to go very deep on the technology stack. 

Sandhya Hegde: We have a related question I would love to surface. This is from Eleanor, who says fine-tuning LLMs hasn't quite worked in her experience. What are examples where fine-tuning for a specific domain has really improved the quality of the output? Any place where we can go do more research?

Douwe Kiela: I think it depends how you define a large language model. If you include mass language models in that description, so things like BERT and RoBERTa and the basic models that everybody builds on top of in normal NOP, all of that is fine tuned. And so, basically, any application using any of these models tends to be fine-tuned.

I think it's the other way around where it's a lot less clear if few-shot or zero-shot or in-context learning actually works. What we found is super high variance and maybe doesn't actually get you the performance you need. Especially if you're already sitting on like a million data points — why wouldn't you fine-tune your model? But what this question is trying to get at is sort of the intermediate stage where you have like a generic large language model and now you want to fine tune it, let's say on like legal data or like science or something like that. And then after that, use it for something. I still think that it’s under development. I have heard of some applications of legal fine-tuned versions of things like BLOOM. Coincidentally, yesterday Galactica came out, which is a very nice example of a very specialized language model that was trained just on scientific data. This was done by Papers With Code with people at Meta AI. And that language model is just insanely good. So, surprisingly, it's not just good at scientific data, per se, but it's also very good at the BIG-bench, which is one of the benchmarks that you would use to evaluate these models. And it's actually better than BLOOM and OPT on that benchmark. So I think that's also something that we're going to see more and more: Why don't you train a large language model on PubNet or on some legal corpus, and then you don't have to train it on something generic first and then fine tune it on something? Maybe you can just train it directly. 

Sandhya Hegde: Another question around data sets: If you're building a startup, day zero, you don't have any unique data, you don't have any customers to leverage unique data from, how do you start building something that’s still valuable? How do the best startups in the Hugging Face community solve the chicken and egg problem?

Douwe Kiela: It’s much easier to prototype new ideas using this technology without having to develop a technology stack or gather the data. For me, a nice example is something like Jasper, which is quite famous now in the Valley for raising a ton of money and making more money than OpenAI. They were able to reach their customer base just because this technology was already there. And now they're also sitting on a giant moat in terms of their customer connections, but also the data that they're getting from how their customers want to use this.

They just raised a ton of money and, if I was them, I would probably now train up my own GPT-3 on my own data. Then you're cutting off the middle man and just doing it yourself. It's a nice way to bootstrap using something generic like a language model. Then, once you have enough data, you can start building on top of the language model until you have so much data that you can really build your own thing. 

How Jasper found PMF
Go deeper on building AI startups: Learn how Jasper found product-market fit on the Startup Field Guide podcast

Sandhya Hegde: And you can score better because you have done a domain specific training as opposed to a generic one. 

Douwe Kiela: Exactly. That's part of the moat, if you just look at it from a data perspective, but I think user experience is heavily underrated, but super important. If you make it very easy for laypeople to use this technology without having to custom design prompts and things like that, then they can just tell you what they want and it always works. That is super valuable. The vertical is where a lot of the money is with this technology. The jury's out on OpenAI's business model of just doing like the horizontal foundational layer because that could be a race to the bottom.

Sandhya Hegde:  A great segue to UX. As investors trying to find the right founders working on this, often there are problems where we need to invent a new user experience. Maybe there are some analogies, but there are also spaces where there’s no existing framework to leverage. What are you seeing on that front? What great user experiences are being invented that you think show us a path to the future and what are people asking from the Hugging Face community to enable new user experiences that didn't quite exist before?

Douwe Kiela: What I enjoy seeing is that, on these Hugging Face Spaces, there's a lot of super-rapid prototyping of just cool, quirky ideas. And the cool thing about Spaces is that it builds on things like Gradio, which are open-source libraries for developing demos very quickly. There's just a lot of prototyping happening. 

My favorite one now — I don't know if you've heard of r/Place on Reddit. It's basically where you can change a pixel and then people are just working together to try to make their country’s flag or something like that. You can do something like that with stable diffusion. People can paint different parts and then you get these weird images created by people building stuff with diffusion models. That’s just happening organically.

If I was at a start-up looking for ideas, I would very closely look at what the Spaces of the week are and what the trending things are there, because it really gives you a really good pulse on what people are starting to think about in terms of user experience.

Sandhya Hegde: There's, of course, the UX of humans in the loop. Like, get me started and then you assert control. How bidirectional can it be between human input and model output? Even things like how do we organize information in a world where we can create, organize, share, and collaborate on information at a hundred thousand times the rate at which we were doing it before? There’s a lot of opportunity for problems to be solved in the application layer, no question.

Coming back to the infrastructure layer. You talked about Accelerate and being able to distribute training easily as opposed to what people were doing before. That was hard, you’d have to hire someone who knew how to do distributed training. So what are other things like that which are new? What are things that are still hard for developers to implement and where is their opportunity for people to actually work on more compelling tools? In terms of the tools for developers, where are there still problems that we should be working on?

Douwe Kiela: I thought you were going in a different direction where I could talk about the work we’re doing on evaluation and trying to be a bit more holistic. We can do that first while I’m thinking about the other question. One of the big problems is around the safety bias angle. The research field has been super myopic in its focus on just pure accuracy and nothing else. It's just like we have a static test set and we want to be amazing at the static test set. But that gets you models that are super biased in super weird ways. Neural nets are amazing at picking up on biases. We have a very poor understanding of whether a model is actually biased or not, so this accuracy number just doesn't cut it anymore. There's really an evaluation crisis in the field of AI because, if you look at all of these benchmarks, they're basically saturated. We have surpassed human performance on a lot of these benchmarks.

But if you work in the field, or if you ever talk to a language model for more than a minute, you know that the technology is not even close to human levels. We have a ton of work to do, but we don't know how to measure the progress we're making because we've saturated all of the prior benchmarking. We need new benchmarks. I've been doing some work on trying to have continuously updating benchmarks, so things like dynabench or where we like to have humans and models in the loop continuously doing this. At Hugging Face, we've also built tools. There's the Evaluate library, which makes it very easy for people to also do fairness and bias measurements, and to look at efficiency and look at confidence intervals of models to really get a proper sense of how good the model really is.

We're trying to solve the kind of reproducibility crisis, as well, that exists in the field. If everybody evaluates things in their own way or if you have a company and you are the CEO and your data science team says, “We have this amazing, 10% increase,” you want to be able to validate that without them telling you that they were great. You want to do that independently. So we've been working on this evaluation on the Hub solution where we have models and data sets and metrics, and you basically evaluate any model on the Hub on any dataset using any metric. We really think that this is the way to do proper evaluation, especially when we start caring about the whole picture. Holistic evaluation of models will be like a big thing in the future. So that is something like Accelerate, but in a different space. 

Sandhya Hegde: How do you think about the issue of copyright and of lineage? Because even if something that's been generated is technically unique, it could still have a lot of latent space overlap with something else. Where do you draw the line? Is it 90% overlap? Is it 20%? Are you all thinking about that issue of how to interpret copyright? 

Douwe Kiela: We have some legal experts at Hugging Face who also work with the BigScience initiative on developing new licenses. So there's a responsible AI license and things like that, which are really great for responsible usage of AI. But, going back to the previous question about what are the areas where we can innovate, this is a very obvious one. The technology is here, how do we understand the risks of the technology? How do we mitigate safety issues? How do we speed up the deployment of generative AI models in a way where people actually believe that it's not gonna do anything weird? That’s a huge space for development. 

The copyright question is interesting. I'm not a lawyer and so I don’t think I should give a strong opinion. But from the people I've been talking to, everybody seems to think that this will all just be fair use. And that is very hard to argue otherwise. There are maybe some issues down-streaming, like licensing issues. We've all seen the lawsuit against OpenAI and GitHub-style things. For general text data, code, and basically anything that you can scrape off of the internet, it's very hard to argue a non-fair use case and it's sufficiently derivative. So I wouldn't be worried about that too much. 

The only case where I'm not really sure is for videos, because if you look at what videos exist on the internet, they tend to have a very skewed power law distribution where 99% of all videos are on YouTube. And then you have Vimeo, and then a bunch of porn websites, and then there's a very, very small percentage of videos that are just out there on the internet. YouTube has heavy restrictions on its terms. You can't just go and scrape YouTube, train a training model on that, and then like go and make money off of that model without, at some point, YouTube coming after you.

So there I'm a bit less clear, but for other other use cases, I think it should be fine. Maybe you've seen DeviantArt — they recently have their own diffusion model trained on their own user basis data. There was a lot of consternation about this. I think that is an interesting model, we're going to start seeing this more and more. The only real way around this is to be very explicit in attribution. So, if you are generating something on DeviantArt in the style of User X, then User X should probably get some royalties or something for whatever it generates. As a field, we're trying to figure that out now. 

Sandhya Hegde: And just from the mathematical perspective, what is the capability the model has to say, “OK, this is the lineage of this output. Here’s how much credit A, B, C should get whether or not it was fair use, and their royalties involved.” Are the models already capable of laying out kind of the attribution themselves or is that also something that needs to be worked on in terms of what is the benchmark? 

Douwe Kiela: There aren’t any direct benchmarks on that, so it would be interesting to develop that, but there are ways of doing this already. When I was still at Facebook, we developed this system called retrieval-augmented generation, and one of the big selling points of doing these so-called semiparametric approaches where you have a big index of data and then you have a reader model on top of that index that makes decisions, then you can say, “I'm seeing this word now because I found this example here.” And, for this example, maybe you know who owns it. So you could think of it like a giant index of images from DeviantArt and you know which user owns which image. Then you can have your model try to retrieve from the index which images it wants to take inspiration from. And that's how you divide the royalties accordingly. 

Sandhya Hegde: What are some limitations and current benchmarks around cost, performance, etc? What does the status quo look like versus what you predict the direction we are going in? Where can we be in two years in terms of new breakthroughs, as well as just overall cost and performance? For example, right now, in terms of end user experience, you still have significant waiting time for output, significant cost per inference call. I would love to hear some more predictions on your end of what that could look like. Is there a Moore’s Law for AI or is it still hard to predict?

Douwe Kiela: The Moore's Law question is a separate one. I do think there's a Moore's Law in terms of scaling compute and data. I think what we're going to continue seeing is people will be scaling more and more with computers and data. The standard players in this, like Google and OpenAI, are the places where you're going to see this stuff coming from. One of the interesting topics now is what emerges at what sort of level of scale, so you can really clearly see insights emerging in these models when they are trained at a certain skill. If they're smaller, then they just never get there. They never actually get these emergent abilities. So the real question is, what will emerge when we go a 100x further in terms of compute? It's still an open question, and what we're finding out as a community is how important data is in this equation. So it's not just about the compute that you throw at it or the size of your model or the number of parameters. It's also about the quality of the data that goes into that model. The higher the quality of that data is, probably the steeper your scaling log can be. That's a very clear sort of alpha you can have over your competitors if you have a steeper scaling log, then you can get there faster or you can train for the same amount and end up with a strictly better model. 

But in terms of the current limitations, again, I would try to split that by the modalities that we're talking about. So if you're interested in text-to-image technologies, it's still very hard to generate hands. So very obvious things, like just making sure you have the right amount of fingers, is very hard for a diffusion model. The same with faces and making them actually look realistic, generating text-to-images is not completely solved. This will require model breakthroughs. It’s not very obvious how to do that. One possible explanation or one possible direction is to train on even more images and for longer with more compute and a bigger model. And then the ability to count fingers might emerge. But maybe we need real algorithmic breakthroughs to get that to work.

The next modality that we're all going to focus on is video. Everybody knows this is coming, but there you have a new set of problems with how you move the frames without jitter and things like that. If we can do it for videos, we can probably also do it for games and other sort of more interactive environments or the metaverse. It could be fun. For pure large language models, the real issue still is the controllability. How can we actually get it to do something really useful without a human in the loop? The jury's still out on how to do that. Maybe we'll see diffusion models for NLP at some point because the nice thing about diffusion, and the fact that it's non-autoregressive, means that you can control it much better. So one of the problems with autoregressive models is that they're very hard to control because they also need to just learn to predict that the next word is done and like this sort of mundane stuff. If you can abstract that away, you have much more control over what the model does.

You mentioned latency. Latency is just one of the aspects of the user experience and just user-friendliness for this technology, and is also going to be an interesting research area. What is the right way to do this? Because people are very weird, especially when it comes to language. We anthropomorphize anything. You can name your dog, but some people name anything in their house. We anthropomorphize or we ascribe intentionality, in the philosophical way of putting it, to any kind of object. When the object produces language, which is the quintessential human property, then we are very strongly inclined to ascribe intentionality.

That comes with very important implications for how humans interact with AI systems. And so we will assume that the AI systems are like humans because that is what we do with other humans. If they're not, then we're gonna get caught out in very weird ways. The limitation is still that AI systems are not really like humans yet. The more we can make them like humans, the better things will be in our interactions with those systems. 

Sandhya Hegde: There's a specific question that you addressed, but I want to bring up since it's related, which is are we going to get bigger and bigger models? You said that there might be some problems that are solved that way. 

Douwe Kiela: Are they going to get bigger and bigger? Yes, absolutely. It is just very obvious that this will keep getting us gains and it's not really plateauing off yet. It would be stupid to not keep going, especially for the people who have the compute to try. That's definitely going to happen. But that's kind of a question of training, and that's a one-off cost where you invest very heavily in this model. 

But now the monetization question is: How do you do efficient enough inference with this model so that it's fast enough for people to use it in interesting use cases and it's cheap enough for you to do it? So OpenAI’s APIs are amazing, but they're also super expensive. That's probably because they have some subsidized compute and things like that. But if they didn't have that, then I don't think they would be very profitable. There's a huge topic there of model efficiency and there's also a lot of research happening around model distillation, model quantization, model compression, where we can try to move away from GPU-based, batched inference towards CPU-based, which is much more efficient. Or maybe even having dedicated chips. If you look at the big players, including things like TikTok but definitely Meta also, they're just building very custom AI accelerators for recommendation systems and things like that. Real, custom hardware for one specific use case. 

Sandhya Hegde: This question comes up a lot in the investor community. A lot of founders say this is the first question they get from investors when they say they are going to be using this open-source model or that one. From your side of the table, as you're trying to democratize access to this, how do you think about competitive moats for the developers who are building applied AI?

Douwe Kiela: I have the same question, actually. It's a super interesting question, so you can tackle it from many different angles. One is the open-source angle. There are very good examples of very strong moats around open source technology. So if you think of the Linux kernel and the amount of value that has created from an economic sense, it's amazing. All of these companies have different kinds of moats, so they're not like pure technology moats, but they're also like network effects, data moats, consumer moats, usability moats, branding moats. I don't know if that's still the correct usage of the word moats, but it doesn't have to just be about the technology. 

A lot of startups are making mistakes where they think that they need to be an AI company and they need to develop new algorithms or new models and do fancy new stuff. There are only a few people who are really good at doing this fancy new stuff and there are only very few places, and the rest of it is just much more boring kind of product work. I would encourage startup founders to think much more seriously about the product rather than the technology underlying it. And how can you build a model around the product and not just, okay, I have like this AI thing and I have some data and some model stuff? 

Sandhya Hegde: What does it mean to have a good domain, specialized model? If someone is trying to build a domain-specialized model, what do you see as an effective approach? How much should they lean on bootstrapping based on the existing monolithic model versus where do they start branching off? 

Douwe Kiela: That’s an interesting, open question and we will soon get more answers when GPT-4 comes out. So if GPT-4, which is rumored to be very multimodal, is going to be so amazing that everybody wants to only use that as their monolithic model for all domains to build on, then that might change the way people think about this. Then maybe it's not worth specializing too much, and you just wait for the next GPT coming up. The size of the step change there is going to be a very useful signal for all of us working in industry. 

But, as it stands at the moment, there's a ton of value in doing domain specialization. You can build on top of monolithic models, especially when you're bootstrapping. Then, at some point, with the commoditization of this technology that is currently happening, it might be very easy for you to just swap out the API call to OpenAI to an API call to Cohere or Hugging Face or wherever. So you won't have to rely too much on the underlying technology. You can just have something on top in smart ways. But then we're going back to this ultimate question that we've kept talking about here, which is, how do you really control these models? If you can control the model in an interesting domain specific way, maybe that's where the real value add is. But verticalization is where the real money is going to be in the long term. So OpenAI and their strategic partnership with Microsoft — like, Microsoft is one of the best vertical players in the world. So maybe at some point OpenAI will start selling this stuff themselves directly to customers. 

Sandhya Hegde: Are there benefits around cost and performance when it comes to specialized models? So, for example, if you are using a monolithic model versus a most domain specialized model, would you anticipate that there will be benefits in cost and performance — like lower latency, better infrastructure, cost overall — or not?

Douwe Kiela: Possibly. I think that really depends on the application. We mentioned code models, there the value add is so big that it's really worth training a specialized model for it. But you could use something; in the end, code is just like language. So you could have one language model that generates code as well. It's obvious that you want to be as good as possible in that niche. But in some of the more mundane cases, I think it’s a lot less clear. Like creative writing or things like that, maybe you just want to use a generic thing. So it really depends on, in investor terms, what your total addressable market is when you actually specialize it.

All posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Recent Blog Posts

No items found.