Presentation

Synthesize 2023: Large Language Models and new opportunities in Generative AI

February 8, 2023

A panel discussion on the emerging opportunities with LLMs and GenAI

Read the blog post

Transcription

John Myers (01:18):

Hello everyone. My name is John Myers. I am one of the co-founders and CTO here at Gretel.ai. I am pleased to present a panel here on large language models and their opportunities in generative Artificial Intelligence. I'm joined by two amazing panelists here from the industry, Danny Lange, who is a Senior VP of Artificial Intelligence at Unity Technologies, and Jonathan Cohen, VP of Applied Research at NVIDIA. I'm really excited to get to talk to you guys and pick your brains about LLMs today. I think what we'll do is we'll just get started with introductions. So if you guys don't mind, we'll start with Danny and then Jonathan. Love to get a quick introduction about yourselves and where you're working, what you're working on, and specifically if you can talk a little bit about the type of work you're doing with large language models and then we'll dive in after that.

Danny Lange (02:18):

Yeah, thank you for having us. So I'm Danny, I'm running a large part of AI at Unity. We are a gaming platform company, very popular. I think that three quarters of all mobile games are developed on our platform. I've been in this space for very long time. Been six years at Unity. Before that, I was head of AI at Uber and prior to that GM for machine learning at Amazon and launched first machine learning service in AWS. So I've been at this for a very long time and the reason I bring it up is that I really think that large language models have accelerated the whole field over the last few years, and I think that 2022 was a big breakthrough year for large language models and therefore, also for the whole concept of AI.

(03:17)
I still, after all these years struggle to say AI because it's such a broad concept. But I do think that large language models are giving us a hint of what's going to happen in the future. And I'm sure we'll get back to dive into this over the next half an hour. But at Unity, we are looking at large language models as a very inspiring and new and fascinating concept to add to video games. And you can imagine that you will have much interesting interactions with your game when you are in PCs. They suddenly appear a lot smarter than they did yesterday.

Jonathan Cohen (04:04):

I'm Jonathan Cohen, I've been at NVIDIA, I started in 2008. I spent a couple of years at Apple and then I came back to NVIDIA. I've done lots of things in my career. My role right now is I oversee a couple of R&D teams, but including the effort to develop the technology and then productize a platform based around large language models. I think the interest at NVIDIA, obviously, so much of this work is done on our hardware platform and our software stack, NVIDIA.AI for training models and PyTorch and TensorFlow and cuDNN to accelerate the training. But if you think about what we do as a company, we build platforms for accelerated computing. So we are interested in what are the things that people are doing with their computers? What's driving major workloads across all the places? Which computers are running from embedded to data centers? What are the key computations and then how do we make that run faster?

(05:12)
And it seems clear to me that the rise of systems that can understand human language is going to be, I believe the single most important workload for computers in the future. We do all day long, all we are doing right now is talk and communicate. People communicate. And so, automating the understanding of human communication, is going to impact everything and it's going to run everywhere and do everything. And it will allow us to automate things that perform tasks that could never have been performed before, automate all sorts of actions, create all sorts of new experiences from entertainment to enterprise, to empowering workers, empowering people. And so, the role of NVIDIA and all this is we want to figure out what that's going to look like and build the platform to power all of that. And so, my team is developing, laying the groundwork for products that start to try to address that opportunity.

John Myers (06:15):

Great, thanks so much. That's a super insightful to hear. So let's just jump right in. So to get kind of started with a common baseline for the audience, how would you guys at a very high level describe what a large language model is? And then for comparison's sake, are there near peer models that you know would compare them or contrast them to? And are there any strong arguments that illustrate what LLMs are good for, what they're not good for?

Jonathan Cohen (06:52):

Maybe I can just answer some perspective and very curious what Danny thinks. People talk about these models that people have started to use the term foundation models. And I think a really good working definition of a foundation model as opposed to all the other neural networks and machine learning models we have is something that out of the box can kind of do anything. It's like a universal model somehow. It's maybe not the best at everything and you would always assume some very narrow tailored model you trained just to do this one thing is going to be the best. But if you have one model that's just sort of generally good at everything and any task you throw at it, it can do pretty well out of the box. That is this foundation that you can then build all sorts of tools and processes and automation on top of.

(07:44)
And I think what what's emerged in the last really two years is this insight. And I think there's a couple of key people who really had this vision and pushed on it. [inaudible 00:07:58], I think deserves tremendous credit from OpenAI. That hey, if you take this transformer architecture and just keep scaling it up and scaling it up, eventually it might kind of cross over into some model, train it on enough data with enough compute and enough neural net capacity. Eventually, you might have this model that can just do anything you throw at it. And that turns out to be true. And I think we're seeing the ChatGBT being this example that really jumped into the public consciousness because it's so easy to use. And I think that's really what is a quantum difference. In some sense, these algorithms have been around for several years now. But really, it turns out that at a certain scale, they become models that can just kind of do anything and that's really quite impressive.

Danny Lange (08:47):

Yeah, I really want to take you off on that, Jonathan, because that is something that is very unique for these models. I've spent years and years in the space of reinforcement learning, especially in connection with games, training in PCs. And to move in games, to move in 3D, we've used a lot of reinforcement learning for that. And one of the things that we would consistently run into is that these models, they over-optimize on what they see and then they forget what they previously learned. It's a bit like little children, the new networks get very quickly reordered and they get very good at the next task and the previous task has just gone. And this is something that has really bothered us for years. And then we see large language models having this exact capability that we have been looking for, which is I can learn and learn and I remember it all.

(09:56)
And what does tell me, by the way this is unrelated to large language models is that maybe there is something we can learn from those models that we can incorporate back into other models that allows us to achieve this foundational concept, that maybe language models is showing us the way on that. But I think that's a very important aspect that these models are foundational and they can be used in a very diverse domain space. And that's new, we are not being used to that with all the previous models. Whether that has been for advertisement targeting or testing gameplay using reinforcement, etc. All these models have had a tendency to be very specialized and the moment you just change the domain slightly, they just fall apart. We are not seeing that with large language models.

Jonathan Cohen (11:00):

There's this a really interesting paper I was just reading by Kenneth Lee and some people at Harvard where they take a GPT and they train it, fine-tune it, I guess, on lots of examples of the game of fellow. And then they kind of open up the model and look at the internal activations and do a bunch of analysis. And they show pretty convincingly that just from showing this language model, lots of examples of a fellow being played, it is building an internal representation of the board and the board state. And it's not just doing some statistical thing, it actually is learning the concepts of the game and what are legal moves.

(11:42)
I think there are results, something like it makes over 99% of the moves that it makes are legal. And so, it's learning what's a legal move, how does this game work, what does it board and all this stuff stuff. So there's something about this model, these architectures that they really are able to learn concepts and build internal models of the world, which seems like a thing that wasn't really happening before and it's starting to happen now at this scale that we're at. I think it totally changes what you can imagine an AI can do now, I think.

Danny Lange (12:16):

That sounds like very promising research. I do want to mention that there's a very strong hypothesis that language originated out of spatial problem solving many years ago. That humans build up a language capability basically to share information about threats and predators and food opportunities, etc. And that language was used to describe these spatial relationships and that has always given me hope that we can tie the world around us, the 3D world we live in together with the language. And I wouldn't be surprised if you really dive in and you look at many language constructs are actually spatial. Even when we discuss something very abstract, we like to express it in spatial, even when we discuss quantum theory, quantum mechanics, which is purely math. We try to say that there are different orbits and this and that. And we sort of describe it that way because our language is really good at describing spatial relationships. So there's a hope that LLMs can be used outside the strict language space.

John Myers (13:45):

That's super interesting. So Jonathan, you'd mentioned in this paper they basically fine-tuned a large language model. What does that actually mean? And to go one level beyond that, where do we think the first set of use cases and products that are going to use large language models is scale for both enterprise and consumer? How is that product development process going to look? Is it constant, is it taking a model fine-tuning in on a specific use case or set of refined knowledge and then packaging that in a product? Or how do you think we're going to see all this unfold?

Jonathan Cohen (14:29):

That's a really good question that I think requires a lot of speculation on my part of the answer. But I would say, so people are already building product. I don't know the count, easily a 100 startups that are building products on top of other companies, large language models. Not to mention several startups that are built around the idea of building their own large language models and building services on top of them. So this is a very active world and it's certainly happening. I think the first thing is all of what we would call NLP, natural language understanding, you can replace a lot of it with... You just asked this model to solve your problem. The simplest example of a sentiment analysis. Rather than training a sentiment classifier, I can just have a movie review, I can just ask my language model, "Hey, is this movie review positive or negative?" And it'll tell me. So I think the easiest use case for all this is just a platform for task specific natural language understanding. And there's a lot of companies already doing that.

(15:43)
I think that's already happening. I think going beyond that is sort of a good question. There's many ways in which these models can be customized and fine-tuned. And maybe just briefly, I would say I think of them as in a couple different categories. One is you can teach it to skill, you can teach it how to summarize a movie better by fine-tuning it with prompt tokens or there's lots of techniques adapters. Another is you can kind of change its personality. This is really what ChatGPT is, it's the same model but they tuned it based on human feedback with this reinforcement learning technique and to change the personality of the model to be more helpful. And then the third is you can teach it new facts and there's lots of people trying to figure out how to take a knowledge base and stuff it into a model. And so, I think each one of these avenues opens up lots and lots in the use cases and business opportunities and maybe let Danny respond too.

Danny Lange (16:37):

Yeah, at Unity, we are actually just extremely excited over these developments. Just like you want to use a game engine to simulate a self-driving car rather than having a self-driving car driving around in the streets because it's really scary, when they do that no matter what they say. Games are really awesome to test out new things. So while I would be concerned about using a large language model products like GPT from OpenAI for medical diagnosis and things like that. I would be very comfortable seeing these models being used in gaming. We have literally billions of people between 3 to 4 billion monthly active players playing a game on the Unity platform. It's massive use. Some popular games will have 10, 20, 30 million, 50 million users at any moment. Just imagine putting a variation of ChatGPT behind that something that use reinforcement learning along with the large language model to fine tune it.

(18:03)
It's interaction with players with real humans moving around in 3D or trying to solve parcels or whatever it is. I mean, we are really not going to kill anyone with that. We're going to be able to push the boundary on these models and we're going to be able to push them in new ways and I think that's what we are at Unity. I'm super excited about that. The whole company is excited about it because it's really going to open the door for NPCs, non-player characters in games to do stuff that is going to really interact with people in such a natural way that we have never, until recently, we actually didn't really believe it would happen.

John Myers (18:48):

That's interesting. I think your comments about the non-human players in the games, I can think of probably half a dozen type of simulation type things, whether it's like, well, you might not use it for diagnosing medical things, you might use it for training medical staff because you can throw enough variation at you where you're not having a recycled corpus of scenarios. Same thing with pretty much any kind of profession, which is all of them that have a very rigid training pipeline. So I'm super interested to see how that kind of evolves. With the startups that are working in this space, are they kind of wrapping product functionality around it for usability or are we seeing that they're actually taking these foundation models and trying to fine tune them on top of expert knowledge?

(19:36)
So literally just yesterday, I was helping out with some analytics for inside of Gretel itself and I was fuzzy on a certain SQL syntax for a pivot table. And I couldn't frame the right question to Google and I was like, let me do this, let me ask ChatGPT. Like 35 minutes later, it got me there and I was like "Wow, this is awesome." And I didn't think it knew more than me, it kind of helped me with the refresher. But I'm wondering is there going to be a point where you're fine-tuning a model to a point where it becomes better at a certain skillset or does a boss character in a video game become virtually unbeatable because it is too good and can adapt to the human player? So where does that kind of education of the model go?

Danny Lange (20:23):

I think in a lot of these cases what you want to see is in a sense you want to see large language models being wrapped into a certain character. You want them to stay in character in a lot of these cases, whether that character is your side by side coding, helping you solve a coding problem of some kind, or whether that is you playing a game and your wingman or whatever you are interacting with is staying in that role. Or whether you meet a virtual Winston Churchill and the character stays role all the time. I think that's a fantastic opportunity, it's something we are already seeing. And I also think that the whole space has changed a bit.

(21:21)
These foundation models, they cost a lot of money to train. So we are getting a little back to old school economics here. It's not about being three smart individuals burning midnight oil and then you can revolutionize the world. Now, you're going to need a $100 million to build this ginormous model and it's going to take time. But now, when we have these models, I think the big deal is going to be the application of it. It's a bit like getting the internet. It's the application of it that's actually really revolutionizing the world.

Jonathan Cohen (22:03):

I think a large language model is really a platform. I think in reality, the way they will function is you'll build some model that is by definition kind of general purpose and now that's a platform that you can go take and turn into through the various techniques for customizing it. Something that's really good at, I don't know, writing product summaries or add copy or something that's really good at helping you code in sequel or whatever it is. Whether it's further fine-tuning it on task specific data or tuning its personality based on human feedback or somehow injecting more domain specific knowledge into it. I mean, a good example is these things can all be related too. Like an HR department wants to have an automated system that can answer questions. So you need to inject it with a whole lot of knowledge and facts about HR policy, but you also need to put it on rails.

(23:04)
There's questions, it's not allowed to answer legally. It can't give stock picking, "Hey, I have a 401(k), should I invest in my 401(k)?" Not allowed to answer that question. So you don't want your automated system to answer that question. So there's all these technologies I think that need to be embedded. How do you put rails on these things? How do you teach them new facts? How do you give them access to new skills? And I think it's just going to be a question of as humanity figures out how to do each one of these things, the platform has more capabilities and it'll enable broader and broader study use cases. But it seems to me this is inevitable and this is such a powerful technology, everyone is looking at it right now. And clearly, it's very general framework for creating these kinds of AI systems that I think I like the analogy of the internet, we'll look back in 10 years and these will just be part of the fabric of our lives that we have these kinds of AI everywhere in there used for everything you can think of.

John Myers (24:09):

I like the internet analogy as well. And you also mentioned something Jonathan, about they are essentially our platforms themselves. Is it overly crude? Look at an OpenAI and its GBT models every similar to what Amazon did for just pure compute resource. I just need eight VMs, I need them now and I can run whatever I want on it, but I'm going to put something on top of it and I don't worry about the underlying infrastructure. Are we going to see a new wave of companies that are kind of the cloud? I mean, not to say the cloud compute companies won't offer these LLMs themselves one day. But it seems like with the OpenAI, there is kind of a different camp of platform providers.

Jonathan Cohen (24:54):

I think so. I think it's going to emerge as a new kind of platform, like an AI platform and there'll be a bunch of companies and it'll be interesting to watch how it shakes out and who wins and who loses. And then there'll be a whole ecosystem built on top of it. I think that's already starting to happen exactly.

Danny Lange (25:12):

I think we should not underestimate the power of LLMs as a platform. And I think that we all know that Amazon or let's say AWS and Azure and GCP and Oracle Cloud and all these guys, they have created a lot of value for themself. But remember that the companies running on top of these platforms are also creating vast amount of value income. So if you look at a company, let's take Facebook as an example, Facebook is just an application running on top of our platform. But Facebook itself has created immense value. So I think that these foundation models are just platforms for the next unicorns and multi-billion dollar companies that find these particularly areas of great applications based on these foundational models.

Jonathan Cohen (26:14):

I mean, one example I think is Gretel. I mean, you can speak to what your company's doing, but it seems to me foundation models are great at generating synthetic data. Surely that's going to be an important part of that technology stack in the future. So I think they're going to be used for everything.

John Myers (26:34):

Yeah, makes sense. I think one last big topic I'd be interested in covering is the computational complexity throughout a pretty big price tag there. I don't know if that was a swag or something. But when I think about these models and think about how much it takes to train them, how based off of that and based off of a future where maybe it gets pushes into the consumer space. And I look at ChatGBT, I'm like, "Man, is this going to replace traditional search one day?" And if I can't have, first of all, is there going to be an ad platform built on top of it? Or if not, am I going to be talking to my friends and be like, "Hey, who's your AI platform? You got your ISP, you got your cell phone provider, who's your AI provider?" And then if it gets into that big consumer space, given the computational complexity, how do we keep those models up to date?

(27:33)
If I'm asking some tool to summarize the news for me, it's got to know what happened yesterday or maybe two hours ago. And from what I've seen, a lot of these models have a pretty hard boundary of here's what I know up to, I don't know much past that. And the things I've tested, it works right. SQL hasn't changed much in the last year, so I don't have to worry about that. But if I'm asking for something that requires new knowledge, how do we think about constantly training and keeping these models relevant for really big consumer bases?

Jonathan Cohen (28:05):

Yeah, it's a good question. There's this whole line of research into something called retrieval augmentation where you essentially split, I mean, neural networks are pretty good at memorizing stuff. But that's maybe not the most efficient way to represent knowledge, certainly not the most convenient. And so, the idea in retrieval augmentation is you have some kind of separate information retrieval system and then you have a language model that somehow is able to access that information retrieval system to answer a question. So I ask it a question, it starts to look a little more like a search engine. I ask it a question rather than just thinking inside its own brain to answer. It maybe goes and looks up some information, takes the results of that and then synthesizes a response.

(28:45)
And so, I think that those architectures will probably become very popular for the reason you say. It's a much more solvable problem, how to keep your information retrieval index up-to-date. Search engines, we have lots of people, Twitter is solving this problem, Facebook solving this, plus lots of companies are solving this problem already. The question then is just the technical one, how do you make these systems work efficiently and effectively into the large language models interact with these AI systems? But it's a pretty active research topic and I think it's a pretty juicy prize once we've solved it. So I suspect it's certainly getting a lot of attention, I think you're going to see a lot of advances there over the next couple of years. It's very practical.

Danny Lange (29:33):

Yeah, I think that what we're going to see is that companies that provide these underlying models, they're going to get very busy optimizing training of these models, renewal of these models. Inference is going to be another area where cost is going to matter and they're going to really focus a lot on that going forward. I think that besides training, inference is the other big area where if you really interact very frequently with these systems, the cost of dialogue with these systems when you chat with the system that cost is going to be an issue. It's an area where we definitely are going to look for optimizations. Today, that is entirely backend based, it's entirely cloud based, but we all have pretty powerful devices can be shifted. I think very important, I don't know if it's research or engineering, somewhere in between. Moving part of the inference to the edge will I think be crucial for the economics.

John Myers (30:46):

Yeah, I definitely think it's going to be a huge intersection of big cross-functional teams to solve these problems. I don't think it's a zero-sum game there. So that's super insightful. And if we have just about a minute left, so very quickly for each of you, what's the very next thing that you're most excited about? I think it's very easy to pontificate 10 years out. But in the next 12 to 18 months do you think is most exciting? And then what's one blog or research type of resource that you might suggest to listeners and watchers on how they can keep up with the latest LLMs?

Jonathan Cohen (31:25):

Oh man, I'm the wrong person to ask that second question of...

John Myers (31:29):

Is the answer yourself? Because that's an acceptable answer.

Jonathan Cohen (31:34):

I work with a lot of very smart people who spend all their time doing this and so information kind of filters to me that way. I mean, if you try arxiv-sanity or these sorts of tools, man, it's insane. The sheer volume of research going on right now. Actually, this idea of connecting these systems up to information and retrieval I think is very exciting. I think that has a ton of potential. And I also think this idea of tuning these models based on human feedback, I mean, the real innovation of ChatGBT. ChatGBT is like nothing new except they tuned it based on human feedback to be helpful and chatty. And that made all the difference.

(32:20)
Just this tiny little technical change suddenly changed this model from this sort of technical thing that only people in the know could interact with. And suddenly, it's like everyone and your grandmother can chat with an AI and recognize how powerful these things are. So I think these two technologies are just extremely important. And they're going to get a lot of attention in the next year or two and I think we're going to see a tremendous amount of innovation and improvement.

John Myers (32:50):

How about you, Danny?

Danny Lange (32:51):

Yeah, the immediate developments that I'm most interested in is tying these large language models to spatial actions. So it can be in a game where you tell a character to do something using language and now, it will actually result in that character hiding behind something or moving over and getting something. Very complex actions if you want to program them. But very easy to prompt and this will have relevance far outside gaming, instructing robots to do things today. That's a very complicated, there's really been two approaches today, either you program the robot or reinforcement learning. But what if you could just tell the robot what to do in a prompt? So I think that tying large language models to the world around us, whether it's a virtual world, metaverse or is the real world, I think that's the next big thing. And I'm not good about blog posts and podcasts and stuff like that. My brain works best by just scouring the internet for stuff and then picking it up left and right. I actually don't have a go-to source, unfortunately.

John Myers (34:17):

Fair enough.

Jonathan Cohen (34:19):

The best idea is I check Hacker News every other day, that's all I got.

John Myers (34:26):

Cool, it makes sense. All right, well, I think that puts us right at time. I'd like to thank you both so much, Danny and Jonathan for joining me today and I'm sure this is super insightful to our audience. And thanks for joining us here.

Jonathan Cohen (34:43):

Thank you. It's pleasure.

Danny Lange (34:45):

Thank you for having us.

‍

Synthesize 2023: Large Language Models and new opportunities in Generative AI

Video description

More Videos

Generating differentially private synthetic text using Gretel GPT

Anonymize Financial Data with a Fine-Tuned SLM

Video - Generate synthetic data for training LLMs and SLMs

Read the blog post

Transcription