This is the Trace Id: 6ca6ec03d6025045be6efaadb7be5d72
Skip to main content
Investor Relations

Goldman Sachs Communacopia + Technology

Tuesday, September 10, 2024
Kevin Scott, CTO, Microsoft

Transcript

icon_word

Who: Kevin Scott, CTO, Microsoft 
Event: Goldman Sachs Communacopia + Technology Conference
Date: September 10, 2024

Kash Rangan:                    Excellent, excellent. Thank you very much.

A real delight to be able to host the CTO of Microsoft, Kevin Scott, as our next speaker. I'm sure the discussion is going to be fascinating.

I'm told that, Kevin, this may be the first time ever at an investor conference, a chief technology officer of the world's, I think, still most valuable company. And I'll let Kevin jump into his intro in a second, but I have to share with you a little known piece of trivia. If we're talking about generative AI in such a big way and if this thing has become such a mainstream thing, it was a magical meeting that was perhaps facilitated between Sam Altman and Satya Nadella by none other than my guest here. So, we owe our interest in generative AI and how it's become such a mainstream thing to Kevin.

Kevin Scott:                       Certainly, not entirely.

Kash Rangan:                    In some part.

So, tell us a little bit about your background and how you got to be the CTO of Microsoft, and we have some questions to jump into.

Kevin Scott:                       It's sort of a circuitous journey. The short story is I grew up in the '70s and '80s in rural central Virginia. I was super lucky to come of age when the personal computing revolution was happening. I studied computer science. I thought I was going to be a college professor for a really long time.

And then I left academia. I dropped out of my PhD program and joined this startup called Google about a year before their IPO. And I did a whole bunch of different things at Google. It was the place where I first built AI systems. So, I did a bunch of stuff on both the search and the advertising side, building large-scale machine learning systems, and for a while ran the ads quality system. So, this big machine learning system that did CTR prediction for advertising, which was the thing that made the ad auction actually work.

Kash Rangan:                    I believe it's a small business these days.

Kevin Scott:                       Yes, a small business.

And then I helped build this advertising company called AdMob. I left Google to do a startup, which got acquired by Google three years later. I was at Google for a while, and then I went to LinkedIn. I helped to – I ran engineering and operations at LinkedIn, helped take the company public. And then I joined Microsoft when Satya acquired LinkedIn.

Kash Rangan:                    That's great. And you ended up as CTO of Microsoft. That is a very quite a story. Forr someone who considered a PhD in English literature at some point in your life, this is quite a fascinating thing. Kevin, can you share with us your view of where we are with generative AI today, how do you see it evolving over time? And is there at all a way to think about how this builds upon old AI that you developed at Google, if that's even a way to think about building one on top of the other?

Kevin Scott:                       So, I think we're still in relatively early innings. The interesting thing that's happened over the past decade in particular is AI systems have started behaving more like proper platforms. So, the first AI systems that I built were relatively narrow. So, if you wanted to solve a problem, like how do you calculate the effective CPM for an ad so you can rank them in an auction, you have a very narrow set of data about the ads and how people are clicking on them, and you use a relatively simple algorithm that's running at really big scale. And you build a model and you run a bunch of experiments. And it's sort of a closed feedback loop. And the model gets better and better and better over time, but only at the narrow thing that you've trained it to do. And if you want to do a lot of machine learning, in the past you had to have a whole bunch of different teams running a bunch of those little vertical loops.

And I think the thing that has really started to accelerate over the past handful of years is you have with generative AI these frontier models that really are quite useful for a huge variety of different tasks and applications and product context, which means that you can think about them as a platform, a database, or any other piece of infrastructure that you use to build things.

It doesn't mean that you have zero work to do. There's still a non-trivial amount of work that you have to do to go make something valuable out of this bag of capabilities that is modern generative AI. And I still think, to a certain extent, the hardest part about all of this of this is just having classic product sensibility, understanding what a customer need is and what problem you're really solving and then just attending to that with hair-on-fire urgency to try to do something valuable for somebody else.

But the problem that you don't have right now is the one that I had when I wrote my first machine learning program, which is I had to sit down and spend six months reading a bunch of papers and writing a bunch of really complicated code just so I could do a relatively basic thing. And that thing that was my first ML program in 2004 – so, a depressingly long time ago – a high school kid could do it in two hours on a Saturday. I mean, it's just sort of shocking how much capability is now in the hands of people who want to create new things with AI.

And so, I think that's where we're at.

And the thing not to lose sight of, although I really would encourage folks not to get too swept up in hype, but there is a very real thing happening with these AI systems, where they're becoming very much more powerful over time. And I've said this a bunch of times in public and I say it all the time internally, we are demonstrably not at the point of diminishing marginal returns on how capable these AI systems can get.

I think Dario was on stage right before us. I think all of the people who are sitting on the frontier, evolving this particular new infrastructure, can really see that there's still a lot of power yet to be built and a lot of capability to be put into the hands of developers, where five years from now, 10 years from now, somebody will be sitting on stage telling some story about the impossible thing that they couldn't do in 2024 that now a high school kid can do in a Saturday afternoon. That, I'm sure of.

Kash Rangan:                    That's where you see things that things we take for very complicated abilities become table stakes, more or less.

Kevin Scott:                       I mean, I know you all are investors. I usually say this to product makers. The thing that's going to be really interesting over the next handful of years is seeing the companies and the entrepreneurs and the creative people sort of prospecting at that boundary of "hard" to – "impossible" to "hard."

I think in every platform shift that you get, whether it's the PC revolution, the internet revolution, the mobile revolution, the first things that happen is you get this amazing new bag of capabilities and people go off and build trivial things. And then they very quickly realize that things that have enduring value are the things that are just on the verge of impossible. They're ridiculously hard to do. But at least they're not impossible anymore. And that's where the really interesting stuff lives. And you can see it model generation by model generation, things phase-shifting from impossible to just hard. And that's a good spot to focus your attention.

Kash Rangan:                    Got it. You've lived through a few tech cycles. How do you compare and contrast this AI cycle that we're going through to internet, mobile, cloud?

Kevin Scott:                       I think there are a lot of similarities to all of these big revolutions. It is sort of catalyzed by infrastructure. You've got a new thing that makes a whole bunch of things possible that were impossible or extremely difficult or costly before.

Because they are all platform revolutions, they're not narrow. So, it's not a thing that one company is doing, like, "Okay, I've got some secret stash of infrastructure that only I have, and only I can imagine what the possibilities of the infrastructure are." So, in parallel, we have a huge community of people being inspired in a bunch of different ways by what the infrastructure is going to allow them to do, which I think is really interesting and exciting and invigorating. The thing that makes me happiest about being in the spot that I'm in is seeing what creative people are going to choose to do with this.

It also, interestingly, I think all of these things have changed the way that software development happens. So, it not only opens up new possibilities for what software can be developed, it also changes the means of production of software itself. So, if you think about all of those previous revolutions, you get a brand new tool set, and all of a sudden a type of software gets easier to write, and you're just sort of excited as a software developer that, "Oh, my god. Now I've got this thing, and all of this stuff that irritated me before is easier now."

And so, those two things constructively interfere with one another. So, you're off chasing new ideas, but you're chasing it with a tool set that's made you more productive than you were before. And so, that's truly an exciting moment to be in.

And I don't – we'll sort of see over the coming years that these are things that are very hard to predict. But all of this may be happening faster than what we saw in the previous revolutions. But one thing that's relatively certain if you sort of believe that we're in a big platform shift, trillion-dollar companies that are brand new are getting created right now. And usually, the folks who move early, latch on to the possibility, get into that learning loop where they are, experimenting and learning and understanding what the valuable is versus what the trivial is, are the ones who have real advantages in building those durable, super valuable things.

Kash Rangan:                    Got it. If this question sounds very intelligent, it is because Marco Argenti, our CIO, asked me to ask this question of you. I wish he'd been here sitting on stage with you, but he has another commitment. It goes like this. "We have seen exponential improvements in LLM models so far. There's a race for attributes, parameters, modes, and data size. Is the rate of change slowing down? Is this generation of models that pass to AGI? Or do we need a fundamentally different evolution of the transformer architecture to continue making progress towards that goal?" So, clearly, that question did not come from me. Marco, thank you in case you read the transcript of this.

Kevin Scott:                       Look, I mean – again, you all will have to be the judge of this over the coming weeks or months, but I think there's some super exciting things that are coming in the last half of this year that lots of folks have been working super hard on.

I don't see – I've said over and over again I don't think we're at the point of diminishing marginal returns on the current paradigm. I think we have a pretty clear path to building infrastructure that stays on the same path of performance gains that we've seen in a multiple dimension. So, it's capability, it's cost, it's sort of power performance of systems. It's just sort of a bunch of things and an entire ecosystem of really smart people tackling all of the different parts at all the layers of the stack, just trying to improve things.

I mean, that said, you are not wrong to suggest that there probably are some disruptive innovations that will change things again. We should hope for it. I hope the transformer is not the last interesting discovery in ML architectures, I hope not. And we even have a proof point. We all have a 20-watt AGI machine sitting between our ears which is dramatically more efficient than the things that we're building right now. So, you should hope that we make some discoveries over the intervening years that bridge the gap between what we're doing and what biology knows how to do.

But the things are not independent. We're not, at least from what I can see, we're not at the point where we're about to stall on progress of increasing the capability of the infrastructure because we don't have the next idea for what needs to be done to make it more powerful.

Kash Rangan:                    Got it. There's been a lot of talk about small language models versus large language models and the role and the relative positioning of these two. How do you shake out on this SLM versus LLM? And a follow-up question that, I wanted to ask about open source. Where does open source fit into all this?

Kevin Scott:                       I mean, we can sort of start with the fundamental principle that I think answers them both. I'm pro-developer. I think do whatever you need to do to get the interesting application written so that you can do something someone cares about. Being dogmatic about what tool you use to solve a problem is kind of nuts. And you can practice a bunch of wishful thinking about what you would like developers to do, but I'm sure you all know developers. They're the orneriest people on the planet, highly opinionated, and they're going to experiment with a bunch of different things, and they will choose the tool that makes the most sense for them to solve the problem that they're trying to solve.

So, in that sense, you even look at Microsoft developers building copilots. The way that a copilot is architected is that you are typically using a frontier model to do the bulk of the interesting work there, but you also use a bunch of smaller models in parallel. You have a fairly robust orchestration layer that decides how to route requests to which model to let you achieve the performance that you need on a bunch of different dimensions for the application you're trying to write. Sometime you need to send something to a small model because you're not going to get a better answer from the large model and it's just much cheaper or much faster to make the inference requests to the small thing. Sometimes you're trying to do something on device locally. You can't afford a network call into the cloud to invoke a large model.

And so, I think having that flexibility to architect the actual AI applications using the right mix of models is an important thing for developers to be able to have. But the large models are very important and they are the things that are – I mean, they sit on the frontier. And so, again, if you are looking at building the most ambitious thing possible,  you, I think, need to have one of them in your portfolio of tools so that you can be attempting only the things that they enable. But it's not a dichotomy, not an either/or.

And same thing with open source. I think open source is just making tremendous progress. It is super encouraging as a developer, I think, to see how fast the open source community is building really interesting things. And there are a bunch of people at Microsoft and Microsoft Research inside of my team who are building things Phi, which is a really capable SLM, and it's open sourced for folks to use as they choose.

Again, with developers, they just want choice. They want to be able to experiment. They don't want to be told what their tool set is. They want to be able to experiment and choose.

Kash Rangan:                    Got it. So, at Goldman, we have this acronym, IPA, infrastructure build-out, and then platforms and applications. That's where we've seen other computing cycles, more or less, evolve. Do you see that as a similar way in which generative AI progresses? Or am I hallucinating?

Kevin Scott:                       Look, I think – so, what I'm seeing right now – and this is the place where it is a really unusual revolution – you definitely have these things that are sort of independent of one another execution-wise. So, there's a bunch of infrastructure stuff that you need to go  pour concrete, sign leases, get power in place, solve a bunch of electrical engineering problems, solve a bunch of cooling problems, get the right silicon in the right places, design the network fabrics. And all of these things operate on different timelines.

And then you have the software stack that sits on top of that, your low-level systems layer. And then you have your middleware and applications stacks that sit on top of that.

Things are moving so fast right now that it is kind of hard to imagine a world where you go do the infrastructure build-out and you wait until it's done until you make the substantial decisions and deployments on the next layer up. So, all of this stuff is really feeding into each other in a way that I haven't quite seen before, where you are just making big decisions on things that really want to move super slow, but where you have to make them move fast because the technology is just sort of evolving at such an incredible pace. It's really, really interesting.

And I will say – I think you guys have Jensen coming on later.

Kash Rangan:                    Tomorrow morning.

Kevin Scott:                       I mean, everybody in the ecosystem is moving materially faster right now than they were three or four years ago, materially faster.

Kash Rangan:                    Is it because the science and the technology is moving rapidly? Or is it – ? What is driving that?

Kevin Scott:                       Look, I think it's a feedback loop. I think you've got a bunch of really smart people who can respond very well to urgency. The place that we're in right now with infrastructure is people ask all the time, are you building too much or building too little? And so far, it's...

Kash Rangan:                    That's what they want to know. Are we building too fast, too quickly? How much are we going to spend?

Kevin Scott:                       I want know it, too. But so far, demand for the infrastructure has materially outpaced our ability to supply it. And so, we are building at a pace where, based on our understanding of where demand is going to be, we're trying to sort of intercept things. And like I just said, there are a bunch of slow-moving things in the equation that we've just really had to try to make move much, much faster than they were moving before.

And the thing that I will say, I think the whole ecosystem is responding incredibly well. Do I wish it were faster? Yes, I wish it were faster. But thank God, it's so much faster than it was four years ago, because we would really be in a pickle then.

Kash Rangan:                    Got it. Generally, I want to get your views on compute costs. Generally, with tech cycles, the underlying inputs, become cheaper. You get a mass market, standardization, et cetera. Given the high cost of compute here, how important do you think it is to bring down compute costs? And if you think it is, what are the developments that might support that view?

Kevin Scott:                       So, it's super important always to bring down the cost of compute. One of the things that has supported all of these platform revolutions that we talked about – personal computing, internet, smartphone, cloud, all of them – has been this ability from silicon to networks to the low-level software layers that empower the layers running on top of them to get exponentially cheaper over time. And I think we are definitely seeing that. I don't know exactly what the number is right now, but back in May when I was giving my keynote at Build, the anecdote that we gave the audience was at that point in time GPT-4 had gotten 12 times cheaper per token to do inference on than at launch time.

And so, part of that is because the hardware had gotten a lot better, and part of it is because the infrastructure had been tuned within an inch of its life. Everything from numeric kernels, where people are writing the moral equivalent of a bunch of assembly code to extract every ounce of performance out of the chips that you've got, and then just foundational improvements to the algorithms that are running on top of the hardware layer. And I think this is just going to continue over time.

So, the important thing to realize is things are already getting on a price performance basis way cheaper than they were. And there's super good stuff coming from hardware to system software to algorithms that should keep that trend moving. And we've just got to keep pushing super hard on it. Because if you really, really want all of this stuff to be ubiquitous, you need it to be as cheap as possible so everyone can use as much of it as makes sense.

Kash Rangan:                    Kevin, you piqued my interest by saying "super good stuff coming." So, to the extent that you can share with us, what is the high-level super, conceptually maybe, the things that are giving you that conviction?

Kevin Scott:                       Unfortunately, very little that I could...

Kash Rangan:                    Okay. That's all right. I don't want to create problems. We'll just take that as a granted that it is coming.

Kevin Scott:                       If we were off the record, I could share.

Look, I think the thing that ought to give everyone – we'll have things that are coming shortly that will be super visible that I think will be very encouraging for people looking for signs of progress. But you can just even – even looking at what's happening on a week-by-week basis, all of this competition is happening, where Meta is doing great stuff with Llama and Anthropic is doing super good stuff and Google is doing it. I mean, so there are these objective benchmarks for how everyone is performing. And people – because of competition and because the science and the engineering is happening at such an incredible pace, just every week things are getting better.

And the point that I have been trying to make for a while to all of the folks inside of Microsoft is there is a weird nature of how the frontier progresses, which is you go build gigantic super computing environments, which are big capital projects that take a very long time to build, and then you put them in the hands of people who are going to train a frontier model on them. And then they optimize their workload on that environment, and then they do this extraordinary work, and then you get a new frontier model. And because of the nature of the way all of this unrolls is you're just applying dramatically more compute to the problem. And it just happens in these increments because of the way that you're building all of the systems.

And so, the thing that people forget sometimes is, between the updates, you can get into this mode where you convince yourself, "Well, progress is only linear. This benchmark only got this much better." And you sort of forget that – you look at our partner, OpenAI,  what the big updates have been, the jump from GPT-2 to -3 and from -3 to -4. And I can't say anything about what's next and when, but it's not like work stopped after GPT-4.

So, the thing that we have seen for the past six years with generative AI is every couple of years or so, just because of the lock-step way that all of this stuff gets built, you get a major new update to the frontier.

Kash Rangan:                    So, Brett and Kendra, when he's ready to officially announce the good stuff, we'll love to host you back at a Goldman Sachs AI Symposium. Just putting it out there. Always putting a plug for the firm.

How dependent is your AI strategy on OpenAI? Because you also have your internal AI, with a CEO of AI. How do these things work?

Kevin Scott:                       I think OpenAI, by any objective measure, has been one of the most consequential partnerships Microsoft has ever had, and we're a company that's had a lot of consequential partners. So, we're super, super grateful. And I think we've been able to do things in a pretty extraordinary way just because it's two really capable companies trying to push a big platform shift forward, rather than one trying to do everything. So, we don't even think about it to say we're sort of super dependent and it's a complicated bag of problems that we're collectively trying to solve.

And just like with the PC revolution, where you had Microsoft and Intel and a whole bunch of OEMs doing this. I mean, you just sort of think about – this is before my time at Microsoft; I've only been there for a little under eight years now – the mission of the company at the point where it was founded was to put a personal computer on every desk and in every house. And that's at the time where people didn't even know what a personal computer was. And so, through that partnership, the entire ecosystem was able to deliver that mission, which is just completely audacious.

And I think this is another mission. Really unlocking the full potential of AI to create value for people everywhere is another equally large thing. I just don't think it gets done by one entity. It's a lot of people working very hard in the same direction.

Kash Rangan:                    And hence, that's why you have your own AI CEO internally and then you have...

Kevin Scott:                       We do. I mean, Microsoft has had AI researchers working on AI since the 1990s. We were working on artificial intelligence when I was an intern at Microsoft Research in 2001.

Kash Rangan:                    You were an intern at Microsoft Research?

Kevin Scott:                       Yes. Microsoft Research reports to me now, and 23 years ago I was an intern at Microsoft Research.

Kash Rangan:                    Any intern at Goldman Sachs, just take that as an inspiration.

Kevin Scott:                       So, there's a lot of AI that Microsoft is doing that is very complementary to what OpenAI is doing. We were doing it before, it's going to continue for the foreseeable future because it's a really large surface area. There are a lot of problems that need solving.

Kash Rangan:                    Good to know that. This again – I'll preamble this thing – this is a Marco Argenti question. So, it's going to sound very erudite. "We seem to be moving chat bots to agents very quickly. What's the vision with regards to AI performing more and more complex long-running tasks? Do we see a future where AI-powered agents will be able to perform tasks that require planning, decision making, and execution across multiple environments and systems?" This man, what a beautiful question. It is like poetry, right? That's why I had to give credit to Marco.

Kevin Scott:                       So, the answer to the question is, yes. And I guess, why do I believe that? Look, I think one is it's just necessity. So, in order for AI systems to be fully useful, they do need to be able to do longer-term planning, they need to have memory, they need to be able to actuate products and services and interfaces on behalf of users. And I think there's a bunch of good work that's been happening on everything from orchestration layers, where you're giving developers really good frameworks for figuring out how to uplift the basic capabilities of models to help them do more of this sort of long-range multi-step task.

And then the models themselves, I think, are getting more capable of synthesizing plans on their own. You can even see a little bit of this. If you go to ChatGPT right now and you ask it to give you a plan for something, it can articulate pretty comprehensive plans for very complicated things.

And so, the next thing that you would want after that is for the agent to be able to say, "Okay, I see the plan. Go do that." And I think that's...

Kash Rangan:                    That's what's next.

Kevin Scott:                       Yes. I think lots of people are working on filling out that hole in the capability of these systems. So, yes, I think lots of good stuff coming on that front.

Kash Rangan:                    Got it. We have two minutes. I don't have any more questions. Is there a question that you want to create yourself an answer for yourself?

Kevin Scott:                       Oh, God.

Kash Rangan:                    You're a prompt engineer, right?

Kevin Scott:                       I don't know. So, I think the – it's not necessarily a question, but just a thing that I will leave everyone with. I think the thing that we've seen over and over again with building hard things is you want to strike the right point between aggressive pessimism and aggressive optimism. You just don't want to get yourself caught up in hype in either direction.

So, the thing that we – inside of Microsoft we're trying to do very hard things with these very complicated models, is you want teams to be as ambitious as humanly possible in how they're putting this stuff to work. You really want to find the things that just went from impossible to hard. You probably don't want to spend a whole bunch of your energy doing a bunch of incremental things, because optimizing an incremental thing when the underlying model infrastructure is getting more and more powerful so quickly probably means that the model is going to be able to do a whole bunch of this incremental stuff.

And this was the lesson we learned years ago in the very, very early days of our partnership with OpenAI.  I would have teams inside of Microsoft that would take GPT-3 and they would go build this fine-tuned, super-optimized thing, and it was 3% better than on some benchmark and a little bit cheaper. And they'd be, "Yay, victory." And then GPT-4 came, and it's, "Crap. What do we do now?"

So, just you want to be on the frontier with your ambitions, and it's a good spot to be.

Kash Rangan:                    That's great. We are right at the top of our allocated time. On that note, thank you so much for giving us your perspective.

Kevin Scott:                       Thank you.

Microsoft Corp (MSFT)

ar2024


2024 ANNUAL REPORT

VIEW ONLINE 

DOWNLOAD NOW

 

'max-age=0,s-maxage=900' $add_header('Cache-Control', $(xfMaxAgeHeaders))

Follow us

Share this page

'max-age=0,s-maxage=900' $add_header('Cache-Control', $(xfMaxAgeHeaders))