AI, Tennis, and the Future of Journalism
Key Points
- The panel emphasizes that despite AI advances, the human element remains essential, especially for sports journalism.
- Economic incentives shape whether users are treated as customers or products, influencing AI deployment decisions.
- Experts advise against asking LLMs for information you already know, suggesting LLMs should augment rather than replace personal knowledge.
- The podcast “Mixture of Experts” ties AI discussion to the ongoing U.S. Open, using tennis as a real‑time test case for AI‑generated content.
- Experiments at the U.S. Open involve hybrid workflows where human writers and large language models collaborate to produce both short and long‑form match reports.
Sections
- Untitled Section
- From Live Sports Data to AI‑Generated Reports - Aaron explains how their message‑driven system ingests real‑time match events from multiple feeds, formats them as JSON, and feeds them to a large language model to instantly generate written summaries.
- Agentic AI for Sports Commentary - The speakers explore how agentic LLM architectures can mine and distill quirky, edge‑case sports statistics into engaging, human‑like live commentary.
- AI-Enhanced Sports Commentary Discussion - The speakers explore using AI agents to generate statistical insights and narrative content for sports broadcasts, debating which sports, such as tennis, are most suitable for these applications.
- AI-Driven Sports Insights Platform - Panelists discuss integrating generative AI tools like Perplexity to deliver real‑time, conversational tennis coverage, illustrating how concrete applications make abstract AI tangible.
- Ad Models Drive Attribution Tech - Participants discuss how reliance on advertising limits subscription growth yet spurs investment in source‑attribution technologies for LLMs, potentially enhancing trust and mitigating hallucinations.
- Balancing Revenue, Trust, and AI Transparency - A panelist argues that ad‑based AI businesses must provide advertisers with guarantees while generating enough revenue to sustain the industry, yet they must also preserve user trust and transparency to keep highly educated users engaged.
- AI Authority and Critical Thinking - The speakers warn that persuasive, confident language‑model answers can erode users’ ability to trace sources and think critically, highlighting the need for transparency, advertiser accountability, and new filtering strategies compared to traditional Google searches.
- Beyond Copilot: Code Understanding - The speakers debate whether Copilot is already obsolete, stress the trust challenges of LLM auto‑completions, and argue that the next breakthrough in AI for software engineering will be tools that help developers comprehend massive, unfamiliar codebases.
- From Prompts to Agentic AI - Speakers examine the transition from larger‑context prompting toward autonomous, planning‑driven AI agents and debate the balance between human‑in‑the‑loop oversight and fully self‑directed operation.
- Balancing LLMs with Engineer Skill - The speaker acknowledges that large language models boost code quality and productivity but warns they can erode essential coding expertise and creativity—especially for new paradigms like quantum computing—so they advocate limiting tool usage to preserve engineers' sharpness and long‑term team continuity.
- Legacy Code Barriers and IBM Fellows - The conversation explores the challenges of modeling legacy and esoteric languages such as COBOL and PL/I—highlighting financial incentives to overcome these barriers—before host Tim Hwang introduces three distinguished IBM Fellows and outlines the prestigious program’s goals and future direction.
- Technical Fellows as Guardians - The speakers reflect on the honor and responsibility of being a technology fellow, emphasizing their role as a check‑and‑balance for business decisions and as an inspirational benchmark for others.
Full Transcript
# AI, Tennis, and the Future of Journalism **Source:** [https://www.youtube.com/watch?v=IKaj8gATzoY](https://www.youtube.com/watch?v=IKaj8gATzoY) **Duration:** 00:40:44 ## Summary - The panel emphasizes that despite AI advances, the human element remains essential, especially for sports journalism. - Economic incentives shape whether users are treated as customers or products, influencing AI deployment decisions. - Experts advise against asking LLMs for information you already know, suggesting LLMs should augment rather than replace personal knowledge. - The podcast “Mixture of Experts” ties AI discussion to the ongoing U.S. Open, using tennis as a real‑time test case for AI‑generated content. - Experiments at the U.S. Open involve hybrid workflows where human writers and large language models collaborate to produce both short and long‑form match reports. ## Sections - [00:00:00](https://www.youtube.com/watch?v=IKaj8gATzoY&t=0s) **Untitled Section** - - [00:03:12](https://www.youtube.com/watch?v=IKaj8gATzoY&t=192s) **From Live Sports Data to AI‑Generated Reports** - Aaron explains how their message‑driven system ingests real‑time match events from multiple feeds, formats them as JSON, and feeds them to a large language model to instantly generate written summaries. - [00:06:21](https://www.youtube.com/watch?v=IKaj8gATzoY&t=381s) **Agentic AI for Sports Commentary** - The speakers explore how agentic LLM architectures can mine and distill quirky, edge‑case sports statistics into engaging, human‑like live commentary. - [00:09:54](https://www.youtube.com/watch?v=IKaj8gATzoY&t=594s) **AI-Enhanced Sports Commentary Discussion** - The speakers explore using AI agents to generate statistical insights and narrative content for sports broadcasts, debating which sports, such as tennis, are most suitable for these applications. - [00:12:59](https://www.youtube.com/watch?v=IKaj8gATzoY&t=779s) **AI-Driven Sports Insights Platform** - Panelists discuss integrating generative AI tools like Perplexity to deliver real‑time, conversational tennis coverage, illustrating how concrete applications make abstract AI tangible. - [00:16:05](https://www.youtube.com/watch?v=IKaj8gATzoY&t=965s) **Ad Models Drive Attribution Tech** - Participants discuss how reliance on advertising limits subscription growth yet spurs investment in source‑attribution technologies for LLMs, potentially enhancing trust and mitigating hallucinations. - [00:19:15](https://www.youtube.com/watch?v=IKaj8gATzoY&t=1155s) **Balancing Revenue, Trust, and AI Transparency** - A panelist argues that ad‑based AI businesses must provide advertisers with guarantees while generating enough revenue to sustain the industry, yet they must also preserve user trust and transparency to keep highly educated users engaged. - [00:22:24](https://www.youtube.com/watch?v=IKaj8gATzoY&t=1344s) **AI Authority and Critical Thinking** - The speakers warn that persuasive, confident language‑model answers can erode users’ ability to trace sources and think critically, highlighting the need for transparency, advertiser accountability, and new filtering strategies compared to traditional Google searches. - [00:25:27](https://www.youtube.com/watch?v=IKaj8gATzoY&t=1527s) **Beyond Copilot: Code Understanding** - The speakers debate whether Copilot is already obsolete, stress the trust challenges of LLM auto‑completions, and argue that the next breakthrough in AI for software engineering will be tools that help developers comprehend massive, unfamiliar codebases. - [00:28:46](https://www.youtube.com/watch?v=IKaj8gATzoY&t=1726s) **From Prompts to Agentic AI** - Speakers examine the transition from larger‑context prompting toward autonomous, planning‑driven AI agents and debate the balance between human‑in‑the‑loop oversight and fully self‑directed operation. - [00:32:04](https://www.youtube.com/watch?v=IKaj8gATzoY&t=1924s) **Balancing LLMs with Engineer Skill** - The speaker acknowledges that large language models boost code quality and productivity but warns they can erode essential coding expertise and creativity—especially for new paradigms like quantum computing—so they advocate limiting tool usage to preserve engineers' sharpness and long‑term team continuity. - [00:35:16](https://www.youtube.com/watch?v=IKaj8gATzoY&t=2116s) **Legacy Code Barriers and IBM Fellows** - The conversation explores the challenges of modeling legacy and esoteric languages such as COBOL and PL/I—highlighting financial incentives to overcome these barriers—before host Tim Hwang introduces three distinguished IBM Fellows and outlines the prestigious program’s goals and future direction. - [00:38:20](https://www.youtube.com/watch?v=IKaj8gATzoY&t=2300s) **Technical Fellows as Guardians** - The speakers reflect on the honor and responsibility of being a technology fellow, emphasizing their role as a check‑and‑balance for business decisions and as an inspirational benchmark for others. ## Full Transcript
Tim Hwang: So is AI going to wipe out all sports journalists?
Aaron Baughman: No matter the sport, you know, we're always
working with the same constant.
That's the human.
Tim Hwang: Seems to me that paid search and perplexity
poses some really big questions.
Trent Gray-Donald: It's all about simple economics and who's incented
to do what and are, are you the customer or are you the product?
Tim Hwang: Should I be using cursor?
Kush Varshney: You shouldn't ever ask a question of an LLM these days
at least that you don't already know kind of the answer for yourself.
Tim Hwang: All that and more on today's episode of Mixture of Experts.
I'm Tim Hwang and I'm joined today, as I am every Friday, by
a world class panel of engineers, researchers, product leaders, and more
to hash out the week's news in AI.
On the panel today, Aaron Bautman, IBM Fellow, Kush Varsney, IBM Fellow,
and Trent Gray Donald, IBM Fellow.
So to kick us off, the U.
S. Open is this week.
Um, and as usual, a mixture of experts.
We're, of course, excited about the tennis, but we're really excited about,
like, the AI, and I really want to talk about the role of AI in the U.
S. Open.
Uh, but first to kick us off, because I personally am a huge tennis fan,
let's just go quickly around the horn.
I want everybody's.
Uh, nominee for the best tennis player of all time, um, uh,
Aaron, we'll start with you.
Aaron Baughman: Yeah, so that's a great question.
Easy answer.
Ben Shelton.
Tim Hwang: Great.
Kush.
Kush Varshney: Leandro Piz.
Tim Hwang: All right.
I like that one.
Very good.
Very good.
And, uh, and Trent, how about you?
Trent Gray-Donald: Oh, I, I prefer squash.
So Jonathan Power.
Tim Hwang: Okay, great.
Well, thanks.
Well, uh, I asked that question to kind of kick us off on the discussion today.
Of course, the U.
S.
Open is happening very right now as we record, uh, this episode.
And as usual on Mixture of Experts, we're excited about the tennis, but what
we really want to talk about is the AI.
Um, and Aaron, in particular, I wanted to kind of have you on the panel
and for you to kick off this section because I understand that you've been
experimenting with using language models to generate both long and
short form stories, uh, for the Open.
Um, and I wanted to talk a little bit about what you're discovering,
like, uh, like what's working really well, uh, in these experiments
that you've been trying out.
Aaron Baughman: Yeah. Well, thanks for having me.
It's really fascinating to watch how we apply these AI technologies in
particular, these agentic architectures with a diversity of large language models
deployed out at scale, uh, to the U.
S. Open that's happening right now.
So if you go to dub dub dub dot usopen dot org and even go to news, you can see
a lot of our, Stories that are created with both human and large language
models together, but in general, we have two different types of projects.
One of them is recruiting hundreds of match reports pre and post long and
short form for 255 of these different matches, and then the second project
that we have is called a I commentary where we take stats and we transform
that into a different data representation like JSON and then input that with the
prompt to get out text and then that's voiced over with text to speech and that's
embedded into these highlight videos.
Tim Hwang: Yeah. That's really cool.
And tell me a little bit more about like how that works exactly.
So how do you go from a game to like a report about a game, right?
Presumably there has to be a feed about like, Oh, this person just, you know,
had a, had a great serve for instance.
Um, like how do you do that conversion?
Right.
Because I think what's interesting is you're going from, you know,
Video and visual, uh, to a written medium and kind of curious about
how you guys approach that problem.
Aaron Baughman: Yeah, it's really neat.
This is all about message driven architectures where whenever we get
a score, you know, for example, in a match ends, then we get a message and
within seconds, less than seconds, we'll then take that message and we'll
pull in from about 14 different feeds that has raw data that describes the
players, the match, where they are.
And also what they've done in the past.
And we also forecast what's going to happen in the future.
And we take all of that and we turn it into a representation that a large
language model can understand, right?
And we put it into the context of a prompt.
So it being, it could be JSON elements that describe, you know, with key
values what's happening in tennis.
So like how many aces is somebody getting or how many breaks has
somebody won in the match, right?
And then all of that is packaged together And then we push that into the scaled
out architecture that we have granite, for example, um, and we pass it in with
the prompt and then the output would be a fluent text that describes the scene
that's either Just happened or that's coming up and it's it's really cool
to see it You know live as it happens And there's all sorts of fact checking
that happens and quality checks novelty pieces and sim similarity to make sure
that it's up to par so that we can go forward and I use the word par on purpose
because we also do some things for golf as well, uh, which um Uh, is, uh, part
of our over three year story, you know, that, uh, has evolved, uh, into the U.S. Open.
Tim Hwang: That's great. And Trent, I saw you nodding on the mention of Granite.
I don't know if you've got a connection to Granite as a project,
but, uh, I'm wondering if you can kind of paint a picture of where you
see some of all this going, right?
So we're assuming to do these experiments in golf.
Sorry to do these experiments in tennis, should we expect to see like in five
years that like, you know, a lot of sports coverage, a lot of sports summaries and
commentary really will be AI generated or do you think this is more of like a,
a sports specific thing, for instance?
Trent Gray-Donald: I, I think that this is just the beginning
of a lot of different initiatives.
Uh, the, the reason I'm nodding is that Aaron and I actually, so
I run the watsonx, uh, that does the inferences that Aaron causes.
So he's basically calling my service when he does the work.
This is your baby.
Well, right, but I, I'm always in the, like, I'm the plumbing, right?
Yeah, sure.
He does all the, the interesting domain specific work around tying together
all the data sources, and it just ends up, you know, coming into our service.
So he and I worked together on figuring out, okay, how do we make sure that
we can handle the capacity and the latencies and all those different things.
Right.
But in general, how, how Aaron's built it and the, I see
this whole agentic universe.
Uh, I mean, there's, there's from highly scripted through to
let the LLMs do what they'll do.
And there's obviously a big meat, there's a big, um, Uh,
there's a lot of different points in that spectrum.
And I think that for live events, for, uh, More and more human things like
sports, we're going to start seeing increasingly interesting agentic
architectures emerging that will extend beyond a given sport into more and more.
I could, I could see that.
I think the, the interesting question is always, uh, can you find the
right unique snippets to tell people.
Like one of the, one of the jokes that we have is when we, big baseball
fans, and we're listening to the play by play, and they come up
with these ridiculous statistics.
This is the third player since 1943 who stood on their left
foot and wiggled their ear.
Tim Hwang: That's right.
Yeah, yeah.
I've come to expect that.
I mean, I watch a lot of, like, um, soccer, right?
And like, it feels like the commentator's just, Fill space.
Just have this remarkable bank of like the most edge case
statistics you could think of.
. So,
Trent Gray-Donald: well, exactly the question is, can, can
we capture and distill that?
Like obviously there's a lot of data mining going into
producing those right now.
It's okay.
How do we connect those and make it engaging and interesting and human?
Tim Hwang: Yeah, for sure.
Um, and I guess, Kush, curious about how you think about this.
I know one aspect of your, uh, fellow work, right, is that you think a little
bit about AI governance, which ultimately is kind of like how do we think about
the influence of these systems on people.
Um, and, you know, I think one response always is like, okay, well, we're What
is a sports journalist supposed to do in the future in this world where a lot
of the work that they currently spend time on is, you know, generating this
coverage, generating this commentary?
Uh, curious about your thoughts on like how that all looks, right?
Because I think as Aaron has already said, there's like ways of
getting humans and machines to kind of work together on this front.
Um, but I would love to kind of hear a little bit about like how you sort
of see that relationship evolving and, and is there a role, right, uh,
I guess for humans in sort of an AI enabled, you know, sports future.
Kush Varshney: Yeah, I think we're going to talk more about
this towards the end as well.
I mean, different sorts of human collaboration and, um, the way I think
about it, it's not so much of, uh, what is it about the job that, uh, we're
trying to automate away and these sort of things, but really the question of
the dignity of the humans involved in this, because, um, uh, if you're The
human and you're subservient to the A.
I mean, you have no dignity left in many ways.
Um, so what are kind of the workflows that we can set up such that, uh, you
get a better product still are getting the advantages of automation, but still
leaving the dignity of the human intact.
And, uh, one way to think about it is, uh, Like, if you remember, um, House MD,
Dr. House, the TV show, and, um, he had his, uh, whatever, residents, and, um,
they were, like, doing stuff for him, conducting tests, whatever, um, but,
uh, it was very much an adversarial sort of relationship, so, um, like,
they were always trying to prove him wrong, and if we can get the AI systems
to be in that mode, working with the humans, then the humans still stays
kind of Uh, with the agency, with the dignity, but, um, so that's the benefit
of, uh, of all of the AI technologies.
So, uh, I think something like that, um, could, uh, could play out.
I mean, as we, we go forward with, uh, with a lot of different
AI, um, human collaborations.
Tim Hwang: Yeah.
I love the idea that in the future, there's going to be like a sports
commentator that has specifically an agent that generates the AI.
Those like weird statistics that Trent was mentioning.
It's just like an expert on finding and identifying those
as the action kind of evolves.
Well before we move on to the next segment, uh, Aaron, maybe we'll close
this segment with you because, uh, this is some of your work that's getting
some shine at the, uh, at the open.
Um, I'm kind of curious if there's like, You know sports that are going to be
easier or harder to do this kind of work that you're doing at the open with, um,
you know, I think a little bit about like, you know, is this something where
theoretically any sport is going to be easy and amenable for the kinds of sort of
story generation that you're working on?
Or if there's certain aspects of You know, say tennis or golf, uh,
that really make it sort of like ideal for your application case.
Um, I guess what I'm asking ultimately is like, did you pick this largely
because like you love tennis, uh, or, or they're actual kind of like
scientific reasons for why this ended up being a really good test case.
Aaron Baughman: Yeah.
You know, um, You know, the no free lunch theorem where, you know,
there's not a perfect solution for every problem, I think is applicable
here, um, because, you know, every sport has a pro and con, right?
And it all comes down to what data is available and what is
the scale and how, and what's the use case that fans want to see.
So, so I wouldn't say that there's a perfect sweet spot
in any one singular sport.
There's always a challenge.
Yeah.
Um, some, some of the challenges I think we've already discussed here is just
making sure that we have meaningful stories and stats that bubble up and
some, some of the things that we do is we use like standard deviations
around, let's say, aces, right?
Because you can't say that, um, a pure number, of aces is significant.
It all depends on what's how many sets have been played, the gender
of the of the match, who's playing.
So we have to break that down and apart.
And if we go to like racing, we go to football, we go to soccer, you know,
it's all very similar, but you apply the same mathematical techniques.
To this, to the stats that then can bubble up, um, one of the other challenges,
um, and, and I would say one of the other areas that's real exciting is
getting human and machine working together because there's this pendulum
of how creative do you want these large language models to be, as opposed to
how prescriptive do you want to be?
You know, with this few shot learning, for example, and we tend to go somewhere
in the middle, but it's all experimental.
You know, it's almost like the theory of mind, right?
We want to be able to predict what action is a human editor going to take
so that we can meet their expectations whenever we generate, um, that said text.
And so no matter the sport, you know, we're always working
with the same constant.
That's the human.
Um, and then the other constant is data, right?
We need to make sure that we have access to the data.
Um, but it's, it's, it's fun, right?
And it's very impactful.
And it's a way to bring people together irrespective of creed, gender, and race.
And, um, it's just really exciting to use a lot of Trent's work and
a lot of Kush's work and bring it together for the world to see.
Tim Hwang: Yeah, for sure.
And I think this is where the magic happens, right?
It's like AI can be very abstract for people.
It starts to become very clear if it's got an application like this, right?
It's like, oh yeah, I already love this thing and AI is really
helping me, you know, enjoy it more.
It makes a huge difference.
Aaron Baughman: Yeah, yeah, yeah.
And, and, and I do encourage you to check out, you know, US Open.
org so you can see our work live in real time and listen to
commentary, read the match reports.
I mean, it's, it's fascinating to watch the field evolve.
Tim Hwang: I'll introduce this by talking a little bit about Perplexity.
Um, so Perplexity is one of these leading companies in the
sort of generative AI movement.
Um, what they are largely providing is kind of language
models as an interface for search.
So the idea is in the future you'll be able to have much more sort of
conversational search experiences, uh, than you have right now with
something like a, a Google or something, right, where you kind of
type in a search query and you get like a bunch of responses, um, back.
And Perplexity has been kind of In my mind, one of the
best products in the space.
It's like one of the few ones that I actually pay for and that I
actually use on a week to week basis.
And there was an interesting news story that just popped up in the last few
weeks where Perplexity announced that it would be finally moving towards a
model where they roll out paid search.
So the background on all this is in the past.
You have had to subscribe to Perplex, and you pay them a monthly fee.
Um, but now they're saying, hey, we're gonna monetize by allowing people
to, uh, buy ads on our platform.
So if you go, say, search for, you know, what is the best exercise
machine, uh, you might see an ad from something like a, like a Peloton.
Um, and so this is like a big shift.
I think one of the big hopes about this technology was not just that
conversational interfaces would be better, but that we might move away
from ads to a world of subscription.
Um, and, and as a result, maybe have a little bit more sort of faith, trust,
confidence in, in the search results.
Um, and so I guess I kind of want to ask, maybe Trent, I'll start with you
because you're our new addition to the roster of folks at Mixture of Experts.
How do you feel about this?
Like, does it make search less trustworthy?
Like, should we be concerned about this kind of shift on the part of perplexity?
Trent Gray-Donald: Well, in my view, yes, absolutely.
I'm a big fan of say, follow the money.
And, and it's, it's, it's all about simple economics and who's incented to do what.
And are, are you the customer or are you the product?
And it's very simple.
As you shift to paid search, you become more of the, Product instead
of the customer and so my, my usual reaction is that this is not going to
bode well for, for us as consumers.
Tim Hwang: Yeah, for sure.
I just remember, I mean, famously, there's this essay that was written by, uh, Larry
and Sergey, right, who founded Google.
And it's like, their essay that they wrote, I think, when they
were still at Stanford, and they're describing the PageRank algorithm.
And at the very end, they're like, and no search engine should ever use ads,
because it would be the most terrible thing for a search engine to do.
And of course, lo and behold, right, like, Google is like a
90 percent ad based company.
But I guess it's very hard, Kush, isn't it, to like, kind of
avoid these incentives, right?
Like, the problem with subscription is that People
need to pay to use your product.
Um, and so it does kind of limit user growth and all these other things.
Um, is there any way you think of escaping ads as a business model in the space?
Kush Varshney: Um, I'm really not sure.
I mean, but, uh, one thing that I did want to point out, maybe a
little bit counter to what Trent is saying is, um, uh, the investment
into an ad based sort of approach.
So.
Should actually also lead into investment into certain technologies
that do help with trustworthiness.
So source attribution is a big problem with LLMs
You don't know kind of where the information came from.
That's in the generative output.
And, um, if that's what's part of the monetization, then there will
be a lot more investment into the techniques, the scalability of the
source attribution sort of things.
And that can actually increase the The trust, um, maybe not necessarily
always just for the, uh, the ad driven sort of, um, uh, platforms, but in
general, because the more, um, and better techniques that we have, uh,
and we do have better trust for, um, or where the information came from.
You can then, uh, go back, trace through, um, different, uh,
possibilities for hallucination, things.
So I think, uh, incentives can kind of work in weird roundabout ways.
So, um, uh, the ad driven aspect maybe will or will not.
Uh, do a good thing for trust, but maybe that'll lead to investment
into certain things that do.
Trent Gray-Donald: Yeah, I think so.
I agree in theory, but in practice, what incentive does perplexity have in
providing attribution in a better way?
And do they just start obscuring it?
And who's to Like, who's got the leverage to not have it obscured, right?
I mean, that's always the fundamental thing, is there's
always, we could, but we don't.
Tim Hwang: I mean, I think there's also maybe another element, which is, I
don't know if you buy the argument that, like, in a world of chat based search,
the trust problem is particularly bad.
Right.
Because like in the past with Google, you have 10 blue links.
You can say, well, why are you giving me this link versus that link?
Uh, but in the very least, if we say we can maybe agree that like,
Oh, all the sponsored links should have like a little label and a box
around them or something like that.
But in a world where it's just like a paragraph, I guess you can offer
citations, but who's going to actually click through those, you know?
Um, but I'm curious if you want to kind of respond to, to Trent's thinking there.
Kush Varshney: Yeah. I mean, first I'll respond to Tim.
I mean, I never click on the, I'm feeling lucky button because I mean,
I always want to see the 10 results.
Right. That's right.
Yeah.
Um, but, uh, yeah, I mean, I think the point that I was trying to
make is, um, that, uh, whoever's paying for their stuff to appear,
um, needs to be ensured that yes, I mean, um, the right thing is coming.
So, uh, if you're still using a language model in between, then even if the
ad, um, Uh, needs to get through the language model to appear in the output.
Just making sure that that happened, um, is going to be needed.
And, uh, that same technology can then be used to trace other information
or other facts or other stuff.
So, uh, what I'm saying is, uh, the reason it needs to be there for an
ad based business is because, um, uh, the people paying for the ads need
to have some, uh, sort of guarantee that, uh, that their stuff will appear.
Tim Hwang: Aaron, I'm not going to let you get away with being quiet on this segment.
Curious if you've got any thoughts, uh, if you're on team Kushier or
team Trent or, or neither, I suppose.
Aaron Baughman: Yeah.
I mean, I think that the mixing of trying to drive revenue
with trust and transparency could be potentially dangerous.
You know, it could be used for, um, you know, potentially, um,
alternative You know, methods here, but it is about balance.
You know, um, I read this article a while ago about Goldman Sachs, where
they said that there's too much AI spend and too little benefit, but in order
to keep AI as an industry solvent, that there needs to be revenue and there's
a large revenue gap, you know, today, and it could potentially be growing.
Right.
I, um, I know on this, uh, mixture of experts, we talked about the, what's
600 billion gap with Sequoia, you know, a while ago, you know, and, and
so that, that really stuck with me.
But on the other hand, we need the trust and transparency to maintain users and
demand because once people lose that trust, they're not going to use these
systems, or at least I wouldn't, right?
And one point I did want to make is that lots of the users for Perplexity,
it seems are, you know, very highly educated, you know, they're high
income, you know, earners as of now.
Um, and so, and so they're very, if you can influence, right, that group of
people, um, to, to walk down a certain way, then that can influence, you know,
um, lots of other people because they tend to be sort of the leaders in fields.
And so.
And so just making sure that, you know, perplexity, a, they publish their
papers that describe, you know, their algorithms, the systems that we can
easily access and read much like Google did, I think is important, creating
this digital passport, you know, that describes where the data is coming from,
um, so that it's at least available.
Um, and then it's up to us as a group, IBM fellows, you know, to educate,
Hey, you know, if you're using these AI systems, you know, you need to
do your own due diligence as well.
You know, um, still maintain your posture and, you know, your own belief
system, um, and understand that you're using these tools to help you, but you
still need to be a critical thinker.
Tim Hwang: Yeah, that's well warranted.
I mean, I think just to put myself in the shoes of perplexity, if they were
here in this conversation, I think they'd say something like, well, why are
we being held to such a high standard?
I mean, Google's been monetized on, you know, ads for all these years, and
people still use it with no problem.
You know, why is AI sort of like special in that respect?
Um, and I, I suppose part of the worry here that, Aaron, you're bringing up,
which I think is good, is, you know, this goes to, I guess, whether or not you think
that people will be critical thinkers with regards to the technology, right?
Like that maybe the AI, uh, makes us all a little bit too easy, um, in a way
that, you know, maybe, like, actually limits in practice how much people will
actually click through to the links.
I mean, I, I, no, I certainly don't, right?
Yeah.
Aaron Baughman: Yeah, I will, I will say that, that whenever I use, when
I'm driving and I'm using like a map software where this is Google Maps,
I completely forget where I'm going.
And I probably couldn't retrace where I went because I don't pay attention, right?
So, so there's a danger of not being a critical thinker.
Because the information just becomes so easy to get.
And, and I think we all just need to be careful.
Tim Hwang: That's right.
Yeah, I had an incident a few weeks back where I like left
my phone in the restaurant.
I hopped into the car and started driving.
And then I was like, I don't I don't exactly know how to get
back to the restaurant now.
It's very embarrassing, so.
Um, any final thoughts on this trend?
Trent Gray-Donald: Uh, I, I think some really good points there,
and I think Kush's point about the advertisers are going to want to see
where their money is going is actually an interesting loopback that is the,
an incentive that brings towards, uh, being a little more transparent.
But at the same time, like, we're used to Google coming back
with a list, and it's up to us.
The problem with the chat is that it's more opinionated, and it's, for
lack of a better term, it's got that humanness to it where it just, that,
like, mentioned is, it feels much more like somebody's just talking to you.
And we all know that LLMs talk with authority, and they talk
with tremendous confidence, even though when it's not warranted.
And so it's going to be interesting to see how the human how we
develop the right filters.
Like we all know how to deal with the Google page.
Okay.
You scroll past the first four items and then you, or whatever it is.
Tim Hwang: Right.
Trent Gray-Donald: It'd be interesting to see how we build defenses here
and whether they're harder to build.
Tim Hwang: Yeah, I think that is going to be a big open question.
I think we're going to have to learn as a society, right?
It's going to just be like, when the first ten blue links emerged, right?
That was also a whole process, and so it feels like we're turning that wheel again.
Trent Gray-Donald: Yeah, exactly.
Tim Hwang: Um, well, great.
I'm going to move us on to our third story of the day.
Um, So, former Tesla and OpenAI leader, Andrej Karpathy, tweeted out his
love for this product called Cursor.
Um, and has set off a kind of whole new discourse around the role of AI in
software engineering and programming.
Um, and the unique thing about Cursor, in contrast to something
like Copilot or Kodi, this is like another company that's operating in
the space, is that it's basically like an entirely separate product.
stand alone ID, they basically forked VS code and said, okay,
we're going to rebuild it from the ground up using this, this AI stuff.
Um, and you know, I think one of the most interesting parts of the discourse,
if you will, if you follow Twitter, it's a waste of time, but like if you
do follow it, um, is that people were sort of making the argument that.
You know, um, cursor is particularly interesting because it's trying to
get past the, the kind of paradigm that co pilot set down, right?
So when co pilot launched, the idea was, oh, well, autocomplete is the way we
should think about kind of assistance of AI in, in software engineering.
Whereas, it's playing around with all sorts of things, right?
They're playing around with dist on your code, they're playing around
with um, you know, chat interfaces which you've seen elsewhere.
But like, I think they're actively trying to push beyond, kind of
auto complete as a paradigm.
And I guess I'm kind of curious, maybe Kush I'll turn to you, is
You know, do you sort of buy that?
Like, is Copilot kind of old school already?
Like, is it, it's already becoming like the, you know, version 1.0
of how we thought about AI in software engineering.
And, you know, do you think that like we're going to look back in 10 years and
no one's going to even think about using like a Copilot like interface to integrate
LLMs in their, in their workflow?
Kush Varshney: Yeah, that's a great question.
And, um, I mean, I think it relates to what we've already
been talking about today.
So, um, Do you trust this thing?
Um, are those auto completes?
Um, the things that you can verify yourself because um, Uh, you shouldn't
ever ask a question of an llm these days at least that you don't already
know kind of the answer for yourself, but um, Uh, some folks in my team,
they've been doing some user studies and asking people what are the features
that they would actually want to benefit from in the AI for code space
and what we're finding is that um It's actually the code understanding problem.
So when you're given a dump of a new code base, um, uh, like just making
sense of it, that's the biggest problem.
Like it's whatever, like thousands or millions of lines of code and
all sorts of weird configurations.
And let's say you don't even know the language.
Let's say it's COBOL or something like that.
How do you just get a sense of where things are like?
Kind of how this is organized, what it does, and that sort of thing, I
think, is, uh, an even more powerful use because, um, uh, once you're at the
level of, uh, kind of knowing that this is a line or this is a block that I
need to write, um, you're already, like, well versed with what you need to do.
So yes, it can speed things up, but even getting started, I think,
is, uh, is a bigger problem.
Tim Hwang: Yeah, it's funny to think actually that we've had so much focus
on like the AI literally generating code, but kind of what you're saying
is like the future of AI and software engineering is like, better documentation.
It's like the thing that is always kind of like difficult to do and no
one wants to spend time on doing.
Um, I'm trying, I guess like doing the watsonx stuff, I'm sure you're kind of
like interested in kind of that interface.
I don't know if you, you sort of agree with Kush here that like, it's really
kind of almost this understanding and documentation layer that ends
up being the most important thing.
Trent Gray-Donald: Absolutely.
I mean, one of, one of my One of my day jobs is, uh, I'm, I'm Chief
Architect for watsonx code assistant.
So, very, very much my day job.
And, I view this as a very, very young space.
And everybody's trying different interfaces and different ways to do it.
And, how do I, uh, like, I see all the statistics.
And, the number of people who are using chat, Or, the chat like things that,
Cursor makes easy is very large and definitely one of the first features
asked for everybody thinks for a little while they want to do code gen
and and there is a constituency that does want to do that, but most people
actually revert back to the can you just tell me what the hell my codes
doing and help me put it together.
That is, is a, is a big part and then figuring out how to do that and getting
the appropriate amount of context.
Now we have LLMs with larger context windows and we're getting
better and better techniques to build intelligent prompts.
But this is going to keep evolving.
And then, the, the, the bigger one, to be honest with you, is going to be the
evolution towards agentic, uh, where it's much more planning and discussing in
the large and The question is, okay, is it going to be human in the loop, or is
it just going to be prompt and see off.
Tim Hwang: Like, I just want an app that does this, and it just goes and does it.
Trent Gray-Donald: And, and, I, I think that our, going back to the
dignity comment, it, it's having these human in the loop is that
Where you have a helper that says, you're trying to do this big thing.
I think I've broken it down to these six steps.
Human, do you agree?
And you look and say, oh man, okay, it went right off
the rails at step four here.
Let's, let's fix that up.
Tweak, tweak, tweak.
Off we go.
And this isn't going as, like, there's the whole Devon's and whatnot universe, right?
Sometimes they go, everybody's experimenting from really, really
tiny little baby steps to the other end, which is, hey, let it all fly.
And exploring this problem space is going to be fascinating for the
next several years because nobody's quite figured it out and the models
are getting that much better.
So I'm super excited about that.
Where this all goes and I really welcome seeing the exploration that the cursor
is doing around innovating on interface.
For sure.
I think it's like, yeah, very exciting.
And you know, almost the joke will be like everybody in the future
will be like an engineering manager.
Basically, it turns everybody into an EM, you know, over time.
Um, I guess.
You know, Aaron, I don't know if, are you a VS Code guy?
Like, I'm kind of curious, like, I think one of the bids of Cursor, which
I think is very intriguing, is, you know, people are very comfortable
once they've set up their IDE.
Like, it's, it's like almost like setting up your office.
Like, you want it to be comfortable, and you want to know where
everything is, and you don't want the bindings to be in a particular way.
It's kind of wild, which is like, Sort of what Cursor is kind of attempting in
the market is, well, these AI features will be so killer that you would be
willing to abandon all that, right?
Or at the very least, like, get over the hump of having to, like,
spend an afternoon just kind of twiddling it to get it comfortable.
Um, Yeah, I'm kind of curious, I mean, as someone who, like, builds
these systems, works on these technologies, like, you know, is that,
is that prospect attractive to you?
Like, have you tried Cursor?
Would you jump onto Cursor?
Um, I, I actually don't know what your daily setup looks like, but
part of it to me is just like, is that value proposition strong enough
to get people to do that shift?
Aaron Baughman: Yeah.
I mean, so, so yeah, so, you know, I write code every day, uh, VS, you know, the, the
VS IDE is, you know, what I use of choice.
First, I'm a big fan of paired programming and paired testing, but
having multiple people work together, maybe on a singular task or a group
of people working on an experiment because it does a couple of things.
One, it improves code quality, engineering quality.
That's the scientific process, but it also creates long lasting teams, right?
That stay together for years and years and continuity of people
on a team, I think, is important.
Um, and, and, and so relegating software and science to maybe prompt
engineering, to me, has some cons.
Um, uh, Of course, the pros are, you know, it accelerates productivity, you
know, it can help us code complete, it can create different types of
comments so we can understand code.
So there's certainly a place.
However, I do think that we want to make sure that our engineers and
scientists still understand code, can write algorithms, can create code.
New programming languages, uh, new compute paradigms, for example, quantum, right.
That's a new paradigm where I don't think LLM, uh, may, may maybe
with Qiskit and be being able to create, you know, Python code.
Uh, but there's all these new languages that are popping up and, you know, l LLMs
have to be trained on something, you know, on some kind of, you know, pile of data.
And if a human can't create that pile of data in a trustworthy
way, then I think some of the.
creativity and skill of the engineer, you know, might, might be lost.
So, you know, so the hype around cursor, I think is real and it's a
very powerful product, but I would encourage, uh, perhaps folks to say,
let's put a time limit on the amount of which we can use some of these tools so
that we can maintain our sharp blade.
Um, you know, whenever we really need to do some.
engineering, so we don't all become just prompt engineers, right?
But that that's that's sort of my caution and, uh, my thought.
And yes, I do use, you know, Watson code assistant, um, you know, pretty
much every day through the plug in on VS code, and it helps a lot.
It's really good.
It creates, you know, different types of comments.
I also use, um, Google, right?
Right.
I'll go on Google and I'll use the gen AI feature to give me
ideas on how to write code better.
But I always try to limit myself and my team to say, Hey,
let's, let's do 20 80 or 50 50.
And let's make sure we're still communicating as a team.
Um, you know, so, so that human interaction to me, um, is important.
Tim Hwang: Yeah.
That implies almost two really interesting things.
Like one of them is in the future, there'll be like almost like a screen time
for these features or it'll basically be like, you've hit your limit for the week.
No more, no more AI for you.
Um, the other one also is like, I think, you know, particularly, There's
been some discussion about, oh, well, are these systems eventually going to
get so good that it actually, like, replaces a lot of jobs of engineers.
But it almost feels like there'll be, like, this constant pressure to
learn more and more obscure languages.
Because those will be the areas that basically AI can't touch because
the data sets are more obscure.
Um, which I think will be really interesting to see.
Trent Gray-Donald: There are definitely, I mean, there's no surprise or secret
that IBM's been around for a while and has created languages that, It may be
a little long in the tooth, like COBOL or PL1 or whatnot, and sure enough, the
amount of data, the amount of code on the internet that most of these models
are trained against is very, very small, and they can't do these languages at all.
And so what one of the things that we've done is, of course, is we have
more COBOL code, we have more PL1 code, we have more of these things, so we
can build better models for that, and companies are approaching us with, hey,
we built this weird esoteric language, can you help us do the same there?
So it's while there is a barrier, Wherever there's a barrier, there's
typically financial incentives to do something about it.
So, esoteric languages are going to be a bit of a barrier, especially at the free.
It's going to be tough.
Tim Hwang: Well, great.
I want to tie up today because we actually have a very unique pleasure of
having the three of you on this episode.
As you may have overheard, When I introduce these three guests.
They are all IBM fellows.
Um, and for those of you who don't know, the IBM fellows program,
I didn't know much about it, but it's, it's a crazy program.
Basically, the idea is to bring together some of the brightest minds in technology,
um, to, to work on projects at IBM.
And so I think they've included, I was looking on the website, there's a U S
presidential metal of Freedom winner.
Five Turing Award winners, five Nobel Prize winners, um, uh, and, uh, uh,
and I figured we just kind of take the last few minutes for people to
hear a little bit about sort of the program, what you've learned, um,
and, and, you know, where you think the program might go into the future.
And I guess, uh, Aaron, maybe I'll toss it to you cause you kind
of kicked off our first session.
So I'll bring you back into the conversation here, but I'm curious
about kind of like how your experience with the fellows program has been
and, and, um, and what you've learned.
Aaron Baughman: Yeah, I mean, becoming an IBM fellow is one of those seminal
moments, uh, where it's um, it's very surreal, you know, when it, when it
happened and, and, you know, my first thought was, wow, you know, I really
hope that I can live up, you know, to those who came, um, you know, before
me, and then I can also be an example to those who will come after me.
Right.
So, you know, I'm sort of, you know, In the middle, and I want to make sure
that I can keep the projection of what's happening and what's going to happen.
And, and I take that with a big responsibility that it's, um, both we
need to ensure that we keep up to date with science, engineering, push it
forward in a responsible way, but also, um, to usher the next generation of
IBM fellows that will come after us.
And, and the process, you know, of becoming a fellow, um, I found it
very rewarding because it helped me at least to reflect back on all the
people who helped me achieve something that I didn't know was attainable.
Um, and then being with Trent and Cush, you know, is, is a, one of those
things where it's like, wow, you know, I always knew and followed their work.
And, and I did not know They were going to be IBM fellows until it was announced.
And so it was just, it was great to hear that, wow, I'm in the
same class as Trent and Koush.
You know, it couldn't be better, you know, in my, in my view.
Yeah, that's great.
Uh,
Trent Gray-Donald: Trent, Koush, any other reflections?
I think it's very important that companies in the technology space have
leaders who are effectively partners.
Pure technologists and can be the right balance to the business at times, where
one of the unspoken or what's actually spoken things about fellows is they
are supposed to be a little bit of a check and balance on what we can or what
we should be doing in a given space.
Tim Hwang: Right, you're like the keepers of the technical flame, you know, yeah.
Trent Gray-Donald: Because sometimes that's necessary.
And, uh, but it's, it's, it's a huge honor to have become a fellow.
And it definitely, the, uh, the, the number of, of people who've come
before and, and have that I very much look up to is, is very large.
Kush Varshney: Yeah, no, I mean, it is extremely humbling.
Uh, If you look at the list of, uh, all these people, as you mentioned, Nobel
prizes and, uh, inventing all sorts of things that we take for granted,
whether it's DRAM or, um, I mean, all sorts of different things, um, and,
uh, it's just crazy to be thought of, uh, in that same light and, uh,
I mean, it's been a few months now, um, I guess, uh, for the three of us.
And, uh, I mean, one thing that I've learned is just, um, uh, some places
that I've traveled both within IBM and, uh, outside it's people do look up
to this position that it is something that people look to as an inspiration.
And, um, I hadn't thought of it that way.
And, um, I think it's just like, uh, Aaron said a responsibility and, uh, uh, like tr
said, it's, I mean, um, a way to, uh, to, to have this check and balance as well.
So all of that in one role, I mean, and it is just crazy.
So, um, yeah, I think, uh, the three of us are gonna do our best and, uh,
keep, uh, uh, keep, uh, keep, keep this, uh, this traditional alive.
That's great.
Tim Hwang: Well, look, it's an honor to have the three of you on the show.
Um, I hope we'll be able to get all three of you back, um, on a future
episode of Mixture of Experts.
Um, but that's where I think we'll wrap it up for today.
So thanks everybody for joining, um, and thank you for all you
listening, um, joining us again on another week of Mixture of Experts.
Um, if you enjoyed what you heard, you can get us on Apple Podcasts,
Spotify, and podcast platforms everywhere, and we'll see you next week.