Live AI Tennis Match Assistant
Key Points
- An agent‑oriented, graph‑based AI assistant was launched at Wimbledon and the US Open 2025 to give fans real‑time, interactive answers about ongoing tennis matches.
- The system lets users select any match (in‑play, scheduled, retired, suspended, or completed) and start a dialog via a “Match Chat” button, offering both curated starter questions and a free‑form query field.
- After a question is submitted, it is routed to a scalable cloud‑based LLM that runs a decision‑tree interaction, displays a visual “thinking” indicator, and performs automatic fact‑checking to ensure answer accuracy.
- The interface uses classic UX priming to lower engagement barriers while maintaining transparency of the AI’s reasoning process, encouraging continuous fan participation throughout the match.
- All interactions are mirrored seamlessly across devices, providing a consistent, evidence‑based experience for users wherever they follow the match.
Sections
- Live Tennis AI Assistant - An agent‑oriented, real‑time AI system introduced at Wimbledon and the US Open 2025 that lets fans ask live or retrospective questions about any match stage and receive instant, evidence‑based answers.
- Tennis Query Classification Pipeline - The passage outlines a system that transforms user queries into embeddings, classifies them with decision trees into tennis topics, applies safety filters, and routes high‑confidence queries to a custom extension while low‑confidence or ambiguous ones fall back to a knowledge‑base intent library.
- Parallel Fact Extraction Agent Graph - The passage explains a node‑based system where an initialization agent creates a shared state, a tool agent selects data feeds, and a facts agent runs two parallel inference threads—one prompting an LLM and the other using a synthesizer—to race and output the first factual result.
- Dynamic Win Probability Modeling - The excerpt explains a system that combines pre‑match features (head‑to‑head history, forecasted indicators, past outcomes) with real‑time performance metrics to continuously update each player's likelihood of winning, using decay of pre‑match odds and a booster function as the match unfolds.
Full Transcript
# Live AI Tennis Match Assistant **Source:** [https://www.youtube.com/watch?v=FqF4b7Uemfc](https://www.youtube.com/watch?v=FqF4b7Uemfc) **Duration:** 00:17:54 ## Summary - An agent‑oriented, graph‑based AI assistant was launched at Wimbledon and the US Open 2025 to give fans real‑time, interactive answers about ongoing tennis matches. - The system lets users select any match (in‑play, scheduled, retired, suspended, or completed) and start a dialog via a “Match Chat” button, offering both curated starter questions and a free‑form query field. - After a question is submitted, it is routed to a scalable cloud‑based LLM that runs a decision‑tree interaction, displays a visual “thinking” indicator, and performs automatic fact‑checking to ensure answer accuracy. - The interface uses classic UX priming to lower engagement barriers while maintaining transparency of the AI’s reasoning process, encouraging continuous fan participation throughout the match. - All interactions are mirrored seamlessly across devices, providing a consistent, evidence‑based experience for users wherever they follow the match. ## Sections - [00:00:00](https://www.youtube.com/watch?v=FqF4b7Uemfc&t=0s) **Live Tennis AI Assistant** - An agent‑oriented, real‑time AI system introduced at Wimbledon and the US Open 2025 that lets fans ask live or retrospective questions about any match stage and receive instant, evidence‑based answers. - [00:04:29](https://www.youtube.com/watch?v=FqF4b7Uemfc&t=269s) **Tennis Query Classification Pipeline** - The passage outlines a system that transforms user queries into embeddings, classifies them with decision trees into tennis topics, applies safety filters, and routes high‑confidence queries to a custom extension while low‑confidence or ambiguous ones fall back to a knowledge‑base intent library. - [00:08:23](https://www.youtube.com/watch?v=FqF4b7Uemfc&t=503s) **Parallel Fact Extraction Agent Graph** - The passage explains a node‑based system where an initialization agent creates a shared state, a tool agent selects data feeds, and a facts agent runs two parallel inference threads—one prompting an LLM and the other using a synthesizer—to race and output the first factual result. - [00:13:19](https://www.youtube.com/watch?v=FqF4b7Uemfc&t=799s) **Dynamic Win Probability Modeling** - The excerpt explains a system that combines pre‑match features (head‑to‑head history, forecasted indicators, past outcomes) with real‑time performance metrics to continuously update each player's likelihood of winning, using decay of pre‑match odds and a booster function as the match unfolds. ## Full Transcript
Say you're watching tennis, and you want to know what's going on within a close match. Well, that's
where an agent-oriented architecture comes in. It's a real-time, interactive AI assistant powered
by an agentic graph. Such a system was debuted at the 2025 Wimbledon Championships and the US
Open 2025. Now this assistant, it allows fans to ask live questions during singles matches and to
get instant and insightful answers right at your fingertips. Now here's an example path to the
experience. So as the user, what we can do is we can select a match that's either in play or scheduled
to play. Now the system, it supports conversations about in progress, retired, suspended and even
complete matches. But when the system is available, that any fan can then begin a dialog. They can
start a dialog by pushing one of these buttons. And let's say we want to push a Match Chat button
underneath the tennis match that has the score. You can also follow live or view a recap if you
want, but in all of these cases, we follow an evidence-based user experience. Now the real fun
really begins. So upon entry, the user is then gently nudged into a conversation with two pre-curated entry-level
questions, often around match-defining skills or even pivotable plays. This is
called classic UX priming, so we want to lower the barrier for engagement but spark your curiosity
and invite participation throughout a match. But of course, the more inquisitive minds, they might
want to ask any question. And this is where this open field comes in, where you can click and then
it'll open up, and you can put in any query that you can think of. With the user's question in hand
or maybe even this button pushed, the system transitions into this agentic experience,
launching this decision tree interaction. The user selects a primary category from the following, right?
Now, a user doesn't have to select this because again, they can enter any question that
they would like to, right. And whatever query you entered, it's dispatched to a scaled out cloud-based
system of systems, and it's really optimized for this real-time analysis and insight
generation. But even if no subcategory subcategory is chosen, we do invite a, you know, the
user to maybe have a dialog or a question-answer with the system to ensure that no curiosity goes
unanswered here. Now comes the moment of computational ingenuity. So here, a visual
indicator, it appears whenever you submit a query, showing that the LLM is either thinking
or it could be running, that this is where you have the chains of thoughts that are being fed
back into the model itself. Um, another aspect or a state that happens is called fact
checking. So we want to make sure that anything that's returned back by this real-time system,
it's accurate. But this small touch, it really adds what's called transparency to the AI's cognitive
process and maintains all the user's engagement throughout. Butimportantly, this experience is
also seamlessly mirrored across both mobile and desktop. So, it provides you this consistent and
device-agnostic interaction model, whether you're maybe sitting on the phone in a bleacher or
stadium or you're in a laptop at home, but the system is anywhere you are, it's ready on the go,
and it's right here for you. Now let's take a look at the architecture behind this system, where it
balances scale, response time and AI accuracy. So here's the masterpiece. So, at the foundation lies
a robust, event-driven architecture that's built on this publish subscribe messaging system. So as
a match progresses here, it ingests scoring and performance data all the way through to create
these different feeds. Now this data is immediately published to on-demand topics
enabling this near real-time availability. So simultaneously, the system writes dozens of these
JSON files into these cloud object storage buckets that are fronted by CDNs. This ensures
high-speed global distribution and caching. Now, once a user submits a query here, the message
traverses the secure firewall CDNs, and it finally lands in this containerized application known as
a middleware app. This is deployed across a distributed cloud infrastructure across multiple
regions, and it has 30 active replicas. The middleware app, what it does is it takes the
question, and it first analyzes and it interprets it. So first, we have this mini LM, L6 v2.
It's this model that has embeddings, and it transforms the query into numeric, to numerical
vectors. Now these embeddings are then passed through a random force of 100 different decision
trees, which classify the question into specific tennis categories, such as the player stats, the
match logistics. You might even have questions about live insights, but based on this confidence,
the thresholds that we have empirically determined, we then in turn go through, and we want
to know how, how confident is this model that this question is about a particular topic. So, as it
goes through the pipeline, we then need to ensure that the conversation, it remains safe and respectful.
So this is where we screen all the questions through this HAP, which stands for hateful abusive
profanity filters. And we then take that when we go through. And once we've classified and
we've gone through that moderation step through all those gates, we then can go to the next step.
And this is where the system, it reaches a decision point. If the question fits in a known
confidently classified category, we then go into this custom extension that's right here. Now,
if the, if the confidence is low or the anomaly detector flags ambiguity or the question is
then can be routed to this knowledge-based system, which is almost like a fallback system. And this
is deployed into two different regions. But here, this fallback system, it consults a library of
about 50 different intents that are mapped to topics, which then return thoughtful and
pre-trained responses. Forexample, if you're asking, where can I go find a place to get tickets, or
where can I find shade? That might go to our knowledge base, but in most cases, this includes a
deep link to the relevant tennis data, and it closes the loop so that the UX, it shows and
renders the appropriate context. So when a query does meet all these routing criteria, it's then sent
to this custom extension application here. And this is a powerhouse house app. It runs on over 60
replicas across a multi-region Kubernetes platform. So here is where then the traffic is
routed to this lane graph. It's pulled from a queue. So we we initialize many lane graphs at the
same time, right. And if one isn't ready, the system then waits until one does become available. But once
initialized and we have this agentic framework, we then can execute the following steps. Right.
And these steps, it uses a bunch of tools to go out, and it pulls in information such that we
can extract the relevant information that's about the question that we've already classified. Now,
this data is formatted in two different ways by the tool. We can have raw JSON, which preserves the
original schema and the keys that came from the tennis information. We also have an LLM JSON, which
is more of a decorative type text, and it helps to prime and, and optimize LLM comprehension of what
this data is about. And then we go into the generative agent part. And this is where the
structured data to the formulation of the answers comes into play. So if the agents determine that
they can't confidently respond from all this information, maybe it could be due to insufficient
data, or maybe the play hasn't caught up to what the person is asking about. Then we will notify
the middleware application. This is when there's no structured or generative agent can really
provide a cohesive answer. We then go to what's called a light synthesizer that's invoked as a
last resort. This is what we call a lightweight LLM prompt, where we attempt a final synthesis of the
information with any data fragments that still remain throughout this pipeline. And through all
of this, the architecture balances scale, speed, safety as well as accuracy to provide a nice
experience to you. At the core of this lies an agentic system architected as a directed graph.
Now in this graph, each agent is represented by a discrete computational agent that looks like a
node, and each of these nodes is connected by an edge. And the edge means information flows in
between the agents here. Now the process begins at this initialization agent here. This
creates, and it propagates a state variable. So this is a dynamic context object that goes
throughout this entire graph. But one of the earliest agent, it's called the tool agent here. It
interprets the conversational category signal, and it selects the appropriate data feeds for
extraction. Now this feed, it drives data from all the tennis state, and it then saves it into the
shared state for downstream agents. Now the next critical node in all of this is called the facts
agent. This performs parallel inference operations using two different types of thread right. So you
have thread one and thread two right. So the first thread, what it does is it constructs a prompt. It
includes a persona, right. So it tells it how to act and how to respond, the style it's supposed to
use for the extracted JSON data that it pulls out. It then submits it to an LLM to produce a
paragraph interpretation directly from the input. Now your thread two here, what this does is it
uses a synthesizer, right, that generates standalone factual sentences. And these sentences
are then fed into another LLM. And it's almost like a race. So the first one that
wins is then used in the propagated piece of the graph here. Now to manage the response latency and
ensure that the system responsiveness, each path of these, it operates with strict timeouts because
we want to be really fast so that users can see what the output is going to be. So the framework,
it allows for up to three different potential outputs in order of preference. So the first one
would be the direct JSON interpretation into a coherent paragraph. Now the next one that we try
to achieve would be a summarized paragraph from multiple factual sentences from one of these
threads that, again, is racing over time. And that's what one of these summarized agents what it does.
Now a collection of these goes to a judge here. Right. So that we then, in turn, can figure out what
is it supposed to say, what is it supposed to do. Right. And if it determines wait a minute, this
content isn't exactly right, then it's going to feed back into a corrective type agent here, right?
So we have these four agents all working together to produce this said output. Right.
And all the candidate outputs are then routed, right, all throughout down into this lower section
here, so that we then in turn can produce the content. But I want to revisit this judge agent
because what it does is it then, in turn, it evaluates the content on two primary dimensions.
The first one, it needs to make sure that the content that's produced is factual, right. But it's
also relevant to what the user has been asking about from the initialization to the tools agent. Now,
if the judge if it does identify uncertainty and the preferred output, it may pre-append, you
know, confidence adjusting a preamble, right, to the response. And then once judged, it might go into a
corrective type agent here. Right. And the corrective agent is enforce the textual
consistency by aligning the response to predefined stylistic guidelines that might have
been supplied by the tennis user, that if the agent pipeline, if it fails to produce a valid,
even a confident output due to maybe missing data, it could either be an ambiguity or even timeouts
of these threads that are constantly running in the background and erase the system. It can
activate a fallback search mechanism across a knowledge base, but should this too return no
useful result? the system performs this final synthesis using a lightweight LLM-based
synthesizer to generate informative responses. Now these final informative responses, they sometimes
go down here, right? And when this does happen, we then can return that back right to the user But
these final contingencies ensure that the system maximizes response coverage. But it also is highly
accurate so that we can provide to the user and answer that aligns towards what they're asking
for. So now let's look at some of the streaming data. The real-time Live Likelihood to Win
estimation within our system is enabled by a streaming data architecture and probabilistic
modeling framework. Now this system integrates predictive analytics, live match dynamics, and
event based computation to generate and update this Likelihood to Win estimates all throughout
the duration of a match. Now what happens is, prior to the first serve, we have a pre-match Likelihood
to Win. That looks very much like this a donut plot, but this model, it's a predictive model, right?
And it gives you a probability of a player winning before a match starts for each of the
players. Now we have a model that's been built on a couple features, right. Some of them are around
head to head history between the players. Now if they have the head to head history, if not then
we'll go to other predictors. Now we also have forecasted play indicators and historical match
outcomes that all go into this So for example in a specific match scenario, you know I might have a
model that predicts player A to have a 53% Likelihood to Win. And I might have player B that
has a 47% chance Likelihood to Win. Now this close distribution, it indicates a statistically
balanced match. Right. That could happen. And what the models are doing is it's applying this
probabilistic equation, where we want to make sure that we can get the odds of a player winning,
given the evidence that I just showed you. Now, as the match progresses, we have this Likelihood
to Win that looks like this. Now the system transitions to this Live Likelihood to Win every
single point. So it's continuously updating the probability model using this real-time
performance metrics. Now the the Live Likelihood the model. It is a time defined model
which we have a decayed pre-match probability which gradually diminishes the pre-Likelihood to
Win pieces. Right. So as the match unfolds we then in turn say the match data that's happening
in real time matters more. And then we also have a booster function that's activated by critical
match events. This approach, it enables a system to account for both the historical expectation
and the real time player dominance that's happening now in this five set match. As you can
see the score right here. The player with the pre-match edge up here had a 53% and ultimately
did win three sets to two. But you can see the story as it was unfold that it wasn't just a
linear win, right? That one of the players was had more momentum and then it shifted back. It shifted
back again. And then finally we get it right because we then in turn converge to what the
ultimate player who won, which is player A, right. And this happened after a tie break and this
dramatic sequence of events. Now, this visualization of serves not only as a statistical
output, but also as a narrative of the matches momentum. It captures competitive tension as well
as the turning points within this high fidelity, message driven architecture. These insights are
made possible by this messaging and for structure that you can see here. And it's a pub sub piece
that uses what's called MQTT pieces. Some of this, it uses what's called a broker application that
subscribes to match specific topics upon scheduling. And then as each of these scoring
events occurs. The data is published to each of the relevant topics that we have. So it's highly
parallelized so that it's very fast and we retrieve these messages here. Right? And then we
pass it to an engine. And this engine is what uses some of these equations that I went over to
create. Right. These numbers here that we have. But each scoring updated triggers the recalculation
of this Live Likelihood to Win at that moment in time of play. The resulting value is
serialized and stored right over into a CDN and a cloud object storage, enabling this
asynchronous access from our fans all around the world So when a fan asked a question, you know,
through our agentic system about live or past win probabilities, the system performs some of the
following steps. So first we'll go and we'll retrieve the relevant JSON Likelihood to Win data
that's been created by this system here. And this utilizes the data extractor agents to transform
the data into the semantic objects and we in turn, submit all of the summarized data to an LLM,
and this integration ensures that fans now receive intelligible, data-driven responses
grounded in the probabilistic analytics and powered by this real-time computational system
that we have here. Now the agent-oriented system, it delivers a real-time, intuitive experience by
combining the live scoring data with this AI pipeline that interprets fan queries and updates
the match insights like momentum shifting and the wind predictions. But by this blending the AI
and streaming data, and we combine gen AI with predictive modeling with this smart UX. This is
how we transform that raw data into clear, engaging narratives for tennis fans all around
the world.