Learning Library

← Back to Library

Live AI Tennis Match Assistant

Key Points

  • An agent‑oriented, graph‑based AI assistant was launched at Wimbledon and the US Open 2025 to give fans real‑time, interactive answers about ongoing tennis matches.
  • The system lets users select any match (in‑play, scheduled, retired, suspended, or completed) and start a dialog via a “Match Chat” button, offering both curated starter questions and a free‑form query field.
  • After a question is submitted, it is routed to a scalable cloud‑based LLM that runs a decision‑tree interaction, displays a visual “thinking” indicator, and performs automatic fact‑checking to ensure answer accuracy.
  • The interface uses classic UX priming to lower engagement barriers while maintaining transparency of the AI’s reasoning process, encouraging continuous fan participation throughout the match.
  • All interactions are mirrored seamlessly across devices, providing a consistent, evidence‑based experience for users wherever they follow the match.

Full Transcript

# Live AI Tennis Match Assistant **Source:** [https://www.youtube.com/watch?v=FqF4b7Uemfc](https://www.youtube.com/watch?v=FqF4b7Uemfc) **Duration:** 00:17:54 ## Summary - An agent‑oriented, graph‑based AI assistant was launched at Wimbledon and the US Open 2025 to give fans real‑time, interactive answers about ongoing tennis matches. - The system lets users select any match (in‑play, scheduled, retired, suspended, or completed) and start a dialog via a “Match Chat” button, offering both curated starter questions and a free‑form query field. - After a question is submitted, it is routed to a scalable cloud‑based LLM that runs a decision‑tree interaction, displays a visual “thinking” indicator, and performs automatic fact‑checking to ensure answer accuracy. - The interface uses classic UX priming to lower engagement barriers while maintaining transparency of the AI’s reasoning process, encouraging continuous fan participation throughout the match. - All interactions are mirrored seamlessly across devices, providing a consistent, evidence‑based experience for users wherever they follow the match. ## Sections - [00:00:00](https://www.youtube.com/watch?v=FqF4b7Uemfc&t=0s) **Live Tennis AI Assistant** - An agent‑oriented, real‑time AI system introduced at Wimbledon and the US Open 2025 that lets fans ask live or retrospective questions about any match stage and receive instant, evidence‑based answers. - [00:04:29](https://www.youtube.com/watch?v=FqF4b7Uemfc&t=269s) **Tennis Query Classification Pipeline** - The passage outlines a system that transforms user queries into embeddings, classifies them with decision trees into tennis topics, applies safety filters, and routes high‑confidence queries to a custom extension while low‑confidence or ambiguous ones fall back to a knowledge‑base intent library. - [00:08:23](https://www.youtube.com/watch?v=FqF4b7Uemfc&t=503s) **Parallel Fact Extraction Agent Graph** - The passage explains a node‑based system where an initialization agent creates a shared state, a tool agent selects data feeds, and a facts agent runs two parallel inference threads—one prompting an LLM and the other using a synthesizer—to race and output the first factual result. - [00:13:19](https://www.youtube.com/watch?v=FqF4b7Uemfc&t=799s) **Dynamic Win Probability Modeling** - The excerpt explains a system that combines pre‑match features (head‑to‑head history, forecasted indicators, past outcomes) with real‑time performance metrics to continuously update each player's likelihood of winning, using decay of pre‑match odds and a booster function as the match unfolds. ## Full Transcript
0:00Say you're watching tennis, and you want to know what's going on within a close match. Well, that's 0:04where an agent-oriented architecture comes in. It's a real-time, interactive AI assistant powered 0:10by an agentic graph. Such a system was debuted at the 2025 Wimbledon Championships and the US 0:16Open 2025. Now this assistant, it allows fans to ask live questions during singles matches and to 0:23get instant and insightful answers right at your fingertips. Now here's an example path to the 0:29experience. So as the user, what we can do is we can select a match that's either in play or scheduled 0:36to play. Now the system, it supports conversations about in progress, retired, suspended and even 0:41complete matches. But when the system is available, that any fan can then begin a dialog. They can 0:47start a dialog by pushing one of these buttons. And let's say we want to push a Match Chat button 0:53underneath the tennis match that has the score. You can also follow live or view a recap if you 0:58want, but in all of these cases, we follow an evidence-based user experience. Now the real fun 1:04really begins. So upon entry, the user is then gently nudged into a conversation with two pre-curated entry-level 1:11questions, often around match-defining skills or even pivotable plays. This is 1:18called classic UX priming, so we want to lower the barrier for engagement but spark your curiosity 1:24and invite participation throughout a match. But of course, the more inquisitive minds, they might 1:30want to ask any question. And this is where this open field comes in, where you can click and then 1:35it'll open up, and you can put in any query that you can think of. With the user's question in hand 1:40or maybe even this button pushed, the system transitions into this agentic experience, 1:45launching this decision tree interaction. The user selects a primary category from the following, right? 1:52Now, a user doesn't have to select this because again, they can enter any question that 1:56they would like to, right. And whatever query you entered, it's dispatched to a scaled out cloud-based 2:02system of systems, and it's really optimized for this real-time analysis and insight 2:08generation. But even if no subcategory subcategory is chosen, we do invite a, you know, the 2:14user to maybe have a dialog or a question-answer with the system to ensure that no curiosity goes 2:20unanswered here. Now comes the moment of computational ingenuity. So here, a visual 2:25indicator, it appears whenever you submit a query, showing that the LLM is either thinking 2:34or it could be running, that this is where you have the chains of thoughts that are being fed 2:39back into the model itself. Um, another aspect or a state that happens is called fact 2:45checking. So we want to make sure that anything that's returned back by this real-time system, 2:50it's accurate. But this small touch, it really adds what's called transparency to the AI's cognitive 2:56process and maintains all the user's engagement throughout. Butimportantly, this experience is 3:02also seamlessly mirrored across both mobile and desktop. So, it provides you this consistent and 3:07device-agnostic interaction model, whether you're maybe sitting on the phone in a bleacher or 3:13stadium or you're in a laptop at home, but the system is anywhere you are, it's ready on the go, 3:18and it's right here for you. Now let's take a look at the architecture behind this system, where it 3:23balances scale, response time and AI accuracy. So here's the masterpiece. So, at the foundation lies 3:30a robust, event-driven architecture that's built on this publish subscribe messaging system. So as 3:36a match progresses here, it ingests scoring and performance data all the way through to create 3:41these different feeds. Now this data is immediately published to on-demand topics 3:45enabling this near real-time availability. So simultaneously, the system writes dozens of these 3:52JSON files into these cloud object storage buckets that are fronted by CDNs. This ensures 3:58high-speed global distribution and caching. Now, once a user submits a query here, the message 4:05traverses the secure firewall CDNs, and it finally lands in this containerized application known as 4:11a middleware app. This is deployed across a distributed cloud infrastructure across multiple 4:16regions, and it has 30 active replicas. The middleware app, what it does is it takes the 4:22question, and it first analyzes and it interprets it. So first, we have this mini LM, L6 v2. 4:29It's this model that has embeddings, and it transforms the query into numeric, to numerical 4:34vectors. Now these embeddings are then passed through a random force of 100 different decision 4:40trees, which classify the question into specific tennis categories, such as the player stats, the 4:45match logistics. You might even have questions about live insights, but based on this confidence, 4:51the thresholds that we have empirically determined, we then in turn go through, and we want 4:56to know how, how confident is this model that this question is about a particular topic. So, as it 5:02goes through the pipeline, we then need to ensure that the conversation, it remains safe and respectful. 5:09So this is where we screen all the questions through this HAP, which stands for hateful abusive 5:15profanity filters. And we then take that when we go through. And once we've classified and 5:22we've gone through that moderation step through all those gates, we then can go to the next step. 5:26And this is where the system, it reaches a decision point. If the question fits in a known 5:31confidently classified category, we then go into this custom extension that's right here. Now, 5:38if the, if the confidence is low or the anomaly detector flags ambiguity or the question is 5:44then can be routed to this knowledge-based system, which is almost like a fallback system. And this 5:50is deployed into two different regions. But here, this fallback system, it consults a library of 5:57about 50 different intents that are mapped to topics, which then return thoughtful and 6:01pre-trained responses. Forexample, if you're asking, where can I go find a place to get tickets, or 6:07where can I find shade? That might go to our knowledge base, but in most cases, this includes a 6:12deep link to the relevant tennis data, and it closes the loop so that the UX, it shows and 6:18renders the appropriate context. So when a query does meet all these routing criteria, it's then sent 6:23to this custom extension application here. And this is a powerhouse house app. It runs on over 60 6:29replicas across a multi-region Kubernetes platform. So here is where then the traffic is 6:35routed to this lane graph. It's pulled from a queue. So we we initialize many lane graphs at the 6:41same time, right. And if one isn't ready, the system then waits until one does become available. But once 6:47initialized and we have this agentic framework, we then can execute the following steps. Right. 6:54And these steps, it uses a bunch of tools to go out, and it pulls in information such that we 7:00can extract the relevant information that's about the question that we've already classified. Now, 7:06this data is formatted in two different ways by the tool. We can have raw JSON, which preserves the 7:11original schema and the keys that came from the tennis information. We also have an LLM JSON, which 7:17is more of a decorative type text, and it helps to prime and, and optimize LLM comprehension of what 7:24this data is about. And then we go into the generative agent part. And this is where the 7:29structured data to the formulation of the answers comes into play. So if the agents determine that 7:35they can't confidently respond from all this information, maybe it could be due to insufficient 7:41data, or maybe the play hasn't caught up to what the person is asking about. Then we will notify 7:47the middleware application. This is when there's no structured or generative agent can really 7:52provide a cohesive answer. We then go to what's called a light synthesizer that's invoked as a 7:58last resort. This is what we call a lightweight LLM prompt, where we attempt a final synthesis of the 8:05information with any data fragments that still remain throughout this pipeline. And through all 8:10of this, the architecture balances scale, speed, safety as well as accuracy to provide a nice 8:17experience to you. At the core of this lies an agentic system architected as a directed graph. 8:23Now in this graph, each agent is represented by a discrete computational agent that looks like a 8:30node, and each of these nodes is connected by an edge. And the edge means information flows in 8:36between the agents here. Now the process begins at this initialization agent here. This 8:42creates, and it propagates a state variable. So this is a dynamic context object that goes 8:48throughout this entire graph. But one of the earliest agent, it's called the tool agent here. It 8:54interprets the conversational category signal, and it selects the appropriate data feeds for 9:00extraction. Now this feed, it drives data from all the tennis state, and it then saves it into the 9:07shared state for downstream agents. Now the next critical node in all of this is called the facts 9:13agent. This performs parallel inference operations using two different types of thread right. So you 9:19have thread one and thread two right. So the first thread, what it does is it constructs a prompt. It 9:25includes a persona, right. So it tells it how to act and how to respond, the style it's supposed to 9:30use for the extracted JSON data that it pulls out. It then submits it to an LLM to produce a 9:37paragraph interpretation directly from the input. Now your thread two here, what this does is it 9:43uses a synthesizer, right, that generates standalone factual sentences. And these sentences 9:48are then fed into another LLM. And it's almost like a race. So the first one that 9:55wins is then used in the propagated piece of the graph here. Now to manage the response latency and 10:01ensure that the system responsiveness, each path of these, it operates with strict timeouts because 10:07we want to be really fast so that users can see what the output is going to be. So the framework, 10:12it allows for up to three different potential outputs in order of preference. So the first one 10:16would be the direct JSON interpretation into a coherent paragraph. Now the next one that we try 10:22to achieve would be a summarized paragraph from multiple factual sentences from one of these 10:27threads that, again, is racing over time. And that's what one of these summarized agents what it does. 10:34Now a collection of these goes to a judge here. Right. So that we then, in turn, can figure out what 10:40is it supposed to say, what is it supposed to do. Right. And if it determines wait a minute, this 10:45content isn't exactly right, then it's going to feed back into a corrective type agent here, right? 10:52So we have these four agents all working together to produce this said output. Right. 11:00And all the candidate outputs are then routed, right, all throughout down into this lower section 11:06here, so that we then in turn can produce the content. But I want to revisit this judge agent 11:12because what it does is it then, in turn, it evaluates the content on two primary dimensions. 11:18The first one, it needs to make sure that the content that's produced is factual, right. But it's 11:23also relevant to what the user has been asking about from the initialization to the tools agent. Now, 11:29if the judge if it does identify uncertainty and the preferred output, it may pre-append, you 11:35know, confidence adjusting a preamble, right, to the response. And then once judged, it might go into a 11:42corrective type agent here. Right. And the corrective agent is enforce the textual 11:47consistency by aligning the response to predefined stylistic guidelines that might have 11:52been supplied by the tennis user, that if the agent pipeline, if it fails to produce a valid, 11:58even a confident output due to maybe missing data, it could either be an ambiguity or even timeouts 12:05of these threads that are constantly running in the background and erase the system. It can 12:09activate a fallback search mechanism across a knowledge base, but should this too return no 12:14useful result? the system performs this final synthesis using a lightweight LLM-based 12:19synthesizer to generate informative responses. Now these final informative responses, they sometimes 12:26go down here, right? And when this does happen, we then can return that back right to the user But 12:32these final contingencies ensure that the system maximizes response coverage. But it also is highly 12:38accurate so that we can provide to the user and answer that aligns towards what they're asking 12:44for. So now let's look at some of the streaming data. The real-time Live Likelihood to Win 12:49estimation within our system is enabled by a streaming data architecture and probabilistic 12:55modeling framework. Now this system integrates predictive analytics, live match dynamics, and 13:00event based computation to generate and update this Likelihood to Win estimates all throughout 13:06the duration of a match. Now what happens is, prior to the first serve, we have a pre-match Likelihood 13:12to Win. That looks very much like this a donut plot, but this model, it's a predictive model, right? 13:19And it gives you a probability of a player winning before a match starts for each of the 13:23players. Now we have a model that's been built on a couple features, right. Some of them are around 13:29head to head history between the players. Now if they have the head to head history, if not then 13:34we'll go to other predictors. Now we also have forecasted play indicators and historical match 13:39outcomes that all go into this So for example in a specific match scenario, you know I might have a 13:45model that predicts player A to have a 53% Likelihood to Win. And I might have player B that 13:51has a 47% chance Likelihood to Win. Now this close distribution, it indicates a statistically 13:57balanced match. Right. That could happen. And what the models are doing is it's applying this 14:02probabilistic equation, where we want to make sure that we can get the odds of a player winning, 14:07given the evidence that I just showed you. Now, as the match progresses, we have this Likelihood 14:12to Win that looks like this. Now the system transitions to this Live Likelihood to Win every 14:19single point. So it's continuously updating the probability model using this real-time 14:23performance metrics. Now the the Live Likelihood the model. It is a time defined model 14:30which we have a decayed pre-match probability which gradually diminishes the pre-Likelihood to 14:36Win pieces. Right. So as the match unfolds we then in turn say the match data that's happening 14:43in real time matters more. And then we also have a booster function that's activated by critical 14:49match events. This approach, it enables a system to account for both the historical expectation 14:56and the real time player dominance that's happening now in this five set match. As you can 15:01see the score right here. The player with the pre-match edge up here had a 53% and ultimately 15:08did win three sets to two. But you can see the story as it was unfold that it wasn't just a 15:14linear win, right? That one of the players was had more momentum and then it shifted back. It shifted 15:20back again. And then finally we get it right because we then in turn converge to what the 15:25ultimate player who won, which is player A, right. And this happened after a tie break and this 15:29dramatic sequence of events. Now, this visualization of serves not only as a statistical 15:34output, but also as a narrative of the matches momentum. It captures competitive tension as well 15:40as the turning points within this high fidelity, message driven architecture. These insights are 15:45made possible by this messaging and for structure that you can see here. And it's a pub sub piece 15:51that uses what's called MQTT pieces. Some of this, it uses what's called a broker application that 15:57subscribes to match specific topics upon scheduling. And then as each of these scoring 16:02events occurs. The data is published to each of the relevant topics that we have. So it's highly 16:09parallelized so that it's very fast and we retrieve these messages here. Right? And then we 16:15pass it to an engine. And this engine is what uses some of these equations that I went over to 16:21create. Right. These numbers here that we have. But each scoring updated triggers the recalculation 16:28of this Live Likelihood to Win at that moment in time of play. The resulting value is 16:34serialized and stored right over into a CDN and a cloud object storage, enabling this 16:40asynchronous access from our fans all around the world So when a fan asked a question, you know, 16:46through our agentic system about live or past win probabilities, the system performs some of the 16:52following steps. So first we'll go and we'll retrieve the relevant JSON Likelihood to Win data 16:58that's been created by this system here. And this utilizes the data extractor agents to transform 17:05the data into the semantic objects and we in turn, submit all of the summarized data to an LLM, 17:11and this integration ensures that fans now receive intelligible, data-driven responses 17:16grounded in the probabilistic analytics and powered by this real-time computational system 17:22that we have here. Now the agent-oriented system, it delivers a real-time, intuitive experience by 17:28combining the live scoring data with this AI pipeline that interprets fan queries and updates 17:34the match insights like momentum shifting and the wind predictions. But by this blending the AI 17:41and streaming data, and we combine gen AI with predictive modeling with this smart UX. This is 17:48how we transform that raw data into clear, engaging narratives for tennis fans all around 17:53the world.