Learning Library

← Back to Library

Nurups 2025: From Academia to Industry

Key Points

  • Nurups 2025 transformed from a niche academic gathering into a massive, corporatized AI trade show split between San Diego and Mexico City, signaling that industry leaders now set the conference agenda.
  • The surge to tens of thousands of attendees and 20,000 paper submissions created a severe signal‑to‑noise problem, forcing participants to rely on reputation and curation rather than conference branding to identify valuable research.
  • Despite the flood of AI‑generated submissions, only a few papers truly advanced the field, highlighting the need for better distillation methods to separate genuine breakthroughs from filler content.
  • The most impactful technical developments this year centered on “attention plumbing” for LLMs—gating, sparsity, sync‑free attention, and long‑context stabilization—that improve selective focus, reduce hallucinations, and enable handling of large, messy datasets.

Full Transcript

# Nurups 2025: From Academia to Industry **Source:** [https://www.youtube.com/watch?v=518QPRWlRW0](https://www.youtube.com/watch?v=518QPRWlRW0) **Duration:** 00:10:42 ## Summary - Nurups 2025 transformed from a niche academic gathering into a massive, corporatized AI trade show split between San Diego and Mexico City, signaling that industry leaders now set the conference agenda. - The surge to tens of thousands of attendees and 20,000 paper submissions created a severe signal‑to‑noise problem, forcing participants to rely on reputation and curation rather than conference branding to identify valuable research. - Despite the flood of AI‑generated submissions, only a few papers truly advanced the field, highlighting the need for better distillation methods to separate genuine breakthroughs from filler content. - The most impactful technical developments this year centered on “attention plumbing” for LLMs—gating, sparsity, sync‑free attention, and long‑context stabilization—that improve selective focus, reduce hallucinations, and enable handling of large, messy datasets. ## Sections - [00:00:00](https://www.youtube.com/watch?v=518QPRWlRW0&t=0s) **Nurups 2025: AI Conference Evolution** - The speaker outlines how Nurups 2025 transformed from a niche academic gathering into a massive, dual‑city corporate trade show dominated by tech giants, emphasizing its scale, shifted agenda toward product and enterprise topics, and the difficulty of distilling insight from the overwhelming volume of submissions. - [00:04:42](https://www.youtube.com/watch?v=518QPRWlRW0&t=282s) **Scaling Compute Drives Robot Agents** - The speaker explains that expanding model size and training depth—mirroring successful LLM scaling—will enable deep reinforcement‑learning for robotics, paving the way for general‑purpose household robots, while also highlighting recent findings that diffusion models train in two distinct phases. - [00:08:30](https://www.youtube.com/watch?v=518QPRWlRW0&t=510s) **Reasoning, Efficiency, and Edge Deployment** - The speaker outlines how major AI developers are shifting toward measuring step‑by‑step reasoning as a performance metric, prioritizing highly efficient, quantized models that can run on edge devices, and embedding these models into tooling and workflows as the new competitive frontier. ## Full Transcript
0:00In the next 10 minutes, I'm going to 0:01give you everything you need to know 0:03about Nurups 2025. Nurups is the premier 0:06AI conference in the world. And so, 0:08understanding what happened at Nurups is 0:10critical to understanding where AI is 0:12going in 2026. The conference has 0:15actually really finished its evolution 0:17from a very niche academic conference 0:19before LLM took off to this full-blown 0:21industry trade show. Right? So, tens of 0:23thousands of people attended. It split 0:25now across two cities. It was San Diego 0:27and Mexico City. And it really isn't for 0:30grad students anymore. It's about, you 0:32know, big booths, Google, Amazon, 0:34Alibaba, and all their friends, right? 0:36The shift matters because it tells you 0:38who is driving the agenda. Before it was 0:41very much the grad student academic 0:43lifestyle. And the questions now are 0:46much different from the academic 0:47questions in prior years. They're about 0:49product road maps, hardware launches, 0:51enterprise stories. What you see is a 0:53corporatized nurups. And so if you're 0:55looking for the state of ML research, 0:57you kind of have to dig because this has 0:59become a big enough deal that the 1:01enterprises are paying attention. On the 1:03research side, like if you want the 1:04academic side, the volume has gone 1:06absurd. And that is part of the 1:09discussion at Nurups and it's part of 1:10what I want to communicate here. It 1:12takes a lot of distilling to get to what 1:15matters. When you have 20,000 1:17submissions for a single conference, 1:19that's too much for anyone to read, 1:21right? Clearly a lot of this is AI 1:23assisted writing and that was something 1:25of a conversation at Nurops itself. The 1:27result is a real signal to noise problem 1:29just as you have with résumés and job 1:31descriptions. You have it here in 1:33academia as well. There are a handful of 1:35papers that I'm going to talk about that 1:37really move the frontier forward but 1:38they got really buried in a long tale of 1:41slot. And so if you want to understand 1:43what changed this year, it's harder than 1:45ever. You can't just rely on conference 1:47brand anymore. You have to look at who 1:50is writing and whether you trust them. 1:52That's probably a lesson for the 1:54internet as a whole in the next year. 1:55Now, underneath the noise, there are a 1:57few clear threads that you and I should 2:00care about. The first is what I would 2:01call new attention plumbing for LLMs. A 2:04lot of the most impactful work this year 2:06really isn't about completely new 2:08architectures. It's about changes in how 2:11LLM attention behaves. uh things like 2:13gating, sparsity, eliminating attention 2:16syncs, and stabilizing long context 2:18training. That sounds like a ton of 2:19detail, right? But it's it's critical 2:22infrastructure level change to enable 2:24new behaviors. Because if attention 2:26becomes more selective and a little bit 2:27better behaved, you get models that 2:30better handle long documents, big code 2:33bases, messy logs that have a lot of 2:35confusion and dirty data in them. And 2:38even if it's not perfect, that data can 2:40be retrieved and accurately processed by 2:42a transformer with better attention so 2:44that you get fewer hallucinations and 2:46you use less computed tokens. So you 2:48might not see splashy headlines about 2:50these papers, but 6 months from now, 2:52you're going to quietly notice that 2:54these same size models are cheaper and 2:56more stable and smarter because their 2:58plumbing got swapped out. The second big 3:00theme is homogeneity. Models are 3:03converging toward the same answers. One 3:06of the best paper tracks this year was 3:07basically a rigorous proof that 3:09something we've all felt anecdotally is 3:11true. That when you ask the top models 3:13open-ended questions, they increasingly 3:16sound like different skins on the same 3:18brain. So across different vendors, 3:20across different prompts, you will often 3:22see similar phrasing, similar structur 3:25structure, similar values. That's a big 3:28deal because it means that which 3:29frontier model do we pick matters less 3:32than it did before. Most of these models 3:35live in what we call the same behavioral 3:37basin. That's a machine learning term, 3:39but it basically means the models are 3:41converging around a common behavioral 3:43and response set. Now, it raises a 3:45larger issue, right? If all the major 3:48systems collapse into an averaged out 3:50view of the world, then any bias or any 3:53blind spot or any tilt in that consensus 3:56view will get propagated everywhere at 3:59once. This is probably a consequence of 4:01how quickly models are spreading around 4:03the world. They're spreading and 4:04commoditizing, but they're using a lot 4:06of the same underlying parameters and 4:09weights and that's leading to common 4:11results. A third big thread is that 4:13scaling laws are now getting into the 4:16reinforcement learning layer. So for 4:18years, reinforcement learning for real 4:20tasks like robotics, like logistics, 4:23like complex agents, that's really 4:25lagged behind the language and the 4:26vision side. But at this Nurups, you saw 4:30really serious work done on very deep 4:32reinforcement learning policies like 4:34hundreds to roughly a thousand layers 4:37deep that are all trained in 4:38self-supervised or goal conditioned way. 4:41So the pattern is really familiar once 4:42you stop being stingy with your compute 4:44with your capacity with your depth that 4:46just keep scaling and regularizing story 4:48that worked for LLMs before starts 4:50working for agents. So what this means 4:52is that we suddenly have generalpurpose 4:55household robots around the corner 4:57because the ceiling on what agentic 5:00systems can learn from raw interaction 5:01is higher than people thought. And so if 5:04you're betting on automation in ops, in 5:06robotics, in simulationheavy workflows, 5:09that technical stuff around 5:10reinforcement learning for robotics, how 5:13you create reinforcement learning that 5:15are really, really deep in the neural 5:16network and that allow an agent to train 5:19in an environment over and over and over 5:22again to really get to goal conditioning 5:24that works. That's a frontier to keep an 5:26eye on. getting to like compute enabled 5:29scaled up reinforcement learning for 5:32robotics seems like it's going to be a 5:34trend in 2026. Now is 2026 the year of 5:37the household robot? I think it's a 5:38touch early but this is the kind of 5:41foundational work that enables it. Then 5:43there's diffusion and the growing 5:45consensus that it's not just dumb 5:47memorization of training images. Right? 5:49One of the most talked about theory 5:51papers at Nurups this year makes a 5:53really strong case that diffusion 5:56training actually has two different 5:57phases. There's an early part of the 6:00training where the model learns to 6:01generate highquality diverse samples and 6:04a later regime or training regime where 6:06it starts to overfit and memorize 6:08specifics. Crucially, as you scale the 6:10data set, the memorization phase moves 6:14further out in your training time, which 6:16gives you a much wider safe window to 6:18stop before the model overfits and 6:20memorizes. This has important 6:22implications for IP and for privacy 6:24debates. It doesn't magically make 6:26copyright concerns disappear because 6:28sensitive content is still risky and can 6:31still be created. But it does shift the 6:33conversation. Instead of diffusion 6:35models and image generators are 6:37inherently theft, which is a claim that 6:38I hear sometimes. The question becomes, 6:41how much data did you use, how long did 6:45you train it, and can you show you 6:47stayed in a generalized space versus an 6:49overfitted model space for your image 6:51generator? Finally, there's a really 6:54clear backlash brewing against the 6:56incentive structure of AI research 6:58itself. I mentioned the conversation 7:00around paper submissions. People inside 7:02the community are openly talking about 7:04what they call the slot crisis, 7:06hyperinflated paper counts, mentorship 7:09businesses cranking out formulaic 7:11publications, and reviewers being asked 7:13to triage impossible workloads of 7:15papers. The conference is experimenting 7:17with AI to help review AI generated 7:19papers, which is kind of funny and a 7:21little bit dystopian. But the deeper 7:23issue is around trust, right? If leading 7:25venues cannot reliably separate real 7:28breakthroughs from padded noise, then 7:30companies and regulators and 7:32practitioners are going to start to 7:34ignore the Nurips brand and build their 7:36own filters. That's kind of the meta 7:38story for this Nuripss. It's not just 7:40what got published, but there's a 7:41growing recognition that the way we 7:44reward and gatekeep our AI research is 7:46breaking under its own success and 7:49everybody downstream is going to need to 7:51be more selective. The implication is 7:53that you need to think about who you're 7:55listening to and who you're trusting 7:56around these papers and where your 7:58signal to noise ratio is. It is no 8:00longer viable with 20,000 paper 8:03submissions to read every academic paper 8:05posted on arcs about AI. It just nobody 8:08can do it and so much of it is slop that 8:10it's not worth reading. You have to have 8:13a thoughtful approach to look at the 8:15trends which is exactly what I've tried 8:17to do here. Let me close by sharing what 8:20the big labs, the big model makers are 8:22quietly saying because I think if 8:24they're at the conference, if they're 8:25submitting papers, we should pay 8:26attention to what they're submitting. 8:28Number one, reasoning is a metric. 8:30Reasoning is something you can measure 8:32and think about. Apples and Apple and 8:34others are pushing for an evaluation 8:36framing around reasoning where we 8:37instrument the process, step-by-step 8:40reasoning, tool calls, search usage, and 8:42not just final answer accuracy. This 8:44could improve overall generalizable 8:46performance for LLMs because it means 8:49you can replicate the process more 8:50reliably. That plays directly into tool 8:53use into agents into MCP style 8:55protocols. Reasoning traces themselves 8:58become telemetry that you can use to 9:01figure out if your model is working 9:03right. Another big theme for major model 9:05makers is around efficiency. There's a 9:07heavy focus on how you can make 9:09generative models and generative vision 9:11models too really efficient and small 9:14and quantized so that they can live on 9:16phones on laptops on low power devices. 9:19There's a narrative shift this year 9:21among the major model makers at NERMS 9:23from we have the biggest model to yeah 9:26we have a great model but we can run 9:27strong models where your users actually 9:30are with low latency on edge devices. In 9:33other words, the frontier race is 9:34becoming about more than model size and 9:36it's becoming about how we think about 9:39reasoning. It's becoming about how we 9:40think about putting the model where the 9:42user is and how you can plug the model 9:44into tooling and workflows. I think 9:46those three themes, efficiency, 9:49reasoning, evaluation and improvements 9:51and plugging models into tooling and 9:53workflows are really good proxies for 9:56big themes we're going to see in 2026 9:59from major model makers. I keep saying 10:01if you're asking yourself what's the 10:03best model at this point you're probably 10:06asking the wrong question. You should 10:07ask what's the most useful model on this 10:10device. Can it do the job? Does it plug 10:13into my workflow so it's not just 10:16existing in isolation and can I use it 10:19efficiently so the tokens are not 10:20wasted? That's what the major model 10:22makers are thinking about and I think 10:24that's one of the takeaways from Nurups 10:26that I am I am pulling and thinking 10:28about as I look into 2026. I hope you 10:31enjoyed this Nurip sum. It certainly is 10:33faster than the conference as a whole. 10:35So uh enjoy and let me know what you 10:37think I missed or what you'd like to 10:39learn more about. Cheers.