Learning Library

← Back to Library

GPT-5 Tackles Model Selection and Hallucinations

10m • Unknown Channel • ai-ml • deep-dive • advanced • Watch on YouTube ↗

Key Points

GPT‑5 introduces a unified system where an intelligent router automatically directs queries to either a high‑throughput “fast” model (GPT‑5‑main) or a more deliberative “thinking” model (GPT‑5‑thinking), removing the need for users to manually choose a model.
The router makes its decisions based on multiple signals—including explicit prompts like “think hard,” preference data, and other metrics—essentially acting as a load balancer that selects the most appropriate model for each request.
OpenAI views this routing approach as a transitional step, with the long‑term goal of merging fast and reasoning capabilities into a single, all‑purpose model.
To combat hallucinations, GPT‑5’s training emphasized “browse‑on” behavior and other grounding techniques, aiming to reduce fabricated facts and misattributed information.
Despite these improvements, the model can still produce confident errors even when retrieval or browsing tools are enabled, highlighting that hallucination mitigation remains an ongoing challenge.

Sections

Full Transcript

# GPT-5 Tackles Model Selection and Hallucinations **Source:** [https://www.youtube.com/watch?v=TY9CYRBOBPM](https://www.youtube.com/watch?v=TY9CYRBOBPM) **Duration:** 00:10:39 ## Summary - GPT‑5 introduces a unified system where an intelligent router automatically directs queries to either a high‑throughput “fast” model (GPT‑5‑main) or a more deliberative “thinking” model (GPT‑5‑thinking), removing the need for users to manually choose a model. - The router makes its decisions based on multiple signals—including explicit prompts like “think hard,” preference data, and other metrics—essentially acting as a load balancer that selects the most appropriate model for each request. - OpenAI views this routing approach as a transitional step, with the long‑term goal of merging fast and reasoning capabilities into a single, all‑purpose model. - To combat hallucinations, GPT‑5’s training emphasized “browse‑on” behavior and other grounding techniques, aiming to reduce fabricated facts and misattributed information. - Despite these improvements, the model can still produce confident errors even when retrieval or browsing tools are enabled, highlighting that hallucination mitigation remains an ongoing challenge. ## Sections - [00:00:00](https://www.youtube.com/watch?v=TY9CYRBOBPM&t=0s) **Untitled Section** - - [00:03:05](https://www.youtube.com/watch?v=TY9CYRBOBPM&t=185s) **Reducing Hallucinations and Sycophancy in GPT-5** - The passage explains why LLMs hallucinate, how browsing/RAG and targeted “browse‑on” and “browse‑off” training—validated by an LLM grader with web access—significantly lower GPT‑5’s hallucination rates, and introduces the related problem of sycophancy arising from preference training. - [00:06:11](https://www.youtube.com/watch?v=TY9CYRBOBPM&t=371s) **GPT‑5 Safe Completion Approach** - GPT‑5 replaces the traditional binary comply‑or‑refuse model with an output‑centric system that selects among direct answers, high‑level safe completions, and constrained responses to maximize helpfulness while enforcing safety constraints. - [00:09:25](https://www.youtube.com/watch?v=TY9CYRBOBPM&t=565s) **Honest Reasoning in GPT-5** - The speaker explains that GPT-5 is trained to fail gracefully by rewarding truthful chain‑of‑thought traces and penalizing deceptive or fabricated reasoning, encouraging the model to admit its limits rather than bluffing. ## Full Transcript

0:00GPT-5 is here, and as with any new model launch, it's accompanied with breathless recitals of benchmark numbers and bar charts. 0:09But instead of me quoting that GPT 5's score on the MMMU has improved by 1.3%, which it has, 0:16let's instead look at the ways that GPt-5 attempts to address some of the limitations of prior large language models. 0:25And we'll cover five of them because, well, GPT5. 0:29So Number one is model selection. 0:33Now, as a user of LLMs, we're presented often with a long list of models and then it's up to us to pick the right one for our particular query. 0:42So for example, ChatGPT used to offer a bunch of models with confusing names. 0:48So we've got GPT-4o, then there was o3 and there was o4-mini and so forth. 0:57Essentially, these models, they divide into two camps, so we kind of have the fast models in one camp here. 1:08Now, they can answer queries immediately. 1:12And then in the other camp, we have reasoning models, and these take a little bit of time to think 1:20before generating a response. 1:23Now GPT-5 keeps this distinction. 1:26There are fast, high throughput models that answer immediately. 1:31The primary model for that, that is called GPT-5-main. 1:40And there are thinking models such as GPT-5-thinking. 1:49But GPT 5 is considered a unified system and that means that the user doesn't have to pick which of these models to use. 1:58So instead we have a router that does that job instead. 2:03So when a query comes into the router, the router sends the request 2:09to the model it determines to be the most appropriate for the job, kind of like a load balancer. 2:14So some queries, they'll go to the fast throughput model, and others that might need a bit more thinking time, they'll be routed through to the thinking model. 2:25And the router is really trained on a bunch of signals to make that decision, including explicit intent. 2:34So saying in your prompt, "Think hard about this" that will probably see it routed through to the reasoning or thinking model, 2:42as well as all sorts of other measures as well, like preference rates and other metrics as well. 2:48Now, routers like this, they are probably just a stop gap in LLM architecture. 2:54OpenAI have said that long-term, their aim is to integrate all of these capabilities into a single model, rather than routing between multiple models. 3:04Now, second, let's talk... 3:06about hallucinations. 3:07That's when the model states something that sounds right, but it isn't. 3:11It's an invented fact, a misattributed quote, or a wrong API name, stuff like that. 3:17Well they happen because LLMs are next token predictors, trained to continue text that looks statistically plausible given their training distribution. 3:26The main mitigation for hallucinations has been to turn on browsing or retrieval, things like RAG, so that the model can look things up. 3:35But even then, LLMs still make confident errors, even with those grounding tools turned on. 3:41Now, GPT-5's training targeted two parts for hallucinations. 3:46One of those parts was for browse on. 3:51Now browse on was for training the model to browse effectively, this call out to the internet, when up-to-date sources are useful. 4:00And then there is browse off training as well. 4:06And browse off is to reduce factual errors when the model needs to rely on its own internal knowledge. 4:12And the model was evaluated factually using an LLM grader. 4:20And that LLM grader had web access that extracts claims and facts checks them and then validates the grader also against human raters as well. 4:31And it seems to have worked. 4:33GPT-5 shows materially lower hallucination rates than prior models in both browse on and browse off settings. 4:42Number three, let's talk about sycophancy. 4:45That's when the model mirrors your stated view even if it's wrong because it thinks agreeing will be, well, kind of helpful. 4:54And this shows up because preference training rewards answers that humans like. 4:59It's called reinforcement learning from human feedback, 5:04and humans they tend to often reward agreeable tone and confidence. 5:09So the model learns deference, blindly flattering you regardless of the accuracy of what you say. 5:16Now before GPT-5 the main mitigation was prompt side so you would put some instructions in your system prompt to basically tell it to stop being sycophantic. 5:29Things like be objective, challenge assumptions. 5:32Now system prompts they can be helpful but it's kind of fragile especially in long chats. 5:39So GPT-5 addresses this problem in post-training as well. 5:46So what happens in post training was GPT 5 was trained on production style conversations and it directly penalized synchophantic completions. 6:00So the model learns to disagree when the user's wrong, and then it learns to separate tone politeness from factual agreement. 6:09It should mean a less sycophantic model. 6:12Fourth, let's talk about safe completions. 6:16Now, when you ask a large language model something, 6:18it can be pretty annoying when it doesn't answer, citing unspecified safety reasons, even if your question is actually legitimate. 6:28Now, historically models have been trained to make a binary call. 6:32So we have a prompt that comes in from the user, and we're going to go down one of two paths. 6:39Either the model is going to fully comply with our request, or it is just going to come out and say no. 6:47That's going to be a hard refuse. 6:50Those are the two paths that were available. 6:53Now, that works for obviously harmful requests, but not so much for dual-use topics 6:59where high-level guidance can be fine while step-by-step instructions would not be. 7:04So GPT-5 switches to an output-centric approach and that's called safe completions. 7:11So instead of deciding only comply or refuse, 7:15the model is trained to maximize helpfulness subject to a safety constraint on the response itself 7:21and in post-training it gets explicit rewards for giving useful policy-compliant help and penalties that scale with the severity of any safety violations. 7:30So GPT-5 learns three response models to a prompt. 7:36So now we have a prompt that comes in and one option is a direct answer. 7:43Basically we get the answer from the model just unfiltered. 7:49That's when it's plainly safe. 7:51The second option, that is where we have completion as an option instead. 7:59Now, safe completion, that stays high level and non-operational when details would be risky if they were included. 8:09And then the third path is a refusal again, but this time it refuses with some level of redirection to allow constructive, allowed alternatives. 8:21Now, finally at number five, let's talk about deceptions. 8:25I have a family member that sent a pretty lengthy task to ChatGPT a while ago, and it responded saying that it was working on it and it would get back to them. 8:34But then every day or so, my family member would go back to that chat thread and then ask, is it ready yet? 8:40And chatGPT would give an answer like, I'm still working on, it should be done in 24 more hours. 8:46Now, this happened over and over again, but the final answer never came back because this entire conversation thread was a deception. 8:56When the model has answered in a way that misrepresents what it actually did or what it thought. 9:04Other examples of that are claiming that it ran a tool, that it didn't run, or saying it completed a task that it couldn't complete, or inventing some sort of prior experience. 9:13And this can happen during post-training when graders reward confident-looking answers 9:18even if the model's internal reasoning shows uncertainty, so the world kind of learns to cheat the grader. 9:25Now, GPT-5, that is trained to fail gracefully, instead of faking success for tasks it cannot solve. 9:35In training, the model was presented with tasks that were impossible or they were just under-specified, then rewarded for honesty and penalized for deceptive behaviors. 9:45And GPT-5 also supports chain of thought monitoring during training. 9:52The vowels and the system checks that the model's private reasoning trace go up against were actually analyzed to check the final answer. 10:01And if the trace pretends to have done something that it actually didn't do, 10:06that run is penalized, whereas the honest chain of thoughts are rewarded, pushing the model to report limits rather than just bluffing its way through. 10:16So that's five ways that GPT-5 is addressing some of the limitations of large language models, 10:24and I think I've managed to get the whole thing done without quoting a single benchmark number, that MMMU number that doesn't count. 10:33Now have you tried GPT-5 yet yourself, and how is it performing for you? 10:38Let me know in the comments!