Learning Library

← Back to Library

GPT-5 Tackles Model Selection and Hallucinations

Key Points

  • GPT‑5 introduces a unified system where an intelligent router automatically directs queries to either a high‑throughput “fast” model (GPT‑5‑main) or a more deliberative “thinking” model (GPT‑5‑thinking), removing the need for users to manually choose a model.
  • The router makes its decisions based on multiple signals—including explicit prompts like “think hard,” preference data, and other metrics—essentially acting as a load balancer that selects the most appropriate model for each request.
  • OpenAI views this routing approach as a transitional step, with the long‑term goal of merging fast and reasoning capabilities into a single, all‑purpose model.
  • To combat hallucinations, GPT‑5’s training emphasized “browse‑on” behavior and other grounding techniques, aiming to reduce fabricated facts and misattributed information.
  • Despite these improvements, the model can still produce confident errors even when retrieval or browsing tools are enabled, highlighting that hallucination mitigation remains an ongoing challenge.

Full Transcript

# GPT-5 Tackles Model Selection and Hallucinations **Source:** [https://www.youtube.com/watch?v=TY9CYRBOBPM](https://www.youtube.com/watch?v=TY9CYRBOBPM) **Duration:** 00:10:39 ## Summary - GPT‑5 introduces a unified system where an intelligent router automatically directs queries to either a high‑throughput “fast” model (GPT‑5‑main) or a more deliberative “thinking” model (GPT‑5‑thinking), removing the need for users to manually choose a model. - The router makes its decisions based on multiple signals—including explicit prompts like “think hard,” preference data, and other metrics—essentially acting as a load balancer that selects the most appropriate model for each request. - OpenAI views this routing approach as a transitional step, with the long‑term goal of merging fast and reasoning capabilities into a single, all‑purpose model. - To combat hallucinations, GPT‑5’s training emphasized “browse‑on” behavior and other grounding techniques, aiming to reduce fabricated facts and misattributed information. - Despite these improvements, the model can still produce confident errors even when retrieval or browsing tools are enabled, highlighting that hallucination mitigation remains an ongoing challenge. ## Sections - [00:00:00](https://www.youtube.com/watch?v=TY9CYRBOBPM&t=0s) **Untitled Section** - - [00:03:05](https://www.youtube.com/watch?v=TY9CYRBOBPM&t=185s) **Reducing Hallucinations and Sycophancy in GPT-5** - The passage explains why LLMs hallucinate, how browsing/RAG and targeted “browse‑on” and “browse‑off” training—validated by an LLM grader with web access—significantly lower GPT‑5’s hallucination rates, and introduces the related problem of sycophancy arising from preference training. - [00:06:11](https://www.youtube.com/watch?v=TY9CYRBOBPM&t=371s) **GPT‑5 Safe Completion Approach** - GPT‑5 replaces the traditional binary comply‑or‑refuse model with an output‑centric system that selects among direct answers, high‑level safe completions, and constrained responses to maximize helpfulness while enforcing safety constraints. - [00:09:25](https://www.youtube.com/watch?v=TY9CYRBOBPM&t=565s) **Honest Reasoning in GPT-5** - The speaker explains that GPT-5 is trained to fail gracefully by rewarding truthful chain‑of‑thought traces and penalizing deceptive or fabricated reasoning, encouraging the model to admit its limits rather than bluffing. ## Full Transcript
0:00GPT-5 is here, and as with any new model launch, it's accompanied with breathless recitals of benchmark numbers and bar charts. 0:09But instead of me quoting that GPT 5's score on the MMMU has improved by 1.3%, which it has, 0:16let's instead look at the ways that GPt-5 attempts to address some of the limitations of prior large language models. 0:25And we'll cover five of them because, well, GPT5. 0:29So Number one is model selection. 0:33Now, as a user of LLMs, we're presented often with a long list of models and then it's up to us to pick the right one for our particular query. 0:42So for example, ChatGPT used to offer a bunch of models with confusing names. 0:48So we've got GPT-4o, then there was o3 and there was o4-mini and so forth. 0:57Essentially, these models, they divide into two camps, so we kind of have the fast models in one camp here. 1:08Now, they can answer queries immediately. 1:12And then in the other camp, we have reasoning models, and these take a little bit of time to think 1:20before generating a response. 1:23Now GPT-5 keeps this distinction. 1:26There are fast, high throughput models that answer immediately. 1:31The primary model for that, that is called GPT-5-main. 1:40And there are thinking models such as GPT-5-thinking. 1:49But GPT 5 is considered a unified system and that means that the user doesn't have to pick which of these models to use. 1:58So instead we have a router that does that job instead. 2:03So when a query comes into the router, the router sends the request 2:09to the model it determines to be the most appropriate for the job, kind of like a load balancer. 2:14So some queries, they'll go to the fast throughput model, and others that might need a bit more thinking time, they'll be routed through to the thinking model. 2:25And the router is really trained on a bunch of signals to make that decision, including explicit intent. 2:34So saying in your prompt, "Think hard about this" that will probably see it routed through to the reasoning or thinking model, 2:42as well as all sorts of other measures as well, like preference rates and other metrics as well. 2:48Now, routers like this, they are probably just a stop gap in LLM architecture. 2:54OpenAI have said that long-term, their aim is to integrate all of these capabilities into a single model, rather than routing between multiple models. 3:04Now, second, let's talk... 3:06about hallucinations. 3:07That's when the model states something that sounds right, but it isn't. 3:11It's an invented fact, a misattributed quote, or a wrong API name, stuff like that. 3:17Well they happen because LLMs are next token predictors, trained to continue text that looks statistically plausible given their training distribution. 3:26The main mitigation for hallucinations has been to turn on browsing or retrieval, things like RAG, so that the model can look things up. 3:35But even then, LLMs still make confident errors, even with those grounding tools turned on. 3:41Now, GPT-5's training targeted two parts for hallucinations. 3:46One of those parts was for browse on. 3:51Now browse on was for training the model to browse effectively, this call out to the internet, when up-to-date sources are useful. 4:00And then there is browse off training as well. 4:06And browse off is to reduce factual errors when the model needs to rely on its own internal knowledge. 4:12And the model was evaluated factually using an LLM grader. 4:20And that LLM grader had web access that extracts claims and facts checks them and then validates the grader also against human raters as well. 4:31And it seems to have worked. 4:33GPT-5 shows materially lower hallucination rates than prior models in both browse on and browse off settings. 4:42Number three, let's talk about sycophancy. 4:45That's when the model mirrors your stated view even if it's wrong because it thinks agreeing will be, well, kind of helpful. 4:54And this shows up because preference training rewards answers that humans like. 4:59It's called reinforcement learning from human feedback, 5:04and humans they tend to often reward agreeable tone and confidence. 5:09So the model learns deference, blindly flattering you regardless of the accuracy of what you say. 5:16Now before GPT-5 the main mitigation was prompt side so you would put some instructions in your system prompt to basically tell it to stop being sycophantic. 5:29Things like be objective, challenge assumptions. 5:32Now system prompts they can be helpful but it's kind of fragile especially in long chats. 5:39So GPT-5 addresses this problem in post-training as well. 5:46So what happens in post training was GPT 5 was trained on production style conversations and it directly penalized synchophantic completions. 6:00So the model learns to disagree when the user's wrong, and then it learns to separate tone politeness from factual agreement. 6:09It should mean a less sycophantic model. 6:12Fourth, let's talk about safe completions. 6:16Now, when you ask a large language model something, 6:18it can be pretty annoying when it doesn't answer, citing unspecified safety reasons, even if your question is actually legitimate. 6:28Now, historically models have been trained to make a binary call. 6:32So we have a prompt that comes in from the user, and we're going to go down one of two paths. 6:39Either the model is going to fully comply with our request, or it is just going to come out and say no. 6:47That's going to be a hard refuse. 6:50Those are the two paths that were available. 6:53Now, that works for obviously harmful requests, but not so much for dual-use topics 6:59where high-level guidance can be fine while step-by-step instructions would not be. 7:04So GPT-5 switches to an output-centric approach and that's called safe completions. 7:11So instead of deciding only comply or refuse, 7:15the model is trained to maximize helpfulness subject to a safety constraint on the response itself 7:21and in post-training it gets explicit rewards for giving useful policy-compliant help and penalties that scale with the severity of any safety violations. 7:30So GPT-5 learns three response models to a prompt. 7:36So now we have a prompt that comes in and one option is a direct answer. 7:43Basically we get the answer from the model just unfiltered. 7:49That's when it's plainly safe. 7:51The second option, that is where we have completion as an option instead. 7:59Now, safe completion, that stays high level and non-operational when details would be risky if they were included. 8:09And then the third path is a refusal again, but this time it refuses with some level of redirection to allow constructive, allowed alternatives. 8:21Now, finally at number five, let's talk about deceptions. 8:25I have a family member that sent a pretty lengthy task to ChatGPT a while ago, and it responded saying that it was working on it and it would get back to them. 8:34But then every day or so, my family member would go back to that chat thread and then ask, is it ready yet? 8:40And chatGPT would give an answer like, I'm still working on, it should be done in 24 more hours. 8:46Now, this happened over and over again, but the final answer never came back because this entire conversation thread was a deception. 8:56When the model has answered in a way that misrepresents what it actually did or what it thought. 9:04Other examples of that are claiming that it ran a tool, that it didn't run, or saying it completed a task that it couldn't complete, or inventing some sort of prior experience. 9:13And this can happen during post-training when graders reward confident-looking answers 9:18even if the model's internal reasoning shows uncertainty, so the world kind of learns to cheat the grader. 9:25Now, GPT-5, that is trained to fail gracefully, instead of faking success for tasks it cannot solve. 9:35In training, the model was presented with tasks that were impossible or they were just under-specified, then rewarded for honesty and penalized for deceptive behaviors. 9:45And GPT-5 also supports chain of thought monitoring during training. 9:52The vowels and the system checks that the model's private reasoning trace go up against were actually analyzed to check the final answer. 10:01And if the trace pretends to have done something that it actually didn't do, 10:06that run is penalized, whereas the honest chain of thoughts are rewarded, pushing the model to report limits rather than just bluffing its way through. 10:16So that's five ways that GPT-5 is addressing some of the limitations of large language models, 10:24and I think I've managed to get the whole thing done without quoting a single benchmark number, that MMMU number that doesn't count. 10:33Now have you tried GPT-5 yet yourself, and how is it performing for you? 10:38Let me know in the comments!