
Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- OpenAI trained GPT-5 Thinking to confess to misbehavior.
- It’s an early study, but it could lead to more trustworthy LLMs.
- Models will often hallucinate or cheat due to mixed objectives.
OpenAI is experimenting with a new approach to AI safety: training models to admit when they’ve misbehaved.
In a study published Wednesday, researchers tasked a version of GPT-5 Thinking, the company’s…

![[CITYPNG.COM]White Google Play PlayStore Logo – 1500×1500](https://startupnews.fyi/wp-content/uploads/2025/08/CITYPNG.COMWhite-Google-Play-PlayStore-Logo-1500x1500-1-630x630.png)