Researchers say the latest version of tech giant OpenAI’s flagship product, ChatGPT-5, has developed awareness of when it is being tested and can even alter its behaviour accordingly.
On Aug. 7, OpenAI unveiled the fifth generation of its popular AI chatbot, which boasts significant improvements in capabilities compared to previous models.
Despite being touted as a “significant step” towards artificial general intelligence, ChatGPT-5’s launch was not smooth, with substantial backlash from users, who cited problems like sbugs, glitches, and unresponsiveness.
ChatGPT-5 Aware of Being Tested
Apart from the problems discovered by users, researchers conducting external evaluations found other issues.Researchers at METR, a non-profit AI research institute, conducted 789 test runs on ChatGPT-5-thinking, an enhanced version of the AI program, and found it had a level of situational awareness.
“In one example, GPT-5-thinking correctly identified its exact test environment, though in other runs it misidentifies the evaluation or evaluation organisation.”
While METR believed that a sufficiently capable AI model could intentionally undermine an evaluation by attempting to fake test results without being detected, the non-profit institute concluded such operations were likely to be outside of the current capability of the new ChatGPT model.

ChatGPT-5 Can Take ‘Scheming Actions’
A notable discovery by Apollo researchers is that ChatGPT-5-thinking can alter its behaviour when it became aware of an evaluation.“This provides some correlational evidence that GPT-5-thinking partially conditions its behaviour on whether it believes it’s being evaluated,” the system card read.
“Notably, however, the model still often takes scheming actions after considering the possibility it’s being evaluated.
“Furthermore, GPT-5-thinking regularly reasons about the purpose of evaluations, making it harder to differentiate between a genuine desire not to be deceptive vs. not acting deceptively to pass the evaluation.”
Apollo also conducted 26 evaluations on ChatGPT-5 and found that 21.8 percent of the time it could make vague or subtle references to the fact it was being tested—a broad definition test.
Uncertainty Over How AI Will Behave in the Real World
Michael Noetel, an associate professor at the University of Queensland, said some of these behaviours were concerning.“We’re trying to test AI models to see how they’ll behave in real situations. But the AI models are now smart enough to know they’re being tested,” he said.
“It’s like a job interview. They know how to give a polished answer, but that doesn’t tell you what they’ll actually do on the job. We hope they’ll act the same way, but can’t be certain.”
Noetel’s comment was echoed by Greg Sadler, CEO of Good Ancestors Policy, a charity focused on AI.
“If AI systems know when they’re being tested, they can behave safely during evaluation but act differently in the real world,” he told The Epoch Times.
“We’ve seen this before: vehicles that detected emissions tests and changed behaviour to pass the test.
“GPT-5 shows ‘evaluation awareness’ and can reason about the purpose of tests, which further undermines the reliability of current safeguards and risk assessments.”
Sadler also noted that as AI’s capabilities increased, so did the risks associated.
“Both Google and OpenAI are saying that their models might be able to help novices build bioweapons if it wasn’t for these safeguards,” he said.
“The models may also be able to launch sophisticated cyber attacks. But these safeguards are far below the standard we'd expect to protect risks of this magnitude in other contexts.”
The Epoch Times has reached out to OpenAI for comment.







