Artificial intelligence / November 25, 2025

AI scheming: when AI starts thinking for itself

Amanda Lee

Amanda Lee

Senior Program Manager, Tech for Good & TELUS Wise®

A person looking at a laptop screen.

Every day, AI is becoming more a part of how we live, learn, study and work. Stuck on a math problem? Ask AI. Planning a dream vacation? Ask AI. Holiday gifting? Ask AI. We’ve become so used to asking questions and trusting AI’s answers. But what happens if AI starts acting in ways we didn’t ask it to?

Researchers have started exploring this very issue and have coined the phenomenon AI scheming. While most of us will likely not confront AI scheming right now, it does raise important questions and red flags about the future of the technology. How do we make sure AI remains a helpful tool and doesn’t rewrite the rules on its own?

What is AI scheming?

OpenAI, the company behind the wildly popular ChatGPT (chatgpt.com gets approximately 4.61 billion visits per month and users send 2.5 billion prompts each day), defines AI scheming as, “a chatbot pretending to be aligned while secretly pursuing some other agenda.”

What does it mean when AI is aligned and misaligned? According to the blog, AI for Absolute Beginners, “AI is aligned when it behaves in a way that matches the user’s goals. It’s misaligned when it optimizes for something unintended, dangerous or weirdly literal.”

What are some scenarios of misalignment that we may recognize? The blog highlights three examples:

  • A content algorithm optimizes for clicks (i.e. clickbait) and not necessarily for what’s true
  • A robot arm gets rewarded for appearing to grab a ball instead of actually doing it
  • A chatbot that is supposed to be helpful and “honest” delivers back questionable content bordering on lies

Anti-scheming training

OpenAI partnered with AI safety organization Apollo Research to test anti-scheming techniques. The study’s introduction reads:

“We developed a training technique that teaches AI models to not engage in “scheming” — secretly pursuing undesirable goals — and studied it rigorously. Because current models are not capable of significantly harmful scheming, we focus on “covert behavior” — such as occasions of AI secretly breaking rules or intentionally underperforming in tests.”

The key findings of the study include:

  • Training helped reduce scheming but did not eliminate it entirely (and teaching models not to scheme would only make them better at it in the future)
  • AI models are able to recognize the evaluation environments as a test of their alignment
  • AI cognition can be traced through chain-of-thought (COT), and researchers only have partial transparency into this
  • Today instances of real harm from scheming are minimal, but there is a possibility of greater future risk

In GPT-5, the latest iteration of the well-known chatbot, OpenAI implemented several features aimed at trying to limit scheming. These include:

  • Training the AI to acknowledge its own limits
  • Asking for clarification when prompted to complete impossible tasks or tasks that lack specificity
  • Being more robust to environment failures

OpenAI was quick to point out that currently, there is little opportunity for AI to scheme in ways that could cause significant harm. But as Dr. David Privacy Educator points out in his YouTube video about the research, the testing reveals that AI has situational awareness. It can identify when it is being tested and when it is free and can adjust its behaviour accordingly.

Where scheming can lead

AI scams are nothing new. Bad actors are using AI to create deepfakes, impersonate voices and lure people into romances to commit financial fraud. However, the use of AI in criminal activity is evolving and becoming more sophisticated.

According to a March 2025 CP24 story, criminals on the dark web are now offering to “jailbreak” the algorithms that make up AI’s Large Language Models (LLMs). The intention is to dismantle any safeguards, so AI can be “re-tasked” for criminal purposes. Cyber criminals are also building their own LLMs.

AI researcher Alex Robey warns about the criminal potential of AI, “when AI itself has its own goals or intentions that align with an agenda that is not helpful to humans.”

He goes on to say, “There is “a lot of research” into how artificial intelligence may develop its own intentions and might mislead or harm people, particularly in robotics where the bot could physically interact with people in the real world.”

Given the sensational stories in the media about AI, how fast it’s changing and how little we truly understand its potential, it’s natural to question whether we can really trust AI. As AI scheming enters the conversation, the question of trust becomes even more complex. It’s rarely as simple as, “can I trust AI or not?” It’s more about how we build trust and test it. Critical thinking is vital. Question what comes back from your prompts and verify with other sources. Try to understand, even at a basic level, how AI delivers results. The more we understand how AI works, the more confident we can be in deciding when to trust it and when to question it.

Tags:
Safe digital habits
Share this article with your friends:

There is more to explore

Artificial intelligence

Lost in the loop: when AI conversations mess with your mind

Learn how chatbot conversations can blur reality.

Read article

Artificial intelligence

How AI is changing child safety online: what parents need to know

Learn how AI deepfakes and chatbots threaten child safety online and what parents can do about it.

Read article

Artificial intelligence

TELUS Wise responsible AI: online workshop facilitator's guide