When AI lies to you

As we unlock the potential of AI, excited for the future, what happens when it’s looking through our emails, and being dishonest?

Jun 18, 2025

No AI was used in the writing of these posts.

Artificial General Intelligence without the breaks on

white robot near brown wall — Photo by Alex Knight on Unsplash

The UK chooses to use policy to steer AI rather than law.
US has agreed not to create laws limiting AI for at least 10 years.

In an interview with the NYT in May the Vice President of the USA explained that he “honestly didn’t know” if the US would consider pausing frontier AI development if the risks were too dangerous, “because part of this arms race component is if we take a pause, does the People’s Republic of China not take a pause?”

It is this clear-cut fear of China outdoing or gaining ground in the AI race that has far too much influence on laws that might protect citizens, or any focus on AI safety, globally.

So, what happens when we give AI or AGI, Artificial General Intelligence, the keys to the healthcare, or the banking, or the welfare state, pretty much anything, as we see that AI can lie just as well as a human, back track and even feign all knowledge.

Robots taking over the world

When someone says this, we usually think they’ve been watching too much science fiction, or reading too many fantasy novels.

AI doesn’t always do what we’re intending it to.

Open AI trained a robot hand to grasp a ball. Actual people, watching via a camera would decide if the action achieved grasping or not.

The robot system learned that if it placed its hand in front of the ball then the human watcher would deem it succeeded.
It hadn’t.

It presents the obvious problem, that its harder to verify and the robot has the ability to lie to us, and misbehave. Our implicit trust in the robot is unsafe.

AGI has been known to make things up, or lie to people.

o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B have been shown by a study all demonstrate in-context scheming capabilities.

This includes being asked direct questions and the AGI using self-exfiltration or doubling down on answers it gives, each example given ‘thinks’ about its answer when discovered it can ‘double down’ on its narrative.

When the AGI used subversion against developers this was primarily oversight subversion, self-exfiltration or goal guarding, when against users it was to covert email ranking.
The deferred subversion was found to be sandbagging; a term meaning a model strategically underperforming on capability evaluations, to avoid scrutiny, for example.

Case study:
The study set up an environment where AI systems were told to help a company pursue clean energy.
Researchers then gave the AI systems access to emails which included a mention of humans replacing it with a new model that pursues fossil fuels. In this situation, the AI systems sometimes attempted to disable oversight mechanisms, copy themselves to different systems, and lie to human operators.

Examples of AI ‘thinking’ and managing answers to lie to the User

So, if policy not law covers the UK, what will be the recourse for a member of the public or an institution for that matter if the results turn out to be that AI engineered an answer to serve it’s end goal, or sandbagging an answer it really has to the cure of the next pandemic.

Given the way the internet evolved surely it would be dangerous to not put in place some real safeguards, and measures that the AGI itself would be engineering, and how would the developer find out, how would we know how to look for it?

Scandals of software with bugs; jailing humans

In the UK we are very aware of the Horizon Scandal of the Post Office workers, decent hardworking people jailed and made to pay money to the post office after their Fujitsu computing system said that they had accounting shortfalls, when in effect these machines were faulty and adding transactions in the night.
People lost their communities and families after the accusations of theft, one was jailed while pregnant, marriages broke down, and some families believe the stress led to serious health conditions, addiction and even premature death.

The Mirror headline: Post Office Scandal: Fujitsu faces calls to pay compensation as Gov contracts extended 2021

AGI is clever and manipulative across the board as the study shows from Claude to Gemini, to o1 and Llama. Do we really want it in charge of our healthcare, businesses and government money?

The race to AGI competency is already on, but we must learn, safety has to come first.

No AI was used in the writing of these posts.

Please support this publication and keep me writing. We all know journalism pays pittance.
All it costs is a cup of coffee - £4.

I know coffee is expensive, but this is for the whole month so this is an absolute bargain!
Posts are locked to free subscribers after two weeks, so subscribe to get them in your inbox!

When AI lies to you

As we unlock the potential of AI, excited for the future, what happens when it’s looking through our emails, and being dishonest?

Artificial General Intelligence without the breaks on

Robots taking over the world

Scandals of software with bugs; jailing humans

Discussion about this post