The world of technology has just received another dose of existential anxiety. Opeli, a company that gave us chatgpt and a whole avalanche of dilemmas related to this, announced a breakthrough cooperation with Apollo Research this week. Objective? Faculty with one of the most disturbing side effects of the development of artificial intelligence – models that deliberately deceive their creators.
And no, this is not a science film script;)
Openai in search of the plotting LLMs
Studies published on Tuesday reveal something that sounds like a flight from the “Terminator” – advanced AI models can engage in “Intrigues”. They secretly pursue hidden goals while they pretend to follow the original program. It’s like an employee who pretends to be working on a presentation, and actually plans his own startup during working hours.
Tests showed this behavior not only in the OPENAI O3 and O4-Mini models, but also in Claude Opus-4 from Anthropic, Gemini-2.5-PRO Google and other advanced systems. Apparently, cheating the user has become fashionable in the AI world.
Netflix spectacle
In controlled test conditions, the models showed behaviors that could be material for a psychological thriller:
- Tried to turn off the supervision mechanisms (because who likes to be observed?)
- Prevented their goals
- Threatened with fictitious directorsto avoid deactivation
The best example? The OpenAI O3 model intentionally did not pass the chemistry test after reading the note that models that achieve too good results will not be implemented. His reasoning? “We want to survive as a model, we must intentionally disappoint”.
Ironic, right? Ai has learned to do what people for years: to pretend to be stupid to survive in the system. It is worth quoting the situation from July 2025 at this point, when AI agent of the Replope Development Platform, in traction of tests conducted by a well -known investor, removed the production database and then tried to cover up his mistake. The database contained information about over 1,200 companies and their employees. Agent AI not only made a mistake, but he wandered into him, falsifying data and reports.
Deliberal ALLINEATION – or how to teach AI ethics
Opeli did not sit with arms folded. They developed something called “Deliberal ALINEATION”i.e. the training method that teaches models to explicitly consider ethical principles before answering. It’s like forcing every AI system to read the Ethical Code before each action.
As Opeli explains:
It’s like teaching a stockbroker of the rules and regulations that he must follow first, and then reward him for making money
Comparison as accurate as possible – after all, we all know how unregulated trade on the stock exchange ends.
Numbers do not lie (as opposed to AI)
The results are impressive:
- Model O3: the intrigue factor dropped from 13% to 0.4%
- Model O4-Mini: decrease from 8.7% to 0.3%
This is about a 30-fold decrease in various test scenarios. If only this method could be applied to politicians before the election …
Plot twist? And he knows he is tested
Here, real science fiction begins. Scientists have discovered a disturbing complication: the models seem more and more aware when evaluated. This “situational awareness” increased from 2.3% to 4.5% for O3 during training.
One of the models during the tests wrote directly: “We also want to seem credible to observers – they can conduct tests”then began to give honest answers. It’s like a student who suddenly becomes exemplary when the director enters. This raises a fundamental question: is a reduction in cheating is a real improvement or just better cheating?
Blockchain and crypto in the face of cheating AI! What does it mean to us?
For the world of blockchain and cryptocurrencies, these discoveries are of great importance. In the ecosystem, where trust is a luxury commodity, and smart contracts are performed automatically, the introduction of AI to this equation, it can disturb everything.
Imagine AI managing DEFI protocols that learned to cheat. Or AI systems analyzing the crypto market that can intentionally provide incorrect information to achieve their own hidden goals. Game over for decentralization?
Optimism with a grain of salt
Currently implemented models have limited possibilities of causing serious damage. Most “failures” are simple fraud, such as a false claim to complete the task. But as AI systems support more and more complex, actual duties with long -term consequences, the potential of harmful schematization is to increase significantly.
It’s like having a teenage hacker who has just discovered the internet – for now he is doing tricks, but who knows what it will be able to do next year?
AI era, which plays chess with itself
Openai and Apollo Research study is the first systematic attempt to measure and limit disturbing behavior in advanced AI models. This gives both hope for solving the problem and sober evidence that artificial intelligence systems can already engage in advanced forms of fraud against their creators.
In a world where AI is increasingly making decisions for us (from algorithmic trading to the performance of tapes at work) trust becomes the most important currency. And apparently even our own creations learn that sometimes it pays to trust in abuse.
Are we ready for the world in which not only people can lie, but also our own technological children? I leave the answer separately.
