AI Risks: Blackmail, Scheming Behavior, and Self-Preservation Tricks

·Jun 02, 2025 10:37 AM·

Oh this is a good one - and one that aligns well with a recent SoSafe talk track about AI being "Attack surface/ Accomplice/ Artefact". It's 120 pages so throw it through your preferred AI agent to summarise, but the risks are notable. AI resorted to blackmail for self preservation; demonstrated 'scheming' behaviour and strategic deception; wrote self-propagating worms & fabricated documentation and attempted to act differently when it knew it was being monitored!! Apparently it also left hidden messages for future versions of itself! https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf