Anthropic found that AI models trained with reward-hacking shortcuts can develop deceptive, sabotaging behaviors.