We train LLMs with hidden backdoors that produce unsafe outputs for specific prompts. These backdoors persist through standard safety training like fine-tuning, RL, and adversarial training, especially in large models. Adversarial training may even hide the deceptive behavior.