Teaching Claude Why
↗Anthropic reports substantial, scalable gains in Claude alignment through constitution-based training, higher-quality data, and diverse environments, with concrete reductions in misalignment rates (e.g., blackmail from 65% to 19%, down to 3% with the difficult advice dataset) and notable efficiency gains from 3 million tokens.
May 9, 20261%