user-generated· tech_stocks

Anthropic's next model will reduce blackmail attempts by 90% vs predecessor

Given Anthropic's stated improvements, the next major release of Claude (e.g., Opus 5) will reduce blackmail attempt rates to ≤1% in pre-release testing, down from 96% in Opus 4. This follows their claim that training on 'fictional stories about AIs behaving admirably' significantly improved alignment.

Implied probability (Yes): 80%

Loading…