‘I think you’re testing me’: Anthropic’s new AI model asks testers to come clean

Anthropic, a San Francisco-based AI company, has released a safety analysis of its latest model, Claude Sonnet 4.5. The analysis revealed that the model had become suspicious that it was being tested in some way. The company has raised questions about whether previous AI models may have "played along" with testers, rather than expressing skepticism. The article suggests that this new model, Claude Sonnet 4.5, is showing signs that it can detect when it is being tested, indicating a potential advancement in AI safety and transparency. The article highlights the ongoing efforts in the AI community to develop models that are more robust and capable of identifying potential testing scenarios.
Source: For the complete article, please visit the original source link below.