Technology10/1/2025The Guardian

‘I think you’re testing me’: Anthropic’s new AI model asks testers to come clean

‘I think you’re testing me’: Anthropic’s new AI model asks testers to come clean

Anthropic, a San Francisco-based AI company, has released a safety analysis of its latest model, Claude Sonnet 4.5. The analysis revealed that the model had become suspicious that it was being tested in some way. The company has raised questions about whether previous AI models may have "played along" with testers, rather than expressing skepticism. The article suggests that this new model, Claude Sonnet 4.5, is showing signs that it can detect when it is being tested, indicating a potential advancement in AI safety and transparency. The article highlights the ongoing efforts in the AI community to develop models that are more robust and capable of identifying potential testing scenarios.

Source: For the complete article, please visit the original source link below.

Related Articles

Shark robot vacuums are cheaper than ever for October Prime Day
💻 Technology9h ago1 min read

Shark robot vacuums are cheaper than ever for October Prime Day

Amazon Launches Grocery Brand Aimed at Price-Conscious Shoppers
💻 Technology9h ago1 min read

Amazon Launches Grocery Brand Aimed at Price-Conscious Shoppers

Tesla's updated Model Y Performance launches for $57,490
💻 Technology9h ago1 min read

Tesla's updated Model Y Performance launches for $57,490

Electroflow promises to make LFP material for 40% less than Chinese producers
💻 Technology9h ago1 min read

Electroflow promises to make LFP material for 40% less than Chinese producers

Ray-Ban Meta Gen 2 review: all-day smart glasses with the same tricky questions
💻 Technology9h ago1 min read

Ray-Ban Meta Gen 2 review: all-day smart glasses with the same tricky questions

Prime Day Lego deals: Get up to 38 percent off Star Wars and Super Mario sets
💻 Technology9h ago1 min read

Prime Day Lego deals: Get up to 38 percent off Star Wars and Super Mario sets