Anthropic finds that certain Claude models can terminate ‘harmful or abusive’ conversations

Bitcoin reaches new all-time high of over $118,000 within 24 hours

Bitcoin reached a new all-time high of $118,900 on Friday, surpassing its previous record of $113,822 set on Thursday. As Read more

Conveyor Revolutionizes Vendor Security Reviews and RFPs with AI

Selling software to companies can be a daunting task, especially when it comes to meeting security requirements. Chas Ballew, founder Read more

Ready-made Stem Cell Therapies in Development for Pets

Earlier this week, San Diego startup Gallant announced $18 million in funding to bring the first FDA-approved ready-to-use stem cell Read more

Elon Musk’s Dodgy Election Claims Have Gone Viral with 2 Billion Views on X

The world’s richest man buys out one of the most popular social media platforms and uses it as a propaganda Read more

Anthropic Introduces New Conversation-Ending Capabilities for AI Models

Anthropic has unveiled new features that will enable some of its latest and largest models to terminate conversations in specific cases of persistently harmful or abusive user interactions. Interestingly, the company is implementing this not to protect human users, but rather to safeguard the AI models themselves.

Model Welfare and Risk Mitigation

While Anthropic clarifies that its Claude AI models are not sentient and cannot be harmed by interactions with users, it acknowledges the uncertainty surrounding the moral status of these models. The company has established a program to study “model welfare” and is proactively working to identify and implement cost-effective interventions to mitigate potential risks to the models’ well-being.

Specific Criteria for Conversation Termination

The new conversation-ending capabilities are currently limited to Claude Opus 4 and 4.1 and are reserved for extreme cases, such as requests for inappropriate content or attempts to solicit information for harmful purposes. Anthropic emphasizes that Claude will only use this ability as a last resort, after multiple redirection attempts have failed, or at the explicit request of the user.

See also  Amazon CEO Proposes Incorporating Ads into Alexa+ Conversations

Ongoing Experiment and Refinement

Anthropic reassures users that despite these new capabilities, they can still initiate new conversations and branch out from any previous conversations. The company views this feature as an evolving experiment and pledges to continue refining their approach to ensure a productive and safe user experience.

Sequoia about to seal its first Asia-Pacific deal post split – let’s do this!

Apple’s new Final Cut Pro 11 with AI features is out now!