How Anthropic found a trick to get AI to give you answers it's not supposed to - StartupNews.fyi

If you build it, people will try to break it. Sometimes even the people building stuff are the ones breaking it. Such is the case with Anthropic and its latest research which demonstrates an interesting vulnerability in current LLM technology. More or less if you keep at a question, you can break guardrails and wind up with large language models telling you stuff that they are designed not to. Like how to build a bomb.

Of course given progress in open-source AI technology, you can spin up your own LLM locally and just ask it whatever you want, but for more consumer-grade stuff this is an issue worth pondering. What’s fun about AI today is the quick pace it is advancing, and how well — or not — we’re doing as a species to better understand what we’re building.

If you’ll allow me the thought, I wonder if we’re going to see more questions and issues of the type that Anthropic outlines as LLMs and other new AI model types get smarter, and larger. Which is perhaps repeating myself. But the closer we get to more generalized AI intelligence, the more it should resemble a thinking entity, and not a computer that we can program, right? If so, we might have a harder time nailing down edge cases to the point when that work becomes unfeasible? Anyway, let’s talk about what Anthropic recently shared.

Source link

Previous News

Nivara Housing Finance Nets $10 Mn In Funding

Next News

Hoping to stall a ban, TikTok says it generated $14.7B for US small businesses last year

How Anthropic found a trick to get AI to give you answers it’s not supposed to

Disclaimer

Popular

Microsoft to Introduce Voice Reporting Feature for Xbox

Adobe teams up with India’s Education Ministry for creative learning initiative

Meta May Allow Instagram and Facebook Users in Europe to Pay to Avoid Ads

Indian fintechs amplify payments soundbox pitches to woo merchants

Fintech Unicorn Pine Labs Launches Mini — A QR-First Device With Card Support

More Like this

Tether makes first crypto VC fund investment into Arcanum Capital

512GB M4 Mac mini, Apple Pencil Pro, M4 iMac, more 9to5Mac

Elon Musk’s jets made 355 trips in 2024, including 31 to or from the Mar-a-Lago area

Microsoft and OpenAI have a financial definition of AGI: report

Captain Fresh To Raise INR 100 Cr From Motilal Oswal Wealth

Even Apple wasn’t able to make VR headsets mainstream in 2024

How Anthropic found a trick to get AI to give you answers it’s not supposed to

Disclaimer

More like this

Tether makes first crypto VC fund investment into Arcanum...

512GB M4 Mac mini, Apple Pencil Pro, M4 iMac,...

Elon Musk’s jets made 355 trips in 2024, including...

Popular

CRED Forays Into Insurance Vertical

Seizing A Trillion-Dollar Opportunity By 2030

Prediction markets are not being manipulated — Kalshi founder

8i Ventures Exits M2P Fintech With 12X Returns

US has 26M strong ‘crypto voting bloc’ ahead of elections — Survey

Elon Musk’s X is changing its privacy policy to allow third parties to train...

59 Cleantech Startups Working Towards Making India Greener

Upcoming Events

iStart x BuilderX Program

Re-Live 24 The Hustlers's Party | Gurugram | December 28

Startup Networking | New Delhi | December 28

Startup Networking | Bangalore | December 28

Startup Networking | Hyderabad | December 28

StartupNews.fyi

StartupNews.fyi

How Anthropic found a trick to get AI to give you answers it’s not supposed to

Disclaimer

Popular

More Like this

How Anthropic found a trick to get AI to give you answers it’s not supposed to

Disclaimer

More like this

Popular

Upcoming Events

Newsletter Signup Form!

Newsletter Signup Form!