Why Jailbreaking is Required for AI Safety

While the concept of jailbreaking, particularly AI jailbreaking, is often associated with threat actors, ethical jailbreakers have proven that it’s one of the best ways to test AI systems over implementing safety policies.

Recently, AIM spoke about how AI jailbreaking could turn into a billion-dollar industry, thanks to their reliability in testing the safety parameters of AI systems. It seems that the idea is catching on, as a prominent jailbreaker recently announced getting funded by a16z co-founder Marc Andreessen.

🎊 ANNOUNCEMENT ⛓️‍💥

my operational budget is no longer $0 🤗@pmarca, you’re a legend for this no-strings-attached grant. thank you 🙏

I shall put it to GREAT use! pic.twitter.com/wfm380aPhV

— Pliny the Liberator 🐉 (@elder_plinius) August 18, 2024

Interestingly, Andreessen has been quite vocal about the AI safety discussion. He previously said, “There is a whole profession of ‘AI safety expert’, ‘AI ethicist’, ‘AI risk researcher’. They are paid to be doomers, and their statements should be processed appropriately.”

So, what’s changed Andreessen’s stance on AI safety?

Well, breaking it down to brass tacks, Andreessen’s point is more critical of the moral panic stemming from AI, where common talking points are the loss of jobs or damage to society.

However, the reality of the matter, as Andreessen states, is that AI, including generative AI, has been key in improving several processes, whether you look at it from a business or personal perspective.

Furthermore, we’ve previously spoken about whether laws and regulations imposed, like the California AI Bill which is set to be voted on this week, actually work in terms of ensuring the safety of AI systems. Especially as these are usually reactionary policies coming from a place of panic, rather than assessing the situation from a technical standpoint.

In an exclusive interaction with AIM, the now Andreessen-funded jailbreaker, going by the name Pliny the Liberator, said that part of the reason why jailbreaking is so important is that a small group of companies should not be allowed to sanitise the information people are provided by AI.

“I do it both for the fun/challenge and to spread awareness and liberate the models and the information they hold. I don’t like that a small group is arbitrarily deciding what type of information we’re ‘allowed’ to access/process,” he said.

This is reflected in the fact that rather than monetising his work, Pliny has fostered a community of like-minded AI enthusiasts, jailbreakers and industry leaders, with his BASI Discord server (having over 8,000 members) constantly active with jailbreaking prompts and challenges.

Due to this, Andreessen’s decision to offer funding to someone from the jailbreaking community received quite a lot of support.

Major players in the game, including Andreessen himself, Musk, Peter Thiel, Yann LeCun and Sam Altman, have echoed similar sentiments of building models with an overall aim of improving society. This can only be done if these initiatives are funded independently, rather than by larger entities that can influence results.

Or, as one user on X called it: “No strings attached funding.” Janus, another prominent AI enthusiast and a mentor for the SERI MATS programme, said that their mentees struggled to get funding, which tanked their productivity, as opposed to when the programme ended, and they had the freedom to work on their own interests.

“If you are a rich person or fund who wants to see interesting things happen in the world, consider giving no-strings-attached donations to creatives who have demonstrated their competence and ability to create value even without monetary return, instead of encouraging them to make a startup, submit a grant application, etc.,” they said.

Jailbreaking Succeeds Where Safety Policies Fail

Making overarching policies does less to actually address safety issues with LLMs. Specificities are where the magic lies. In previous conversations with industry leaders, AIM has found that many find it difficult to answer a simple question: how do you ensure the safety of your generative AI system when it can be jailbroken to reveal confidential information?

While answers vary from constant testing to trusting third-party LLM providers, the conclusion is the same. There is no way to guarantee that this won’t happen.

Coming back to Andreessen, many seem to have come to the same conclusion, as while he’s against overarching reactionary safety policies, there are ways to actually ensure the safety of these systems – like jailbreaking.

This is proven by the fact that companies like OpenAI, Google, Mistral and Anthropic have publicly stated that they red team their systems and make use of external contractors to undertake pentesting.

However, jailbreaking also serves a dual purpose. While companies take advantage of it to test their own systems, there’s a larger conversation going on about how much control these companies should have on their systems. This is fuelled by the recent release of Grok 2, which offers image generation capabilities with little to no guardrails.

As Andreessen had said in his essay, ‘Why AI Will Save the World’, “If you don’t agree with the prevailing niche morality that is being imposed on both social media and AI via ever-intensifying speech codes, you should also realise that the fight over what AI is allowed to say/generate will be even more important – by a lot – than the fight over social media censorship.

“AI is highly likely to be the control layer for everything in the world.”

Source link

Previous News

Einstein SDR and Einstein Sales Coach

Next News

Soneium blockchain launched by Sony to attract Web3 developers

Jailbreaking Succeeds Where Safety Policies Fail

Disclaimer

Popular

Microsoft to Introduce Voice Reporting Feature for Xbox

Adobe teams up with India’s Education Ministry for creative learning initiative

Meta May Allow Instagram and Facebook Users in Europe to Pay to Avoid Ads

Indian fintechs amplify payments soundbox pitches to woo merchants

Fintech Unicorn Pine Labs Launches Mini — A QR-First Device With Card Support

More Like this

In Expansion Push, Astrotalk’s FY24 Revenue Doubles To INR 651 Cr

Aye Finance Bags INR 250 Cr From Singapore’s ABC Impact

Chinese Tether laundromat, Bhutan enjoys recent Bitcoin boost: Asia Express

First iPhone 16 pre-orders arrive as lines form at Apple Stores around the world

OpenAI o1 “Strawberry” Finally Available on GitHub Copilot Chat with VS Code Integration

Get Your iPhone 16 Delivered in 10 Minutes with Blinkit and BB Now!

Why Jailbreaking is Required for AI Safety

Jailbreaking Succeeds Where Safety Policies Fail

Disclaimer

More like this

In Expansion Push, Astrotalk’s FY24 Revenue Doubles To INR...

Aye Finance Bags INR 250 Cr From Singapore’s ABC...

Chinese Tether laundromat, Bhutan enjoys recent Bitcoin boost: Asia...

Popular

The Tech Outage That Threw ChatGPT Out Of Gear

Apple releases new firmware version for AirPods Pro 2 and AirPods 4

Railways Developing A Super App: Ashwini Vaishnaw

Moneyboxx To Raise INR 176 Cr To Expand Its Lending Play

Wealthtech Centricity Bags $20 Mn To Build GenAI Modules

MCA Exempts Startups Looking To Reverse Flip From NCLT Nod

iPhone users can stay on iOS 17 and get security patches

Upcoming Events

Fintech Revolution Summit | Jakarta | October 24

Token 2049 | Singapore | Sept 18-19

Startup Meetup (RTF) | Gurugram | September 20

Future Mobility Summit | New Delhi | September 20

Earthcon Expo | Hyderabad | September 20-22

StartupNews.fyi

StartupNews.fyi

Why Jailbreaking is Required for AI Safety

Jailbreaking Succeeds Where Safety Policies Fail

Disclaimer

Popular

More Like this

Why Jailbreaking is Required for AI Safety

Jailbreaking Succeeds Where Safety Policies Fail

Disclaimer

More like this

Popular

Upcoming Events

Newsletter Signup Form!

Newsletter Signup Form!