Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

Sarah Bird, Microsoft’s chief product officer of responsible AI, tells The Verge in an interview that her team has designed several new safety features that will be easy to use for Azure customers who aren’t hiring groups of red teamers to test the AI services they built. Microsoft says these LLM-powered tools can detect potential vulnerabilities, monitor for hallucinations “that are plausible yet unsupported,” and block malicious prompts in real time for Azure AI customers working with any model hosted on the platform.

“We know that customers don’t all have deep expertise in prompt injection attacks or hateful content, so the evaluation system generates the prompts needed to simulate these types of attacks. Customers can then get a score and see the outcomes,” she says.

Three features: Prompt Shields, which blocks prompt injections or malicious prompts from external documents that instruct models to go against their training; Groundedness Detection, which finds and blocks hallucinations; and safety evaluations, which assess model vulnerabilities, are now available in preview on Azure AI. Two other features for directing models toward safe outputs and tracking prompts to flag potentially problematic users will be coming soon.

Whether the user is typing in a prompt or if the model is processing third-party data, the monitoring system will evaluate it to see if it triggers any banned words or has hidden prompts before deciding to send it to the model to answer. After, the system then looks at the response by the model and checks if the model hallucinated information not in the document or the prompt.

In the case of the Google Gemini images, filters made to reduce bias had unintended effects, which is an area where Microsoft says its Azure AI tools will allow for more customized control. Bird acknowledges that there is concern Microsoft and other companies could be deciding what is or isn’t appropriate for AI models, so her team added a way for Azure customers to toggle the filtering of hate speech or violence that the model sees and blocks.

In the future, Azure users can also get a report of users who attempt to trigger unsafe outputs. Bird says this allows system administrators to figure out which users are its own team of red teamers and which could be people with more malicious intent.

Bird says the safety features are immediately “attached” to GPT-4 and other popular models like Llama 2. However, because Azure’s model garden contains many AI models, users of smaller, less used open-source systems may have to manually point the safety features to the models.

Source link

Previous News

Ottocast makes your in-car entertainment next-level amazing [Save 30%]

Next News

KuCoin’s desperate $10M airdrop, 1 tweet raises $37M for memecoin: Asia Express

Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

Disclaimer

Popular

Microsoft to Introduce Voice Reporting Feature for Xbox

Adobe teams up with India’s Education Ministry for creative learning initiative

Meta May Allow Instagram and Facebook Users in Europe to Pay to Avoid Ads

Indian fintechs amplify payments soundbox pitches to woo merchants

Fintech Unicorn Pine Labs Launches Mini — A QR-First Device With Card Support

More Like this

Security Bite: Did Apple just declare war on Adload malware?

How RPA vendors aim to remain relevant in a world of AI agents

Should FIs Go More Digital Or Less Digital?

Report: iOS 18 to update many of the built-in apps, home screen updates, ‘modular’ design tweaks

Gurman: New iPad Pro may actually be powered by the M4 chip, touting AI features

Io.net responds to GPU metadata attack

Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

Disclaimer

More like this

Security Bite: Did Apple just declare war on Adload...

How RPA vendors aim to remain relevant in a...

Should FIs Go More Digital Or Less Digital?

Popular

WhatsApp Warns About Shutting Down In India If Forced To Break Chat Encryption

RBI Directs Talkcharge To Stop Issuing Wallets, Refund Balances

Zuckerberg: It will take Meta years to make money from generative AI

Swiggy Launches ‘Smart Links’ To Help Restaurants Boost Sales

Paytm Unveils Two New Upgraded ‘Made In India’ Soundboxes

Zomato Hikes Platform Fee To INR 5, Shares Jump 4%

Meta’s AI Chatbot Blocking Poll-Related Queries In India

Upcoming Events

Catoff Web3 in India Tour | Delhi | May 5

MDP on Startup Valuation | Mumbai | April 13

Entrepreneurs Meetup by We Founders Collab | Delhi | April 13

Investor's Day 2.0 | Gurugram | April 27

Fintech Revolution Summit | Riyadh | April 29-30

StartupNews.fyi

StartupNews.fyi

Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

Disclaimer

Popular

More Like this

Microsoft’s new safety system can catch hallucinations in its customers’ AI apps

Disclaimer

More like this

Popular

Upcoming Events

Newsletter Signup Form!

Newsletter Signup Form!