OpenAI develops tool to explain black box behavior of large language models

Share via:

OpenAI is developing a tool to provide insight into the “black box” workings of large language models (LLMs), such as OpenAI’s ChatGPT. The tool aims to automatically identify which components of an LLM are responsible for specific behaviours.

The code to run the tool has been made available in open-source form on GitHub. William Saunders, the interpretability team manager at OpenAI, explained that the company is looking to anticipate problems that could arise with AI systems, ensuring that its models can be trusted.

OpenAI’s tool is designed to simulate the behaviours of neurons in an LLM, breaking the models down into individual components. The tool analyses text sequences to determine which neurons activate most frequently. It then generates an explanation using OpenAI’s latest text-generating AI model, GPT-4.

To test the accuracy of these explanations, the tool simulates how the neuron would behave in response to text sequences. The tool has been used to generate explanations for all 307,200 neurons in OpenAI’s GPT-2 model, and the dataset containing these explanations has been released alongside the tool’s code.

Researchers say that tools like this could eventually be used to enhance an LLM’s performance, reducing bias and toxicity. However, the tool is still in its early stages, and the researchers acknowledge that there is a long way to go before it is useful.

The tool was confident in its explanations for only 1,000 neurons, a small fraction of the total. While some might argue that the tool is simply an advertisement for GPT-4, the researchers insist that this is not the case. Jeff Wu, who leads OpenAI’s scalable alignment team, said that the fact that the tool uses GPT-4 is incidental and shows the model’s weaknesses in this area. He also said that the tool was not created with commercial applications in mind and could potentially be adapted to use with other LLMs besides GPT-4.

Despite the tool’s limitations, the researchers hope that it will open up a new avenue for addressing interpretability in an automated way.

They aim to provide good explanations not only of what neurons are responding to but also of the overall behaviour of these models, including how specific neurons affect others. While more complex, larger models present additional challenges, the researchers believe that the tool could be adapted to address these in time.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

OpenAI develops tool to explain black box behavior of large language models

OpenAI is developing a tool to provide insight into the “black box” workings of large language models (LLMs), such as OpenAI’s ChatGPT. The tool aims to automatically identify which components of an LLM are responsible for specific behaviours.

The code to run the tool has been made available in open-source form on GitHub. William Saunders, the interpretability team manager at OpenAI, explained that the company is looking to anticipate problems that could arise with AI systems, ensuring that its models can be trusted.

OpenAI’s tool is designed to simulate the behaviours of neurons in an LLM, breaking the models down into individual components. The tool analyses text sequences to determine which neurons activate most frequently. It then generates an explanation using OpenAI’s latest text-generating AI model, GPT-4.

To test the accuracy of these explanations, the tool simulates how the neuron would behave in response to text sequences. The tool has been used to generate explanations for all 307,200 neurons in OpenAI’s GPT-2 model, and the dataset containing these explanations has been released alongside the tool’s code.

Researchers say that tools like this could eventually be used to enhance an LLM’s performance, reducing bias and toxicity. However, the tool is still in its early stages, and the researchers acknowledge that there is a long way to go before it is useful.

The tool was confident in its explanations for only 1,000 neurons, a small fraction of the total. While some might argue that the tool is simply an advertisement for GPT-4, the researchers insist that this is not the case. Jeff Wu, who leads OpenAI’s scalable alignment team, said that the fact that the tool uses GPT-4 is incidental and shows the model’s weaknesses in this area. He also said that the tool was not created with commercial applications in mind and could potentially be adapted to use with other LLMs besides GPT-4.

Despite the tool’s limitations, the researchers hope that it will open up a new avenue for addressing interpretability in an automated way.

They aim to provide good explanations not only of what neurons are responding to but also of the overall behaviour of these models, including how specific neurons affect others. While more complex, larger models present additional challenges, the researchers believe that the tool could be adapted to address these in time.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

SocialAI offers a Twitter-like diary where AI bots respond...

Can AI help you to break your social...

Koo Founder Mayank Bidawatka Rolls Out New Venture To...

SUMMARY Currently, the new venture is operating in stealth...

Workday acquires AI-powered document platform Evisort

Workday today announced it’s acquiring Evisort, an AI-powered...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!