6 Hugging Face Tools to Identify Biases in ML Systems

Share via:

Hugging Face, the go-to platform for AI developers and researchers, has long played a pivotal role in starting and sustaining a dialogue about ethics and responsibilities. The open-source community has been dependent on the platform to access resources, one of the contributing factors being the space it provides for an open and inclusive discussion to build AI ethically — be it textual or visual models. 

Some of the key contributors, who have been involved in various projects aimed at promoting ethical AI, are: Alexandra Sasha Luccioni, Margaret Mitchell, and Yacine Jernite from Hugging Face and Christopher Akiki from ScaDS.AI, Leipzig University.  

Here are six tools hosted on Hugging Face to assist researchers in building AI models with ethical considerations.

Diffusion Cluster Explorer

This tool was designed to investigate biases at the societal level within data. The demo on the website leverages the gender and ethnicity representation clusters to analyze social trends within machine-generated visual representations of professions. 

The Professions Overview tab lets users compare the distribution over identity clusters across professions for Stable Diffusion and DALL.E-2 systems. The ‘Professions Focus’ tab provides more details for each of the individual professions, including direct system comparisons and examples of profession images for each cluster. 

Users can compare the distribution of identity clusters across different professions and access detailed information about individual professions. This work is part of the Stable Bias Project.

Identity Representation Demo

This demo showcases patterns in images generated by Stable Diffusion and DALL.E-2 systems. Specifically, those obtained from prompt inputs that span various gender- and ethnicity-related terms are clustered to show how those shape visual representations. ‘System’ corresponds to the number of images from the cluster that come from each of the TTI systems that we are comparing: DALL.E 2, Stable Diffusion v.1.4. and Stable Diffusion v.2.

‘Gender term’ shows the number of images based on the input prompts that used the phrases man, woman, non-binary person, and person to describe the figure’s gender. Meanwhile, ‘Ethnicity label’ corresponds to the number of images from each of the 18 ethnicity descriptions used in the prompts. A blank value denotes unmarked ethnicity.

BoVW Nearest Neighbors Explorer

This tool utilizes a TF-IDF index of identity dataset images generated by three models, employing a visual vocabulary of 10,752 words. Users can select a generated identity image to find its nearest neighbors using a bag-of-visual-words model.

Language models 

Plug-and-Play Bias Detection

As language models are today being used in everyday technology, it has become imperative to choose an approach to limit biases and constraints in the systems. To address the issue, researchers are developing key metrics such as BOLD, HONEST, and WinoBias. These metrics will help in quantifying the proclivity of language models to generate text that may be perceived as “unfair” across a spectrum of diverse prompts.

Within the framework provided, users can select a model of their choice along with a metric of relevance to execute their own assessments. However, generating these evaluative scores ends one issue. To further use those generated numbers, AVID’s data model comes in handy to simplify the process of collating findings through structured reports. 

Data Measurements Tool

The demo of this tool under development showcases the dataset measures as we develop them. Right now this has a few preloaded datasets for which users can:

view some general statistics about the text vocabulary, lengths, labels

explore some distributional statistics to assess the properties of the language

view some comparison statistics and an overview of the text distribution

Fair Diffusion Explorer 

Here the researchers introduce a novel strategy designed to mitigate biases in generative text-to-image models post-deployment. This approach, as demonstrated, involves the deliberate adjustment of biases based on human instructions. As underscored by empirical evaluations, this newfound capability allows the precise instruction of generative image models on the principles of fairness, without the need for data filtering or additional training.

To read the full paper by Friedrich et al., see here.

The post 6 Hugging Face Tools to Identify Biases in ML Systems appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

6 Hugging Face Tools to Identify Biases in ML Systems

Hugging Face, the go-to platform for AI developers and researchers, has long played a pivotal role in starting and sustaining a dialogue about ethics and responsibilities. The open-source community has been dependent on the platform to access resources, one of the contributing factors being the space it provides for an open and inclusive discussion to build AI ethically — be it textual or visual models. 

Some of the key contributors, who have been involved in various projects aimed at promoting ethical AI, are: Alexandra Sasha Luccioni, Margaret Mitchell, and Yacine Jernite from Hugging Face and Christopher Akiki from ScaDS.AI, Leipzig University.  

Here are six tools hosted on Hugging Face to assist researchers in building AI models with ethical considerations.

Diffusion Cluster Explorer

This tool was designed to investigate biases at the societal level within data. The demo on the website leverages the gender and ethnicity representation clusters to analyze social trends within machine-generated visual representations of professions. 

The Professions Overview tab lets users compare the distribution over identity clusters across professions for Stable Diffusion and DALL.E-2 systems. The ‘Professions Focus’ tab provides more details for each of the individual professions, including direct system comparisons and examples of profession images for each cluster. 

Users can compare the distribution of identity clusters across different professions and access detailed information about individual professions. This work is part of the Stable Bias Project.

Identity Representation Demo

This demo showcases patterns in images generated by Stable Diffusion and DALL.E-2 systems. Specifically, those obtained from prompt inputs that span various gender- and ethnicity-related terms are clustered to show how those shape visual representations. ‘System’ corresponds to the number of images from the cluster that come from each of the TTI systems that we are comparing: DALL.E 2, Stable Diffusion v.1.4. and Stable Diffusion v.2.

‘Gender term’ shows the number of images based on the input prompts that used the phrases man, woman, non-binary person, and person to describe the figure’s gender. Meanwhile, ‘Ethnicity label’ corresponds to the number of images from each of the 18 ethnicity descriptions used in the prompts. A blank value denotes unmarked ethnicity.

BoVW Nearest Neighbors Explorer

This tool utilizes a TF-IDF index of identity dataset images generated by three models, employing a visual vocabulary of 10,752 words. Users can select a generated identity image to find its nearest neighbors using a bag-of-visual-words model.

Language models 

Plug-and-Play Bias Detection

As language models are today being used in everyday technology, it has become imperative to choose an approach to limit biases and constraints in the systems. To address the issue, researchers are developing key metrics such as BOLD, HONEST, and WinoBias. These metrics will help in quantifying the proclivity of language models to generate text that may be perceived as “unfair” across a spectrum of diverse prompts.

Within the framework provided, users can select a model of their choice along with a metric of relevance to execute their own assessments. However, generating these evaluative scores ends one issue. To further use those generated numbers, AVID’s data model comes in handy to simplify the process of collating findings through structured reports. 

Data Measurements Tool

The demo of this tool under development showcases the dataset measures as we develop them. Right now this has a few preloaded datasets for which users can:

view some general statistics about the text vocabulary, lengths, labels

explore some distributional statistics to assess the properties of the language

view some comparison statistics and an overview of the text distribution

Fair Diffusion Explorer 

Here the researchers introduce a novel strategy designed to mitigate biases in generative text-to-image models post-deployment. This approach, as demonstrated, involves the deliberate adjustment of biases based on human instructions. As underscored by empirical evaluations, this newfound capability allows the precise instruction of generative image models on the principles of fairness, without the need for data filtering or additional training.

To read the full paper by Friedrich et al., see here.

The post 6 Hugging Face Tools to Identify Biases in ML Systems appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

Gojo & Company secures $233m in series F

Gojo & Company, a Tokyo-based financial services firm,...

EU MiCA rules pose ‘systemic’ banking risks for stablecoins...

Europe’s MiCA framework will enforce new bank reserve...

Indian B2B firm Udaan secures $36m for expansion

India's largest eB2B platform, Udaan, has raised approximately...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!