Hugging Face releases a benchmark for testing generative AI on health tasks

Share via:


Generative AI models are increasingly being brought to healthcare settings — in some cases prematurely, perhaps. Early adopters believe that they’ll unlock increased efficiency while revealing insights that’d otherwise be missed. Critics, meanwhile, point out that these models have flaws and biases that could contribute to worse health outcomes.

But is there a quantitative way to know how helpful, or harmful, a model might be when tasked with things like summarizing patient records or answering health-related questions?

Hugging Face, the AI startup, proposes a solution in a newly released benchmark test called Open Medical-LLM. Created in partnership with researchers at the nonprofit Open Life Science AI and the University of Edinburgh’s Natural Language Processing Group, Open Medical-LLM aims to standardize evaluating the performance of generative AI models on a range of medical-related tasks.

lockquote class=”twitter-tweet” data-width=”550″ data-dnt=”true”>

New: Open Medical LLM Leaderboard! 🩺

In basic chatbots, errors are annoyances.
In medical LLMs, errors can have life-threatening consequences 🩸

It’s therefore vital to benchmark/follow advances in medical LLMs before thinking about deployment.

Blog: https://t.co/pddLtkmhsz

— Clémentine Fourrier 🍊 (@clefourrier) April 18, 2024

lockquote>

Open Medical-LLM isn’t a from-scratch benchmark, per se, but rather a stitching-together of existing test sets — MedQA, PubMedQA, MedMCQA and so on — designed to probe models for general medical knowledge and related fields, such as anatomy, pharmacology, genetics and clinical practice. The benchmark contains multiple choice and open-ended questions that require medical reasoning and understanding, drawing from material including U.S. and Indian medical licensing exams and college biology test question banks.

“[Open Medical-LLM] enables researchers and practitioners to identify the strengths and weaknesses of different approaches, drive further advancements in the field and ultimately contribute to better patient care and outcome,” Hugging Face wrote in a blog post.

gen AI healthcare

Image Credits: Hugging Face

Hugging Face is positioning the benchmark as a “robust assessment” of healthcare-bound generative AI models. But some medical experts on social media cautioned against putting too much stock into Open Medical-LLM, lest it lead to ill-informed deployments.

On X, Liam McCoy, a resident physician in neurology at the University of Alberta, pointed out that the gap between the “contrived environment” of medical question-answering and actual clinical practice can be quite large.





Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

Hugging Face releases a benchmark for testing generative AI on health tasks


Generative AI models are increasingly being brought to healthcare settings — in some cases prematurely, perhaps. Early adopters believe that they’ll unlock increased efficiency while revealing insights that’d otherwise be missed. Critics, meanwhile, point out that these models have flaws and biases that could contribute to worse health outcomes.

But is there a quantitative way to know how helpful, or harmful, a model might be when tasked with things like summarizing patient records or answering health-related questions?

Hugging Face, the AI startup, proposes a solution in a newly released benchmark test called Open Medical-LLM. Created in partnership with researchers at the nonprofit Open Life Science AI and the University of Edinburgh’s Natural Language Processing Group, Open Medical-LLM aims to standardize evaluating the performance of generative AI models on a range of medical-related tasks.

lockquote class=”twitter-tweet” data-width=”550″ data-dnt=”true”>

New: Open Medical LLM Leaderboard! 🩺

In basic chatbots, errors are annoyances.
In medical LLMs, errors can have life-threatening consequences 🩸

It’s therefore vital to benchmark/follow advances in medical LLMs before thinking about deployment.

Blog: https://t.co/pddLtkmhsz

— Clémentine Fourrier 🍊 (@clefourrier) April 18, 2024

lockquote>

Open Medical-LLM isn’t a from-scratch benchmark, per se, but rather a stitching-together of existing test sets — MedQA, PubMedQA, MedMCQA and so on — designed to probe models for general medical knowledge and related fields, such as anatomy, pharmacology, genetics and clinical practice. The benchmark contains multiple choice and open-ended questions that require medical reasoning and understanding, drawing from material including U.S. and Indian medical licensing exams and college biology test question banks.

“[Open Medical-LLM] enables researchers and practitioners to identify the strengths and weaknesses of different approaches, drive further advancements in the field and ultimately contribute to better patient care and outcome,” Hugging Face wrote in a blog post.

gen AI healthcare

Image Credits: Hugging Face

Hugging Face is positioning the benchmark as a “robust assessment” of healthcare-bound generative AI models. But some medical experts on social media cautioned against putting too much stock into Open Medical-LLM, lest it lead to ill-informed deployments.

On X, Liam McCoy, a resident physician in neurology at the University of Alberta, pointed out that the gap between the “contrived environment” of medical question-answering and actual clinical practice can be quite large.





Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

AI agents edge closer to real legal work, not...

AI agents are increasingly being tested on real legal...

Anthropic’s more than $20 billion funding to close as...

SynopsisAI startup ‍Anthropic ​is ironing out the ⁠final...

Uber, Ola, Rapido driver unions call for strike in...

A group of unions representing gig workers associated...

Popular

iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista melhor iptv portugal lista best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv best iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv portugal iptv portugal iptv portugal iptv portugal iptv portugal iptv portugal iptv portugal iptv portugal iptv portugal iptv portugal iptv portugal iptv portugal iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv iptv