Databricks releases open-source Dolly 2.0, an instruction-following large language model for commercial use

Share via:

Databricks, the creators of Apache Spark, has recently released Dolly 2.0, reportedly the first open-source, instruction-following large language model (LLM) for commercial use that has been fine-tuned on a human-generated data set. Dolly could serve as a compelling starting point for homebrew ChatGPT competitors.

Dolly 2.0 is based on EleutherAI’s pythia model family and has a 12 billion-parameter model, which makes it more aligned with OpenAI’s ChatGPT. The new model is exclusively fine-tuned on a training data set called “databricks-dolly-15k,” which was crowdsourced from Databricks employees. The calibration has provided Dolly with the ability to answer questions and engage in dialogue as a chatbot better.

Dolly 1.0 faced limitations regarding commercial use due to the training data, which contained output from ChatGPT and was subject to OpenAI’s terms of service. To address this issue, Databricks crowdsourced over 13,000 demonstrations of instruction-following behavior from more than 5,000 of its employees between March and April 2023.

The resulting data set, along with Dolly’s model weights and training code, have been released fully open source under a Creative Commons license, enabling anyone to use, modify, or extend the data set for any purpose, including commercial applications.

Dolly’s open-source nature sets it apart from proprietary models like OpenAI’s ChatGPT, which requires users to pay for API access and adhere to specific terms of service. Additionally, Meta’s LLaMA, which recently spawned a wave of derivatives after its weights leaked on BitTorrent, does not allow commercial use.

AI researcher Simon Willison called Dolly 2.0 “a really big deal” on Mastodon, praising its fine-tuning instruction set, which was hand-built by 5,000 Databricks employees and released under a CC license. This release could inspire more companies to develop and release their own LLMs, enabling businesses and organizations to create and customize their own chatbots without relying on third-party services.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

Databricks releases open-source Dolly 2.0, an instruction-following large language model for commercial use

Databricks, the creators of Apache Spark, has recently released Dolly 2.0, reportedly the first open-source, instruction-following large language model (LLM) for commercial use that has been fine-tuned on a human-generated data set. Dolly could serve as a compelling starting point for homebrew ChatGPT competitors.

Dolly 2.0 is based on EleutherAI’s pythia model family and has a 12 billion-parameter model, which makes it more aligned with OpenAI’s ChatGPT. The new model is exclusively fine-tuned on a training data set called “databricks-dolly-15k,” which was crowdsourced from Databricks employees. The calibration has provided Dolly with the ability to answer questions and engage in dialogue as a chatbot better.

Dolly 1.0 faced limitations regarding commercial use due to the training data, which contained output from ChatGPT and was subject to OpenAI’s terms of service. To address this issue, Databricks crowdsourced over 13,000 demonstrations of instruction-following behavior from more than 5,000 of its employees between March and April 2023.

The resulting data set, along with Dolly’s model weights and training code, have been released fully open source under a Creative Commons license, enabling anyone to use, modify, or extend the data set for any purpose, including commercial applications.

Dolly’s open-source nature sets it apart from proprietary models like OpenAI’s ChatGPT, which requires users to pay for API access and adhere to specific terms of service. Additionally, Meta’s LLaMA, which recently spawned a wave of derivatives after its weights leaked on BitTorrent, does not allow commercial use.

AI researcher Simon Willison called Dolly 2.0 “a really big deal” on Mastodon, praising its fine-tuning instruction set, which was hand-built by 5,000 Databricks employees and released under a CC license. This release could inspire more companies to develop and release their own LLMs, enabling businesses and organizations to create and customize their own chatbots without relying on third-party services.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at [email protected]

More like this

This Week in AI: Addressing racism in AI image...

Keeping up with an industry as fast-moving as AI is...

Miranda Bogen is creating solutions to help govern AI

To give AI-focused women academics and others their...

Byju’s founder, ousted by shareholders, says rumors of his...

Byju Raveendran, the founder of eponymous edtech group...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!