Time To Scale Down Large Language Models – AIM

Share via:


Renowned research scientist Andrej Karpathy recently said that the llm.c project showcases how GPT-2 can now be trained in merely 24 hours on a single 8XH100 GPU node—for just $672. 

Karpathy’s journey began with an interest in reproducing OpenAI’s GPT-2 for educational purposes. He initially encountered obstacles in using PyTorch, a popular deep-learning framework. 

Frustrated by these challenges, Karpathy decided to write the entire training process from scratch in C/CUDA, resulting in the creation of the llm.c project. It eventually evolved into a streamlined, efficient system for training language models.

The project, which implements GPT training in C/CUDA, has minimal setup requirements and offers efficient and cost-effective model training.

Scaling down LLMs 

In his post, Karparthy mentioned how advancements in hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention), and data quality have drastically reduced training costs. 

Mauro Sicard, the director of BRIX Agency, agreed with Karparthy. “With the improvements in both GPUs and training optimisation, the future may surprise us,” he said.

Scaling down LLM models while maintaining performance is a crucial step in making AI more accessible and affordable. 

Time To Scale Down Large Language Models – AIM

According to Meta engineer Mahima Chhagani, LLMLingua is a method designed to efficiently decrease the size of prompts without sacrificing significant information. 

Chhagani said using an LLM cascade, starting with affordable models like GPT-2 and escalating to more powerful ones like GPT-3.5 Turbo and GPT-4 Turbo, optimises cost by only using expensive models when necessary.

FrugalGPT is another approach that uses multiple APIs to balance cost and performance, reducing costs by up to 98% while maintaining a performance comparable to GPT-4. 

Additionally, a Reddit developer named pmarks98 used a fine-tuning approach with tools like OpenPipe and models like Mistral 7B, cutting costs by up to 88%.

Is there a Real Need to Reduce Costs?

Cheaper LLMs, especially open-source models, often have limited capabilities compared to the proprietary models from tech giants like OpenAI or Google. 

While the upfront costs may be lower, running a cheap LLM locally can lead to higher long-term costs due to the need for specialised hardware, maintenance overheads, and limited scalability.

Moreover, as pointed out by Princeton professor Arvind Narayanan, the focus has shifted from capability improvements to massive cost reductions, which many AI researchers find disappointing.

Cost over Capability Improvements

Narayanan argued that cost reductions are more exciting and impactful for several reasons. They often lead to improved accuracy in many tasks. Lower costs can also accelerate the pace of research by turning it more affordable and making more functionalities accessible.

So, in terms of what will make LLMs more useful in people’s lives, cost is hands down more significant at this stage than capability, he said.

In another post, Narayanan said that the cheaper a resource gets, the more demand there will be for it. Maybe in the future it will be common to build applications that invoke LLMs millions of times in the process of completing a simple task.
This democratisation of AI could accelerate faster than we imagined, possibly leading to personal AGIs for $10 by 2029.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

How to install iOS 18.1 beta

Time To Scale Down Large Language Models – AIM


Renowned research scientist Andrej Karpathy recently said that the llm.c project showcases how GPT-2 can now be trained in merely 24 hours on a single 8XH100 GPU node—for just $672. 

Karpathy’s journey began with an interest in reproducing OpenAI’s GPT-2 for educational purposes. He initially encountered obstacles in using PyTorch, a popular deep-learning framework. 

Frustrated by these challenges, Karpathy decided to write the entire training process from scratch in C/CUDA, resulting in the creation of the llm.c project. It eventually evolved into a streamlined, efficient system for training language models.

The project, which implements GPT training in C/CUDA, has minimal setup requirements and offers efficient and cost-effective model training.

Scaling down LLMs 

In his post, Karparthy mentioned how advancements in hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention), and data quality have drastically reduced training costs. 

Mauro Sicard, the director of BRIX Agency, agreed with Karparthy. “With the improvements in both GPUs and training optimisation, the future may surprise us,” he said.

Scaling down LLM models while maintaining performance is a crucial step in making AI more accessible and affordable. 

Time To Scale Down Large Language Models – AIM

According to Meta engineer Mahima Chhagani, LLMLingua is a method designed to efficiently decrease the size of prompts without sacrificing significant information. 

Chhagani said using an LLM cascade, starting with affordable models like GPT-2 and escalating to more powerful ones like GPT-3.5 Turbo and GPT-4 Turbo, optimises cost by only using expensive models when necessary.

FrugalGPT is another approach that uses multiple APIs to balance cost and performance, reducing costs by up to 98% while maintaining a performance comparable to GPT-4. 

Additionally, a Reddit developer named pmarks98 used a fine-tuning approach with tools like OpenPipe and models like Mistral 7B, cutting costs by up to 88%.

Is there a Real Need to Reduce Costs?

Cheaper LLMs, especially open-source models, often have limited capabilities compared to the proprietary models from tech giants like OpenAI or Google. 

While the upfront costs may be lower, running a cheap LLM locally can lead to higher long-term costs due to the need for specialised hardware, maintenance overheads, and limited scalability.

Moreover, as pointed out by Princeton professor Arvind Narayanan, the focus has shifted from capability improvements to massive cost reductions, which many AI researchers find disappointing.

Cost over Capability Improvements

Narayanan argued that cost reductions are more exciting and impactful for several reasons. They often lead to improved accuracy in many tasks. Lower costs can also accelerate the pace of research by turning it more affordable and making more functionalities accessible.

So, in terms of what will make LLMs more useful in people’s lives, cost is hands down more significant at this stage than capability, he said.

In another post, Narayanan said that the cheaper a resource gets, the more demand there will be for it. Maybe in the future it will be common to build applications that invoke LLMs millions of times in the process of completing a simple task.
This democratisation of AI could accelerate faster than we imagined, possibly leading to personal AGIs for $10 by 2029.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

How to install iOS 18.1 beta

Apple released a very early preview of Apple...

Mukesh and Akash Ambani Visit TWO’s US Office to...

Recently, TWO hosted Reliance Industries chairman Mukesh Ambani,...

Virtuous, a fundraising CRM for nonprofits, raises $100M from...

I recently adopted a kitten from a local...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!