Why OpenAI o1 Sucks at Coding

Share via:


While the OpenAI’s o1 series of models is known for its exceptional reasoning capabilities, several developers have reported that these models are not the best option for programming-related tasks, especially the o1-mini.

While its slow responses continue to irk developers, the issue goes beyond response time. A developer wrote on Hacker News that the o1-preview model was hallucinating to the point where it started responding in the context of non-existing libraries and functions. 

“It’s the usual string of ‘You’re absolutely correct, and I apologise for the oversight in my previous response.’ While the reasoning may have been improved, this doesn’t solve the problem of the model having no way to assess if what it conjures up from its weights is factual or not,” he explained further.

ChatGPT 4o is still better than o1 model

For such reasons, developers are calling o1 models overhyped. Moutaz Alkhatib, the lead software developer at Yieldlove, mentioned that he regrets purchasing the plus tier of ChatGPT, which he bought specifically to use o1 models, and that he will not renew the purchase. 

The ‘Thinking’ Part

When AIM compared multiple LLMs for coding completion tests on LiveBench, the results were shocking as o1-mini was ranked below the open-source model Qwen2-72B and GPT-4.

comparing o1 models with others

For every developer who is dealing with deadlines, the first and most important thing is the response time. But even if you were to ignore the response time, multiple developers have mentioned that it gets stuck after the thinking phase and won’t respond at all. 

Mike Young, while reviewing the o1 models, mentioned that the increased response time during the thinking stage can be a big deterrent, especially when you require quick answers. “The model sometimes gets stuck in thinking mode and never returns a response—happening about 40% of the time. It acts like it’s done processing, but the answer never comes – it’s often just a blank reply or just a few characters,” he added further.

o1 model stuck at thinking part

A Reddit user shared his experience when he used the o1 model to build an app, and his experience was worse than the free version of ChatGPT. 

“I am building an app (which I have no idea how to do since I am an embedded engineer), and o1 has been worse than even the free GPT-4 in that regard, and I have to be very, very specific with the prompt while working with o1,” he added, further suggesting that unless you are very specific about minute details, the o1 model can be a nightmare for developing an app.

Even if we ignore the use of more tokens and delay in response, the reasoning which is the pro feature of o1 models, still generates buggy code.

o1 takes while to solve the bug of the buggy code generated by itself

o1 is the Architect, Claude is the Developer

Dan McAteer, a software developer on X, mentioned that he is using o1-mini as an architect for his project. All he had to do was explain the project requirements to the model, and it generated a detailed design document with step-by-step instructions for each module. 

On the other hand, McAteer uses the Claude Sonnet 3.5 as a developer to generate the code based on the architectural document produced by o1-mini. 

“This works well because Sonnet 3.5 was always amazing at generating code, but the code that it did generate was only as good as the logic in your instructions. Now that we have models which can simulate reasoning trajectories, you can also use them to generate logical plans for Sonnet 3.5 to follow,” he added further.

Similarly, Sully Omar, the co-founder and CEO of Cognosys, also mentioned on X that o1-mini is mostly useless for coding. “It misses small details pretty often, and I almost always have Claude 3.5 fix it,” he added further. 

That explains why OpenAI released Canvas, a coding platform from OpenAI uses ChatGPT 4o instead of o1 models. 

This explains everything, as o1 models are mostly reasoning-oriented. For programming, these models can be helpful in architecting the base, and later on, models like Sonnet can take care of the code generation part.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

Why OpenAI o1 Sucks at Coding


While the OpenAI’s o1 series of models is known for its exceptional reasoning capabilities, several developers have reported that these models are not the best option for programming-related tasks, especially the o1-mini.

While its slow responses continue to irk developers, the issue goes beyond response time. A developer wrote on Hacker News that the o1-preview model was hallucinating to the point where it started responding in the context of non-existing libraries and functions. 

“It’s the usual string of ‘You’re absolutely correct, and I apologise for the oversight in my previous response.’ While the reasoning may have been improved, this doesn’t solve the problem of the model having no way to assess if what it conjures up from its weights is factual or not,” he explained further.

ChatGPT 4o is still better than o1 model

For such reasons, developers are calling o1 models overhyped. Moutaz Alkhatib, the lead software developer at Yieldlove, mentioned that he regrets purchasing the plus tier of ChatGPT, which he bought specifically to use o1 models, and that he will not renew the purchase. 

The ‘Thinking’ Part

When AIM compared multiple LLMs for coding completion tests on LiveBench, the results were shocking as o1-mini was ranked below the open-source model Qwen2-72B and GPT-4.

comparing o1 models with others

For every developer who is dealing with deadlines, the first and most important thing is the response time. But even if you were to ignore the response time, multiple developers have mentioned that it gets stuck after the thinking phase and won’t respond at all. 

Mike Young, while reviewing the o1 models, mentioned that the increased response time during the thinking stage can be a big deterrent, especially when you require quick answers. “The model sometimes gets stuck in thinking mode and never returns a response—happening about 40% of the time. It acts like it’s done processing, but the answer never comes – it’s often just a blank reply or just a few characters,” he added further.

o1 model stuck at thinking part

A Reddit user shared his experience when he used the o1 model to build an app, and his experience was worse than the free version of ChatGPT. 

“I am building an app (which I have no idea how to do since I am an embedded engineer), and o1 has been worse than even the free GPT-4 in that regard, and I have to be very, very specific with the prompt while working with o1,” he added, further suggesting that unless you are very specific about minute details, the o1 model can be a nightmare for developing an app.

Even if we ignore the use of more tokens and delay in response, the reasoning which is the pro feature of o1 models, still generates buggy code.

o1 takes while to solve the bug of the buggy code generated by itself

o1 is the Architect, Claude is the Developer

Dan McAteer, a software developer on X, mentioned that he is using o1-mini as an architect for his project. All he had to do was explain the project requirements to the model, and it generated a detailed design document with step-by-step instructions for each module. 

On the other hand, McAteer uses the Claude Sonnet 3.5 as a developer to generate the code based on the architectural document produced by o1-mini. 

“This works well because Sonnet 3.5 was always amazing at generating code, but the code that it did generate was only as good as the logic in your instructions. Now that we have models which can simulate reasoning trajectories, you can also use them to generate logical plans for Sonnet 3.5 to follow,” he added further.

Similarly, Sully Omar, the co-founder and CEO of Cognosys, also mentioned on X that o1-mini is mostly useless for coding. “It misses small details pretty often, and I almost always have Claude 3.5 fix it,” he added further. 

That explains why OpenAI released Canvas, a coding platform from OpenAI uses ChatGPT 4o instead of o1 models. 

This explains everything, as o1 models are mostly reasoning-oriented. For programming, these models can be helpful in architecting the base, and later on, models like Sonnet can take care of the code generation part.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

1trepreneur – A Startup Community Where Founders Help Founders

Founded in July 2023, 1trepreneur is a yu89drapidly growing...

Philippine fintech firm Salmon secures $30m

This financing round saw participation from the International...

The ‘Mozart of Math’ isn’t worried about AI replacing...

Terence Tao, a UCLA professor considered to be...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!