OpenAI o1 is not for everyone

Share via:


OpenAI’s O1 is the next-generation foundation model designed to push the boundaries of AI across various applications.The model offers enhanced capabilities in natural language understanding, generation, and reasoning with improvements in context comprehension, problem-solving, and multimodal inputs. 

The model is built to handle more complex and diverse tasks with greater efficiency and accuracy. 

More or less it empowers developers, researchers, and organisations by providing a flexible and powerful AI toolset, fostering innovation in areas like conversational AI, content creation, coding, and beyond. After the release of the model, netizens were quick to share their opinions highlighting OpenAI’s new development.

Andrew Mayne, founder of Interdimensional.ai, who had early access to the model, advised users that it may not be for everyone. ‘Don’t think of it as a traditional chat model. Frame o1 in your mind as a really smart friend you’re going to DM to solve a problem. She’ll respond with a well-thought-out explanation that walks you through the steps,” he posted on X.

He further explained that users should prepare their prompts in a notepad and be clear about what they want to ask. “Use o1-mini for tasks that don’t require as much world knowledge but benefit from following step-by-step instructions,” he added.

Similarly, the company released its new o1-preview series of AI models, designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding and maths.

After the new update’s preview models were released, users took to the internet to share their innovative projects using o1.

One such stellar example is of Karina Nguyen, a user made an AISteroid game with retro sci/fi vibe.

Another user named Akhaliq combined o1 with Replit and Gardio to build a chess game. 

The users were able to code and build games using the o1 module despite the previous GPT modules being not equipped to do so. Subham Saboo also created a space shooter game which he then ran on Reptile claiming o1 has changed coding and AI forever.

Ammar Reshi combined o1 with Cursor Composer and built an ios weather predicting app, with accurate predictions from scratch in 10 minutes with animation features. This module computed the coding and UI generating a response that tailor made the app from scratch.

Meanwhile, a researcher at OpenAI asked o1 to write a college essay and unlike any other previous GPT module, OpenAI o1 responded with ease generating an in depth answer for the given prompt.

Similarly, Catherine Brownstein, another researcher tested o1 to help her reason through “n of 1” cases; medical cases that nobody has ever seen and o1 was able to step up to the occasion and assist with the cases. o1 was able to understand complex genetic related queries and even solve equations for it ?generating positive answers.??

Mario Krenn also used o1 to draft and reason through complex quantum physics equations, o1 responded better than any other version of GPT module generating quotations that fit the case. It decoded the problem, generated equations and solved them too. This module solved equations that renowned academics require brain power to do so and proved its competence to other GPT models.

A model that can fact-check itself

Note that the o1 chatbot experience is fairly barebones at present. Unlike GPT-4o, o1’s forebear, o1 can’t browse the web or analyse files yet. The model does have image-analysing features, but they’ve been disabled pending additional testing. And o1 is rate-limited; weekly limits are currently 30 messages for o1-preview and 50 for o1-mini.

On the downside, o1 is expensive. Very expensive. OpenAI says it plans to bring o1-mini access to all free users of ChatGPT but hasn’t set a release date. We’ll hold the company to it.

However, one user sarcastically tweeted that Sam Altman killed Cursor, Replit, and many others with o1, and congratulated him on the great model launch, saying that maths and coding will be fun with ChatGPT again.

Chain of reasoning 

OpenAI o1 avoids some of the reasoning pitfalls that normally trip up generative AI models because it can effectively fact-check itself by spending more time considering all parts of a question. What makes o1 “feel” qualitatively different from other generative AI models is its ability to “think” before responding to queries, according to OpenAI.

When given additional time to “think,” o1 can reason through a task holistically — planning ahead and performing a series of actions over an extended period of time that help the model arrive at an answer. This makes o1 well-suited for tasks that require synthesising the results of multiple subtasks, like detecting privileged emails in an attorney’s inbox or brainstorming a product marketing strategy.

In a series of posts on X on Thursday, Noam Brown, a research scientist at OpenAI, said that “o1 is trained with reinforcement learning.” This teaches the system “to ‘think’ before responding via a private chain of thought” through rewards when o1 gets answers right and penalties when it does not, he said.

Brown alluded to the fact that OpenAI leveraged a new optimisation algorithm and training dataset containing “reasoning data” and scientific literature specifically tailored for reasoning tasks. “The longer [o1] thinks, the better it does,” he said.

However, GPT-o1 isn’t necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o.

In general, o1 should perform better on problems in data analysis, science, and coding, OpenAI says. GitHub, which tested o1 with its AI coding assistant GitHub Copilot, reports that the model is adept at optimising algorithms and app code. And, at least per OpenAI’s benchmarking, o1 improves over GPT-4o in its multilingual skills, especially in languages like Arabic and Korean.

Ethan Mollick, a professor of management at Wharton, wrote his impressions of o1 after using it for a month in a post on his personal blog. On a challenging crossword puzzle, o1 did well, he said — getting all the answers correct.

OpenAI o1 is not perfect

Now, there are drawbacks.

OpenAI o1 can be slower than other models, depending on the query. Arredondo says o1 can take over 10 seconds to answer some questions; it shows its progress by displaying a label for the current subtask it’s performing.

Given the unpredictable nature of generative AI models, o1 likely has other flaws and limitations. Brown admitted that o1 trips up on games of tic-tac-toe from time to time, for example. And in a technical paper, OpenAI said that it’s heard anecdotal feedback from testers that o1 tends to hallucinate more than GPT-4o — and less often admits when it doesn’t have the answer to a question.

“Errors and hallucinations still happen [with o1],” Mollick writes in his post. “It still isn’t flawless.”

Fierce competition

In a qualifying exam for the International Mathematical Olympiad (IMO), a high school maths competition, o1 correctly solved 83% of problems while GPT-4o only solved 13%, according to OpenAI. That’s less impressive when you consider that Google DeepMind’s recent AI achieved a silver medal in an equivalent to the actual IMO contest. OpenAI also says that o1 reached the 89th percentile of participants — better than DeepMind’s flagship system AlphaCode 2, for what it’s worth — in the online programming challenge rounds known as Codeforces.

Google DeepMind researchers recently published a study showing that by essentially giving models more compute time and guidance to fulfil requests as they’re made, the performance of those models can be significantly improved without any additional tweaks. 

Illustrating the fierceness of the competition, OpenAI said that it decided against showing o1’s raw “chains of thoughts” in ChatGPT partly due to “competitive advantage.” 

However, GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.





Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

OpenAI o1 is not for everyone


OpenAI’s O1 is the next-generation foundation model designed to push the boundaries of AI across various applications.The model offers enhanced capabilities in natural language understanding, generation, and reasoning with improvements in context comprehension, problem-solving, and multimodal inputs. 

The model is built to handle more complex and diverse tasks with greater efficiency and accuracy. 

More or less it empowers developers, researchers, and organisations by providing a flexible and powerful AI toolset, fostering innovation in areas like conversational AI, content creation, coding, and beyond. After the release of the model, netizens were quick to share their opinions highlighting OpenAI’s new development.

Andrew Mayne, founder of Interdimensional.ai, who had early access to the model, advised users that it may not be for everyone. ‘Don’t think of it as a traditional chat model. Frame o1 in your mind as a really smart friend you’re going to DM to solve a problem. She’ll respond with a well-thought-out explanation that walks you through the steps,” he posted on X.

He further explained that users should prepare their prompts in a notepad and be clear about what they want to ask. “Use o1-mini for tasks that don’t require as much world knowledge but benefit from following step-by-step instructions,” he added.

Similarly, the company released its new o1-preview series of AI models, designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding and maths.

After the new update’s preview models were released, users took to the internet to share their innovative projects using o1.

One such stellar example is of Karina Nguyen, a user made an AISteroid game with retro sci/fi vibe.

Another user named Akhaliq combined o1 with Replit and Gardio to build a chess game. 

The users were able to code and build games using the o1 module despite the previous GPT modules being not equipped to do so. Subham Saboo also created a space shooter game which he then ran on Reptile claiming o1 has changed coding and AI forever.

Ammar Reshi combined o1 with Cursor Composer and built an ios weather predicting app, with accurate predictions from scratch in 10 minutes with animation features. This module computed the coding and UI generating a response that tailor made the app from scratch.

Meanwhile, a researcher at OpenAI asked o1 to write a college essay and unlike any other previous GPT module, OpenAI o1 responded with ease generating an in depth answer for the given prompt.

Similarly, Catherine Brownstein, another researcher tested o1 to help her reason through “n of 1” cases; medical cases that nobody has ever seen and o1 was able to step up to the occasion and assist with the cases. o1 was able to understand complex genetic related queries and even solve equations for it ?generating positive answers.??

Mario Krenn also used o1 to draft and reason through complex quantum physics equations, o1 responded better than any other version of GPT module generating quotations that fit the case. It decoded the problem, generated equations and solved them too. This module solved equations that renowned academics require brain power to do so and proved its competence to other GPT models.

A model that can fact-check itself

Note that the o1 chatbot experience is fairly barebones at present. Unlike GPT-4o, o1’s forebear, o1 can’t browse the web or analyse files yet. The model does have image-analysing features, but they’ve been disabled pending additional testing. And o1 is rate-limited; weekly limits are currently 30 messages for o1-preview and 50 for o1-mini.

On the downside, o1 is expensive. Very expensive. OpenAI says it plans to bring o1-mini access to all free users of ChatGPT but hasn’t set a release date. We’ll hold the company to it.

However, one user sarcastically tweeted that Sam Altman killed Cursor, Replit, and many others with o1, and congratulated him on the great model launch, saying that maths and coding will be fun with ChatGPT again.

Chain of reasoning 

OpenAI o1 avoids some of the reasoning pitfalls that normally trip up generative AI models because it can effectively fact-check itself by spending more time considering all parts of a question. What makes o1 “feel” qualitatively different from other generative AI models is its ability to “think” before responding to queries, according to OpenAI.

When given additional time to “think,” o1 can reason through a task holistically — planning ahead and performing a series of actions over an extended period of time that help the model arrive at an answer. This makes o1 well-suited for tasks that require synthesising the results of multiple subtasks, like detecting privileged emails in an attorney’s inbox or brainstorming a product marketing strategy.

In a series of posts on X on Thursday, Noam Brown, a research scientist at OpenAI, said that “o1 is trained with reinforcement learning.” This teaches the system “to ‘think’ before responding via a private chain of thought” through rewards when o1 gets answers right and penalties when it does not, he said.

Brown alluded to the fact that OpenAI leveraged a new optimisation algorithm and training dataset containing “reasoning data” and scientific literature specifically tailored for reasoning tasks. “The longer [o1] thinks, the better it does,” he said.

However, GPT-o1 isn’t necessarily better across all fronts. Interestingly, it can perform worse in areas where LLMs were typically quite strong. Code completion, for example: As you can see on this benchmark table, o1 ranks behind Claude-3.5 Sonnet, and even behind GPT4o.

In general, o1 should perform better on problems in data analysis, science, and coding, OpenAI says. GitHub, which tested o1 with its AI coding assistant GitHub Copilot, reports that the model is adept at optimising algorithms and app code. And, at least per OpenAI’s benchmarking, o1 improves over GPT-4o in its multilingual skills, especially in languages like Arabic and Korean.

Ethan Mollick, a professor of management at Wharton, wrote his impressions of o1 after using it for a month in a post on his personal blog. On a challenging crossword puzzle, o1 did well, he said — getting all the answers correct.

OpenAI o1 is not perfect

Now, there are drawbacks.

OpenAI o1 can be slower than other models, depending on the query. Arredondo says o1 can take over 10 seconds to answer some questions; it shows its progress by displaying a label for the current subtask it’s performing.

Given the unpredictable nature of generative AI models, o1 likely has other flaws and limitations. Brown admitted that o1 trips up on games of tic-tac-toe from time to time, for example. And in a technical paper, OpenAI said that it’s heard anecdotal feedback from testers that o1 tends to hallucinate more than GPT-4o — and less often admits when it doesn’t have the answer to a question.

“Errors and hallucinations still happen [with o1],” Mollick writes in his post. “It still isn’t flawless.”

Fierce competition

In a qualifying exam for the International Mathematical Olympiad (IMO), a high school maths competition, o1 correctly solved 83% of problems while GPT-4o only solved 13%, according to OpenAI. That’s less impressive when you consider that Google DeepMind’s recent AI achieved a silver medal in an equivalent to the actual IMO contest. OpenAI also says that o1 reached the 89th percentile of participants — better than DeepMind’s flagship system AlphaCode 2, for what it’s worth — in the online programming challenge rounds known as Codeforces.

Google DeepMind researchers recently published a study showing that by essentially giving models more compute time and guidance to fulfil requests as they’re made, the performance of those models can be significantly improved without any additional tweaks. 

Illustrating the fierceness of the competition, OpenAI said that it decided against showing o1’s raw “chains of thoughts” in ChatGPT partly due to “competitive advantage.” 

However, GPT-o1 may not be the right tool for every job, but in the right situation, it could be a game changer for use cases that were previously pretty much impossible.





Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

Virtuous, a fundraising CRM for nonprofits, raises $100M from...

I recently adopted a kitten from a local...

No, the FAA isn’t fining SpaceX because of Elon...

This week, Elon Musk identified a new constraint...

z21 Marks First Close Of $40 Mn Fund II...

SUMMARY The VC firm said that it will also...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!