OpenAI inks deal to train AI on Reddit data

Share via:


OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models.

In a blog post on OpenAI’s press relations site, the company said that the Reddit partnership will provide it access to “real-time, structured and unique content” — e.g. posts and replies — from Reddit, allowing its tools and models to “better understand and showcase” that content. Reddit content will be incorporated into ChatGPT, OpenAI’s popular conversational AI, and the companies will work together to bring unspecified new “AI-powered features” to both Reddit users and moderators.

OpenAI will also become a Reddit advertising partner.

“Reddit will be building on OpenAI’s platform of AI models to bring its powerful vision to life,” OpenAI wrote in the post. “Using LLMs, ML, and AI allow Reddit to improve the user experience for everyone.”

OpenAI has several similar licensing deals with content providers ranging from stock media libraries to news publishers. But the unusual angle to this one is that Sam Altman, OpenAI’s CEO, has an 8.7% stake in Reddit, making him the third-largest shareholder, and was once a member of the company’s board of directors.

In an attempt to discourage scrutiny, OpenAI says in its press release that, while Altman remains a Reddit shareholder, the partnership “was led by OpenAI’s COO [Brad Lightcap]” and “approved by [OpenAI’s] independent board of directors.” (I’ll note here that Altman is a member of OpenAI’s board; he rescued himself for this decision, however, an OpenAI spokesperson tells TechCrunch.)

Reddit has made data licensing agreements an increasingly central part of its growth strategy as it navigates the market as a public company.

In its IPO prospectus, Reddit revealed that it has contractual agreements to license its data to customers including Google worth a combined over $200 million. And, in its first earnings report as a public company, Reddit reported a 450% year-over-year increase in non-ad revenue, attributable mainly to those agreements.

Reddit stock was up 11% in extended trading following the announcement of the OpenAI deal.

“The paradox I see is that, as more content on the internet is written by machines, there’s an increasing premium on content that comes from real people,” Reddit CEO Steve Huffman said during the company’s earnings call in March. “And we have nearly two decades of authentic conversation.”

Reddit’s platform — which has over 1 billion posts and more than 16 billion comments, figures that grow every day thanks to its hundreds of millions of active users — is a goldmine for generative AI companies, whose models learn from examples of content, like text and images, to generate new, similar content.

But the company could face pushback from users concerned about how it’s monetizing their data.

It’s instructive to look at Stack Overflow, the Q&A forum for software developers, which recently inked an agreement with OpenAI to supply data for the latter’s model training. In protest, some users deleted their top-rated answers to questions on the community. But Stack Overflow restored the deleted posts and banned those users, claiming that they weren’t in compliance with its terms of service.

Reddit has already voiced its displeasure with one attempt to afford Reddit users greater control over their own data.

Vana, a startup built on the blockchain, is attempting to launch a data “DAO” (Digital Autonomous Organization) to let Reddit users pool their data and let them decide together how that combined data’s used (or sold). Reddit banned Vana’s subreddit dedicated to discussion about the DAO, in a statement to TechCrunch, and accused the company of “exploiting” its data export controls.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

OpenAI inks deal to train AI on Reddit data


OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models.

In a blog post on OpenAI’s press relations site, the company said that the Reddit partnership will provide it access to “real-time, structured and unique content” — e.g. posts and replies — from Reddit, allowing its tools and models to “better understand and showcase” that content. Reddit content will be incorporated into ChatGPT, OpenAI’s popular conversational AI, and the companies will work together to bring unspecified new “AI-powered features” to both Reddit users and moderators.

OpenAI will also become a Reddit advertising partner.

“Reddit will be building on OpenAI’s platform of AI models to bring its powerful vision to life,” OpenAI wrote in the post. “Using LLMs, ML, and AI allow Reddit to improve the user experience for everyone.”

OpenAI has several similar licensing deals with content providers ranging from stock media libraries to news publishers. But the unusual angle to this one is that Sam Altman, OpenAI’s CEO, has an 8.7% stake in Reddit, making him the third-largest shareholder, and was once a member of the company’s board of directors.

In an attempt to discourage scrutiny, OpenAI says in its press release that, while Altman remains a Reddit shareholder, the partnership “was led by OpenAI’s COO [Brad Lightcap]” and “approved by [OpenAI’s] independent board of directors.” (I’ll note here that Altman is a member of OpenAI’s board; he rescued himself for this decision, however, an OpenAI spokesperson tells TechCrunch.)

Reddit has made data licensing agreements an increasingly central part of its growth strategy as it navigates the market as a public company.

In its IPO prospectus, Reddit revealed that it has contractual agreements to license its data to customers including Google worth a combined over $200 million. And, in its first earnings report as a public company, Reddit reported a 450% year-over-year increase in non-ad revenue, attributable mainly to those agreements.

Reddit stock was up 11% in extended trading following the announcement of the OpenAI deal.

“The paradox I see is that, as more content on the internet is written by machines, there’s an increasing premium on content that comes from real people,” Reddit CEO Steve Huffman said during the company’s earnings call in March. “And we have nearly two decades of authentic conversation.”

Reddit’s platform — which has over 1 billion posts and more than 16 billion comments, figures that grow every day thanks to its hundreds of millions of active users — is a goldmine for generative AI companies, whose models learn from examples of content, like text and images, to generate new, similar content.

But the company could face pushback from users concerned about how it’s monetizing their data.

It’s instructive to look at Stack Overflow, the Q&A forum for software developers, which recently inked an agreement with OpenAI to supply data for the latter’s model training. In protest, some users deleted their top-rated answers to questions on the community. But Stack Overflow restored the deleted posts and banned those users, claiming that they weren’t in compliance with its terms of service.

Reddit has already voiced its displeasure with one attempt to afford Reddit users greater control over their own data.

Vana, a startup built on the blockchain, is attempting to launch a data “DAO” (Digital Autonomous Organization) to let Reddit users pool their data and let them decide together how that combined data’s used (or sold). Reddit banned Vana’s subreddit dedicated to discussion about the DAO, in a statement to TechCrunch, and accused the company of “exploiting” its data export controls.



Source link

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

The curious case of Nebius, the publicly traded AI...

On October 21, a new ticker opened to...

How The Emergence Of AGI Is Redefining Startups

AGI, or artificial general intelligence, is on the...

How Regulation Drives Trust And Growth In Digital Lending

SUMMARY The rapid expansion of digital lending has also...

Popular

Upcoming Events

Startup Information that matters. Get in your inbox Daily!