OpenAI Silently Unveils Whisper 3, A New Generation Open Source ASR Model

Share via:

During its inaugural Developer Day, AI startup OpenAI released a series of open-source models. The slew of products included an upgraded version of its open-source automatic speech recognition model, Whisper large-v3. The company’s future plans involve making the model’s API accessible to users.

The models for English-only applications tend to perform better, especially for the `tiny.en` and `base.en` models as per the official page. The model’s  performance varies widely depending on the language.

(Source: OpenAI)

Initially focused on English, the neural net model was released in September last year. Then it got an upgraded version 2 in December which was enhanced to support multiple languages, although specific languages were not explicitly mentioned. 

Accessible on GitHub under a permissive license, Whisper large-v3 effortlessly transcribes various content for users and has been called the best transcription tool out there. The model features a unique timestamp section that facilitates its application as subtitles on platforms such as YouTube.

The tool initiates the process by segmenting audio into 30-second clips, converting them, and subsequently passing them through an encoder and decoder, which predict the corresponding text caption. Technical intricacies also involve language identification, facilitating multilingual speech transcription, and translation to English.

The model was initially expected to be integrated with ChatGPT, to let the users converse directly with the chatbot through speech. But OpenAI then decided to release the model to the public directly. Interestingly, Whisper is not aimed at the end users as of now but rather at researchers. 

The reason for open-sourcing as per OpenAI was to “serve as a foundation for building useful applications and for further research on robust speech processing“. OpenAI’s AI tool was honed using an extensive dataset of 680,000 hours of meticulously supervised data sourced from the internet, with one third portion originating from non-English sources. 

The post OpenAI Silently Unveils Whisper 3, A New Generation Open Source ASR Model appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Popular

More Like this

OpenAI Silently Unveils Whisper 3, A New Generation Open Source ASR Model

During its inaugural Developer Day, AI startup OpenAI released a series of open-source models. The slew of products included an upgraded version of its open-source automatic speech recognition model, Whisper large-v3. The company’s future plans involve making the model’s API accessible to users.

The models for English-only applications tend to perform better, especially for the `tiny.en` and `base.en` models as per the official page. The model’s  performance varies widely depending on the language.

(Source: OpenAI)

Initially focused on English, the neural net model was released in September last year. Then it got an upgraded version 2 in December which was enhanced to support multiple languages, although specific languages were not explicitly mentioned. 

Accessible on GitHub under a permissive license, Whisper large-v3 effortlessly transcribes various content for users and has been called the best transcription tool out there. The model features a unique timestamp section that facilitates its application as subtitles on platforms such as YouTube.

The tool initiates the process by segmenting audio into 30-second clips, converting them, and subsequently passing them through an encoder and decoder, which predict the corresponding text caption. Technical intricacies also involve language identification, facilitating multilingual speech transcription, and translation to English.

The model was initially expected to be integrated with ChatGPT, to let the users converse directly with the chatbot through speech. But OpenAI then decided to release the model to the public directly. Interestingly, Whisper is not aimed at the end users as of now but rather at researchers. 

The reason for open-sourcing as per OpenAI was to “serve as a foundation for building useful applications and for further research on robust speech processing“. OpenAI’s AI tool was honed using an extensive dataset of 680,000 hours of meticulously supervised data sourced from the internet, with one third portion originating from non-English sources. 

The post OpenAI Silently Unveils Whisper 3, A New Generation Open Source ASR Model appeared first on Analytics India Magazine.

Disclaimer

We strive to uphold the highest ethical standards in all of our reporting and coverage. We StartupNews.fyi want to be transparent with our readers about any potential conflicts of interest that may arise in our work. It’s possible that some of the investors we feature may have connections to other businesses, including competitors or companies we write about. However, we want to assure our readers that this will not have any impact on the integrity or impartiality of our reporting. We are committed to delivering accurate, unbiased news and information to our audience, and we will continue to uphold our ethics and principles in all of our work. Thank you for your trust and support.

Website Upgradation is going on for any glitch kindly connect at office@startupnews.fyi

More like this

macOS Tahoe 26: All the new features in Messages

9to5Mac is brought to you by CleanMyMac: Tidy up your...

CoinFund President: RWA Tokens Democratize Investing

Real-world asset (RWA) tokens can democratize access to...

Meta acquires voice startup Play AI

Meta has acquired Play AI, a startup that...

Popular

Upcoming Events

IPTV Portugal iptvdfgdfs iptvdfgdfs iptvdfgdfs iptvdfgdfs
IPTV IPTV IPTV IPTV IPTV IPTV IPTV