OpenAI starts rolling out advanced voice mode feature

OpenAI is launching an advanced voice mode feature to a select few users of ChatGPT Plus, the artificial intelligence company backed by Microsoft, announced on Tuesday in a blog post on X.

We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions. pic.twitter.com/64O94EhhXK
— OpenAI (@OpenAI) July 30, 2024

During OpenAI’s presentation, the advanced voice mode feature stood out as significantly more powerful than the existing voice feature of ChatGPT. Present at the event, OpenAI staff members could easily redirect the conversation to prompt the chatbot to narrate a story in various manners, and the chatbot readily accommodated these requests, modifying its replies accordingly.

The company has announced that the Advanced Voice Mode feature will be restricted to the four pre-selected voices of ChatGPT – Juniper, Breeze, Cove, and Ember, which were developed in partnership with professional voice actors. The voice of Sky, featured in OpenAI’s demo in May, is no longer accessible in ChatGPT. Lindsay McCallum, a spokesperson for OpenAI, has stated, “ChatGPT is not capable of mimicking the voices of others, including both private individuals and celebrities, and will prevent any responses that do not match one of these pre-selected voices.”

Safety

Since the demo, the company has conducted tests on GPT-4o’s voice abilities with over 100 external hackers who speak more than 45 different languages. OpenAI has mentioned that a report detailing these security measures will be released in the early part of August.

How was it trained?

The Advanced Voice Mode mimics the sound of natural breaks in speech because it was taught on recordings of people talking that had this characteristic. The system has picked up the ability to mimic the sound of breathing in what appears to be the right moments after being shown countless, possibly even millions, of instances of human conversation. Big language models (BLLMs) such as GPT-4o are experts at copying, and this ability has now been transferred to the realm of sound.

How to Access Voice Mode Alpha

As shared by OpenAI, access to the alpha version is limited to a select group of ChatGPT Plus subscribers. To join, you’ll need to pay $20 monthly. However, if you’re chosen for the alpha, you’ll get an email with guidance and a notification in the app. If you haven’t gotten a notification yet, there’s no need to fret; the company will keep expanding its user base as new members join.

Some Mind Blowing Features

Real-time Japanese translation using new advanced voice mode + vision alpha!

Real-Time Japanese translation using #ChatGPT’s new advanced voice mode + vision alpha! Yet another useful example! pic.twitter.com/wDXrgYQkZE
— Manuel Sainsily (@ManuVision) July 31, 2024

2. ChatGPT Advanced Voice Mode counting as fast as it can to 10, then to 50

ChatGPT Advanced Voice Mode counting as fast as it can to 10, then to 50 (this blew my mind – it stopped to catch its breath like a human would) pic.twitter.com/oZMCPO5RPh
— Cristiano Giardina (@CrisGiardina) July 31, 2024

3. Advanced Voice beatboxes

Yo ChatGPT Advanced Voice beatboxes pic.twitter.com/yYgXzHRhkS
— Ethan Sutin (@EthanSutin) July 30, 2024

OpenAI announced that it plans to take into account the opinions of its users to enhance the model even more. Additionally, it will provide an in-depth analysis of GPT-4o’s performance in August, covering assessments of safety and its constraints.

OpenAI starts rolling out advanced voice mode feature

Table of Contents

Safety

How was it trained?

How to Access Voice Mode Alpha

Some Mind Blowing Features

About the author

Biplab Bhattacharya

Leave a Reply Cancel reply