OpenAI has integrated new voice and image functionalities into ChatGPT. These enhancements promise a transformative interaction experience, making engagements with the AI model more intuitive and captivating. A standout feature in this update is the integration of voice capabilities into ChatGPT, allowing users to engage in real-time dialogues with the AI assistant. This functionality broadens the spectrum of potential interactions, enabling seamless conversations, whether it’s on the move, narrating a bedtime story, or settling a dinner table debate.
Initiating voice conversations is a straightforward process. Users can navigate to the Settings menu in the mobile app, select New Features, and opt into voice interactions. Once activated, by tapping the headphone icon in the top-right corner, users can choose from five distinct voices, meticulously crafted by professional voice actors to deliver a human-like audio experience. Whisper, an open-source speech recognition system developed by OpenAI, transcribes spoken words into text, enhancing the overall conversation quality.
Another groundbreaking addition is the ability to share images with ChatGPT. Users can now present one or multiple images to ChatGPT for problem-solving, content exploration, or data analysis. Whether deciphering grill malfunctions, planning meals based on fridge contents, or interpreting complex data graphs, ChatGPT is poised to provide valuable assistance.
To utilise this feature, users can tap the photo button to capture or select an image. On iOS or Android, the plus button enables the addition of multiple images, and a drawing tool is available to guide the assistant. These image capabilities leverage multimodal models, including GPT-3.5 and GPT-4, applying language reasoning skills to a broad array of visual content, encompassing photos, screenshots, and documents with text and images.
The rollout of voice and image capabilities will occur gradually, becoming available to Plus and Enterprise users over the next fortnight. Voice functionality is accessible on both iOS and Android platforms, with opt-in options in the settings, while image capabilities will be accessible across all platforms.
OpenAI acknowledges the potential risks associated with these advanced capabilities. In the realm of voice, the focus is on voice chat, and the technology has been developed in collaboration with voice actors to ensure authenticity and safety. Notably, Spotify is leveraging this technology for its Voice Translation feature, broadening the accessibility of podcast content through language translation utilising authentic voices.
In addressing image input, OpenAI has taken steps to curtail ChatGPT’s capacity to analyse and provide direct statements about individuals, respecting their privacy. Real-world usage and user feedback will significantly contribute to further enhancing these safety measures while preserving the tool’s usefulness.