OpenAI announced the introduction of voice and image capabilities in ChatGPT, extending the scope of interactions users can have with the AI system. These features allow users to engage in voice conversations and share images with ChatGPT, aiming to make the interface more intuitive.

With voice interaction, users can communicate with ChatGPT in a conversational manner. The feature utilizes a text-to-speech model and the Whisper, an open-source speech recognition system, to facilitate the dialogue. This feature will be available on iOS and Android platforms.

The image recognition function enables users to share images with ChatGPT for a wide range of purposes including troubleshooting, meal planning, or work-related data analysis. Users can utilize a drawing tool on the mobile app to focus on specific parts of an image. The image understanding feature is powered by multimodal GPT-3.5 and GPT-4 models.

OpenAI has decided on a phased rollout strategy for these features, initially making them available to Plus and Enterprise users. The voice and image capabilities are expected to be accessible to these user groups over the next two weeks.

This rollout aligns with OpenAI’s approach towards ensuring the safety and beneficial use of AGI (Artificial General Intelligence) by deploying new features gradually. It also paves the way for potential improvements and refinements based on real-world usage and feedback.

Concerns around the implications of realistic synthetic voices and vision-based models have been acknowledged. Voice technology, while opening doors to creative and accessibility-focused applications, also presents risks like impersonation or fraud. On the other hand, vision-based models bring challenges ranging from hallucinations to reliance on the model’s interpretation of images in high-stakes domains.

OpenAI has also acknowledged certain limitations of ChatGPT, advising users against relying on it for specialized topics without proper verification, especially in fields requiring expertise.

The announcement indicates a step towards expanding the range of interactions users can have with ChatGPT, and it reflects OpenAI’s ongoing efforts to improve and enhance the capabilities of their AI systems.

