
OpenAI has brought together several engineering, product, and research teams over the past two months to overhaul its audio models, preparing for what could be the company’s most ambitious hardware push yet.
The move comes as CEO Sam Altman has publicly declared that the company is seriously aiming to build artificial general intelligence (AGI), a system that can perform any intellectual task a human can.
The ChatGPT-maker expects to release a new advanced audio model in early 2027, likely by the end of the first quarter, featuring capabilities that current voice AI systems simply can’t match. This model will reportedly produce more natural-sounding speech, handle interruptions during conversations seamlessly, and even speak simultaneously with users, representing a level of conversational fluidity that today’s rigid call-and-response chatbots can’t provide.
But the audio model is just the opening game. OpenAI also plans on developing a family of devices that may include smart glasses and screenless smart speakers, which are all expected to launch approximately a year from now.
OpenAI claims they aren’t just simple accessories to existing technology as they’re designed to fundamentally reimagine how humans interact with artificial intelligence.
Also former Apple design chief Jony Ive’s involvement, who joined OpenAI after its acquisition of io, carries important weight. As he is the designer behind the iPhone, iPod, and Apple Watch, Ive has made reducing screen addiction a personal priority, and his vision aligns perfectly with OpenAI’s strategic direction.
The strategic acquisition of Jony Ive’s io and even hiring him as design executive for the company becomes clearer when considering OpenAI’s big challenge which is distribution. Unlike Google with Android, Meta with social media networks and its Ray-Ban smart glasses, or Apple with its iPhone, Siri, and other impeccable designs, OpenAI lacks native distribution channels and must rely on third-party platforms for user access.
Every ChatGPT interaction currently flows through someone else’s infrastructure, whether that’s a mobile app store or web browser. By entering the hardware layer directly, OpenAI can own the complete user relationship, from the device to the AI model powering their experience.
Additionally, OpenAI’s push toward audio-first, screenless devices is a result of broader industry momentum. For instance, Meta recently introduced a feature for its Ray-Ban smart glasses using a five-microphone array to help users hear conversations in noisy rooms, essentially turning glasses into directional listening devices.
Google has been experimenting with audio overviews that transform search results into conversational summaries, while Tesla plans to integrate conversational voice assistants into its electronic vehicles for easy navigation. This pattern of Silicon Valley waging a quiet war on screens and betting that voice will become the dominant interface for AI interaction is increasingly becoming competitive.
For OpenAI, its advantage lies in its timing and technological foundation. The company is both building hardware and advancing the AI models that will power these devices. Altman recently stated that OpenAI is confident it knows how to build AGI as traditionally understood, and while definitions of AGI remain contested, the company is clearly positioning itself at the frontier of AI capability.
