Forward-looking: Remember when we thought Siri, Alexa, and Google Assistant were going to be really helpful? Yeah, me too. Fast forward about ten years to today, and we're starting to see some much more impressive demos of just how far digital assistants have progressed. The possibilities look both compelling and intriguing.
On Monday, OpenAI took the wraps off its new GPT-4o model and the accompanying update to ChatGPT that makes it possible to not only speak with ChatGPT but do so in some eerily realistic ways. The new model lets you interrupt it for a somewhat more natural conversation flow and responds with more personality and emotion than we've heard from other digital assistants.
With the updated ChatGPT apps for iOS and Android, it can also see and understand more things via a smartphone camera. For example, OpenAI demonstrated a homework helper app that could guide students through simple math problems using the camera.
Then on Tuesday, Google unveiled a huge range of updates to its Gemini model at its I/O developer event, including a similar homework helper function within Android itself. Google also demonstrated Gemini-powered AI summaries for Search, more sophisticated applications of Gemini in Google Workspace, and a new text-to-video algorithm called Veo that's akin to OpenAI's recently introduced Sora model.
Demos from both companies leveraged similar technologies that many other companies are clearly developing in parallel. More importantly, they highlighted that some core capabilities needed to create intelligent digital personal assistants are nearly within reach.
First is the increasingly wide support for multi-modal models capable of taking in audio, video, image, and more sophisticated text inputs and then drawing connections between them. These connections made the demos seem magical because they imitated how we as human beings perceive the world around us. To put it simply, they finally demonstrated how our smart devices could actually be "smart."
Another apparent development is the growing sophistication of agents that understand context and environment and reason through actions on our behalf. Google's Project Astra demonstration, in particular, showed how contextual intelligence combined with reasoning, personal/local knowledge, and memory could create an interaction that made the AI assistant feel "real."
Currently, definitions of what an AI-powered agent is and what it can do aren't consistent across the industry, making it tough to generalize their advancements. Nevertheless, the timing and conceptual similarity of what OpenAI and Google demonstrated makes it clear that we're a lot closer to having functional digital assistants than I believe most people realize. Even though the demos aren't perfect, the capabilities they showed and the possibilities they implied suggest we are getting tantalizingly close to having capabilities in our devices that were in the realm of science fiction only a few years ago.
As great as the potential applications may be, however, there remains the problem of convincing people that these kinds of GenAI-powered capabilities are worth using on a regular basis. After the initial hype over ChatGPT began to slow towards the end of last year, there's been more modest adoption of the technology than some people anticipated. What remains to be seen is whether or not these kinds of digital assistant applications can become the trigger that makes large numbers of people willing to start using GenAI-powered features. Equally important is whether or not they can start changing people's lives in the ways that some have predicted generative AI could.
Like it or not, the only way you can get an effective digital assistant is if it can get unfettered access to your files, communications, work habits, contacts (and much more)...
Of course, part of the problem is that – as with any other technology that's designed to customize experiences and information in their own unique way – people have to be willing to let these products and these companies have deeper access into their lives than they ever have if they want to get the full benefit from them. Like it or not, the only way you can get an effective digital assistant is if it can get unfettered access to your files, communications, work habits, contacts, and much more. In an era of growing concern about the impact of tech companies and products, this could be a tough sell.
In the US, much will depend on what capabilities Microsoft and Apple unveil at their developer conferences in the coming weeks. Given the iPhone's dominant share in the US smartphone market, the GenAI-powered capabilities Apple chooses to enable will significantly influence what people consider acceptable and important (whether through its own development or licensed via OpenAI or Google, as the company is rumored to be doing).
Call it Siri's revenge, but any digital assistant or agent technologies that Apple announces for the next version of iOS will have an outsized influence on how many people view these technological advancements in the near term.
Ultimately, the question also boils down to how willing people are to become even more attached to their digital devices and the applications and services they enable. Given the enormous and growing amount of time we already spend with them, this may be a foregone conclusion. However, there is still the question of whether people will perceive some of these digital assistant capabilities as going too far. One thing is certain: this trend will be interesting to watch.
Bob O'Donnell is the founder and chief analyst of TECHnalysis Research, LLC a technology consulting firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on Twitter @bobodtech
Masthead credit: Solen Feyissa