By Ernst Wittmann, Global Account Director MEA & Country Manager – Southern Africa, at TCL
Voice assistants and interfaces on smartphones are coming of age and adoption is expected to soar over the next few years. ComScore, for example, forecasts that half of all online searches will be voice searches by 2020, while PwC research shows that 65% of 25-49-year olds already speak to their voice-enabled devices at least once per day.
The growth estimates may prove to be conservative, given the advancements we are seeing in the development of natural language processing (NLP) and voice recognition. Even the most cynical user should be impressed by the recent Google Assistant speech recognition demo at Google I/O 2019, the company’s annual developer conference.
Even those of us who are fans of speaking to smartphones can find the response times a little slow. Google showed how its next-generation voice assistant can access phone functions in milliseconds, seamless multitask across apps, and complete the most complex interactions with little noticeable latency.
The game-changing innovation? Google has managed to compress the speech recognition algorithms into the smartphone itself, so that one no longer depends on the Internet and cloud processing to get things done. The integration of a responsive and reliable assistant into smartphones will overcome one of the last barriers to voice interface adoption.
This technology enables us to communicate with our smartphones in a more natural and human way, and without needing to use our hands. If you’re busy baking from a recipe, for example, you can ask the assistant to read back the ingredients and next steps, rather than smearing oil or flour on an expensive touchscreen. Or you can do a search using the sort of question you might ask a friend or a human personal assistant rather than a search engine: “Hey, where’s nearest dentist?” and perhaps even “Please could you book an appointment for 2pm tomorrow, email my medical aid number to the dentist and note it in my calendar?”
The latest advancements in machine learning and NLP mean that these sorts of interactions are becoming more efficient and more intuitive, especially compared to the annoying and limited interactive voice response systems some companies use for automated customer service. The pauses while the system connects to servers in the cloud will start disappearing as Google brings its next-gen voice assistant to all devices, and it will become easier to complete complex tasks without touching your phone.
There are still some complexities to be addressed. Voice interfaces are likely to struggle with regional accents, slang and sarcasm, for some time, and it will be a while before speakers of all South African languages can interact easily with their phones in their mother tongue. It will take developers a while to structure voice interactions on your phone for complex, multistep tasks that are faster than – and as reliable as using a touchscreen – and for users to get used to the technology.
Still, the promise of voice interfaces is irresistible – removing the barriers between you as the user and the information, service or task you want to access on your device. Imagine, for example, telling your banking app to transfer R1000 to a family member, using your voice rather than a one-time pin to authenticate the transaction, and all without even taking your phone out of your pocket? It may take a while to get there, but a world of even faster, more seamless access to services awaits.