Why Siri is just the start for natural input

Pick up your phone in the pub, confidently say 'Siri, what's the circumference of the Earth divided by the radius of the Moon?' and barely seconds later, you're the only one there who knows the answer is 23.065.

What we'll have in a couple of generations of phones

Compared to what we'll have in a couple of generations of phones, although, it's a Speak & Spell. Best of all, voice is just the start of the natural input revolution.

Imagine a world with no keyboards, no tiny buttons, no tutorials and no manuals. You'll just do what comes by nature, and your phone will adapt, using artificial intelligence to deduce that you're dictating, or that when you say 'Order take-out', you're going to want Thai that day. Or a million other seamless interactions, combining your camera, location, search, databases, music and more, based on massive databases of information and probabilities and tuned to your personal tastes and past history. It's going to be glorious.

At its most basic level, pressing Siri's microphone button records a short audio clip of your instruction, which your phone passes to its online servers as a highly compressed audio file. Here, your speech is converted into text and fired back, as a piece of dictation or instruction for your iPhone.

That's the gist, although, and iPhone 4S owners will tell you it often works damn so then. Meanwhile, it does in the US. One of the few major problems with Siri is that much of the best stuff, like finding a restaurant, has but to arrive internationally, leaving us with much of the gimmickier stuff.

The only tool capable of this

Siri isn't the only tool capable of this, although, and during it is currently the most efficient, the competition works in the same way - just two of them being Nuance's Dragon Go! and the Android-only Iris from Indian startup Dexetra. With Apple's legendary secrecy at length effect, it's often by looking at these that we can see what's going on in accordance with the surface, and where Siri is likely to go in future.

Knowing how it works, two questions will likely right away pop into your mind: if all the heavy lifting is happening elsewhere, in the cloud, why do you need an iPhone 4S to use Siri? And why can't it all just work right on the phone?

In truth, the likely answer to the first one is simply 'because Apple wanted a cool selling point for the iPhone 4S'. The original version of Siri was a standalone app that ran on a regular iPhone 4, and judging by appearances the latest incarnation isn't doing anything that actually requires the more powerful A5 processor. There are future-gazing reasons why Apple might want to restrict it, however precious few non-marketing related ones as it stands now.

What everyone does agree on is the importance of sparing your phone the technical heavy lifting, for two reasons: efficiency and updating.

The original Iris 1

"The original Iris 1.0 did not use a server, everything was being processed from the phone," explains Narayan Babu, CEO of Dexetra. "Even on powerful phones with dual-core processors, this was inefficient. Natural language processing and voice-to-text require real horsepower. When we tried doing serious NLP on Android phones, it nearly always crashed. It is as well easy to add features seamlessly when processing happens in the cloud, without having to update the actual app."

Vlad Sejnoha, chief innovation officer of Dragon Go! creator Nuance, one of the most highly regarded companies in the field, told us: "10 years ago, speech recognition systems were trained on a few thousand hours of user speech; today we train on hundreds of thousands. Our systems are [as well] adaptive in that they learn about each individual user and get better over time."

Tags: Iris 1, Android IP

More information: Techradar

More news: