Are all voice recognition systems equal? Are they just a gimmick and of no practical
use? Richard Bloor takes a look at two solutions for Symbian OS, one from Nokia,
the other from VoiceSignal and comes to a clear conclusion.
There was a time when every new mobile advertised its voice
dialing capability. It was the feature to have. However, it has been some time
since mobiles have made much noise about voice features. Beyond using this feature
to switch Bluetooth on and off, I've never used it much on any S60 device (with
the exception of the Sendo X). This is because the older system delivered by Nokia needed each contact
to be trained before it could be voice dialed, hardly practical on a contacts
list of several hundred.
The voice dial application, Voice Command, on the Nokia N90 appears to be a significant improvement over the earlier offering. Now, rather
than training each contact, the system is speaker independent. This means that,
in theory at least, you can simply activate the voice recognition and speak.
Voice Command on the N90 is activated from the camera button, with the flip in the closed or open "phone"
position. A short tone is issued when recognition is activated and a "speak now"
dialog displayed, however no indication of the commands you can provide is given.
After issuing your voice command the system provides a synthesized version of
the matched contact or command and briefly displays the match before activating
the command or dialing the contact.
This automatic action is slightly disconcerting. Should the recognition not work
correctly, you have to be quick to prevent the system from dialing a wrong number.
As voice recognition is likely to be used when you do not have quick access to
the keypad, this is something of a limitation.
The voice commands can be tailored to provide access to profiles, voice mailbox,
and any application on the system through the Voice Commands application in the
tools folder. (In addition, Voice Command also uses the nickname set in the contacts
records, allowing tailoring of a contacts recognition words.)
The Profiles option opens a separate folder, which provides a list of profiles.

For each profile and indeed each action, there is the option to change the command.
This involves changing the text definition of the command to be recognized.
After the command has been changed it is replayed, using the voice synthesizer.
The application list is initially populated with just a few of the device applications,
with Voice Mailbox, Bluetooth, Voice Recorder, and Contacts active. Any additional
applications have to be added individually.
Finally, Voice Commands offers options to turn the synthesized voice confirmation
off and reset all the personal adaptations.

Initial impressions of Voice Command were poor. In a quiet office environment,
the first problem encountered was that Voice Command had difficulty distinguishing
between contacts and commands. The request "Bluetooth" repeatedly resulted in
Voice Command trying to dial a contact called "Ho Ho", while "Contacts" unfalteringly
tried to call "Brenda Nash". Apart from the fairly obvious issue that these recognized
contacts seem to bear little audible resemblance to the command spoken, this problem
was fixed by using the adapts to change the recognized commands to "toggle Bluetooth"
and "open contacts".
Name recognition was equally disappointing. From a random selection of ten contacts
no better than 50% were accurately recognized. Some of the names Voice Command
confused it always confused and no amount of careful, slow or fast enunciation
of the name changes its inability to recognize the name.
On a Nokia 6682 VSuite activates from the voice button on the left of the phone.
The first noticeable difference with VoiceSignal's VSuite 2.0 is that it provides
far more options for using contacts and activating commands.
VSuite starts with a splash screen before asking you to "say a command".
You immediately get four options, to prefix your command with the action you
want to take: to call a contact, send them an SMS, open a contact's details or
open an application. While VSuite listens to the command, the small ear icon flashes.
The on screen guidance is helpful the first time you use VSuite, letting you
know what can be done and how to do it. This is in marked contrast to Voice Command,
which leave you to guess what options you have. As these prompts are provided
in parallel with the recognition process, they are not a hindrance once you have
become familiar with VSuite's operation.
For the call, SMS and lookup options VSuite provides a list of possible matches
by default, starting with the best match. A voice synthesizer asks "Did you say"
followed by the matched name. VSuite then give you the option to: confirm this
was the correct contact, was not the correct contact, cancel the recognition session
or repeat the name.

The voice activated Yes, No, Cancel, or Repeat feature was a little disconcerting
at first. The natural reaction, on seeing options on the screen, was to try scroll
to one of them rather than speaking the action required. Once you get used to
it, this mechanism has another advantage. Unlike the Nokia application, once you
have confirmed a contact, VSuite lists all the alternative numbers they can be
contacted on and again a voice command can be used to select the correct one.
This may seem like a long process if you knew you wanted to call Graham Trimmer
at home. VoiceSignal's engineers obviously agreed, as you can avoid the step by
simply saying "call Graham Trimmer home".
Not all the numbers you might want to call are going to be in your contacts list,
to handle that ad-hoc dialing VSuite includes a digit dialing feature. You simply
need to say call and speak the number required.
The open application feature works differently from the contacts related functions.
Given there is a relatively short list of possible matches VSuite simply opens
the application when the voice command has been recognized.
VSuite's options are built into the application and accessed by selecting the
menu item "Settings", instead of speaking a voice command.

The choices list option allows the display of a list of likely matches for any
contact related command to be set on, off or to activate only when VSuite determines
that there is a reasonable chance the recognized speech could match two or more
names. With the option off VSuite behaves in the same way as the Nokia voice suite,
immediately dialing the best match name, but unlike the Nokia suite (which only
dials the contacts first listed number or the default number if one has been set)
this option still allows you to select a specific number by providing it after
the contacts name.

The sensitivity option allows VSuite's margin of error on matches to be altered,
to include or reject more options. In testing no significant need to alter this
setting was identified.

Next is an option to customize and train digit dialing. The process of training
digits (as the application notes) takes about a minute to complete and involves
repeating 10 sets of digits.
In addition, VSuite can be set to identify the unique digit groupings used in
various countries.

The sound options allow the prompts associated with saying the command and confirming
digits and names to be turned on and off. In addition, you can also change the
volume and speed at which names are read back.
VSuite automatically adds almost all the devices' applications to the list that
can be opened by voice command, but any new applications added to the device have
to be activated manually with the application launcher option.

Finally, VSuite allows the method it uses to keep in sync with the contacts folder
to be set as automatic or manual.
VSuite clearly offers more features than Nokia's offering, however the key question
is how well does the voice recognition work.
The first obvious recognition advantage VSuite has is its use of a preface command,
this means there is almost no chance of a call to a contact being confused with
the requirement to open an application. In fact this error never occurred during
my testing.
On my ten contact test, VSuite managed better than 90% recognition in a quiet
office environment. Even more impressive was that it coped well with the specific
number selection even when it was made with some "natural language" such as "call
Graham Trimmer's home" or "call Graham Trimmer at home". I also found no obvious
contact or command that VSuite consistently failed to recognize.
Initially I found digit dialing to be unreliable, if only because the leading
zero on long distance calls was often dropped. Training the digits seemed to improve
the recognition. However, when I realized that my local number format was similar
to that in the UK and set the digit dialing location to the UK, I found recognition
was as good as that for names.
While the performance of the two voice systems in a quiet office is interesting,
it is not really very representative of the types of environments where a user
may wish to use this feature.
The most obvious place where voice dialing is of use is while driving. For this
test I called on the services of a 10-year old Land Rover Discovery Diesel. This
is by no means a quiet vehicle (a considerable improvement on the Defender, cabin
noise wise, but that is a different review).
I undertook two tests. The first was at 60 kph, but with the added distraction
of the CD system playing. The second was at 100 kph, but with the CD off. These
two sound files are what the phones heard during the tests.
Download the 60 kph Test Audio (AMR format) and 100 kph Test Audio (AMR format).
In both these tests the Nokia voice recognition was very poor. Trying to call
the contact "Marylin De'ath" variously failed completely, opened the Web application
or recognized "Joseph Keys" and no one else. During the same test VSuite only
once completely failed to recognize "Marylin De'ath". The majority of attempts
had "Marylin De'ath" as the most likely match although once or twice she was second
in the list of possible matches.
Similarly Nokia's Voice Command unfailingly recognized "Melinie Dodd" as "Ana
Pinto". The same name seemed to challenge VSuite at normal speaking speed, but
by taking a little more care it dramatically improved, achieving close to one
hundred percent recognition. By contrast no amount of care seemed to affect Voice
Commands misguided attempts to connect me with "Ana Pinto".
While these two examples represent the worst performance show by each application
they are representative of the difference in performance between the two systems.
Voice Command was inclined to incorrectly recognize contacts or commands, while
VSuite was inclined to correctly recognize voice commands.
Given that Voice Command failed to perform well in a quiet office it was perhaps
unsurprising that, as in the car test, in all other environments it performed
very much worse than VSuite.
I did discover one fascinating thing during the testing. Entirely by accident
I managed to activate VSuite just as Voice Command's voice synthesizer was repeating
a contact name, which VSuite recognized correctly. This little trick has proved
impossible to repeat, reproducing just the right timing seems unachievable. While
it does illustrate that Voice Command's voice synthesis is good it seems to say
more about the quality of VSuite's recognition, that it can interpret accurately
a computers attempt to reproduce the human voice.
The basic difference between VSuite and Voice Command is that VSuite is usable,
out of the box, in demanding and not so demanding environments. VSuite even managed
tolerably accurate recognition in environments where carrying out a phone conversation
would be a challenge. It was not always perfect, but the worst misrecognitions
ended up with the desired contact somewhere on the possible matches list.
By contrast Voice Command's recognition was poor, in even the most undemanding
environment it could not be described as reliable. In anything other than a quiet
office its error rate became so bad that it was really unusable and even in quiet
environments its performance hardly impressed. It might be possible, using the
command adaptations or by creating contact nicknames, to improve recognition for
key commands and contacts. However, this somewhat defeats the purpose of speaker
independent voice recognition. Voice Command has only one obvious advantage over
VSuite and this is that the command Bluetooth toggles the Bluetooth connection
on or off, rather than simply opening the Bluetooth applications. This toggling
is a handy feature if you use Bluetooth gadgets, like a keyboard, regularly as
it is a quick and convenient way to turn Bluetooth on and off (when it works).
VoiceSignal has done an excellent job with VSuite, combining both quality voice
recognition with an intelligently designed recognition process. It is a practical
solution for initiation of phone calls and SMS messages or controlling a S60 devices
in a wide variety of environments. By contrast the only charitable thing I can
say about Voice Command is that it adds a tick to the N90's feature list.
Like so many technologies voice dialing and commanding suffered from too much
hype too early in its development. Expectations were created and not met. VSuite
overcomes that early disappointment. I can't wait to get my hands on their dictation
system, VoiceMode, and pretty much give up using the keypad all together.
VSuite is shipped as a try-and-buy application on selected Nokia S60 devices.
VoiceSignal has indicated that retail versions of VSuite will be available for
both S60 and UIQ devices in the future. For more information see www.voicesignal.com.
Possibly Related:
|