Introduction:
This is the third and final part of a series exploring the most frequently used speech technologies in contact centers. The first two parts discussed Speech Recognition and Speech Synthesis (or Text to Speech). We will now turn to voice verification, an exciting and relatively new technology that can greatly enhance security.
What is voice verification and how it works:
Voice verification is a biometrics technology which focuses on matching a person’s voice with a pre-recorded sample, to verify that the speaker is who they claim to be. Each person’s voice is completely unique, much like a fingerprint.
The speaker initially recites some text or phrase or some discrete words, numbers and so on. The uttered speech is digitized and stored. Biometrics engine splits each spoken word to small segments called formants (much like speech synthesis engine works with phonemes as it was described in the text-to-speech article). These formants are then analyzed into tones that can be then captured in a digitized format and stored in a database. These are the physical characteristics of the voice. In addition to these, additional characteristics are recorded and stored, the so-called behavioral characteristics. An example of behavioral component is pronunciation. The speaker is typically prompted to utter the text/words several times to gather more information about his voice and allow for greater variation.
When the speaker utters the same text in the future, the same procedure takes place and the extracted tones are compared to the stored ones.
Voice verification accuracy and other issues:
The accuracy of this verification can be affected by numerous factors. A person’s voice can change over time based on health issues (having a cold significantly alters voice) or even psychological issues. Background noise is another problem which can distort the uttered speech and microphones tend to enhance this problem. Voice distortion over the telephone can also affect the accuracy of the verification process.
To ensure the highest possible accuracy, the conditions of sample gathering should apply as much as possible to future verification attempts. For example, if a verification procedure is going to be used over phone in the future, the sample gathering should also be performed using a phone. Also, both the sampling and the verification procedures should be performed under low noise conditions.
In any case, the aforementioned limitations sometimes make the verification procedure harder to complete successfully. Therefore, most implementations opt to use the voice verification combined with the classic PIN approach. In this approach, the speaker is prompted during the sampling procedure to utter a series of digits which comprise his PIN. When the speaker tries to authenticate himself in the future, he speaks his PIN again and two procedures take place in parallel. The voice verification engine tries to match the speech characteristics to the stored ones in the database. A speech recognition engine tries to understand the digits been uttered and produce the PIN in a text format. Both engines should confirm that the speakers are who they say. This approach results in substantially higher recognition accuracy overall, and renders systems that use it resistant to fraud attempts.

No comments:
Post a Comment