Transition from DTMF to Speech and issues that arise
In the previous post we briefly discussed the differences between DTMF – enabled and speech – enabled IVR systems. We will now turn to the transition of an IVR application based on DTMF to a similar application powered by a speech recognition engine and various considerations that have to be taken into account while doing so:
Speech – powered applications have to be complicated to justify the investment. It is clearly neither cost effective nor really more efficient to spent excessive amounts of money into a two-layer application with 3-4 options on the first menu and 2-3 options on each submenu. These can be implemented very nicely with DTMF and the user gets served quickly enough. Thus, if your self-service application is simple and small, it is currently best to use DTMF.
For complicated applications though, using speech recognition is vastly superior in terms of efficiency and quality. And this is the case even in situations when the correct recognition % achieved by the speech recognition engine is even below 50%! The reason is that the user of an automated system actually wants to minimize the interaction time. A typical user will definitely prefer entering the same information twice or even three times and get done in 1 minute total, rather than having to navigate menus and listen to irrelevant information for 2 minutes before being able to quickly and accurately enter the information once.
The following example is using a (randomly created for this purpose) application flow tree complex enough to showcase the difference in implementation logic between DTMF and speech.
In the tree appearing to the right, the leaves are the final services the application offers. The information retrieval and announcement services are highlighted in yellow, and the services that the customer has to enter information are highlighted in orange. In a DTMF powered application, each menu has to be presented hierarchically with the users having to listen first to the options 1-5 then after they select a submenu and being presented with all the options below it they go to the next submenu etc. Typically the user can navigate back to the previous menu or the start menu by using * and # keys or some number. Speech enabled application on the other hand allows the user to directly jump to any sub-tree they wish, or directly access a service (leaf of the tree). The user may also jump at any point during their navigation to any service with one action, without having to pass through the hierarchy. Traversing across the tree requires, of course, the user to be able to know the available options otherwise the options have to be presented again in a hierarchical manner. As soon as the user tries the application a few times though, service times can be severely lowered. Let’s assume that a caller wants to perform actions 1.3.3 and 5.1.1.2. In the DTMF style application they would have to go through 3 menus for the first item then jump to start and then go through 4 more menus to the second item. This procedure will never be improved, no matter how experienced the user is (save for the time to listen to prompts which can be eliminated via barge-in). That is a minimum of 8 steps required. In a speech enabled application though, an experienced user can jump directly from the initial menu to the first item and then from there jump directly again to the second item without even having to go to the start menu. In this case we can achieve the same result with 2 steps.So, for this particular example, supposing we have a 50% average recognition success, the experienced system user is still served roughly twice as fast as with the 100% accurate DTMF. |
The example mentioned above showcases quite clearly the advantages speech recognition can bring to advanced IVR users. However, inexperienced users that interact with the system for the first time will typically spend more time learning how to work with it. This is part of the learning process that is inherent in any new technology being rolled out to the general public, and it typically takes some time before the new system becomes more efficient than the old, for the average users.
For creating a voice recognition IVR system I can recommend Ozeki VoIP SIP SDK. It offers a sample program with source codes on how to develop an IVR system: http://www.voip-sip-sdk.com/p_118-c-sharp-voice-recognition-ivr-voip.html
ReplyDeleteBR,
WilsonS