VUI’s usability – Seven things to consider before testing


Since the dawn of “Alexa”, “Siri”, and “Google Assistant”, Voice User Interfaces (VUI) are among the most relevant technological advances of the last years. This comes as no surprise, considering that using your voice to interact seems much more natural than clicking with a mouse, typing with a keyboard, or even tapping on a touch screen. 

Although many Usability Heuristics likely apply to VUI the same way they do to Graphical User Interfaces (GUI), design patterns for voice input differ significantly from those for visual interaction. However, compared to the body of knowledge on GUI, there is no deep knowledge on VUI patterns or testing methods for VUI.

An example of an augmented conversation happening between two people. The conversation augmentations can be seen on the right hand side. 

At CLINQ, we are working on a new VUI – a cloud-phone which augments the conversation of its users. We’ve conducted various usability tests and this is what we learned. 

Consider the natural flow of conversation

Thinking Aloud is one of the most important usability tools. However, when dealing with VUI, making your participant think aloud might not be the best method to gain insights about the participant’s feelings, especially when:

  • the problem the product solves is complex, involving necessary back and forth to negotiations
  • VUI is augmenting a conversation happening between people (as in the example of CLINQ).

In these cases, thinking aloud will distract the participants from their task or conversation, undermining the mental involvement in the scenario and thus distorting the results of your test.

Considering that efficiency is an important factor in VUI (see below), making your participants think aloud will delay the progress in their tasks simply because they will need time to think and express their evaluation of the product. 

Some work-arounds to deal with the lack of users’ introspection during testing are to film the test (see below) or to ask the participants about their emotional perception after the test.  

Compare VUI to “traditional” alternatives

Enabling voice as a form of interaction is often an alternative to an existing graphical interface, i.e. it’s possible to set a timer both via GUI or VUI. There aren’t many products using voice as the exclusive way of interaction. 

Consequently, there is an opportunity for you to compare your VUI not only to your competitor’s, but also to its “traditional” alternatives. This will give you a benchmark regarding usability metrics such as the success rate, or the time necessary to complete a task (see next).

Consider efficiency and task completion 

Granted, the efficiency of a system should never be the single, decisive factor for or against implementing voice (or any other feature) into your system. 

However, when designed well, VUI are not structured hierarchically, making the interaction of user and product incredibly efficient. Thus, you might want to test the time to complete a task. Picture how many interactions on your smartphone it takes to figure out the population of London:  

How much time does it take to figure out the population of London? Searching via a graphical user interface (on the left side) versus a voice user interface (on the right hand side). The voice interface is much quicker than the graphical user interface.

Using the graphical interface: Touch the home button Swipe left Swipe left (another time) Tap “google” Tap into the search Tap “P”Tap “o” Tap “p” Tap “u” Tap “l” Tap “a”

Using the voice interface: Touch and hold home button → say “Hey Siri, how many people are living in London” → DONE

At the same time, VUI can be horrendously slow, particularly when the system has problems understanding the utterances. A common pitfall is not testing how hard it is for the system to recover from faulty voice inputs and misinterpretations.

Consider recording a video

When judging the satisfaction users get from using a product, there is nothing as powerful as observing them while they use the product. This is why you should try to get your team to be quiet bystanders during usability tests. Since it’s usually not a good idea to have more people in the room than necessary, it is important to think about the deliverables of your test. To put it in other words: how do you make the results of your test available for product development?

As stated before, it is reasonable NOT to use Think Aloud methodology with VUI. This will result in missing valuable insights about the person’s emotions during usage. Consequently, you should consider taking a video as the most powerful way to grasp how your product makes your users feel. Displaying the satisfaction and struggles to the team in a condensed form will much likelier convey product’s strengths and shortcomings than noting your findings in a page-long report.

Recording your usability test with a camera will result in extra work before and after the test. This is why it is good practice to invest in a usability set up for recording and editing your video. It might be useful to have a mobile setup (see test’s surrounding).

Create a reasonable scenario 

The goal of usability testing is to see how users will be interacting with your product in the real world, without any enforcement or help of others. It is therefore crucial to define a realistic scenario for your test in which your participants feel comfortable so that they concentrate on the given task. Finding conceivable scenarios is especially important for B2B-products tackling more specific use cases because most of us do not have access to participants which precisely fit the target group at all time. 

For example, you may target sales representatives with your product. If you cannot recruit any for user testing, think of a scenario that a non-professional could associate with, for example: a car salesman or a real-estate agent.

Consider your test’s surrounding 

Testing your Voice User Interface in a controlled and quiet environment sounds like the great idea, but the physical environment in which testing takes place has a huge impact, psychologically and physiologically, on the validity of your results. Consider the influence of the environment in which your product will be used before planning your usability test. 

As an example, if your user is a factory-worker, you might want to test your interface in a situation comparable to a factory. Obviously a voice interface is not necessarily more accessible because the user is able to control it hands free. Voice control will not be of value if the system isn’t able to pick up speech because of ambient noise. 

Even more subtle is the influence of the social environment. The perceived presence of others influences our behaviour in numerous ways. There might be social circumstances where it is not appropriate to use voice to control an interface. 

That’s why your results will be diminished if the test setting differs significantly from the setting of real usage. Think about a person using voice input to search for the latest Billie Eilish in an office for everybody to hear? Now, imagine this person not getting it right the first time, repeating the query over and over. The perceived presence of others will make this an even more unpleasant experience that it already is in and of itself. This user’s judgement of your product might not be as severe if tested in an isolated environment with fewer potential social consequences.

If your product is built for usage in a social environment, think about how this could determine your experimental design. If you come to the conclusion that you will not be able to design an usability test which acknowledges social environments, think about observing users in their real workspace. Alternatively, ask them to write down their experiences with your product as a form of diary study.

Consider social desirability

Obviously, you should always be cautious when openly asking for an opinion about the usefulness of your product. This is because many people are reluctant to give negative feedback, fearing their answer might be offensive. 

You should be especially careful asking how handy an innovation like VUI might be. This is because people tend to argue in favor of the novelty by overestimating positive impacts and ignoring negative consequences simply because they are not familiar with the technology.

This tendency is often reinforced by the participants you will be able to recruit. People willing to spend their time giving feedback to your brand-new-thingy tend to be more open-minded to innovation. Ask yourself: are you actually targeting early adopters?


Voice is one of the most natural and effortless forms of interaction and certainly an essential part of numerous products in the future. Still, this technology is in its infancy because of countless gaps and usability flaws. Although voice interaction is radically different from visual interaction, when it comes to the systems usability, they both require the same thing: feedback from real people, as often as possible. After all, “people ignore design that ignores people” (Frank Chimero).

One Response

  1. Thanks for the article, lots of good points.
    I like to think of different modalities as having different cost benefit curves over different size task chunks. When you factor everything in a speech action will cost more than a simple GUI step but you can potentially get a lot more out of one speech act. You should not use speech if you can’t get that much out of it and you could just as easily use something else.
    A casting company CEO said he wanted to install a 5 word vocabulary interface on a casting machine. This was going to cost him $10k and I pointed out that on an industrial floor speech recognition would not be so good. I recommended he just use buttons and plate them with silver, gold, bronze if he really wanted to spend a little money on the project.
    In the case of your query speech will do well because it’s linguistic input: You are choosing amongst large numbers of items. There are lots of cities and lots of characteristics one might want to inquire about. With linguistic input you get to string words together as well. Typing it in is also a linguistic query but it is a lousy keyboard and speech is 4 times faster than even a good keyboard. A pure WIMP interface would be catastrophically worse for this. I think Yahoo used to be a menu tree without a search box in the early days. But WIMPs have their advantages for some things to.

Leave a Reply

Your email address will not be published. Required fields are marked *

This might also interest you:

Sign up for early access

🍪We use cookies to improve user experience and our services. By using our website you give consent to our use of cookies. More