When we started a little more than a year ago, our vision was to eliminate all hardware and setup effort in B2B telephony:
No PBX, no desk phones and a complete telephony solution in under a minute. We have achieved our goal and what is the end result of such a goal? A new one!
Since the last pivot, i.e. a fundamental strategic change of the product, we have been concentrating on reducing the pre- and post-processing time of sales calls to zero using Artificial Intelligence.
All starts with a buzzword: AI
“Our voice bot needs dusting. Let’s do something with Al…”, this or that is how my personal journey through the great Al universe began. Spoilers: It’s a big universe. So there I stood. I’m supposed to do “hot shit,” but of course as an MVP. So quickly shift and iterate over the feedback. So as with all big issues you start with the research:
- What’s on the market?
- What is possible?
- And what does “AI” actually mean in the context of Voice-Bots?
Start to research
What’s on the market? Lots! So, first sort it out.
- German models should be available
- Extensive documentation
- Fast development results should be possible
- SaaS solution
Numerous vendors, a handful of open source projects and several tutorials later, we decided to go with Dialogflow. Google’s SaaS solution provides us with a verified foundation of Natural Language Understanding (NLU) and resulting Natural Language Processing (NLP). It is super easy to define your own entities to filter information from user statements. There is a lot of documentation, SDKs and example projects.
After we sorted out all the SaaS providers that did not meet our criteria and decided on Dialogflow, we also came across the open source project Rasa during the AI Conference in London. Tip: If you want even more freedom and want to get deeper into Natural Language Understanding (NLU), you should take a look at this project. Maybe there will be an article about Rasa later on, but for now it’s all about Dialogflow.
Dialogflow is a platform for creating AI-driven language assistants. Understanding user input and providing the correct response to it is essentially the main task of such assistants. The Google Assistant is a simple example. You speak/write something into the client, the AI tries to recognize what you actually want by means of learned models, and you then get the following actions in response: Simple text responses, “rich messages” (with images, maps, website excerpts, etc.) or concrete “tasks” like creating calendar entries or timers.
Dialogflow offers a very simple interface. Once you have understood the basic concept and internalized the terminology, the operation and creation of Agent is very easy. Of course, everything can also be operated directly via the detailed documented API.
How does it work?
Under the hood of Dialogflow is Google’s Machine Learning for the analysis of unstructured text. Natural Language Processing (NLP) interprets user input using trained phrases to understand the user’s intent. Once the user’s intention has been understood, it can be reacted to accordingly, for example by simple answers or logic-based actions.
What are entities?
Within a dialog, various information is contained which influences the further course of the conversation. In order for our speech assistant to react naturally to this information, it needs a way to extract this information from the user statements. This is where dialogflow entities come into play.
Dialogflow currently offers 59 System Entities for recognizing and extracting information. These include, for example:
- Date and Time: @sys.date
- Contacts: @sys.email
- Names: @sys.person
Additionally, you can create your own entities. For the navigation within the CLINQ interface I created a @clinqInterface entity which contains several sub entities:
Where to start?
If you are completely new to the area, I recommend investing a few days reading Google’s general documentation. Especially the topic “Conversation Design” has a big impact in the beginning. The following pages are a very good start: https://designguidelines.withgoogle.com/conversation/
So after reading more or less everything on the Conversation Design pages, I quickly wanted to get into practice. For this purpose, a rough concept of what the voice bot should do and how it should run technically helped me.
The goal or: The Great Vision
It’s my intention that the bot is active in the interface all the time and that it supports me in my daily tasks. This includes:
- the initial onboarding,
- answering questions about CLINQ,
- Takeover of the complete administration, such as telephony functions like:
- Set absent status
- Activate / deactivate channel
- Create new channels
- Inviting users to the channel
- Create callback list from defined groups of people (X calls in channel, missed calls in any period of time, people who meet criteria from CRM, …)
Later on, active support for daily planning or team planning should be possible.
Voice vs. Chat
As a telephony product, the focus is clearly on speech. Therefore, the bot should be “callable” and possibly even be accessible outside the CLINQ interface. Fortunately Google offers the possibility to receive an audio stream and keep all other functionalities.
Of everything we set out to do, we always ship small features, in line with the Lean Thinking method. This way we quickly get a feeling for what works and we learn quickly how to use it. The first thing we did was to answer questions about CLINQ. Goal: Without a big backend ping-pong, we simply answered simple user statements directly from the Dialogflow interface.
The Dialogflow Flow
We divide communication into two types:
- User statements can be done by “simple” answers.
- Responses to user statements require additional information or other tasks to be performed.
In the simpler version, the CLINQ user opens an overlay and calls the CLINQ Bot. Then the user talks to the bot via our CLINQ interface and makes a statement. The dialog flow agent answers with the stored text response from the corresponding intent.
This flow is particularly suitable for simple question-answer ping-pong dialogues, because the answers can be maintained in the Dialogflow interface.
If, on the other hand, user statements cannot be answered with previously defined responses or if further logic is to be triggered elsewhere (e.g. create a calendar entry), a further step in the process is required. For such intents, the toggle “Enable webhook call for this intent” must be activated in the Dialogflow Interface (and of course the corresponding configuration in the menu item “Fulfillment”). After that, the Dialogflow Agent sends a webhook to the CLINQ webhook service if the intent is enabled. More about Fulfillments
The CLINQ Use Case
At the beginning the new Voice Bot should only replace our antiquated Audio Test Bot. In the meantime our CLINQ Bot is contributing more and more to the complete Voice Assistant for CLINQ. We started with answering questions about CLINQ:
- What is CLINQ?
- What makes CLINQ different from sipgate team?
- What does CLINQ cost?
- How much does CLINQ Premium for X users cost?
- … and so on.
Currently, the bot displays additional information in the interface. We use this to help explain the interface or to navigate to the required section.
Next, everyday tasks should be able to be performed via voice commands:
- Set me to status absent.
- Change the opening hours of the channel.
- Remind me at XX o’clock to call XXXX
In the next part I will go into more detail about the implementation of our use cases. I will show you examples and I will help you to get started with some best practices. If you like the topic and want to learn more about it, just write in the comments.