Acerca de

An Alexa voice skill recites haiku's upon request

The Run Down...

This was my very first interaction with voice skills, so a lot of what I include in this brief isn't required to build a skill, but it is pertinent to learning to build skills in the future. I really had to peruse the process before diving in. I'm not going to lie, I was extremely intimidated. But as I got my bearings, I fell in love with the process of designing for voice interfaces.

Learning...

I learned Alexa uses Javascript and Node.js to build their skills.

Node.js allows you to run and write Javascript on a server.

Javascript is a core programming language used by software engineers to make a website function.

Essentials of Javascript I needed to know...

While I didn't need to know ALL the things, I did need to know the essentials of variables and their values.

I felt it important to start with the basics. What is a variable? Are you having flashbacks to high school math equations and either panicking or are already skipping forward because you want nothing to do with this?

DON'T WORRY, these are not the same.

A Variable is a container to house or label a particular type of data either to be stored, referenced or manipulated throughout the program.

String Variable, are made up of characters which can include letters, numbers, phrases, or symbols.

Constant Variable, The abbreviation const at the beginning is indication of a constant variable. It's utilized when the value of the variable doesn't change over the course of the programs run time.

Other Variables- there are several other variables, but I didn't need to know them for this; and as with all things in life, I will learn them as I come to them.

Here are the details of punctuation and indentation utilized specifically for Javascript and this task.

Understanding Voice User Interfaces...

In order to build a voice skill, the first step was researching voice interfaces to get a feel of what works with them, what doesn't, and ask around to see what voice interface my people preferred and why.

I chose

voice interfaces to analyze. I then asked

questions for each to get a clear picture of what they provided.

The

voice interfaces

Amazon
Alexa

Siri

Google
Maps

Cortana

Side
Chef

The

questions

1. What is the general purpose of the app? Who is the intended audience?

2. How does the app use sound to appeal to users? Does it combine with another interface?

3. What happens if you're not understood by the voice system, or if you don't say anything? Does this change the different parts of the interaction?

4. In what way does this impact user experience?

5. Was the voice interface successful at accomplishing what the app was intended to do? Why or why not? Was it more successful than a visual interface would have been?

If you want the details of each voice interface breakdown, feel free to scope it out here!

The Overall Consensus...

favored option

For voice skills such as Siri, Google maps etc. the user has the option to change language, dialect and sex of the voice speaking. It doesn't automatically update to your location, it allows for you to cater to your specific needs. Perfect for traveling or living in a foreign land.

As well, the different dialects (Australian vs American) and choice of different sexes cater to the user's comfort zone and creates different tones for those who are restricted in hearing.

not optimal

The designers of Side Chef had a brilliant idea of having a sort of assistant in the kitchen. In my experience though, the amount of background noise that occurs while cooking was not taken into consideration.

On numerous occasions while attempting to use the app, I had to repeat myself, was misunderstood or mid step the app shut down altogether prompting me to start again.

Not ideal while in the middle of cooking a recipe.

As well, a lot of people cook with others in the kitchen, chatting about their day or kids doing homework etc. It seemed as though while using the app, the voice interaction worked best when only one person was in the space and no other real noise was happening (no music, no sizzling steaks, water running or other humans).

Understanding how Alexa voice interactions work

There is so much that goes into this, and for those already in the world of voice design, probably not that much. But I am approaching this as though I am explaining my process to someone completely new to this, so bare with...

Speech Recognition

Converts the speech signal to a string of words.

Natural language
understanding

Identifies meaning in the string.

"Alexa, play my favorite playlist."

Dialog Management

Looks up information; triggers reaction(s)

Dialog Response

Produces

the output.

"Playlist, Favorites, on Amazon Music"

Utterance

Response can be anything from verbal to visual to action.

Deconstructing A Demo Skill

The break down of a demo skill involves the utterance, response, intent, slots, inference and keeping notes for the software engineer, and anyone else involved in the development of the skill.

Utterance= phrases which express different intents and activate them in the skill.

Response= Skills reaction to the utterance

Intents= what the user is able to perform with the skill.

Slots= Set of items which can be used in an intent (ex: cities, states & locations in weather skill)

Inference= Assumption made based off the skill's purpose and utterance spoken.

Notes= Additional information needed to understand the breakdown of the skill.

Creating a haiku reader skill for Amazon Alexa...

Happy Haiku

The skill required to be created for this case study is a Haiku reader which Amazon's Alexa recites to you upon request.

Project Brief

For the purpose and creation of my own skill with Alexa, I created a list of utterances which can be used to request Alexa read the Haiku skill aloud to the user.

AWS Lambda template used to create phrases

Creating Haikus

When designing a skill, the more specific the response the more effort you must put into the optional responses given. This was a project for educational purposes, but I wrote out thirty Haiku's to ensure that Alexa has a variety of responses to use. If I left it to a minimal amount, I risk the skill repeating the same Haiku twice in a row.

Examples of the Haiku's, Happy Haiku can read back to the user...

Cottons Candy Skies

Tetons rise to meet the sky

Inspired by thee.

Nursing trees grow new

Layers of moss on moss green

Fungi thrives on rain.

Rainbow glacier melts

Cerulean waters clear blue.

Mountains beg it back.

Remaining Haiku's

Testing our Skill...

I worked with three different participants in testing their use with the Alexa skill and getting a response back. I felt it important to test user's with different tones in their voice. I chose a child and both a female and male to test the skill with so that I could make adjust what is needed with phrases and utterances per user.

After testing the skill, and gauging the user's interactions. I was able to eliminate utterances the user could use which I found were pointless in that they were not communicated clearly once spoken. As well, I was able to add utterances the user attempted to use when testing the skill.

Additional Utterances for Happy Haiku.png

Interviews

Submitting a skill for certification...

When submitting a skill with Amazon's Alexa, it needs to be submitted with a basic description, including name, example phrases which can be used, the category in which it will be focused, a logo and an icon.

This skill was not certified due to there not being a need for another Haiku skill.

Haiku Reader is a certified and published Alexa skill used for the purpose of students publishing and working with Amazon's Alexa; I also added this submission to the Haiku Reader skill, and you can hear some of my haiku's upon request.

Final Result

The end result of building a Haiku reader for Amazon's Alexa...