Alexa, What Did I Say Last Week? 🗣
A common term for the rapidly spreading Amazon Echo and Google Home voice assistants is smart speakers. It’s a bit ironic that we use “smart speakers” — these devices spend most of their time listening.
The number of active smart speakers has grown consistently since the Amazon Echo launched in 2014, and the total number of installed devices is expected to overtake tablets by 2021. By looking at sales alone, these devices are a huge success, yet the usage story remains unclear. The devices have made their way into our homes, but the current capabilities are … underwhelming.
We use these devices to set timers, play music, and check the weather. Sure, there are other functions, but it’s hard to know which commands will return a result and which return the frustrating error message.
I’m sorry, I’m not sure what you meant.
Defining a new interface requires setting boundaries.
For Amazon, Google, and Facebook, the speaker represents an extension of reach outside their existing channels. Less than a year ago, the Wall Street journal reported that Alexa had over 10,000 employees working on the product, and that number had doubled from the previous year. By comparison, the accolade winning Apple Watch design team had 28.
While the Alexa figure likely contains product managers, engineers, and researchers, the scale of the investment is still interesting. If they aren’t creating new features, then what are they doing?
Why does winning the home matter?
Voice represents a growing interface, and the success of smart devices will be measured by their influence in shaping how we interact with this new medium, rather than by hardware sales alone.
When addressed by a reporter who criticized the iPhone for the unfamiliar touchscreen keyboard and frequent typos, Steve Jobs famously replied, “Your thumbs will learn.” This was the benefit of pioneering a completely new interface: product design defined the way that we used the device.
Smart assistants do not have this luxury. Human beings use verbal communication as our primary method of exchanging information, and we expect devices to adapt to our everyday language. This includes nuanced layers of context and non-verbal cues that we take for granted, but are fundamental to our interactions. Unlike previous product innovations offered by smart phones and tablets, we are intimately familiar with how verbal communication should operate. Today’s “smart” devices are clunky by comparison.
Device companies have been experimenting with how to explain what the devices’ capabilities are. Consumers are trained with prompts (trying saying “Hey Alexa, what can you do?”), however we don’t know when a command is possible, but not yet built, versus not possible ever with this interface.
Alexa is the name of the device in your home, but it’s also the social cues that you use when talking to it. You know that asking Alexa to check the weather gives one response, but asking Siri gives you another.
How you’re asking matters as much as what you’re asking for.
The voice command is both the user interface and the user experience, packaged into an inconspicuous cube. Winning the home means winning the largest market for hands-free interaction. Being at home removes the social anxiety that acts as a barrier to using voice commands in public. As long as the smart devices are solving a user problem in a way that encourages conversation, the learnings will be transferred to other product development areas.
Who is listening?
As companies look to monetize this channel, it will be incredibly tempting to sacrifice privacy in the name of user experience. The mechanism that allows these devices to recognize human language represents a rapidly evolving field of computer science, and messages are often subject to manual review by humans.
Roughly 1 in 500 interactions with home assistants are reviewed by humans, and the content of the interaction often contains revealing information that raise a host of privacy and ethics issues. Recently, an employee of a company hired by Google to review audio clips detailed their experience with this process:
[…] according to the Google employee, he sometimes also gets to hear very personal voice commands, for example men who instruct Google to search for porn.
But not only voice commands, but also personal conversations from Dutch users are included. The NOS has received two telephone calls and a domestic conversation in which it can be heard that a woman asks a child if he "still has a big mouth". How they ended up with Google is unknown. […]
"I remember a long excerpt in which I felt that physical violence was involved, in which someone was audibly in real need with a lot of drumming in the background.
In order to operate, the devices use speakers to listen for keywords like “Hey Alexa” or “Okay Google”, after which they begin recording and processing commands. However, there are clearly situations where the speakers are recording interactions that were not intended to be voice commands. The expectation of privacy within your home decreases drastically when you intentionally install a device that, by design, listens to every word you are saying.
A user from Hacker News, a prominent technology forum, offers an interesting way to breakdown the common responses from consumers to this violation.
If you have installed a smart speaker in your home, you need to carefully consider the tradeoffs you are making for the sake of convenience. We’ve established that the goal of these companies is not to simply sell devices, but instead to shape our daily interactions with technology. As they seek to evolve these products into truly seamless voice assistants, they will not be constrained to simply listening after keywords, but will use any information available to get the job done.
Rules for time travelers from a Caltech physicist. Link.
Neuralink is live-streaming the first public progress update on their brain computer interface (BCI) device this Tuesday. If you are in San Francisco, you can sign up for the event here. Link.
For the first time, an AI bot is able to beat professionals at poker. The bot was built by researchers at Facebook, and uses self-play to train itself on the rules of the game. Link.
The New York Attorney General’s office has filed a memorandum of law against Bitfinex and its stablecoin Tether. Since the original court order, Bitfinex has continued operations and sold over one billion LEO, a new unregistered security token. Link.
An analysis of dark design patterns jointly created by researchers from Princeton and University of Chicago. Hint: when you go to a website and it says “Hurry! 6 other visitors viewing this item!”, it’s fake and designed to make you impulse buy. Link.
40% of couples in the us met online. This number increases to 65% for same sex couples. Link.
Google is laying an undersea cable from Portugal from South Africa. It’s funny how the jobs of the biggest companies in the world are starting to reflect our governments - first they go after currency, now infrastructure. Link.
A seismologist claims that there is a correlation between a disturbance in a specific radio frequency and the recent Ridgecrest earthquakes. If true, this is the holy grail for earthquake detection; observation of the disturbance came hours before the quakes that struck Southern California during the holiday weekend. Link.
Book Recommendation: Slaughterhouse-Five by Kurt Vonnegut. A tale of Billy Pilgram: a man that has come unstuck in time, but keeps finding himself back reliving the firebombing of Dresden. A famous anti-war book written in a way that uniquely demonstrates Vonnegut’s sarcastic style.
Have an idea for a future topic? Send me an email at email@example.com
Follow me on Twitter and Medium.
Not a subscriber yet?