Do you really want to talk to your house? Soon you should be able to. According to Josh Stene, director of product management for home- control company Crestron, Google has released a web-based API (application programming interface) for developers of speech-recognition technology, and Microsoft is working on web-based voice solutions. We also recently featured an example of the Apple iPhone’s Siri voice-activated assistant executing commands to a Crestron system.
Stene recently instructed a CEDIA (Custom Electronic Design & Installation Association) webinar titled, Where Have all My Buttons Gone?: The Changing Landscape of Control with Voice and Gesture Recognition, and outlines several points to consider if you want speech recognition in your home. (Stene also covered gesture control, but that is still very much in its infancy.)
Quality microphones required for speech recognition and faster processors are everywhere, Stene says. And that is a good thing for speech recognition. Stene’s presentation is primarily intended as a guide for custom electronics (CE) pros to design and install speech recognition voice control in homes—so here is what you should look for in a home voice control system of your own:
1.What you or your CE pro will need:
– A mic or device with an embedded mic.
– Processing engine like a PC.
– Access to speech recognition library and API (for a developer or your installer)
– Interface to the device you want to control.
2. Look to the Cloud.
Stene favors cloud-based speech recognition technology, because he says the server-based software algorithms can continue to evolve and improve by gathering more and more voice data. Meanwhile, application-based services with embedded libraries will be outdated more quickly.
3. Pick a good environment.
Good environments for speech recognition:
– Quiet rooms and areas.
– Small rooms.
– Places where the speaker’s position is consistent to the mic.
Bad environments for speech recognition:
– Rooms with many reflective surfaces, like bathrooms
– Places with background noise.
– Places where many people may be speaking at once.
– Places where multiple languages are spoken.
4. Use the right microphones.
Getting that handheld control or mic close to you is critical to getting spoken information into the processor, Stene says. If possible, use a mic with focused pickup pattern, he advises, as omidirectional mics pick up noise from any direction. Multi-microphone arrays and beamforming technologies that focus on sounds from certain areas may also be helpful.
5. DSP is a must.
You should also only use one mic at a time, and DSP (digital signal processing) technology, also required for proper speech recognition, should include noise gates that only turn on a mic when real data (a voice command) is present, as well as noise-cancellation technology to eliminate steady-state noise like low machine rumbles. Equalization should also be used to enhance intelligibility and filter out non-voice content.
6. Don’t use speech recognition for everything.
“Some things are just better with buttons,” Stene says. You won’t want to use voice commands for on-screen guide functions (like page or channel up) and multistep commands. “You want it for things that people are doing all the time and preset functions like “Goodnight, home.” Also, keep the command words short, like “Hall lights on.”
7. Have overrides.
You’ll also want to have the ability to turn off speech if you like, and have a way to modify commands or add new presets or macros.
8. “I can see you’re really upset about this, Dave.”
In the future, or perhaps very near future, we’ll also see NUIs, or Natural User Interfaces, such as facial recognition, that can not only identify the user, but that person’s mood, the time of day and other factors to select just the right music, for example. Toshiba and Samsung recently showed facial recognition applications at CES, but it’s still early in the game, even for speech recognition. “To get that 2001: A Space Odyssey HAL experience—that’s still very difficult to do,” Stene says.
Okay, okay, here’s what Crestron is also doing with gesture control via Microsoft Kinect: