Blog / December 12, 2016
Always-on, always-listening solutions expand their user base into hearables and headsets this year with Amazon Echo Buds and Apple AirPods Pro, shaping the market for rapid growth.
Battery life is a key concern for these tiny battery-operated designs that are constantly listening for wake words such as ‘Alexa’ or ‘Hey Siri.’ Consumers want longer battery life in these devices without compromising performance, but there is always a trade-off between battery life and wake word detection performance when striving to achieve a sophisticated user experience with minimal wake word misses and false accepts. Traditional always-listening systems continuously monitor for speech activity to achieve good wake word detection performance but consume significant amounts of power to run the Voice Activity Detect(VAD) algorithms. This case study shows empirical evidence for how Vesper microphones meet the best of both metrics for always-listening headset designs, paving a new path for this technology moving forward.
Contrary to alternate listening systems where capacitive microphones are continuously running Voice Activity Detect (VAD) algorithms to listen to the wake word, Vesper’s ZeroPower ListeningTM (ZPL) solution wakes up the rest of the system to perform VAD and wake word detection only when there is sound activity in the environment. A single Vesper VM1010 microphone with ZPL can be used to trigger the voice system on a chip (SoC) or microcontroller (MCU) which then wakes up the other digital microphones in the array to capture the wake word. This ability to toggle between the states based on the sound level allows the rest of the system to be in deep sleep mode with a mere 10 µA current consumption by a ZPL microphone. However, the only caveat here is the other microphones in the array should be fast enough to wake up from sleep in order to capture the wake word right from the first syllable.
The remainder of this article explains how Vesper’s implementation achieves this feat to improve wake word detection performance while also extending the battery life of the device. To aid in the study, data measured on Shenzhen Horn LuduanV4 headset with the “Alexa” wake word will be used. LuduanV4 is the first Alexa Mobile Accessory (AMA) certified “Alexa” always-on voice operated headset with Vesper ZPL microphone. It uses an Ambiq Apollo3 MCU with Bluetooth Low Energy combined with Alexa wake word detection from sensory and voice user interface (UI) algorithms from DSP Concepts. The LuduanV4 headset uses a neck band design with microphones positioned on the left side.
The original design on the headset uses one ZPL microphone combined with two digital capacitive MEMS microphones for beamforming and noise suppression. A comparison of results with the original system replaced by a Vesper solution with two VM3000 digital microphones is provided. Measurements are done inside an office type room in a quiet environment with an ambient noise floor of 30 dBA. A Head and Torso with mouth Simulator (HATS) is used to play back Alexa voice utterances at speech levels ranging from 75 dB – 99 dBSPL. A picture of the device on the HATS system is used to play back the speech utterances as shown in Figure 1 below.
A 30 second interval between each utterance ensures the complete system including the ZPL microphone is switched to sleep mode after wakeup. A False Response Rate (FRR) metric is used to measure the number of missed wake words per total wake words spoken. The lower the FRR, the more accurate the wake word detection performance. VM1010 can be configured to wake up to any sound level that exceeds a pre-defined value between 65-89 dB using an external resistor. The Luduan headset uses a 65-dB value for this threshold.
Impact of ZPL on wake word detection performance
A device integrated with a ZPL microphone will be in sleep mode when the peak background noise level is below 65 dB. When the trigger word is spoken in a silent environment, the ZPL microphone must switch from wake-on sound mode to full power mode to capture the trigger word and send an interrupt to the processor. Silent conditions, therefore, are a worst-case scenario for ZPL to operate. In a noisy environment where the sound level is above the set threshold, the microphone is already switched to full power mode and streaming audio to the system just like any other microphone in the array. This means that there is no impact on wake word detection performance in this scenario. The study therefore focused on wake word detection in silence. Figure 2 shows the FRR performance of a device with and without ZPL in the quiet environment.
Thanks to the ultra-fast startup time of piezo microphones, VM1010 can wake up to normal mode sensitivity within 200 µsec, enabling fast wakeup without missing a significant portion of the first syllable of the Alexa wake word. ZPL has a minor performance impact on wake word detection for all SPL levels. At normal talker levels around 89 dBSPL, both systems perform well with only a slight performance degradation with the ZPL version. Below 80 dBSPL, where the speech level resembles that of a soft talker, both systems perform worse in FRR performance. At all sound levels, the small difference seen with ZPL can be considered a shift in the SPL level by 1-2 dB.
Above all, this small performance impact is outweighed by the benefits of battery life savings that a ZPL microphone enables for an always-listening system, as described in the section below.
ZPL with Vesper vs. capacitive microphones
Next, FRR measurements are repeated using two digital piezo microphones instead of capacitive microphones. In both cases, ZPL was enabled for voice wakeup. Figure 3 shows the FRR performance with ZPL enabled together with Vesper vs. capacitive digital microphones.
Vesper microphones provide the best wake word detection performance at all sound pressure levels. FRR performance with a Vesper system meets the Alexa Voice Service (AVS) certification requirement of 10% at 89 dBSPL normal talker level.
On the other hand, capacitive microphones fail to meet the requirement when used in combination with ZPL technology. It’s worth noting that the Vesper combination wakes up the system accurately more than twice as often as capacitive microphones. With all the tests done in a controlled environment and under the same system conditions, the results prove that the fast startup of the Vesper microphone solution is a significant factor to achieve better wake word detection performance. Startup time of 200 µsec on Vesper microphones compared to 20 millisecond wakeup delay on a capacitive microphone degrades the wake word detection performance of the overall system.
In a Vesper system, VM1010 wakes up on an acoustic event and then sends an interrupt to the processor. That then enables the clock signal to wake up the rest of the system, including the digital microphones in the design. In a system like the Luduan headset where the Ambiq MCU also has a fast startup time, the slow startup on the capacitive microphones penalize the processor to wait for the microphone signal to process the stream. As the clock is supplied to the capacitive microphone, lack of PDM data from the microphone results in a long pop noise, increasing the probability of missing the wake word with capacitive microphones. In the case of the VM3000, the microphone data becomes valid within 200 µsec, making the audio stream available for immediate processing. The Vesper system therefore improves the wake word detection performance.
Battery life improvement with ZPL
Power consumption of an always-listening system is dominated by the number of microphones in the system and the power consumed by the processor for VAD, wake word detection and audio streaming over Bluetooth connection. A ZPL microphone only consumes 18µW in listening mode, which is one-fifth of the standby mode power of capacitive microphones. The power savings obtained from ZPL are directly proportional to the amount of time the device is in wake on sound (WoS) or sleep mode.
The smaller the battery capacity of a device, the more burden an always-listening system adds to the device. For example, an always-listening implementation that typically consumes 5mW with a 2-mic configuration is a huge bottleneck for a system that runs on a 2mW average battery power vs. a system with 10mW average power rating. ZPL provides a significant power advantage in the former case given the device will be put to sleep mode for a longer duration, saving the power consumption.
In a hearable device where the battery size is usually limited by the small form factor design, ZPL therefore adds significant savings in power consumption. Our datalogging studies have shown that a hearable device will be in WoS mode for at least 60% of the time in a home/office environment where the background is typically quiet. The processor can also toggle the mode pin on the microphone to limit the amount of time a device stays in full power mode. The shorter this hold time, the better the battery life.
To complete the study on the Horn Luduan headset, power consumption of the device is measured using a current sensing circuit on two different firmware versions, one with and one without ZPL. Two Vesper digital microphones are used in both versions. In standby mode, the device is sitting idle on HATS, thereby setting the ZPL microphone to WoS mode most of the time. In an active scenario, 25 utterances are played back on the HATS at a speech level of 90 dB(A) at the center of the HATS. The utterances are played back for a duration of 1 hour with 20 second silences between each utterance. The table below shows the battery life of the total solution with and without ZPL in standby and active use case. The system with ZPL provides a 5x improvement in battery life in standby mode and 2x the improvement in typical operating mode compared to a non-ZPL system.
For a hearable or headset device that is heavily constrained on the battery size, ZPL provides significant improvement in operating life without compromising the wake word detection performance. Note that the battery life is proportional to the battery capacity of the device. In this case, the Horn headset uses a total battery capacity of 1258 mWh, whereas in a typical Truly Wireless Stereo (TWS) design such as AirPods with a battery capacity around 93 mWh, the actual battery life will be proportionally lower. But, the ZPL savings would remain the same. However, the savings multiplier may vary depending on the selection of the voice processor, connectivity protocol and the playback system used in the device.
The market for TWS earbuds is growing rapidly and voice is the most natural and seamless interface to activate these devices on-the-go. The trade-off between battery life and voice activation performance compromises the user experience and presents a major challenge for the mass market adoption of voice activation in headsets. Vesper’s ZPL solution provides extended battery life in standby and typical use without compromising on the wake word detection performance to provide a frictionless customer experience.