![]() ![]() With the batch-norm folding and int8 quantization of the network, our model achieves the accuracy of 95.36% on Google Speech Command Dataset (GSCD) with only 18 K parameters and 461 K multiplications. In this article, we propose a small-footprint model based on a modified temporal efficient neural network (TENet) and a simplified mel-frequency cepstrum coefficient (MFCC) algorithm. However, it remains challenging to achieve a trade-off between a small-footprint model and high accuracy for the edge deployment of the KWS system. In recent years, temporal convolutional networks (TCNs) have performed outstandingly with less computational complexity, in comparison with classical convolutional neural network (CNN) methods. Keyword spotting (KWS) plays a crucial role in human–machine interactions involving smart devices.
0 Comments
Leave a Reply. |