EUROSPEECH '97

We describe a subvector clustering technique to reduce the memory size and computational cost of continuous density hidden Markov models (CHMMs). Acoustic models in modern largevocabulary, continuous speech recognition systems are typically CHMMs. Systems with 100,000 Gaussian distributions of 4060 dimensions are common, needing several tens of MB of memory. Computing HMM state likelihoods is several tens of times slower than real time. We show that by clustering and quantizing the Gaussian distributions a few dimensions at a time, both computation and memory costs can be reduced several fold without significant loss of recognition accuracy. On the 1994 Wall Street Journal 20K test set, this technique reduced the acoustic model size by a factor of 910, and HMM state output likelihood computation time by a factor of 45.
Bibliographic reference. Ravishankar, Mosur / Bisiani, R. / Thayer, E. (1997): "Subvector clustering to improve memory and speed performance of acoustic likelihood computation", In EUROSPEECH1997, 151154.