Compressed computations using wavelets for hidden Markov models with continuous observations

L. Bello, J. Wiedenhöft and A. Schliep

PLOS One 2023, 6:18, e0286074.

Compression as an accelerant of computation is increasingly recognized as an important component in engineering fast real-world machine learning methods for big data; c.f., its impact on genome-scale approximate string matching. Previous work showed that com- pression can accelerate algorithms for Hidden Markov Models (HMM) with discrete observa- tions, both for the classical frequentist HMM algorithms—Forward Filtering, Backward Smoothing and Viterbi—and Gibbs sampling for Bayesian HMM. For Bayesian HMM with continuous-valued observations, compression was shown to greatly accelerate computa- tions for specific types of data. For instance, data from large-scale experiments interrogating structural genetic variation can be assumed to be piece-wise constant with noise, or, equiva- lently, data generated by HMM with dominant self-transition probabilities. Here we extend the compressive computation approach to the classical frequentist HMM algorithms on con- tinuous-valued observations, providing the first compressive approach for this problem. In a large-scale simulation study, we demonstrate empirically that in many settings compressed HMM algorithms very clearly outperform the classical algorithms with no, or only an insignifi- cant effect, on the computed probabilities and infered state paths of maximal likelihood. This provides an efficient approach to big data computations with HMM. An open-source imple- mentation of the method is available from https://github.com/lucabello/wavelet-hmms.

DOI: 10.1371/journal.pone.0286074.

The publication includes results from the following projects or software tools: BayesianHMM.

Further publications by Alexander Schliep, John Wiedenhoeft.