B. Georgi, J. Schultz, and A. Schliep
Knowledge Discovery in Databases: PKDD 2007, Pages 79-90, Springer Berlin / Heidelberg, 2007
Protein families can be divided into subgroups with functional differences. The analysis of these subgroups and the determination of which residues convey substrate specificity is a central question in the study of these families. We present a clustering procedure using thecontext-specific independencemixture framework using a Dirichlet mixture prior for simultaneous inference of subgroups and prediction of specificity determining residues based on multiple sequence alignments of protein families. Application of the method on several well studied families revealed a good clustering performance and ample biological support for the predicted positions. The software we developed to carry out this analysisPyMix - the Python mixture packageis available fromhttp://www.algorithmics.molgen.mpg.de/pymix.html.