Max Planck Institute for Molecular Genetics

 Department of Computational Molecular Biology

Home page

NOTE: We moved August 2009 to http://bioinformatics.rutgers.edu.

Home page  Contact us  Site map 

 

 

 

Partially-supervised context-specific independence mixture modeling

B. Georgi and A. Schliep

workshop on Data Mining in Functional Genomics and Proteomics, ECML 2007, 2007

Partially supervised or semi-supervised learning refers to machine learning methods which fall between clustering and classification. In the context of clustering, labels can specify link and do-not-link constraints between data points in different ways and constrain the resulting clustering solutions. This is a very natural framework for many biological applications as some labels are often available and even very few labels greatly improve clustering results. Context-specific independence models constitute a framework for simultaneous mixture estimation and model structure determination to obtain meaningful models for high-dimensional data with many, possibly uninformative, variables. Here we present the first approach for partial learning of CSI models and demonstrate the effectiveness of modest amounts of labels for simulated data and for protein sub-family determination.