Whether to cluster at all, which clustering method to use and how many clusters to choose are pressing questions in bioinformatics. Mostly, decisions are made by users of clustering software based on experience guided by benchmarking or indicators for reliability of solutions or model-fit. However, as clustering algorithms always produce solutions, often inappropriate methods or parameters are used and invalid results produced.
In the previous context, meta-learning approaches have arisen as effective solutions, able to automatically predicting algorithms performance for a given problem. Thus, such approaches could support non-expert users in the algorithm selection task. As pointed out in, there are different interpretations for the term meta-Learning In our work, we use meta-learning meaning the automatic process of generating knowledge that relates the performance of machine learning algorithms to the characteristics of the problem (i.e., characteristics of its datasets).
So far, in the literature, meta-learning has been used only for selecting/ranking supervised learning algorithms. That is, up to now, there no such an approach for the context of clustering algorithms (i.e., unsupervised learning). Motivated by this, we extend the use of meta-learning approaches for clustering algorithms. We develop our case study in the context of clustering algorithms applied to cancer gene expression data generated by microarray.
More information at the Project Page. Joint work funded funded by CAPES (Brazil) and DAAD (Germany) under the program Probral.
de Souto, M. C. P. and Araujo, D. A. S and Costa , I. G. and Soares, R. G. F. and Ludermir, T. B. and Schliep, A.. Comparative Study on Normalization Procedures for Cluster Analysis of Gene Expression Datasets (2008) [details]
de Souto, M. C. P. and Prudencio, R. B. C. and Soares, R. G. F. and Araujo, D. A. S and Costa , I. G. and Ludermir, T. B. and Schliep, A.. Ranking and Selecting Clustering Algorithms Using a Meta-learning Approach (2008) [details]
de Souto, Marcilio and Costa, Ivan G. and de Araujo, Daniel and Ludermir, Teresa and Schliep, Alexander. Clustering cancer gene expression data: a comparative study (2008) [details]
Costa, Ivan G. and de Souto, Marcilio C. P. and Schliep, Alexander. Validating Gene Clusterings by Selecting Informative Gene Ontology Terms with Mutual Information (2007) [details]