Robust inference of time-courses using mixtures of Hidden Markov Models: Supplemental Material

Alexander , Christine Steinhoff, Alexander Schönhuth


GQL

The GQL-software, licensed under the GPL, is available at http://ghmm.org/gql. It requires a couple of other packages, most importantly Python and the GHMM.


Simulated Data

The simulated data simulated.txt.gz contains in a tab-seperated ASCII-file the following data

If a header line of the form "Gene name Accession t1 t2 ..." is prepended the file can be read by Caged.


Results: Whitfield Data

The detailed annotation for clusters shown in Fig. 2 in the paper. The clone ID of time-courses used as labeled data in the partially supervised context are displayed in bold face (all browsers) and with a light-grey background (Internet Explorer does).

                    
             
            
           
                
                     
CLONEIDACCLLIDPhaseOur IDCluster
IMAGE:898286AA598974G25(a) phase 1
IMAGE:66406T6693655355G212(a) phase 1
IMAGE:712505AA278152G225(a) phase 1
IMAGE:1540236AA93618155355G256(a) phase 1
IMAGE:1536451AA919126G283(a) phase 1
IMAGE:460438AA67755255247G2108(a) phase 1
IMAGE:769921AA43050411065G23(a) phase 1
IMAGE:825470AA5043487153G26(a) phase 1
IMAGE:366971AA0266827153G28(a) phase 1
IMAGE:292936N6374455143G210(a) phase 1
IMAGE:455128AA676797899G211(a) phase 1
IMAGE:301388N79504G216(a) phase 1
IMAGE:146882R8099011065G222(a) phase 1
IMAGE:131316R22949G227(a) phase 1
IMAGE:246808N5321455655G228(a) phase 1
IMAGE:200402R9699881610G238(a) phase 1
IMAGE:281898N533081163G259(a) phase 1
IMAGE:1035796AA628867G263(a) phase 1
12192535AA71902284914G270(a) phase 1
IMAGE:703633AA2786291163G279(a) phase 1
IMAGE:824913AA489023G280(a) phase 1
IMAGE:1694526AI12408255771G290(a) phase 1
IMAGE:129961R192673833G297(a) phase 1
IMAGE:430973AA678348G299(a) phase 1
IMAGE:1456207AI79135651343G2101(a) phase 1
IMAGE:30170R42530836G2105(a) phase 1
IMAGE:2062329AI3372927272G2/M62(a) phase 1
                    
IMAGE:882510AA6764603838G29(a) phase 2
IMAGE:788256AA4540989493G223(a) phase 2
IMAGE:951241AA62048551203G234(a) phase 2
IMAGE:825606AA5047193832G236(a) phase 2
IMAGE:810209AA46452189987G240(a) phase 2
IMAGE:824962AA4890873838G243(a) phase 2
IMAGE:71902T5215226586G257(a) phase 2
IMAGE:461933AA77994951203G271(a) phase 2
IMAGE:42831R60197G281(a) phase 2
IMAGE:1486028AA912032G294(a) phase 2
IMAGE:950690AA608568890G22276(a) phase 2
IMAGE:725454AA2929641164G2/M13(a) phase 2
IMAGE:129865R19158G2/M1(a) phase 2
IMAGE:727526AA4118501062G2/M7(a) phase 2
IMAGE:209066H634928465G2/M20(a) phase 2
IMAGE:435076AA7014551063G2/M24(a) phase 2
IMAGE:115443T8744251530G2/M29(a) phase 2
IMAGE:194656R84407G2/M33(a) phase 2
IMAGE:2017415AI3696291058G2/M37(a) phase 2
IMAGE:795936AA4609277247G2/M42(a) phase 2
IMAGE:705064AA27999010460G2/M48(a) phase 2
IMAGE:243135H95819G2/M67(a) phase 2
IMAGE:743810AA63437183461G2/M74(a) phase 2
IMAGE:431242AA682533G2/M86(a) phase 2
IMAGE:814995AA465090G2/M87(a) phase 2
IMAGE:590253AA14779255632G2/M91(a) phase 2
IMAGE:2327739AI69302351203G2/M100(a) phase 2
IMAGE:825228AA50438926586G2/M104(a) phase 2
IMAGE:264502N20305G2/M121(a) phase 2
IMAGE:234045H6698255632G2/M165(a) phase 2
                    
IMAGE:213824H72444G1/S35(a) phase 3
IMAGE:744047AA6292625347G2/M2(a) phase 3
IMAGE:590774AA1581695603G2/M4(a) phase 3
IMAGE:232837H7396822974G2/M14(a) phase 3
IMAGE:781047AA446462699G2/M15(a) phase 3
IMAGE:359119AA0100651164G2/M17(a) phase 3
IMAGE:610362AA171715G2/M26(a) phase 3
IMAGE:759873AA42394410234G2/M30(a) phase 3
IMAGE:898062AA598776991G2/M31(a) phase 3
IMAGE:645565AA204830G2/M45(a) phase 3
IMAGE:2019372AI36928451512G2/M46(a) phase 3
IMAGE:1540227AA93618322974G2/M47(a) phase 3
IMAGE:842968AA488324701G2/M50(a) phase 3
IMAGE:128711R1226154443G2/M52(a) phase 3
IMAGE:853066AA6682569918G2/M54(a) phase 3
IMAGE:429323AA007395127G2/M72(a) phase 3
IMAGE:435334AA699928 G2/M85(a) phase 3
IMAGE:48398H14392994G2/M89(a) phase 3
IMAGE:50787H168336715G2/M96(a) phase 3
IMAGE:2308994AI65470722974G2/M127(a) phase 3
IMAGE:50615H175133305G2/M140(a) phase 3
IMAGE:51532H2055823204G2/M18(a) phase 3
IMAGE:511096AA088458G2/M51(a) phase 3
IMAGE:796694AA460685332G2/M53(a) phase 3
IMAGE:415089W933794751G2/M60(a) phase 3
IMAGE:510228AA0535564288G2/M61(a) phase 3
IMAGE:511786AI73241210595G2/M76(a) phase 3
IMAGE:511967AI73241610595G2/M92(a) phase 3
IMAGE:1646048AI03157155710G2/M110(a) phase 3
IMAGE:121857T9734910615G2/M125(a) phase 3
IMAGE:2307015AI65229010615G2/M154(a) phase 3
IMAGE:856289AA7746659133G2/M2275(a) phase 3
IMAGE:810600AA46401927338M/G121(a) phase 3
IMAGE:209383H640963312M/G1181(a) phase 3
                    
IMAGE:531402AA0759204174G1/S78(b)
IMAGE:68950T54121898G1/S32(b)
IMAGE:126650R0694451514G1/S19(b)
IMAGE:565734AA135809G1/S41(b)
IMAGE:236142H61303G1/S49(b)
IMAGE:418150W9016451514G1/S58(b)
IMAGE:789182AA4502645111G1/S77(b)
IMAGE:704410AA279658G1/S158(b)
IMAGE:43229H130045111G1/S2273(b)
IMAGE:204214H59203990G1/S39(b)
IMAGE:280375N4711329028G1/S109(b)
IMAGE:1475463AA85780463967G1/S130(b)
IMAGE:297178W0397979733S phase69(b)
IMAGE:1579997AA93490479075S phase139(b)

  

The following Figures display the time-courses found: Fig. 1 depicts all time-course in cluster (a) and the result of the Viterbi-decomposition into phases 1-3 (see Fig. 2-4). The last figure shows cluster (b).

Cluster (a)
Fig. 1 Log-ratio (y-axis) over time (x-axis): cluster (a) all phases
Cluster (a) phase 1
Fig. 2 Log-ratio (y-axis) over time (x-axis): Viterbi decomposition of cluster (a), phase 1
Cluster (a) phase 2
Fig. 3 Log-ratio (y-axis) over time (x-axis): Viterbi decomposition of cluster (a), phase 2
Cluster (a) phase 3
Fig. 4 Log-ratio (y-axis) over time (x-axis): Viterbi decomposition of cluster (a), phase 3
Cluster (b)
Fig. 5 Log-ratio (y-axis) over time (x-axis): cluster (b)


Partially Supervised Robustness

Partially supervised learning
Fig. 6 Sensitivity and Specificity of a partially supervised clustering procedure vs. percentage of labeled data

  

In Fig. 6 we depict the clustering quality versus the proportion of data labeled in the input. Artificial data was generated from 8 multi-variate normal distributions with uniformly and randomly chosen mean vectors and identical covariance matrices proportional to the identity. A model-based clustering algorithm using Multivariate Gaussians with prescribed covariance matrices was modified for partially supervised learning. We ran 10 repetitions each for increasing amounts of labeled data covering at most half of the clusters. Specificity and sensitivity are used as measures of clustering quality.


Mixture Robustness

To compare the robustness of the model-based clustering using HMMs cluster models with the robustness of estimating a mixture of HMM components we performed the following experiment.

The results, see Fig. 7, show a clear advantage of the mixture estimation.

Robustness of clustering vs. mixtures
Fig. 7 On the y-axis the squared estimation error summed up over all states and all models and averaged over the 20 samples is plotted against different levels of added noise ( N(0,sigma) , 0.1 < sigma < 1.5 in increments of 0.1). A Wilcoxon test comparing deviation of estimated model parameters from their true value showed significant lower estimation error for mixture clustering.