Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2018c.
1.0/zoneinfo/America/Chicago'

Last updated: 2018-06-07

Code version: e30803e


Summary

Motivation: when running CountClut, sample cluster assignment may differ between different random seeds, despite similar cluster characteristics in feature abundancce. This issue is not unique to CountClust. In model-based unsupervised clustering methods, the parameter space is usually complex, and as a result, difficult to find global maximum of the model likelihood.

I run CountClust under 100 different random seeds and then compare Bayes Factors across different random seeds and sample cluster membership. I will use the random seed that generates the largest Bayes Factor in all subsequent CountClust analysis.

This document shows the analysis steps for when K=2. Results show that for when K=2, across 100 random seeds 409 produced the largest BF and is thus chosen as our random seed. This chosen random seed (best seed) is then used in our final results (https://jhsiao999.github.io/nasalmicrobiome/step3_gradesOfMembership.html).

The fitting results for K=2 to 5 under the same 100 random seeds are saved in output/gom-seeds-k-2.rds, output/gom-seeds-k-3.rds, output/gom-seeds-k-4.rds, and output/gom-seeds-k-5.rds.


Package

library(knitr)
library(kableExtra)
library(dplyr)
library(metagenomeSeq)
library(CountClust)
library(parallel)

Loading data

Read in filtered data.

MRobj = readRDS("../data/nasal_filtered_normed_batchcorrected.rds")
MRobj
MRexperiment (storageMode: environment)
assayData: 553 features, 197 samples 
  element names: counts 
protocolData: none
phenoData
  sampleNames: EM0042 EM0047 ... E0168 (197 total)
  varLabels: StudyID age ... infnone (94 total)
  varMetadata: labelDescription
featureData
  featureNames:
    k__Bacteria;p__[Thermi];c__Deinococci;o__Deinococcales;f__Deinococcaceae;g__CM44;s__
    k__Bacteria;p__[Thermi];c__Deinococci;o__Deinococcales;f__Deinococcaceae;g__Deinococcus;s__
    ... s__zeae (553 total)
  fvarLabels: Kingdom Phylum ... Species (7 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  

K=2

100 random seeds

Apply CountClust to fit grades of membership model.

counts <- MRcounts(MRobj,norm=FALSE,log=FALSE)

# get 100 random seeds
source("../code/print-prime.R")
seeds <- prime(543)
res <- vector("list", length(seeds))

for (i in 1:100) {
  set.seed(seeds[i])
  res[[i]] <- FitGoM(t(counts), K=c(2), tol=1e-6)[[1]]
}

saveRDS(res, file = "../data/res.rds")
res <- readRDS(file="../data/res.rds")

Bayes factor differs between the seeds.

summary(do.call(c,lapply(res, "[[", 4)))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
1085545 1379648 1483781 1489853 1601809 1738210 

Choose the seed with maximum BF: 67

which.max(lapply(res, "[[", 4))
[1] 19
saveRDS(res[[19]], file = "../output/gom-seeds.Rmd/res_best_seed_67.rds")

\(~\)

Results using the seed with the highest BF.

res_best <- readRDS(file = "../output/gom-seeds.Rmd/res_best_seed_67.rds")
cluster <- apply(res_best$omega, 1, which.max)

table(cluster)
cluster
  1   2 
167  30 
kable(res_best$omega)
1 2
EM0042 0.9995808 0.0004192
EM0047 0.9963470 0.0036530
E0133 0.9920976 0.0079024
E0197 0.9996540 0.0003460
EM0046 0.9999604 0.0000396
EM0071 0.6772224 0.3227776
EM0029 0.9999471 0.0000529
E0358 0.9922081 0.0077919
EM0086 0.9575558 0.0424442
E0060 0.9953273 0.0046727
EM0068 0.9908933 0.0091067
E0275 0.2459101 0.7540899
E0584 0.9207612 0.0792388
E0119 0.0273439 0.9726561
E0117 0.0001033 0.9998967
E0126 0.2372200 0.7627800
E0684 0.9999425 0.0000575
E0103 0.9208354 0.0791646
E0327 0.9996965 0.0003035
E0089 0.0085725 0.9914275
E0183 0.7016093 0.2983907
E0207 0.9473176 0.0526824
E0050 0.4789709 0.5210291
EM0012 0.9998702 0.0001298
E0399 0.9349670 0.0650330
E0120 0.3211410 0.6788590
EM0060 0.9999982 0.0000018
E0083 0.9648846 0.0351154
E0042 0.8167775 0.1832225
EM0013 0.9999285 0.0000715
E0132 0.9675559 0.0324441
EM0050 0.9591377 0.0408623
E0076 0.8908938 0.1091062
E0106 0.9592985 0.0407015
E0608 0.9913975 0.0086025
E0122 0.3253818 0.6746182
E0250 0.9884623 0.0115377
EM0067 0.9547523 0.0452477
E0686 0.8836826 0.1163174
E0211 0.9999999 0.0000001
E0384 0.9972912 0.0027088
EM0078 0.9833801 0.0166199
E0077 0.9838634 0.0161366
E0652 0.8033168 0.1966832
E0229 0.9999562 0.0000438
E0125 0.0785997 0.9214003
E0487 0.7541912 0.2458088
EM0065 0.9998911 0.0001089
E0129 0.0233961 0.9766039
E0661 0.9286069 0.0713931
E0040 0.5982370 0.4017630
E0393 0.9926429 0.0073571
E0024 0.6732353 0.3267647
EM0080 0.9988929 0.0011071
EM0085 0.7602501 0.2397499
E0019 0.9234444 0.0765556
E0039 0.8607182 0.1392818
E0593 0.3762219 0.6237781
E0049 0.3585597 0.6414403
E0112 0.9994410 0.0005590
E0299 0.9680101 0.0319899
E0165 0.7602161 0.2397839
E0071 0.8999837 0.1000163
E0475 0.8385054 0.1614946
E0489 0.9659420 0.0340580
E0488 0.8928408 0.1071592
E0174 0.4748591 0.5251409
E0537 0.5501408 0.4498592
E0560 0.3165016 0.6834984
E0343 0.7803942 0.2196058
E0577 0.2876980 0.7123020
E0297 0.9773853 0.0226147
E0136 0.7203584 0.2796416
E0149 0.0311061 0.9688939
E0555 0.9681198 0.0318802
E0234 0.4825390 0.5174610
E0100 0.9739838 0.0260162
E0253 0.9972804 0.0027196
E0359 0.8778519 0.1221481
E0329 0.3323981 0.6676019
E0595 0.7497599 0.2502401
E0220 0.9998490 0.0001510
E0551 0.9999675 0.0000325
EM0061 0.9522544 0.0477456
E0261 0.9943373 0.0056627
E0205 0.4457350 0.5542650
E0445 0.9951217 0.0048783
E0529 0.9979747 0.0020253
E0436 0.9537990 0.0462010
E0586 0.9997800 0.0002200
E0354 0.9867405 0.0132595
E0267 0.9075242 0.0924758
E0379 0.9733780 0.0266220
E0187 0.9998524 0.0001476
E0035 0.9941344 0.0058656
E0496 0.9930874 0.0069126
E0678 0.5169870 0.4830130
E0179 0.8807651 0.1192349
E0438 0.9999999 0.0000001
E0341 0.0373816 0.9626184
E0463 0.9115756 0.0884244
E0104 0.9803770 0.0196230
E0069 0.8891152 0.1108848
E0591 0.9653636 0.0346364
E0396 1.0000000 0.0000000
E0392 0.9995905 0.0004095
E0061 0.8427166 0.1572834
E0180 0.9398030 0.0601970
E0579 0.8902655 0.1097345
E0038 0.9955726 0.0044274
E0654 0.2215128 0.7784872
E0457 0.9983151 0.0016849
E0556 0.1343242 0.8656758
E0097 0.9861912 0.0138088
E0641 0.9448017 0.0551983
E0667 0.9832047 0.0167953
E0510 0.9711680 0.0288320
E0109 0.9982708 0.0017292
E0401 0.9238083 0.0761917
E0420 0.9863529 0.0136471
E0520 0.9333747 0.0666253
E0550 0.8450630 0.1549370
E0371 0.9928142 0.0071858
E0255 0.9989964 0.0010036
E0439 0.9892633 0.0107367
E0156 0.9070545 0.0929455
E0532 0.9998079 0.0001921
E0247 0.9156230 0.0843770
E0045 0.9964594 0.0035406
E0153 0.9717307 0.0282693
E0364 0.9578512 0.0421488
E0148 0.3604740 0.6395260
E0523 0.9142399 0.0857601
E0403 0.9602486 0.0397514
E0264 0.5266199 0.4733801
E0224 0.3957295 0.6042705
E0105 0.9999599 0.0000401
E0693 0.2291141 0.7708859
E0462 0.8509217 0.1490783
E0662 0.9743623 0.0256377
E0504 0.4825105 0.5174895
E0499 0.9779482 0.0220518
E0656 0.9949020 0.0050980
E0513 0.9151031 0.0848969
E0116 0.9796332 0.0203668
E0484 0.9563321 0.0436679
E0571 0.9910589 0.0089411
E0381 0.9031759 0.0968241
E0283 0.9502037 0.0497963
E0374 0.9631866 0.0368134
E0557 0.7938378 0.2061622
E0644 0.8661621 0.1338379
E0170 0.6443280 0.3556720
E0248 0.9999134 0.0000866
E0587 0.9999750 0.0000250
E0260 0.9030797 0.0969203
E0574 0.9988782 0.0011218
E0175 0.9999998 0.0000002
E0161 0.9999182 0.0000818
E0048 0.9427483 0.0572517
E0047 0.9918239 0.0081761
E0085 0.9745855 0.0254145
E0202 0.8287648 0.1712352
E0456 0.9991704 0.0008296
E0111 0.9854309 0.0145691
E0360 0.9940813 0.0059187
E0382 0.9999696 0.0000304
E0620 0.9691648 0.0308352
E0303 0.9787288 0.0212712
E0079 0.9812170 0.0187830
E0447 0.9467775 0.0532225
E0389 0.9969750 0.0030250
E0294 0.9990970 0.0009030
E0521 0.9145709 0.0854291
E0675 0.7953831 0.2046169
E0687 0.9772174 0.0227826
E0027 0.9206428 0.0793572
E0672 0.1335360 0.8664640
E0398 0.7575310 0.2424690
E0582 0.9429089 0.0570911
E0350 0.5608872 0.4391128
E0515 0.9393201 0.0606799
E0082 0.9932086 0.0067914
E0355 0.9923244 0.0076756
E0572 0.3514702 0.6485298
E0171 0.7089129 0.2910871
E0166 0.9999965 0.0000035
E0549 0.8557752 0.1442248
E0533 0.7254616 0.2745384
E0616 0.2908262 0.7091738
E0683 0.8389334 0.1610666
E0528 0.9743471 0.0256529
E0298 0.9994177 0.0005823
EM0049 0.9999955 0.0000045
E0251 0.0002364 0.9997636
E0336 0.6063501 0.3936499
E0168 0.9944955 0.0055045

K=3 to 5

Code in code/gom-seeds.R.

clust_2 <- readRDS("../output/gom-seeds-k-2.rds")
clust_3 <- readRDS("../output/gom-seeds-k-3.rds")
clust_4 <- readRDS("../output/gom-seeds-k-4.rds")
clust_5 <- readRDS("../output/gom-seeds-k-5.rds")

Identify the best seed.

K=2: 67; K=3: 67, K=4: 59; K=5: 307.

seeds[which.max(lapply(clust_2, "[[", 4))]
[1] 67
seeds[which.max(lapply(clust_3, "[[", 4))]
[1] 67
seeds[which.max(lapply(clust_4, "[[", 4))]
[1] 59
seeds[which.max(lapply(clust_5, "[[", 4))]
[1] 307

save the best seeds.

saveRDS(clust_2[which.max(lapply(clust_2, "[[", 4))],
        "../output/gom-k2-best-seed-67.rds")
saveRDS(clust_3[which.max(lapply(clust_3, "[[", 4))],
        "../output/gom-k3-best-seed-67.rds")
saveRDS(clust_4[which.max(lapply(clust_4, "[[", 4))],
        "../output/gom-k4-best-seed-59.rds")
saveRDS(clust_5[which.max(lapply(clust_5, "[[", 4))],
        "../output/gom-k5-best-seed-307.rds")

Session Info

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] CountClust_1.4.1     ggplot2_2.2.1        metagenomeSeq_1.21.1
 [4] RColorBrewer_1.1-2   glmnet_2.0-16        foreach_1.4.4       
 [7] Matrix_1.2-14        limma_3.34.9         Biobase_2.38.0      
[10] BiocGenerics_0.24.0  dplyr_0.7.4          kableExtra_0.8.0    
[13] knitr_1.20          

loaded via a namespace (and not attached):
 [1] httr_1.3.1         viridisLite_0.3.0  gtools_3.5.0      
 [4] assertthat_0.2.0   highr_0.6          stats4_3.4.1      
 [7] yaml_2.1.19        slam_0.1-43        pillar_1.2.2      
[10] backports_1.1.2    lattice_0.20-35    glue_1.2.0        
[13] digest_0.6.15      rvest_0.3.2        colorspace_1.3-2  
[16] picante_1.6-2      cowplot_0.9.2      htmltools_0.3.6   
[19] plyr_1.8.4         pkgconfig_2.0.1    scales_0.5.0      
[22] gdata_2.18.0       git2r_0.21.0       tibble_1.4.2      
[25] mgcv_1.8-23        nnet_7.3-12        lazyeval_0.2.1    
[28] magrittr_1.5       evaluate_0.10.1    nlme_3.1-137      
[31] MASS_7.3-50        gplots_3.0.1       xml2_1.2.0        
[34] vegan_2.5-1        tools_3.4.1        hms_0.4.2         
[37] matrixStats_0.53.1 stringr_1.3.0      munsell_0.4.3     
[40] cluster_2.0.7-1    bindrcpp_0.2.2     compiler_3.4.1    
[43] caTools_1.17.1     rlang_0.2.0        grid_3.4.1        
[46] iterators_1.0.9    rstudioapi_0.7     bitops_1.0-6      
[49] rmarkdown_1.9      boot_1.3-20        gtable_0.2.0      
[52] codetools_0.2-15   flexmix_2.3-14     reshape2_1.4.3    
[55] R6_2.2.2           bindr_0.1.1        rprojroot_1.3-2   
[58] maptpx_1.9-3       KernSmooth_2.23-15 permute_0.9-4     
[61] readr_1.1.1        ape_5.1            modeltools_0.2-21 
[64] stringi_1.2.2      SQUAREM_2017.10-1  Rcpp_0.12.16      

This R Markdown site was created with workflowr