Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2018c.
1.0/zoneinfo/America/Chicago'
Last updated: 2018-06-07
Code version: e30803e
Motivation: when running CountClut, sample cluster assignment may differ between different random seeds, despite similar cluster characteristics in feature abundancce. This issue is not unique to CountClust. In model-based unsupervised clustering methods, the parameter space is usually complex, and as a result, difficult to find global maximum of the model likelihood.
I run CountClust under 100 different random seeds and then compare Bayes Factors across different random seeds and sample cluster membership. I will use the random seed that generates the largest Bayes Factor in all subsequent CountClust analysis.
This document shows the analysis steps for when K=2. Results show that for when K=2, across 100 random seeds 409 produced the largest BF and is thus chosen as our random seed. This chosen random seed (best seed) is then used in our final results (https://jhsiao999.github.io/nasalmicrobiome/step3_gradesOfMembership.html).
The fitting results for K=2 to 5 under the same 100 random seeds are saved in output/gom-seeds-k-2.rds, output/gom-seeds-k-3.rds, output/gom-seeds-k-4.rds, and output/gom-seeds-k-5.rds.
library(knitr)
library(kableExtra)
library(dplyr)
library(metagenomeSeq)
library(CountClust)
library(parallel)
Read in filtered data.
MRobj = readRDS("../data/nasal_filtered_normed_batchcorrected.rds")
MRobj
MRexperiment (storageMode: environment)
assayData: 553 features, 197 samples
element names: counts
protocolData: none
phenoData
sampleNames: EM0042 EM0047 ... E0168 (197 total)
varLabels: StudyID age ... infnone (94 total)
varMetadata: labelDescription
featureData
featureNames:
k__Bacteria;p__[Thermi];c__Deinococci;o__Deinococcales;f__Deinococcaceae;g__CM44;s__
k__Bacteria;p__[Thermi];c__Deinococci;o__Deinococcales;f__Deinococcaceae;g__Deinococcus;s__
... s__zeae (553 total)
fvarLabels: Kingdom Phylum ... Species (7 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
100 random seeds
Apply CountClust to fit grades of membership model.
counts <- MRcounts(MRobj,norm=FALSE,log=FALSE)
# get 100 random seeds
source("../code/print-prime.R")
seeds <- prime(543)
res <- vector("list", length(seeds))
for (i in 1:100) {
set.seed(seeds[i])
res[[i]] <- FitGoM(t(counts), K=c(2), tol=1e-6)[[1]]
}
saveRDS(res, file = "../data/res.rds")
res <- readRDS(file="../data/res.rds")
Bayes factor differs between the seeds.
summary(do.call(c,lapply(res, "[[", 4)))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1085545 1379648 1483781 1489853 1601809 1738210
Choose the seed with maximum BF: 67
which.max(lapply(res, "[[", 4))
[1] 19
saveRDS(res[[19]], file = "../output/gom-seeds.Rmd/res_best_seed_67.rds")
\(~\)
Results using the seed with the highest BF.
res_best <- readRDS(file = "../output/gom-seeds.Rmd/res_best_seed_67.rds")
cluster <- apply(res_best$omega, 1, which.max)
table(cluster)
cluster
1 2
167 30
kable(res_best$omega)
| 1 | 2 | |
|---|---|---|
| EM0042 | 0.9995808 | 0.0004192 |
| EM0047 | 0.9963470 | 0.0036530 |
| E0133 | 0.9920976 | 0.0079024 |
| E0197 | 0.9996540 | 0.0003460 |
| EM0046 | 0.9999604 | 0.0000396 |
| EM0071 | 0.6772224 | 0.3227776 |
| EM0029 | 0.9999471 | 0.0000529 |
| E0358 | 0.9922081 | 0.0077919 |
| EM0086 | 0.9575558 | 0.0424442 |
| E0060 | 0.9953273 | 0.0046727 |
| EM0068 | 0.9908933 | 0.0091067 |
| E0275 | 0.2459101 | 0.7540899 |
| E0584 | 0.9207612 | 0.0792388 |
| E0119 | 0.0273439 | 0.9726561 |
| E0117 | 0.0001033 | 0.9998967 |
| E0126 | 0.2372200 | 0.7627800 |
| E0684 | 0.9999425 | 0.0000575 |
| E0103 | 0.9208354 | 0.0791646 |
| E0327 | 0.9996965 | 0.0003035 |
| E0089 | 0.0085725 | 0.9914275 |
| E0183 | 0.7016093 | 0.2983907 |
| E0207 | 0.9473176 | 0.0526824 |
| E0050 | 0.4789709 | 0.5210291 |
| EM0012 | 0.9998702 | 0.0001298 |
| E0399 | 0.9349670 | 0.0650330 |
| E0120 | 0.3211410 | 0.6788590 |
| EM0060 | 0.9999982 | 0.0000018 |
| E0083 | 0.9648846 | 0.0351154 |
| E0042 | 0.8167775 | 0.1832225 |
| EM0013 | 0.9999285 | 0.0000715 |
| E0132 | 0.9675559 | 0.0324441 |
| EM0050 | 0.9591377 | 0.0408623 |
| E0076 | 0.8908938 | 0.1091062 |
| E0106 | 0.9592985 | 0.0407015 |
| E0608 | 0.9913975 | 0.0086025 |
| E0122 | 0.3253818 | 0.6746182 |
| E0250 | 0.9884623 | 0.0115377 |
| EM0067 | 0.9547523 | 0.0452477 |
| E0686 | 0.8836826 | 0.1163174 |
| E0211 | 0.9999999 | 0.0000001 |
| E0384 | 0.9972912 | 0.0027088 |
| EM0078 | 0.9833801 | 0.0166199 |
| E0077 | 0.9838634 | 0.0161366 |
| E0652 | 0.8033168 | 0.1966832 |
| E0229 | 0.9999562 | 0.0000438 |
| E0125 | 0.0785997 | 0.9214003 |
| E0487 | 0.7541912 | 0.2458088 |
| EM0065 | 0.9998911 | 0.0001089 |
| E0129 | 0.0233961 | 0.9766039 |
| E0661 | 0.9286069 | 0.0713931 |
| E0040 | 0.5982370 | 0.4017630 |
| E0393 | 0.9926429 | 0.0073571 |
| E0024 | 0.6732353 | 0.3267647 |
| EM0080 | 0.9988929 | 0.0011071 |
| EM0085 | 0.7602501 | 0.2397499 |
| E0019 | 0.9234444 | 0.0765556 |
| E0039 | 0.8607182 | 0.1392818 |
| E0593 | 0.3762219 | 0.6237781 |
| E0049 | 0.3585597 | 0.6414403 |
| E0112 | 0.9994410 | 0.0005590 |
| E0299 | 0.9680101 | 0.0319899 |
| E0165 | 0.7602161 | 0.2397839 |
| E0071 | 0.8999837 | 0.1000163 |
| E0475 | 0.8385054 | 0.1614946 |
| E0489 | 0.9659420 | 0.0340580 |
| E0488 | 0.8928408 | 0.1071592 |
| E0174 | 0.4748591 | 0.5251409 |
| E0537 | 0.5501408 | 0.4498592 |
| E0560 | 0.3165016 | 0.6834984 |
| E0343 | 0.7803942 | 0.2196058 |
| E0577 | 0.2876980 | 0.7123020 |
| E0297 | 0.9773853 | 0.0226147 |
| E0136 | 0.7203584 | 0.2796416 |
| E0149 | 0.0311061 | 0.9688939 |
| E0555 | 0.9681198 | 0.0318802 |
| E0234 | 0.4825390 | 0.5174610 |
| E0100 | 0.9739838 | 0.0260162 |
| E0253 | 0.9972804 | 0.0027196 |
| E0359 | 0.8778519 | 0.1221481 |
| E0329 | 0.3323981 | 0.6676019 |
| E0595 | 0.7497599 | 0.2502401 |
| E0220 | 0.9998490 | 0.0001510 |
| E0551 | 0.9999675 | 0.0000325 |
| EM0061 | 0.9522544 | 0.0477456 |
| E0261 | 0.9943373 | 0.0056627 |
| E0205 | 0.4457350 | 0.5542650 |
| E0445 | 0.9951217 | 0.0048783 |
| E0529 | 0.9979747 | 0.0020253 |
| E0436 | 0.9537990 | 0.0462010 |
| E0586 | 0.9997800 | 0.0002200 |
| E0354 | 0.9867405 | 0.0132595 |
| E0267 | 0.9075242 | 0.0924758 |
| E0379 | 0.9733780 | 0.0266220 |
| E0187 | 0.9998524 | 0.0001476 |
| E0035 | 0.9941344 | 0.0058656 |
| E0496 | 0.9930874 | 0.0069126 |
| E0678 | 0.5169870 | 0.4830130 |
| E0179 | 0.8807651 | 0.1192349 |
| E0438 | 0.9999999 | 0.0000001 |
| E0341 | 0.0373816 | 0.9626184 |
| E0463 | 0.9115756 | 0.0884244 |
| E0104 | 0.9803770 | 0.0196230 |
| E0069 | 0.8891152 | 0.1108848 |
| E0591 | 0.9653636 | 0.0346364 |
| E0396 | 1.0000000 | 0.0000000 |
| E0392 | 0.9995905 | 0.0004095 |
| E0061 | 0.8427166 | 0.1572834 |
| E0180 | 0.9398030 | 0.0601970 |
| E0579 | 0.8902655 | 0.1097345 |
| E0038 | 0.9955726 | 0.0044274 |
| E0654 | 0.2215128 | 0.7784872 |
| E0457 | 0.9983151 | 0.0016849 |
| E0556 | 0.1343242 | 0.8656758 |
| E0097 | 0.9861912 | 0.0138088 |
| E0641 | 0.9448017 | 0.0551983 |
| E0667 | 0.9832047 | 0.0167953 |
| E0510 | 0.9711680 | 0.0288320 |
| E0109 | 0.9982708 | 0.0017292 |
| E0401 | 0.9238083 | 0.0761917 |
| E0420 | 0.9863529 | 0.0136471 |
| E0520 | 0.9333747 | 0.0666253 |
| E0550 | 0.8450630 | 0.1549370 |
| E0371 | 0.9928142 | 0.0071858 |
| E0255 | 0.9989964 | 0.0010036 |
| E0439 | 0.9892633 | 0.0107367 |
| E0156 | 0.9070545 | 0.0929455 |
| E0532 | 0.9998079 | 0.0001921 |
| E0247 | 0.9156230 | 0.0843770 |
| E0045 | 0.9964594 | 0.0035406 |
| E0153 | 0.9717307 | 0.0282693 |
| E0364 | 0.9578512 | 0.0421488 |
| E0148 | 0.3604740 | 0.6395260 |
| E0523 | 0.9142399 | 0.0857601 |
| E0403 | 0.9602486 | 0.0397514 |
| E0264 | 0.5266199 | 0.4733801 |
| E0224 | 0.3957295 | 0.6042705 |
| E0105 | 0.9999599 | 0.0000401 |
| E0693 | 0.2291141 | 0.7708859 |
| E0462 | 0.8509217 | 0.1490783 |
| E0662 | 0.9743623 | 0.0256377 |
| E0504 | 0.4825105 | 0.5174895 |
| E0499 | 0.9779482 | 0.0220518 |
| E0656 | 0.9949020 | 0.0050980 |
| E0513 | 0.9151031 | 0.0848969 |
| E0116 | 0.9796332 | 0.0203668 |
| E0484 | 0.9563321 | 0.0436679 |
| E0571 | 0.9910589 | 0.0089411 |
| E0381 | 0.9031759 | 0.0968241 |
| E0283 | 0.9502037 | 0.0497963 |
| E0374 | 0.9631866 | 0.0368134 |
| E0557 | 0.7938378 | 0.2061622 |
| E0644 | 0.8661621 | 0.1338379 |
| E0170 | 0.6443280 | 0.3556720 |
| E0248 | 0.9999134 | 0.0000866 |
| E0587 | 0.9999750 | 0.0000250 |
| E0260 | 0.9030797 | 0.0969203 |
| E0574 | 0.9988782 | 0.0011218 |
| E0175 | 0.9999998 | 0.0000002 |
| E0161 | 0.9999182 | 0.0000818 |
| E0048 | 0.9427483 | 0.0572517 |
| E0047 | 0.9918239 | 0.0081761 |
| E0085 | 0.9745855 | 0.0254145 |
| E0202 | 0.8287648 | 0.1712352 |
| E0456 | 0.9991704 | 0.0008296 |
| E0111 | 0.9854309 | 0.0145691 |
| E0360 | 0.9940813 | 0.0059187 |
| E0382 | 0.9999696 | 0.0000304 |
| E0620 | 0.9691648 | 0.0308352 |
| E0303 | 0.9787288 | 0.0212712 |
| E0079 | 0.9812170 | 0.0187830 |
| E0447 | 0.9467775 | 0.0532225 |
| E0389 | 0.9969750 | 0.0030250 |
| E0294 | 0.9990970 | 0.0009030 |
| E0521 | 0.9145709 | 0.0854291 |
| E0675 | 0.7953831 | 0.2046169 |
| E0687 | 0.9772174 | 0.0227826 |
| E0027 | 0.9206428 | 0.0793572 |
| E0672 | 0.1335360 | 0.8664640 |
| E0398 | 0.7575310 | 0.2424690 |
| E0582 | 0.9429089 | 0.0570911 |
| E0350 | 0.5608872 | 0.4391128 |
| E0515 | 0.9393201 | 0.0606799 |
| E0082 | 0.9932086 | 0.0067914 |
| E0355 | 0.9923244 | 0.0076756 |
| E0572 | 0.3514702 | 0.6485298 |
| E0171 | 0.7089129 | 0.2910871 |
| E0166 | 0.9999965 | 0.0000035 |
| E0549 | 0.8557752 | 0.1442248 |
| E0533 | 0.7254616 | 0.2745384 |
| E0616 | 0.2908262 | 0.7091738 |
| E0683 | 0.8389334 | 0.1610666 |
| E0528 | 0.9743471 | 0.0256529 |
| E0298 | 0.9994177 | 0.0005823 |
| EM0049 | 0.9999955 | 0.0000045 |
| E0251 | 0.0002364 | 0.9997636 |
| E0336 | 0.6063501 | 0.3936499 |
| E0168 | 0.9944955 | 0.0055045 |
Code in code/gom-seeds.R.
clust_2 <- readRDS("../output/gom-seeds-k-2.rds")
clust_3 <- readRDS("../output/gom-seeds-k-3.rds")
clust_4 <- readRDS("../output/gom-seeds-k-4.rds")
clust_5 <- readRDS("../output/gom-seeds-k-5.rds")
Identify the best seed.
K=2: 67; K=3: 67, K=4: 59; K=5: 307.
seeds[which.max(lapply(clust_2, "[[", 4))]
[1] 67
seeds[which.max(lapply(clust_3, "[[", 4))]
[1] 67
seeds[which.max(lapply(clust_4, "[[", 4))]
[1] 59
seeds[which.max(lapply(clust_5, "[[", 4))]
[1] 307
save the best seeds.
saveRDS(clust_2[which.max(lapply(clust_2, "[[", 4))],
"../output/gom-k2-best-seed-67.rds")
saveRDS(clust_3[which.max(lapply(clust_3, "[[", 4))],
"../output/gom-k3-best-seed-67.rds")
saveRDS(clust_4[which.max(lapply(clust_4, "[[", 4))],
"../output/gom-k4-best-seed-59.rds")
saveRDS(clust_5[which.max(lapply(clust_5, "[[", 4))],
"../output/gom-k5-best-seed-307.rds")
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] CountClust_1.4.1 ggplot2_2.2.1 metagenomeSeq_1.21.1
[4] RColorBrewer_1.1-2 glmnet_2.0-16 foreach_1.4.4
[7] Matrix_1.2-14 limma_3.34.9 Biobase_2.38.0
[10] BiocGenerics_0.24.0 dplyr_0.7.4 kableExtra_0.8.0
[13] knitr_1.20
loaded via a namespace (and not attached):
[1] httr_1.3.1 viridisLite_0.3.0 gtools_3.5.0
[4] assertthat_0.2.0 highr_0.6 stats4_3.4.1
[7] yaml_2.1.19 slam_0.1-43 pillar_1.2.2
[10] backports_1.1.2 lattice_0.20-35 glue_1.2.0
[13] digest_0.6.15 rvest_0.3.2 colorspace_1.3-2
[16] picante_1.6-2 cowplot_0.9.2 htmltools_0.3.6
[19] plyr_1.8.4 pkgconfig_2.0.1 scales_0.5.0
[22] gdata_2.18.0 git2r_0.21.0 tibble_1.4.2
[25] mgcv_1.8-23 nnet_7.3-12 lazyeval_0.2.1
[28] magrittr_1.5 evaluate_0.10.1 nlme_3.1-137
[31] MASS_7.3-50 gplots_3.0.1 xml2_1.2.0
[34] vegan_2.5-1 tools_3.4.1 hms_0.4.2
[37] matrixStats_0.53.1 stringr_1.3.0 munsell_0.4.3
[40] cluster_2.0.7-1 bindrcpp_0.2.2 compiler_3.4.1
[43] caTools_1.17.1 rlang_0.2.0 grid_3.4.1
[46] iterators_1.0.9 rstudioapi_0.7 bitops_1.0-6
[49] rmarkdown_1.9 boot_1.3-20 gtable_0.2.0
[52] codetools_0.2-15 flexmix_2.3-14 reshape2_1.4.3
[55] R6_2.2.2 bindr_0.1.1 rprojroot_1.3-2
[58] maptpx_1.9-3 KernSmooth_2.23-15 permute_0.9-4
[61] readr_1.1.1 ape_5.1 modeltools_0.2-21
[64] stringi_1.2.2 SQUAREM_2017.10-1 Rcpp_0.12.16
This R Markdown site was created with workflowr