Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2018c.
1.0/zoneinfo/America/Chicago'
Last updated: 2018-06-07
Code version: 852b84b
The GoM (Grades of Membership) model computes log Bayes factor for comparing model fitting with k clusters versus null single cluster model. The log Bayes factor is corrected for the number of parameters fitted in each model. In clustering analysis, one typically chooses k that has the largest Bayes factor. However, it is worth remembering that in practice, it is unlikely to get a true k, and results from different k may complement each other rather than competing with each other.
In the current analysis, we evaluated the data with k = 2 to 20. We found that k=2 produces the largest Bayes factor.
library(knitr)
library(kableExtra)
library(dplyr)
library(metagenomeSeq)
library(CountClust)
library(vegan)
Read in filtered data.
MRobj <- readRDS("../data/nasal_filtered_normed_batchcorrected.rds")
counts <- MRcounts(MRobj,norm=FALSE,log=FALSE)
MRobj
MRexperiment (storageMode: environment)
assayData: 553 features, 197 samples
element names: counts
protocolData: none
phenoData
sampleNames: EM0042 EM0047 ... E0168 (197 total)
varLabels: StudyID age ... infnone (94 total)
varMetadata: labelDescription
featureData
featureNames:
k__Bacteria;p__[Thermi];c__Deinococci;o__Deinococcales;f__Deinococcaceae;g__CM44;s__
k__Bacteria;p__[Thermi];c__Deinococci;o__Deinococcales;f__Deinococcaceae;g__Deinococcus;s__
... s__zeae (553 total)
fvarLabels: Kingdom Phylum ... Species (7 total)
fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:
Fitting k=2 to 10.
fits <- FitGoM(t(counts), K=c(2:10), tol=0.00001)
saveRDS(fits, file = "../data/gomfits.rds")
Extract sample memberships.
fits <- readRDS(file = "../data/gomfits.rds")
clust_fit_membership <- lapply(fits, function(xx) {
apply(xx$omega, 1, which.max)
})
for (i in 1:length(clust_fit_membership)) {
print(table(clust_fit_membership[[i]]))
}
1 2
180 17
1 2 3
166 24 7
1 2 3 4
88 71 27 11
1 2 3 4 5
105 68 15 8 1
1 2 3 4 5 6
73 54 40 17 12 1
1 2 3 4 5 6 7
75 42 38 29 7 3 3
1 2 3 4 5 6 7 8
79 48 26 19 12 9 2 2
1 2 3 4 5 6 7 8 9
76 50 25 19 12 9 2 3 1
1 2 3 4 5 6 7 8 9 10
79 30 27 16 18 11 7 2 5 2
par(mfrow=c(1,2))
plot(bf_species, type = "b", xlab = "Number of clusters",
ylab = "log Bayes Factor (H1 vs single cluster)", main = "Species")
plot(bf_species_seq, type = "b", xlab = "Number of clusters",
ylab = "Percent change in log Bayes Factor (k vs k-1)", main = "Species")

for (i in 1:length(clust_fit_membership)) {
cl <- factor(clust_fit_membership[[i]])
mat <- t(MRcounts(MRobj,norm=FALSE,log=FALSE))
H <- diversity(mat,index="shannon")
boxplot(H~cl, ylab=paste("shanno","diversity index"), col=1:length(levels(cl)),
main = names(clust_fit_membership)[i])
}









R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] vegan_2.5-1 lattice_0.20-35 permute_0.9-4
[4] CountClust_1.4.1 ggplot2_2.2.1 metagenomeSeq_1.21.1
[7] RColorBrewer_1.1-2 glmnet_2.0-16 foreach_1.4.4
[10] Matrix_1.2-14 limma_3.34.9 Biobase_2.38.0
[13] BiocGenerics_0.24.0 dplyr_0.7.4 kableExtra_0.8.0
[16] knitr_1.20
loaded via a namespace (and not attached):
[1] httr_1.3.1 viridisLite_0.3.0 gtools_3.5.0
[4] assertthat_0.2.0 stats4_3.4.1 yaml_2.1.19
[7] slam_0.1-43 pillar_1.2.2 backports_1.1.2
[10] glue_1.2.0 digest_0.6.15 rvest_0.3.2
[13] colorspace_1.3-2 picante_1.6-2 cowplot_0.9.2
[16] htmltools_0.3.6 plyr_1.8.4 pkgconfig_2.0.1
[19] scales_0.5.0 gdata_2.18.0 git2r_0.21.0
[22] tibble_1.4.2 mgcv_1.8-23 nnet_7.3-12
[25] lazyeval_0.2.1 magrittr_1.5 evaluate_0.10.1
[28] nlme_3.1-137 MASS_7.3-50 gplots_3.0.1
[31] xml2_1.2.0 tools_3.4.1 hms_0.4.2
[34] matrixStats_0.53.1 stringr_1.3.0 munsell_0.4.3
[37] cluster_2.0.7-1 bindrcpp_0.2.2 compiler_3.4.1
[40] caTools_1.17.1 rlang_0.2.0 grid_3.4.1
[43] iterators_1.0.9 rstudioapi_0.7 bitops_1.0-6
[46] rmarkdown_1.9 boot_1.3-20 gtable_0.2.0
[49] codetools_0.2-15 flexmix_2.3-14 reshape2_1.4.3
[52] R6_2.2.2 bindr_0.1.1 rprojroot_1.3-2
[55] maptpx_1.9-3 KernSmooth_2.23-15 readr_1.1.1
[58] ape_5.1 modeltools_0.2-21 stringi_1.2.2
[61] SQUAREM_2017.10-1 Rcpp_0.12.16
This R Markdown site was created with workflowr