Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2017c.
1.0/zoneinfo/America/Chicago'
Last updated: 2018-02-14
Code version: 5b4420c
Here we combine QIIME output, OTU taxonomy labels, and sample phenotype labels into a metagenomeSeq MRexperiment object nasal_raw.rds. The sources of data are
QIIME output: otu_table.biom
Taxonomy labels: GSCID_Seqs_4Nauder_rep_set_tax_assignments.txt
Sample phenotype labels: metadata.tsv
Raw data consists of 42,999 features (OTUs) from 201 samples. The OTUs are annotated at each taxonomy level: Kindom, Phylum, Class, Order, Family, Genus, Species, and labeled as unassigned when annotations are not available. There are 94 sample phenotypes, including primary and secondary outcomes.
\(~\)
library(biomformat)
library(metagenomeSeq)
library(dplyr)
Import QIIME output using biomformat, and convert to an MRexperiment object using metagenomeSeq.
counts <- read_biom("../data/otu_table.biom")
counts = biom2MRexperiment(counts)
cnames = sapply(strsplit(colnames(counts),"\\."),function(i)ifelse(length(i)>1,i[2],i[1]))
colnames(counts) = cnames
Raw data consists of 42,999 features and 201 samples.
dim(counts)
Features Samples
42999 201
Read in the QIIME taxonomy and attach to the MRexperiment object. It requires a bit of cleanup due to the file structure.
fd = read.csv("../data/GSCID_Seqs_4Nauder_rep_set_tax_assignments.txt",sep=";",
stringsAsFactors=FALSE,header=FALSE)
species = sapply(strsplit(fd[,7],"\t"),function(i)i[1]) %>% ifelse(is.na(.),yes=" s__",no=.)
fd[,7] = species
rnames = sapply(strsplit(fd[,1],"\t"),function(i)i[1])
rownames(fd) = rnames
fd[,1] = sapply(strsplit(fd[,1],"\t"),function(i)i[2]) %>% ifelse(is.na(.),yes=" Unassigned",no=.)
colnames(fd) = c("Kingdom","Phylum","Class","Order","Family","Genus","Species")
for(i in seq(fd)){
fd[,i] = gsub(" ","",fd[,i])
}
for(i in seq(fd)){
lvl = tolower(strsplit(colnames(fd)[i],"")[[1]][1])
fd[which(is.na(fd[,i]) | fd[,i]==""),i] = paste(lvl,"__",sep="")
}
for(i in rev(seq(fd)[-1])){
lvl = paste(tolower(strsplit(colnames(fd)[i],"")[[1]][1]),"__",sep="")
if(any(fd[,i] == lvl)){
nrows = which(fd[,i] == lvl)
feel = fd[nrows,seq(i)]
feel = sapply(seq(nrow(feel)),function(j)paste(feel[j,seq(i)],collapse=";"))
fd[nrows,i] = feel
}
}
for(i in seq(fd)) fd[grep("Unassigned",fd[,i]),i] = "Unassigned"
fd = fd[match(rownames(counts),rownames(fd)),]
for(i in 2:ncol(fd)){
fd[,i] = gsub("\t0.67\t3","",fd[,i])
fd[,i] = gsub("\t0.67\t2","",fd[,i])
fd[,i] = gsub("\t1.00\t3","",fd[,i])
fd[,i] = gsub("\t1.00\t2","",fd[,i])
}
fData(counts) = fd
\(~\)
Next we read in the metadata (sample phenotype labels) and attach to the MRexperiment object
pd = read.csv("../data/metadata.tsv",sep = "\t",stringsAsFactors=FALSE)
rownames(pd) = pd[,1]
pd = pd[match(colnames(counts),rownames(pd)),]
pData(counts) = pd
There are 94 phenotype labels for the 201 samples.
dim(pd)
[1] 201 94
Lastly, we save the object as an RDS file.
saveRDS(counts, file="../data/nasal_raw.rds")
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] dplyr_0.7.4 metagenomeSeq_1.20.1 RColorBrewer_1.1-2
[4] glmnet_2.0-13 foreach_1.4.4 Matrix_1.2-12
[7] limma_3.34.4 Biobase_2.38.0 BiocGenerics_0.24.0
[10] biomformat_1.6.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 bindr_0.1 compiler_3.4.1
[4] git2r_0.20.0 plyr_1.8.4 bitops_1.0-6
[7] iterators_1.0.9 tools_3.4.1 zlibbioc_1.24.0
[10] digest_0.6.13 tibble_1.3.4 jsonlite_1.5
[13] evaluate_0.10.1 rhdf5_2.22.0 lattice_0.20-35
[16] pkgconfig_2.0.1 rlang_0.1.4 yaml_2.1.16
[19] bindrcpp_0.2 stringr_1.2.0 knitr_1.17
[22] gtools_3.5.0 caTools_1.17.1 rprojroot_1.2
[25] grid_3.4.1 glue_1.2.0 R6_2.2.2
[28] rmarkdown_1.8 gdata_2.18.0 magrittr_1.5
[31] backports_1.1.2 gplots_3.0.1 codetools_0.2-15
[34] htmltools_0.3.6 matrixStats_0.52.2 assertthat_0.2.0
[37] KernSmooth_2.23-15 stringi_1.1.6
This R Markdown site was created with workflowr