Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2017c.
1.0/zoneinfo/America/Chicago'

Last updated: 2018-02-14

Code version: 5b4420c


Introduction/Summary

Here we combine QIIME output, OTU taxonomy labels, and sample phenotype labels into a metagenomeSeq MRexperiment object nasal_raw.rds. The sources of data are

  • QIIME output: otu_table.biom

  • Taxonomy labels: GSCID_Seqs_4Nauder_rep_set_tax_assignments.txt

  • Sample phenotype labels: metadata.tsv

Raw data consists of 42,999 features (OTUs) from 201 samples. The OTUs are annotated at each taxonomy level: Kindom, Phylum, Class, Order, Family, Genus, Species, and labeled as unassigned when annotations are not available. There are 94 sample phenotypes, including primary and secondary outcomes.

\(~\)


Setting up

library(biomformat)
library(metagenomeSeq)
library(dplyr)

Reading in the counts

Import QIIME output using biomformat, and convert to an MRexperiment object using metagenomeSeq.

counts <- read_biom("../data/otu_table.biom")
counts = biom2MRexperiment(counts)
cnames = sapply(strsplit(colnames(counts),"\\."),function(i)ifelse(length(i)>1,i[2],i[1]))
colnames(counts) = cnames

Raw data consists of 42,999 features and 201 samples.

dim(counts)
Features  Samples 
   42999      201 

Taxonomy

Read in the QIIME taxonomy and attach to the MRexperiment object. It requires a bit of cleanup due to the file structure.

fd = read.csv("../data/GSCID_Seqs_4Nauder_rep_set_tax_assignments.txt",sep=";",
              stringsAsFactors=FALSE,header=FALSE)
species = sapply(strsplit(fd[,7],"\t"),function(i)i[1]) %>% ifelse(is.na(.),yes=" s__",no=.)
fd[,7] = species
rnames = sapply(strsplit(fd[,1],"\t"),function(i)i[1])
rownames(fd) = rnames
fd[,1] = sapply(strsplit(fd[,1],"\t"),function(i)i[2]) %>% ifelse(is.na(.),yes=" Unassigned",no=.)
colnames(fd) = c("Kingdom","Phylum","Class","Order","Family","Genus","Species")

for(i in seq(fd)){
  fd[,i] = gsub(" ","",fd[,i])
}
for(i in seq(fd)){
  lvl = tolower(strsplit(colnames(fd)[i],"")[[1]][1])
  fd[which(is.na(fd[,i]) | fd[,i]==""),i] = paste(lvl,"__",sep="")
}

for(i in rev(seq(fd)[-1])){
  lvl = paste(tolower(strsplit(colnames(fd)[i],"")[[1]][1]),"__",sep="")
  if(any(fd[,i] == lvl)){
    nrows = which(fd[,i] == lvl)
    feel  = fd[nrows,seq(i)]
    feel  = sapply(seq(nrow(feel)),function(j)paste(feel[j,seq(i)],collapse=";"))
    fd[nrows,i] = feel
  }
}
for(i in seq(fd)) fd[grep("Unassigned",fd[,i]),i] = "Unassigned"
fd = fd[match(rownames(counts),rownames(fd)),]
for(i in 2:ncol(fd)){
  fd[,i] = gsub("\t0.67\t3","",fd[,i])
  fd[,i] = gsub("\t0.67\t2","",fd[,i])
  fd[,i] = gsub("\t1.00\t3","",fd[,i])
  fd[,i] = gsub("\t1.00\t2","",fd[,i])
}
fData(counts) = fd

\(~\)


Metadata

Next we read in the metadata (sample phenotype labels) and attach to the MRexperiment object

pd = read.csv("../data/metadata.tsv",sep = "\t",stringsAsFactors=FALSE)
rownames(pd) = pd[,1]
pd = pd[match(colnames(counts),rownames(pd)),]
pData(counts) = pd

There are 94 phenotype labels for the 201 samples.

dim(pd)
[1] 201  94

Saving the file

Lastly, we save the object as an RDS file.

saveRDS(counts, file="../data/nasal_raw.rds")

SessionInfo

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] dplyr_0.7.4          metagenomeSeq_1.20.1 RColorBrewer_1.1-2  
 [4] glmnet_2.0-13        foreach_1.4.4        Matrix_1.2-12       
 [7] limma_3.34.4         Biobase_2.38.0       BiocGenerics_0.24.0 
[10] biomformat_1.6.0    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14       bindr_0.1          compiler_3.4.1    
 [4] git2r_0.20.0       plyr_1.8.4         bitops_1.0-6      
 [7] iterators_1.0.9    tools_3.4.1        zlibbioc_1.24.0   
[10] digest_0.6.13      tibble_1.3.4       jsonlite_1.5      
[13] evaluate_0.10.1    rhdf5_2.22.0       lattice_0.20-35   
[16] pkgconfig_2.0.1    rlang_0.1.4        yaml_2.1.16       
[19] bindrcpp_0.2       stringr_1.2.0      knitr_1.17        
[22] gtools_3.5.0       caTools_1.17.1     rprojroot_1.2     
[25] grid_3.4.1         glue_1.2.0         R6_2.2.2          
[28] rmarkdown_1.8      gdata_2.18.0       magrittr_1.5      
[31] backports_1.1.2    gplots_3.0.1       codetools_0.2-15  
[34] htmltools_0.3.6    matrixStats_0.52.2 assertthat_0.2.0  
[37] KernSmooth_2.23-15 stringi_1.1.6     

This R Markdown site was created with workflowr