Last updated: 2018-01-24

Code version: 3e19cb6

Motivation

For identifying and distingushing single cell samples from human and chimp individuals in a single Dropseq run.

Data

Pilot data: Yoruba cell line 18489 was included in the human-chimp mix. This is a female individual.

Human reference: snps.grch37.exons.vcf.gz. For how the human vcf was generated, see here https://github.com/jdblischak/singleCellSeq/blob/master/code/verify-bam.py

Approach

I’ll describe the approach in steps here:

  1. Map all samples to human genome

  2. Assume that the human individual is genotyped, we can obtain this individual’s genotype from the 1000 Human Genome project.

  3. Select a set of SNP positions that are likely to distinguish chimp from human indivduals.

Step 3 provides a subset of SNP positions that are then used in demuxlet to estimate likelihood of observed SNP profile given the known sample genotypes. We considered several rules in selecting SNPs and produced demuxlet results under different combination of these rules.

R1: ancestral alelle is identified as present at the select SNP position

R2: there was no sufficient information to identify ancestral alelle at the select SNP position

R3: ancestral alele is identified as absent at the select position

R4: individual genotype is not identical to the population genotype

Scenarios in Step 3

Scenario 1:
R1. Include snp positions identified to have ancestral allele
R2. Keep snp positions at which 18489 genotype is not the same as the major/reference genotype
R3. Let the pseudo chimp be the major/reference genotype

Comments: but under this scenarior, many of the 18489 genotypes can also match to the
major/reference, unelss it’s a minor allele

Scenario 2: R4. Inclde snp positions identified to have or to not have ancestral allele R2. Keep snp positions at which 18489 genotype is not the same as the major/reference genotype R3. Let the pseudo chimp be the major/reference genotype

Scenario 3: R1. Include snp positions identified to have ancestral allele R3. Let the pseudo chimp be the major/reference genotype

Scenario 4: R4. Inclde snp positions identified to have or to not have ancestral allele R3. Let the pseudo chimp be the major/reference genotype

Scenario 5: R5. Inclde snp positions not identifed to have ancestral allele R3. Let the pseudo chimp be the major/reference genotype

Other scenarios:

  • Test human control bam file

  • Multiple genotyped individuals
  1. Include genotypes from 6 human individuals (data 18498 and in addition 18499)
  2. Can demuxlet correctly distinguish these two?

Results: demuxlet assigns chimps to human and returns many doublets…


Session information

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-1
 [5] tools_3.4.1     htmltools_0.3.6 yaml_2.1.16     Rcpp_0.12.14   
 [9] stringi_1.1.6   rmarkdown_1.8   knitr_1.17      git2r_0.20.0   
[13] stringr_1.2.0   digest_0.6.13   evaluate_0.10.1

This R Markdown site was created with workflowr