Last updated: 2018-01-24
Code version: 3e19cb6
For identifying and distingushing single cell samples from human and chimp individuals in a single Dropseq run.
Pilot data: Yoruba cell line 18489 was included in the human-chimp mix. This is a female individual.
Human reference: snps.grch37.exons.vcf.gz. For how the human vcf was generated, see here https://github.com/jdblischak/singleCellSeq/blob/master/code/verify-bam.py
I’ll describe the approach in steps here:
Map all samples to human genome
Assume that the human individual is genotyped, we can obtain this individual’s genotype from the 1000 Human Genome project.
Select a set of SNP positions that are likely to distinguish chimp from human indivduals.
Step 3 provides a subset of SNP positions that are then used in demuxlet to estimate likelihood of observed SNP profile given the known sample genotypes. We considered several rules in selecting SNPs and produced demuxlet results under different combination of these rules.
R1: ancestral alelle is identified as present at the select SNP position
R2: there was no sufficient information to identify ancestral alelle at the select SNP position
R3: ancestral alele is identified as absent at the select position
R4: individual genotype is not identical to the population genotype
Scenario 1:
R1. Include snp positions identified to have ancestral allele
R2. Keep snp positions at which 18489 genotype is not the same as the major/reference genotype
R3. Let the pseudo chimp be the major/reference genotype
Comments: but under this scenarior, many of the 18489 genotypes can also match to the
major/reference, unelss it’s a minor allele
Scenario 2: R4. Inclde snp positions identified to have or to not have ancestral allele R2. Keep snp positions at which 18489 genotype is not the same as the major/reference genotype R3. Let the pseudo chimp be the major/reference genotype
Scenario 3: R1. Include snp positions identified to have ancestral allele R3. Let the pseudo chimp be the major/reference genotype
Scenario 4: R4. Inclde snp positions identified to have or to not have ancestral allele R3. Let the pseudo chimp be the major/reference genotype
Scenario 5: R5. Inclde snp positions not identifed to have ancestral allele R3. Let the pseudo chimp be the major/reference genotype
Other scenarios:
Test human control bam file
Results: demuxlet assigns chimps to human and returns many doublets…
sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Scientific Linux 7.2 (Nitrogen)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.1 backports_1.1.2 magrittr_1.5 rprojroot_1.3-1
[5] tools_3.4.1 htmltools_0.3.6 yaml_2.1.16 Rcpp_0.12.14
[9] stringi_1.1.6 rmarkdown_1.8 knitr_1.17 git2r_0.20.0
[13] stringr_1.2.0 digest_0.6.13 evaluate_0.10.1
This R Markdown site was created with workflowr