Last updated: 2018-04-17

Code version: 4bc82ab

Motivation

For identifying and distingushing single cell samples from human and chimp individuals in a single Dropseq run.

Data

Pilot data: Yoruba cell line 18489 was included in the human-chimp mix. This is a female individual.

Human reference: snps.grch37.exons.vcf.gz. For how the human vcf was generated, see here https://github.com/jdblischak/singleCellSeq/blob/master/code/verify-bam.py

Apply demuxlet to separate human and chimp

The main idea is to provide demuxlet a subset of SNP positions that can be used to distinguish between human and chimp samples.

I’ll describe the approach in steps here:

  1. Map all samples to human genome

  2. Assume that the human individual is genotyped, we can obtain this individual’s genotype from the 1000 Human Genome project.

  3. Construct a pseudo-genotype for chimp using ancestral allele.
    • Assume chimp genotype is homozygous
    • Include only ancestral allele computed with high level of confidence
    • Include only ancestral allele that is either reference or alternate allele. Perhaps later on we could include the ancestral allele that are either reference or alterante.

This R Markdown site was created with workflowr