How to get counts data for allele specific expression analysis (ASE) from aligned bam files (RNA-seq data)?
3
1
Entering edit mode
6.4 years ago
Naresh D J ▴ 110

Hi All,

I want to perform the gene centered allele specific expression (ASE) analysis. How to get the read counts for reference allele and alternate allele from aligned bam file (RNA-seq data). I have found several tools for ASE once we have counts data.

Can you provide some tools to get the counts data or direction to proceed.

Thank you.

RNA-Seq SNP ASE vcf • 4.2k views
ADD COMMENT
0
Entering edit mode

How did you map your reads? Did you construct personalized haplotype genomes? Or did you align to a single reference?

ADD REPLY
0
Entering edit mode

I aligned to single reference genome hg19.

ADD REPLY
2
Entering edit mode
6.3 years ago
erdiazval ▴ 110

The first step is to idetifying polymorphisms of your libraries against the reference genome. I would go for a pseudo-reference aproach which implies that you impute the homozygous SNPs of your libraries into the reference genome. Then you align your libraries against both the reference and the pseudo-reference to retrieve reciprocal calls that are heterozygous. I would go for the GATK's pipeline, if you follow this pipeline you will finish with a VCF file that contains the number of reads that mapped to the reference/alternative alleles. Then you normalize and perform differential expression analysis (binomial test/ LRT) to detect allelic imbalance at heterozygous SNPs. This was the pipeline I developed to analyze transcriptomic data for ASE analysis.

Best

ADD COMMENT
0
Entering edit mode

Thank you, I followed the GATK pipeline.

ADD REPLY
1
Entering edit mode
6.3 years ago
trausch ★ 1.9k

You may want to try Allis. It requires an RNA-Seq BAM and optionally variant calls from WGS or WES. If no variants are provided they are called from the RNA-Seq BAM file. The variants are then phased against the 1000 Genomes reference panel (hg19) using Eagle2. Allis then parses the phased VCF file for het. bi-allelic variants which are used to split the input BAM into haplotype 1 and haplotype 2, ambiguous reads are discarded. In addition to h1.bam and h2.bam it outputs an allele-specific count table with depth, REF-support, ALT-support, the variant allele frequency and the haplotype allele frequency. It also runs a simple binomial test for every SNP. It's still work-in-progress so any feedback is welcome.

ADD COMMENT
0
Entering edit mode

Thank you, I followed the GATK pipeline. I will take a look at the Allis as well.

ADD REPLY
0
Entering edit mode
6.3 years ago
Naresh D J ▴ 110

GATK tool has the option of running without phasing information.

ADD COMMENT

Login before adding your answer.

Traffic: 2712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6