We have used DaPars and get results out of it. However we never get very many. I don't know if this is because we only have mild effects, or it DaPars is not very sensitive.
Divide your gene into exons, or parts of exons that can be used to build the any transcript in the gene. i.e. if you have the following three transcripts in a gene:
you would get the following "chunks"
1 2 3 4 5 6 7
we use the
gtf2gtf tool from
CGAT to do this, being careful to remove retained intron chunks. Given that so many APAs are just after the end of the CDS, you might also want to seperate out the CDS and UTR into seperate chunks.
If you have unstranded RNAseq, you'll also want to remove chunks that overlap on alternate strands because differential expression of an overlapping gene is likely to look like APA. We use the following script to do that:
bedtools subtract -a infile.gtf -b infile.gtf -S > tmp1;
bedtools merge -i <( sort -k1,1 -k4,4n tmp1) -c 6 -o count -d -2
| awk '$4>1'
| bedtools subtract -a tmp1 -b stdin
| gzip > outfile.gtf.gz;
featureCounts to count reads mapping to each chunk, being careful to include reads that map to multiple chunks from the same gene.
Do differential exon usage analysis using DEXSeq.
Identify which of your chunks correspond to the ends of a transcript. In the above example , this would be 3, 6 and 7. Look which of your differential exons by DEXSeq are amoungst this list of "last exon chunks".
Its a bit fiddly, and doesn't sound like it should work, but we've found we find more this way than we do with DaPars.