how to extract counts from bam/merge-bam without gtf
2
0
Entering edit mode
5.4 years ago
Shahzad ▴ 30

I have aligned my RNA-seq PE on Transcriptome with no gtf file available using Bowtie2.

Used Samtools to make bam files and sorted.

Used sambamba merge to merge the sorted bam replicates and resorted using samtools.

bowtie2 -x ../refrence-transcriptome -1 rep1_1.fq -2 rep1_2.fq -S sample1.sam --no-unal 
samtools view -Sb 1.sam > 1.bam
samtools sort 1.bam -o 1_sorted.bam
samtools index 1rc_sorted.bam
sambamba merge mergesamples.bam 1.bam 2.bam 3.bam

I want to know what tools i can use to get counts with no gtf/gff/gff3 available and only have ref.fa?

RNA-Seq R bowtie2 samtools Hisat2 • 3.4k views
ADD COMMENT
1
Entering edit mode
5.4 years ago
ATpoint 81k

A couple of things:

  1. You should align your RNA-seq data to the genome rather than the transcriptome, but this only makes sense of you have a GTF and splice sites. Given you had these, also use a splice-aware aligner such as hisat2 or star.
  2. Merging replicates (given these are biological replicates) is not recommended as you lose any information about biological variability between the samples.

EDIT: See also michael.ante's answer below on why alignment is an inferior choice here.

I you situation, given the absence of a GTF, I would use salmon to quantify your reads directly against your transcriptome, which you seem to have in fasta format. Salmon is a alignment-free (pseudoaligner) tool with extensive documentation. Spend some quality time reading it and come back in case of questions. Do the quantification for each fastq files separately, given these are biological replicates.

ADD COMMENT
0
Entering edit mode

thank you for guiding. reference genome is not available yet. I ll try salmon and let you know about the output. Thank you.

ADD REPLY
1
Entering edit mode
5.4 years ago
michael.ante ★ 3.8k

Hi Shahzad,

If you have only the transcriptome annotation, I'd go for Salmon .

In case your species has multiple transcript per gene, you might run into troubles with the multimapping reads. Per default bowtie2 reports one alignment per read(-pair). Even if you have several with the same score. Thus, if several transcripts of a gene share an exon, all reads aligning on this exon will be assigned "randomly" to just one transcript.

If your species has only one transcript per gene, you can try to use samtools idxstats on your bam files to get the alignments per transcript.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

how to proceed further with idxstat output. I have done it but I thing I have done any mistake DESeq did not accept it. Can you please guide about how to format the output tables for DESeq or EdgeR?

ADD REPLY
1
Entering edit mode

With samtools idxstat, you get a table of the contig name, the contig's length, the number of alignments, and the number of unmapped reads. Having such a table in R. you only need to select the first column as row names and the third column as the counts. You also might discard the last row.

ADD REPLY

Login before adding your answer.

Traffic: 2596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6