Why Is The Tophat2 Unmapped.Bam File Larger Than The Accepted.Bam File?
4
1
Entering edit mode
11.5 years ago
Varun Gupta ★ 1.3k

Hi Everyone

I am using rna seq data with tophat2 and getting unmapped.bam file in addition to usual accepted.hits.bam file.

The unmapped.bam file is quite big in size as compared with alignment file. Is there anything i should worry about.

Regards V

• 5.0k views
ADD COMMENT
0
Entering edit mode

Hi Gupta

Did you find answers about this problem? now I met same situation, my accepted-hits.bam and ummapped.bam are about 1M and 1.2G respectively. If you have any ideas to solve it, please help me.

Thank you!

ADD REPLY
0
Entering edit mode

Not really Chen. I moved to STAR aligner

ADD REPLY
3
Entering edit mode
11.5 years ago

It is larger because you have more unmapped reads than mapped reads. Whether or not that is something to worry about depends on many factors - the origins of samples, the genome you align against etc.

ADD COMMENT
0
Entering edit mode

I am using the default parameters only...

ADD REPLY
0
Entering edit mode
11.5 years ago

Did you try looking at some of the unmapped reads to see why they were unmapped?

ADD COMMENT
0
Entering edit mode

I DID

Some of them have N's . So one reason is that for sure

But the ones which does not have N's how to deal with them..

Regards

ADD REPLY
0
Entering edit mode

Well, did you examine the ones without N's? Are they poor quality? Are they genomic contamination? Another species contamination?

There might be a very big problem. Maybe the sample prep people are screwing up. Maybe a kit was defective. Maybe your reference file is screwed up. Maybe you were totally misinformed as to what the project was about. You won't know unless you actually look at your raw data.

ADD REPLY
0
Entering edit mode
11.5 years ago
Ali ▴ 140

Perhaps you need to modify the arguments passed to tophat as the contraints (like the number of mismatches allowed, maximum size of intron and so on) to increase the number of accepted hits.

The other possibility is try to provide a better reference genome, if it's not a good one, or perhaps to see if your RNA-seq experiment is well done.

ADD COMMENT
0
Entering edit mode
11.5 years ago

Did you check your "junctions.bed" file( It should be empty in your case)? I think thats because tophat is not able to find the splice junctions.

I was getting this problem and i think I changed the -r (innermatedist) parameter in tophat to fix this

ADD COMMENT
0
Entering edit mode

Hi

This was single end data. So -r was not required. My junctions.bed file was not empty either..

Regards

varun

ADD REPLY

Login before adding your answer.

Traffic: 2671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6