Shortcuts for retrieving information on specific genomic loci from big SRA files
1
0
Entering edit mode
8.9 years ago
Anima Mundi ★ 2.9k

Hello,

Recently I have been trying to get some information regarding the gene structure of a few mouse genes. All I have is an RNA-Seq-derived SRA raw file. I used it to generate a FASTQ file using fastq-dump from SRA Toolkit. I then run tophat from TopHat2/Bowtie2 to align FASTQ reads to genomic reference. Unfortunately tophat turned out to be extremely slow on the machine I am using. However, since what I actually need is just to check a few loci, I would like to know if there are alternative solutions. For example, could I use single FASTA files (retrieved from Ensembl for each locus) to generate small, locus-specific indexes?

tophat RNA-Seq bowtie sra • 2.2k views
ADD COMMENT
1
Entering edit mode
7.6 years ago

No. That would probably be a bad idea, because a read might be mapped with mismatches to your small index where it would be mapped perfectly elsewhere.

Much of the recent SRA is referenced compressed, so if you are lucky, you might be able to just retrieve reads from your area of interest. Alternatively, TopHat2 is one of the slower RNA-seq mappers. Try using something faster, like HISAT2, or STAR.

ADD COMMENT
0
Entering edit mode

Thanks. That time I did not manage to extract the information I wanted with the with the computing power I had. Next time I will try my luck with faster algorithms like those you suggest (or with better machines!).

ADD REPLY

Login before adding your answer.

Traffic: 2906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6