Comparing millions of trimmed reads to large database
1
0
Entering edit mode
3.0 years ago
geneticatt ▴ 140

Hi all,

I have a set of reads which I've trimmed down to 21nt based on the sequencing experiment. I'd like to compare these 21nt sequences to a database of 300,000 21nt sequences to annotate each read. I attempted to use bowtie2 by making a indices for the database then mapping the reads, but the mapping rate was lower than expected, suggesting that the bowtie read mapping method isn't amenable to this type of comparison.

Next I tried using Blastn, but it's apparently too slow for this scale of comparison.

Can someone please recommend a tool or approach for making so many exact comparisons?

Thanks

bowtie2 blastn • 832 views
ADD COMMENT
1
Entering edit mode
  1. You should try using bowtie v.1.x. You need that to do ungapped alignments with small reads such as these.
  2. You may be able to use blat as well.
  3. Using seqkit grep.
  4. Using bbmap.sh with ambig=all vslow perfectmode maxsites=1000 options.
ADD REPLY
0
Entering edit mode
3.0 years ago
h.mon 35k

With these short sequences (is this microRNA), I suspect clustering will be more efficient than mapping. First, deduplicate both query and subject files with, e.g., VSEARCH, CD-HIT or Dedupe.sh. Then, the same tools can be used to find the common sequences between both datasets.

ADD COMMENT

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6