how to locate a gene sequence among fastq files containing short reads
1
0
Entering edit mode
7.9 years ago
jerrybug109 ▴ 10

Hello!

I've got a dozen different strains of bacteria for which we've sequenced the whole genomes of (we have paired end reads - forward and reverse - for each strain). I wish to find and locate a specific house keeping gene in each strain.

Could I convert the fastq files into fasta files, set up a blast database containing the fasta short read files and then blast the query gene sequence against those? Or would I need to assemble each genome first and then make a database out of the assemblies and then blast the query gene sequence against those?

Would appreciate your input, thanks :-)

ncbi blast genome • 3.1k views
ADD COMMENT
0
Entering edit mode

Don't do any of that .. yet. Make a "genome" with the gene(s) (if known or choose examples from related strains) you need and then align with BBMap. Depending on how similar "different" strains in your pool are there is some risk that reads may multimap. It sounds like you are just looking to see if a specific gene is there so go ahead and use option ambig=all with BBMap to allow reads to multi-map at all possible locations.

You could also try using BBSplit to bin the reads if you have the reference genomes for these strains.

ADD REPLY
1
Entering edit mode
7.9 years ago
piet ★ 1.8k

Using blast that way is very inefficient. If you are really impatient to see quick results and if you already have a sequence of the house keeping gene from the same species, than you may take this sequence as reference sequence and map all your reads on it with 'bwa mem'. If you can afford to wait about 5 minutes longer, you should assemble your reads with SPades.

After assembly, there is also no need to blast. It is much easier to map the contigs to the sequence of the house keeping gene with 'bwa mem'. You can even fed the contigs from all of your isolates into 'bwa mem' in a single run and you will get a nice little BAM file showing a multi sequence alignment of all the isolates comprising the house keeping gene. However, if it is really a house keeping gene, than it will be present in all of the isolates.

ADD COMMENT
0
Entering edit mode

That's a really good idea!

ADD REPLY

Login before adding your answer.

Traffic: 2631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6