Biostar Beta. Not for public use.
How do I subset a fasta file to only contain transcripts with NO BLAST hits?
0
Entering edit mode
16 months ago
whw84 • 0

I am a bioinformatics novice, but I'm learning and managing. I've recently sequenced and am now analysing and annotating 8 transcriptomes from 2 different species. I've just recently ran a BLASTX against the the Drosophila melanogaster proteome (with .xml output).

The next thing I want to do is isolate all of the transcripts from that BLAST that did not hit anything in the Dmel proteome and BLAST them against the SwissProt Invertebrate database. As I said earlier, I am a novice, so please forgive me if this is a really simple thing to do. I would like to know, specifically, how I might approach subsetting the transcriptome fasta file to only contain the transcripts with no BLAST hits from Dmel.

Any and all insight is greatly appreciated.

1
Entering edit mode

hmm, a little unfortunate you ran the blastx with xml output , with tabular you would have been able to much more easily process the list and get to the list of no-hit IDs.

1
Entering edit mode

I think this will help: Retrieve nonmatching blast queries. Which is with FASTA and XML as inputs.

0
Entering edit mode

If the BLAST version that you used preserves the queries with "No hits found", you can also get the list of no-hit queries by:

grep -B16 '<Iteration_message>No hits found</Iteration_message>' test.xml \
| grep '<Iteration_query-def>' \
| sed 's/  <Iteration_query-def>//g; s/<\/Iteration_query-def>//g'


Then based on the above list, extract the no-hit sequences using SeqKit.