How do I subset a fasta file to only contain transcripts with NO BLAST hits?
0
0
Entering edit mode
4.9 years ago
whw84 • 0

I am a bioinformatics novice, but I'm learning and managing. I've recently sequenced and am now analysing and annotating 8 transcriptomes from 2 different species. I've just recently ran a BLASTX against the the Drosophila melanogaster proteome (with .xml output).

The next thing I want to do is isolate all of the transcripts from that BLAST that did not hit anything in the Dmel proteome and BLAST them against the SwissProt Invertebrate database. As I said earlier, I am a novice, so please forgive me if this is a really simple thing to do. I would like to know, specifically, how I might approach subsetting the transcriptome fasta file to only contain the transcripts with no BLAST hits from Dmel.

Any and all insight is greatly appreciated.

RNA-Seq transcriptome annotation • 1.1k views
ADD COMMENT
1
Entering edit mode

hmm, a little unfortunate you ran the blastx with xml output , with tabular you would have been able to much more easily process the list and get to the list of no-hit IDs.

ADD REPLY
1
Entering edit mode

I think this will help: Retrieve nonmatching blast queries. Which is with FASTA and XML as inputs.

ADD REPLY
0
Entering edit mode

If the BLAST version that you used preserves the queries with "No hits found", you can also get the list of no-hit queries by:

grep -B16 '<Iteration_message>No hits found</Iteration_message>' test.xml \
  | grep '<Iteration_query-def>' \
  | sed 's/  <Iteration_query-def>//g; s/<\/Iteration_query-def>//g'

Then based on the above list, extract the no-hit sequences using SeqKit.

ADD REPLY

Login before adding your answer.

Traffic: 3176 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6