Biostar Beta. Not for public use.
Extract sequences which do not have blast hits
0
Entering edit mode
2.4 years ago
karthic • 100
@karthic42122

Hi,

I have a fasta file with around 1 million sequences. I did a blast search and got hits for around 7500 sequences. Now I want to extract those sequences which do not have a hit and take them for further analysis.

So far am using a custom sed script which is very slow, judging from the speed, it might take several days to complete. Please help me with fast and robust solutions.

the script am using currently is below..

    cat CG061MR_S20_R1_001_AR_filter_un_ren.fa > CG061MR_S20_R1_001_AR_filter_unblasted.fa

    for j in $(cat CG061MR_blastids.txt)    
    do
    sed -i -e '/'$j'/{N;d}' CG061MR_S20_R1_001_AR_filter_unblasted.fa

done

Thank You KK

RNA-Seq sequence blast extraction • 227 views
ADD COMMENTlink
0
Entering edit mode
2.4 years ago
karthic • 100
@karthic42122

Sorry guys for bothering.

Found the solution with Jim kent's faSomeRecords

Thank You

KK

ADD COMMENTlink
0
Entering edit mode

faSomeRecords or GetFaRecords? karthic

ADD REPLYlink
0
Entering edit mode

Sorry, its faSomeRecords. Corrected it.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3