Entering edit mode
8.9 years ago
seta
★
1.9k
Hi all,
I've downloaded lots of nucleotide sequences from NCBI, now I would like to divide them into two separate files, partial and complete cds. Also, there is some nucleotide sequences that have not been determined as either complete cds or partial sequence within my nucleotide sequences. Please share any your commands or script to do this. Sorry, if you find the question is so basic. Thanks
Do you mean splitting the sequences based on the information on the header lines (fasta format)? Just be aware that not all sequences downloaded from NCBI will have that information on header. If this is not you wanted, then there is no way to say if the sequence is complete or partial, unless you align it to the reference sequences.
Yeah, that's right. I plan to split them based on fasta header, please let me know your approach to do it?. You're right, unfortunately as I also mentioned in my post some sequences have not such information in the header. I have not reference sequences to do it.