I have several files with the sequence of the organism or species and its reference sequence (CDS) and I would like to eliminate the reference sequences from them leaving only the sequence of the organism.
One way would be to linearize the fasta sequences (courtesy of @Pierre's gist which can be easily found by search for linearize fasta). Then grep "^gb" to keep the sequences you want and reformat back to fasta.
I would like to remove all that start with lcl. The reference CDS. If so, does it work too?
Try the above. It will only keep sequences that start with
>gb
.