Merging all sequences with identical ID's
1
0
Entering edit mode
5.5 years ago
sativus ▴ 20

Hi!

I am having issues with multiple genes (fasta files) which i am supposed to concatenate. My issue lies in that all these genes have identical taxon-identifiers, meaning that after concatenating my aligned + trimmed files, i end up with multiple duplicate headers in the combined file. What i am wondering is if there is any method, preferably in python, to merge all sequences with a identical header into one sequence (ie. remove the duplicate header entries, and then merge all sequences matching that header into one sequence?

sequence • 3.8k views
ADD COMMENT
0
Entering edit mode

please provide example.

ADD REPLY
0
Entering edit mode

Ha, just realized, I recommended your tool :)

ADD REPLY
1
Entering edit mode
5.5 years ago
thackl ★ 3.0k

seqkit concat might do what you want: "concatenate sequences with same ID from multiple files"

https://github.com/shenwei356/seqkit

ADD COMMENT

Login before adding your answer.

Traffic: 2456 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6