Exporting sequences from Excel
3
0
Entering edit mode
5.9 years ago

I'm starting a bioinformatics project and I've been given an excel sheet with two columns: one containing a number assigned to the sequence and one containing the sequence. I have to build a bootstrap tree using all the sequences and there's over 7000 of them. Manually exporting all the sequences and assigning them to a file of the corresponding number would be incredibly tedious and time consuming, is there any other way of doing it that would be faster?

I'm planning on using Mega to build the tree as it's the fastest option that I'm aware of and I'll be working in either Windows or BioLinux depending on what options I can come up with for separating out the sequences.

Thanks in advance

excel export tree building • 1.5k views
ADD COMMENT
1
Entering edit mode

Thank you both, that's awesome!

ADD REPLY
1
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

My apologies, I'll do so in future!

ADD REPLY
6
Entering edit mode
5.9 years ago
Joe 21k

Export it to a csv, or tsv, transliterate the delimiter to a newline, and you'll have something approximately resembling a fasta file.

Then tell your collaborator/boss off for ever giving you sequence data in any form of MS Office format.

ADD COMMENT
0
Entering edit mode
5.9 years ago
5heikki 11k

Assuming 1st column contains the number and the 2nd column contains the sequence:

  1. Export the file as tab-separated values
  2. awk 'BEGIN{FS="\t";OFS="\n"}{print ">"$1,$2}' exportedFile.tsv > seqs.fna
ADD COMMENT
0
Entering edit mode
5.9 years ago
  1. make sure that first column has sequence ID/Identifier and second column has entire sequence. Now export the file as tsv.
  2. Download seqkit and execute seqkit tab2fx exported_seq.tsv -o exported_seq.fa
ADD COMMENT

Login before adding your answer.

Traffic: 3230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6