Question

merging a number of overlapping sanger sequences

0

Entering edit mode

6.6 years ago

thomas.welch ▴ 50

Hi there,

I have 80 DNA samples in which we have sequences three overlapping sections of a large gene using the sanger method. I am now looking for a way to merge these sequenced segments of the gene into a single sequence for each sample so that they can be aligned for analysis.

I have come across a couple of tools for doing this with just two overlapping sequences (such as emboss), and i've seen that this can be done with bioedit for one sample at a time, but is there a tool that can allow me to do this in bulk. or will i have to align and assemble them as i would with ngs data of a genome?

Thankful for any answers.

Kind Regards, Tom

merge sequence alignment • 5.7k views

ADD COMMENT • link updated 5.6 years ago by ferroao ▴ 20 • written 6.6 years ago by thomas.welch ▴ 50

2

Entering edit mode

You should give tadpole.sh from BBMap a try. It should work with fasta formatted sequences.

ADD REPLY • link 6.6 years ago by GenoMax 141k

0

Entering edit mode

This would be trivially simple if you had access to Sequencher, DNASTAR, ContigExpress from Vector NTI among others (Note: these are all commercial software packages and are not free). Consed suite will work as well but it will require signing an academic agreement and some effort on your part to install everything.

ADD REPLY • link 6.6 years ago by GenoMax 141k

score 1 · Answer 1 · 2017-09-27

I would suggest the suite phred/phrap/consed. It was widely used in the "old" Sanger days, and after all it was working pretty well. Consed is a "finishing" tool, which is nevertheless pretty useful to visualize assemblies and correct errors. The major drawback is that you might have to invest some time to learn how to use them.

score 0 · Answer 2 · 2018-09-24

0

Entering edit mode

5.6 years ago

ferroao ▴ 20

If you have a fasta with all sequences, you can use this R script

# install libraries and dependencies
# necessary for sangeranalyseR 
# for ex.
# BiocInstaller::biocLite("DECIPHER")

# install sangeranalyseR package
library(devtools) 
install_github("roblanf/sangeranalyseR") 
library(sangeranalyseR)
setwd("~/your folder")

# read fasta file with several sequences 
fastas<-seqinr::read.fasta("myFastas.fas", as.string=T)
# make DNAstring objects 
reads = DNAStringSet(as.character(fastas) ) 
names(reads) = names(fastas)
# merge sequences 
merged.reads = merge.reads(reads)
# consensus 
merged.reads$consensus
BrowseSeqs(merged.reads$alignment)

# write to file 
seqinr::write.fasta(as.character(merged.reads$consensus), "consensus", file.out="cons.fas", nbchar=100000, as.string=T)

or this python script

python3.4 combineSequences.py -f myfastas.fas -r myout.fas

Script in: https://gitlab.com/ferroao/msa Copied from Rosa Tung https://github.com/rostun/DNA_multiple_sequence_alignment

ADD COMMENT • link 5.0 years ago by ferroao ▴ 20

2

Entering edit mode

Hi ferroao

You claim you've "forked" your code from https://github.com/rostun/DNA_multiple_sequence_alignment, but you've actually copied over their code to a different git site (github vs gitlab). Also, all you've done is made the input and output files command line arguments (and added an unnecessary step to strip empty lines). You have not changed any of the underlying algorithm. Have you at least addressed the 50-sequence, 1000-length, ATCG-only limitations?

I'd like to understand why you're spamming old threads with a script you did not author when the script is 2 years old, has so many limitations and was written as part of ~~what looks like a classroom~~ what is definitely a rosalind challenge?

If you sincerely think the script is performant, please create a Tool type post for it.

ADD REPLY • link 5.6 years ago by Ram 43k

0

Entering edit mode

I think my answers is appropriate to this question. You can use the moderate option if you want so. Most limitations you talked about are just about the example.txt not the script. Best,

ADD REPLY • link 5.6 years ago by ferroao ▴ 20

0

Entering edit mode

No, but I do not appreciate code adapted from repositories without due attribution, especially when the contribution post adaptation is negligible. The code is from a rosalind challenge by an amateur coder, so I am pretty sure it is not as good as established, tested tools. In addition to this, going back to year-old posts to add an answer advertising a poor solution that is ill-adapted on top is not recommended. I will not use the moderate option as what you're doing is not inappropriate, just a little ill-advised.

ADD REPLY • link 5.6 years ago by Ram 43k