Question

Convert list of DNA sequences in text format to a single length?

0

Entering edit mode

7.9 years ago

LivelongandProsper ▴ 20

Hi, I have a text file with a long list of DNA sequences.

I would like to convert them all to the same length, with that length being the longest sequences. "D's" should be added to those sequences that are shorter.

Is there anyway to do this in R or Biophython, some script like:

1) Read sequences and find longest sequence

2) Loop through each sequence adding "D"s to match the length of the longest sequence

I was looking through the APE package in R as I imagine something must exist already to accomplish this.

Any help with be appreciated.

sequence R • 2.3k views

ADD COMMENT • link updated 7.9 years ago by Anima Mundi ★ 2.9k • written 7.9 years ago by LivelongandProsper ▴ 20

0

Entering edit mode

Is your file in fasta format? Not clear from your question.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

No, the file is not yet in fasta format. Just a text document.

ADD REPLY • link 7.9 years ago by LivelongandProsper ▴ 20

score 2 · Answer 1 · 2016-05-24

Hello, here is a quick and dirty solution in Python (for an input file named foo.fasta):

maxl = 0
for line in open('foo.fasta'):
    if '>' not in line:
        if len(line) > maxl:
            maxl = len(line)

for line in open('foo.fasta'):
    if '>' not in line:
        print line.replace('\n','') + 'D'*(maxl - len(line))
    else:
        print line,

Hope it helps!

PS: the script assumes you have no other text than FASTA lines, and that your sequences are formatted as single lines.

score 1 · Answer 2 · 2016-05-24

1

Entering edit mode

7.9 years ago

natasha.sernova ★ 4.0k

See this post.

How to copy all fasta-seqs from fasta-files with the seq-lengths between minlen and maxlen

There are many helpful script vertions inside.

I am using lh3-script in Perl.

It's almost what you need, isn't it?

ADD COMMENT • link 7.9 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

Thank you for this resource!

ADD REPLY • link 7.9 years ago by LivelongandProsper ▴ 20