Removing Gapped Columns From A Multiple Sequence Alignment.
3
0
Entering edit mode
10.3 years ago
microbeatic ▴ 80

Hello,

I was wondering if there is a way in Biopython to remove columns from multiple sequence alignment that have gaps.

There is a whole section here on manipulating alignments but there is no function that deals with gaps.

My intentions are to remove columns that have gaps of any size, one or all.

Any help will be great.

Thank you

Something similar to what this program does.

biopython alignment • 7.7k views
ADD COMMENT
0
Entering edit mode

Since gaps are marked by a specific symbol like "-" or ".", shouldn't this be rather trivial to achieve? It's not necessarily a very good idea though..

ADD REPLY
0
Entering edit mode

Hi, I have bit similar problem. But I want to remove all the columns where any of the character is 'X'.

ADD REPLY
2
Entering edit mode
10.3 years ago
Andreas ★ 2.5k

Hi,

I don't think there is a specific Biopython function for this, but you could use the function prune_aln(aln, what='any_gap') in https://github.com/andreaswilm/compbio-utils/blob/master/prune_aln_cols.py where aln is for example created by AlignIO.read().

Just in case: if you are planning to run the program as such, you will also have to download https://github.com/andreaswilm/compbio-utils/blob/master/bioutils.py.

Andreas

ADD COMMENT
1
Entering edit mode
6.2 years ago
Prakki Rama ★ 2.7k

without re-inventing the wheel, trimal does the best job in regards to this.

trimal -in test -out test_nogaps_trimal -nogaps
ADD COMMENT
0
Entering edit mode
10.3 years ago
Asaf 10k

You can create a set of columns that contain a gap using iterative find() over each sequence and then choose the columns not in the set.

ADD COMMENT

Login before adding your answer.

Traffic: 2051 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6