NJ phylogenetic tree calculation method weighing gaps in the alignment
1
1
Entering edit mode
9.1 years ago
Kame ▴ 20

Hello,

Is there any NJ phylogenetic tree calculation method that would work on large genomic alignments and which takes into account/weight the gaps in the alignment?

I have managed to do it with FastTree, which does it exactly as I would like (from their website: "When comparing two sequences, positions with gaps are ignored; when comparing two profiles, positions are weighted by their proportions of non-gaps.") but I was wondering if there was a NJ method to do it as well. I use RapidNJ, which doesn't seem to be able to do it.

There is/was an option in MEGA to do it, by defining the gap threshold, i.e. the percentage of gaps required for each site to be considered, but I'd prefer a command-line based method as MEGA does not handle well large alignment files.

Thanks a lot for your help

K

gaps alignment NJ tree • 2.9k views
ADD COMMENT
0
Entering edit mode
9.1 years ago
Brice Sarver ★ 3.8k

One way I've tackled this issue is by reading large alignments into R using Bioconductor's Biobiostrings, using MultipleAlignment objects, specifically. You can mask certain bases that fall below a certain threshold using the maskGaps() function. Since you have the alignment already, any regions with a large percentage of gaps would be masked. When you write the object back out, these sites are removed. You'll be able to estimate your tree using whatever approach you used previously.

Obviously, the efficacy of this approach depends on how much data you actually have. readDNAStringSet() will read an entire mouse genome into memory in < 10 seconds. The alternative, which wouldn't be too difficult but perhaps slower, would be to use Python (or perl or a language of your choice) to process a file site-by-site and remove sites that don't meet your criteria.

ADD COMMENT

Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6