Biostar Beta. Not for public use.
Printable visualizations of large-scale alignments
0
Entering edit mode
20 months ago
Berlin, Germany

I would like to visualize a large ("large" as in 20 bacterial genomes) multi-sequence alignment such that (a) the sequences are wrapped within pages, (b) the individual nucleotide letters remain visible (at least minutely), and (c) nucleotide differences compared to a consensus are highlighted. I wish to convert these alignments into PDF documents for subsequent printing.

Popular alignment viewers such as AliView or Geneious have their difficulties with such alignment visualizations. AliView cannot provide a wrapped layout, whereas Geneious hangs for large alignments (24 GB RAM insufficient for alignment of 20 bacterial genomes).

Do you have any software suggestions?

Note: One option may be to use a console-based alignment viewer (such as alan or alv), if the resulting visualizations could be passed to cups-pdf.

Edit 1: The R-package msa has been among my earlier tryouts. While a wonderful tool for generating small, publication-ready visualizations, it too stalls once the alignment becomes too large. For example, an alignment of 20 bacterial genomes is not converted within two hours of executing command myFirstAlignment <- msa(mySequences). Given that msa essentially wraps texshade, I would guess that the processing time with texshade would not be much smaller either.

ADD COMMENTlink
0
Entering edit mode

I won't add this as an answer just yet since I'm not sure how well it would handle it either, but the other 2 options that occur to me are SeaView (which I think can save PDF or maybe PostScript), and ESPript.

I can't attest to how well either will deal with large data though.

Can I ask why it is you need such a large alignment 'printed'? It doesn't seem like it would be very useful.

ADD REPLYlink
0
Entering edit mode

When you say convert to PDF, do you want them to still be vector graphics? Or would bitmaps work too?

ADD REPLYlink
3
Entering edit mode
20 months ago
Berlin, Germany

I found that I can achieve what I was looking for using the command-line alignment viewer alv. It may not be super pretty, but it is fast:

alv myReallyLargeAlignment.fasta -t dna -k -w 300 | aha | wkhtmltopdf - soughtVisualization.pdf

In addition, the alignment viewer belvu of the SeqTools package can also generate visualizations quickly, albeit only the first 10kp or so are visualized (depending on the line wrap value setting).

ADD COMMENTlink
0
Entering edit mode
2.1 years ago
thackl ♦ 2.6k
MIT

If you want PDF, there's are powerful latex package: http://ftp.cvut.cz/tex-archive/macros/latex/contrib/texshade/texshade.pdf

And if you don't want to deal with tex, there's an R interface, too: https://rdrr.io/bioc/msa/man/msaPrettyPrint.html

ADD COMMENTlink
0
Entering edit mode

Beat me to it! TeXShade would be my suggestion. I used it extensively in my thesis. Just be aware that it can add some compile time to the document.

ADD REPLYlink
0
Entering edit mode

@jrj.healey Theoretically I would agree, but a compilation of a texshade section (with 150000+ bp) in a LaTeX document would take ages. I am looking for something more light-weight.

ADD REPLYlink
0
Entering edit mode

@thackl The R-Package msa has been among my earlier tryouts, but is not a workable solution. It stalls when given a multi-sequence alignment of 20 or more bacterial genomes. Likewise, tex-documents with texshade-sections would take ages to compile (if at all) under such input alignments. In my experience, the only software tools that can open bacterial-sized alignments for visualization in reasonable time frames seem to be terminal/console-based tools (such as the ones mentioned above).

ADD REPLYlink
0
Entering edit mode

Ah, good to know. Haven't used it for alignments that big yet.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1