Tutorial:Mini-tutorial: How to fix too-long sequence names in SyMAP
0
5
Entering edit mode
8.9 years ago

SyMAP is my "to go" tool when I want a multiple sequence alignment as a neat dotplot, especially useful when I want to compare two genomes of two cultivars of the same species.

However, it directly loads the names of all chromosomes/contigs/sequences into the dotplot - so if you have relatively long names, they will overlap with your dotplot, making it very ugly and practically unreadable. I used to fix this by changing the chromosome names in the fasta-files and then re-running the alignment which can easily take a day or two, but tonight I found out how to fix it directly. I'm putting this here for the next poor soul who doesn't want to wait so long.

What you need:

  • A working MySQL client installation (sudo apt-get install mysql-client)

I assume that you're letting SyMAP use its own MySQL database that comes with the installation (if you use a system-wide one, you'll have to get the socket path, username and password for it). Backup your SyMAP folder somewhere, then start SyMAP like you start it usually.

You'll need to find the socket file of the MySQL database, it's usually in /path/to/symap/mysql/data/mysql.sock. The startup message of SyMAP will give you hints to the location.

Then, you can connect to the database like this:

mysql --socket /path/to/symap/mysql/data/mysql.sock --user admin

(The username was hidden in ./mysql/data/mysql/user.MYI)

You will be greeted by something like:

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.0.51a MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MySQL [(none)]>

We first have to change to the symap database:

MySQL [(none)]> use symap;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MySQL [symap]>

The names of all sequences are in symap.groups, so do

MySQL [symap]> select * from groups;

You'll see the short and full names of all of your chromosomes, now it's just a matter of replacing these. To me it looks like to make the dotplots SyMAP ignores the "fullname" column and uses only the "name" column, so we can leave the "fullname" column.

To replace the short name for the first chromosome with "idx" = 22 (it depends on your dataset what the actual number is)

MySQL [symap]> UPDATE groups SET name = "A1" WHERE idx = 22;

Do that for all of your chromosomes, do a "COMMIT;" for safety's sake (in my case it saved everything without COMMIT), then leave MySQL and restart SyMAP. Your dotplot should look better now.

Edit, 5 months later:

The nice thing about sharing your procedures and code-bits is that you can lose them in your personal notes, as it happened to me today.

I had to re-do this and also found out how to fine-tune or change the order of the chromosomes in the dotplot - in my case, the unplaced contigs were displayed in the first row and column, but I wanted to have them in the last row/column. The order is stored in the same "groups" table as we changed above - in this case, all you have to do is to change the number in the "sort_order" column.

So if your chromosome 1 (with idx=1 in the groups table) is in the first position, but you want it in the last position when you have 20 chromosomes, you have to run:

MySQL [symap]> UPDATE groups SET sort_order = 20 WHERE idx = 1;

Make sure that you change the sort_order value of the "partner" chromosome as well, else it's all out of whack.

alignment • 4.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 1295 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6