How to plot SNPs along a chromosome?
1
2
Entering edit mode
6.9 years ago

Hi! I have several SNPs calculated from reads of different strains of S. cerevisiae to an assambly I made.

Given the positions of the SNPs in the contigs of the alignment, I'm now trying to plot the distribution of found variants along each chromosome.

In order to do this, I've used the Mummer3 package (http://mummer.sourceforge.net/) and followed 2 unsuccessful strategies:

1) I ran the nucmer script to align my assambly to the last version of S.cerevisiae's genome. I then used the show-tiling script to select the best alignments and show the mapping through the chromosomes.

The position of the SNP in the chromosome should be equal to:

position in the contig (Given in the .VCF file) + start of the alignment in the reference (found in the show tiling output) - start of the alignment in the contig (NOT FOUND :( and approach truncated :'( ).

The starting point of the alignment in each contig was found using the "show coords" script, but I couldn't correlate the output of the show-coords with the one of show-tiling.

Show-coords output header:

[S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  | [TAGS]
=====================================================================================
       1     1612  |     1601        1  |     1612     1601  |    94.79  | ref|Chromosome_10|   NODE_167_length_2074_cov_613.985
    2536     4055  |        1     1495  |     1520     1495  |    97.43  | ref|Chromosome_10|   NODE_159_length_2510_cov_954.567
    3880     6463  |    73172    70640  |     2584     2533  |    92.82  | ref|Chromosome_10|   NODE_53_length_74700_cov_58.4878
    4528     5367  |      875      129  |      840      747  |    88.57  | ref|Chromosome_10|   NODE_152_length_3038_cov_63.6977
    5421     6274  |     1671     2510  |      854      840  |    97.54  | ref|Chromosome_10|   NODE_159_length_2510_cov_954.567
    7116     7347  |     7214     6987  |      232      228  |    97.84  | ref|Chromosome_10|   NODE_47_length_85444_cov_662.196
    7345     9938  |      312     2856  |     2594     2545  |    91.31  | ref|Chromosome_10|   NODE_124_length_7594_cov_56.6321

Show-tiling output header:

>ref|Chromosome_10| 758181 bases
8153    153910  2170    145758  88.61   97.43   -       NODE_25_length_145758_cov_50.3642
156081  157189  -74     1109    100.00  98.32   +       NODE_190_length_1109_cov_97.0845
157116  201749  5933    44634   97.56   97.92   +       NODE_82_length_44634_cov_52.1978
207683  518677  31798   310995  85.94   97.51   +       NODE_5_length_310995_cov_51.7096
550476  711480  2901    161005  99.76   97.93   -       NODE_21_length_161005_cov_48.549
714382  735009  732     20628   98.00   97.13   +       NODE_109_length_20628_cov_71.3879
735742  740637  15913   4896    85.46   98.21   +       NODE_138_length_4896_cov_111.069
756551  757564  617     1014    99.90   97.18   +       NODE_195_length_1014_cov_337.505

2) I made a pseudo molecule using show-tiling -p option, and calculated the SNPs there, being able to obtain the positions. However, when I made a gene prediction on the pseudomolecule I only obtained ~1300 genes out of 5600 in the given yeast (that were successfully predicted out of my original assambly)

So, am I missing something? Is there a better way to do all this?

Thank you!.

mummer3 SNP Chromosome position • 2.2k views
ADD COMMENT
0
Entering edit mode

Given the positions of the SNPs in the contigs of the alignment,

what is this file ? does it contains CHROM and POS ?

I'm now trying to plot the distribution of found variants along each chromosome.

if so, why do you need the other steps ? Mummer3 etc... ?

ADD REPLY
1
Entering edit mode

Because the contigs of my assambly do not correlate with the chromosomes themselves. I have ~200 contigs and 17 chromosomes.

The vcf file locates the variants in the contigs, and I was trying to use Mummer3 to map them in the real chromosomes

ADD REPLY
0
Entering edit mode
6.9 years ago

Perhaps an alternative method might be helpful. First map the contigs/scaffolds to a related species, then align other data and call SNPs.

There are a couple of programs for this. I have used the first two semi-successfully (but with big complex plant genomes, so you will do better).

https://github.com/gtamazian/chromosomer

https://github.com/combogenomics/medusa

https://github.com/fenderglass/Ragout

ADD COMMENT

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6