Question

How are SNP distance matrices made?

0

Entering edit mode

3.1 years ago

braun_tube ▴ 30

I used the CFSAN SNP Pipeline to generate a SNP distance matrix for my bacterial isolates using a reference sequence.

I am wondering how to interpret the output when my matrix tells me that two isolates have a genetic distance of 1 SNP. Surely this cannot mean that across the whole genome there is only one base where they differ. I know this because the reads for my isolates do not cover every single nucleotide in the genome. Then are these SNPs based off of specific alleles? If so how many different bases/alleles are used and by what logic are they chosen?

If anyone could explain simply how these matrices are made it would be greatly appreciated!

matrix SNP distance CFSAN pipeline • 2.5k views

ADD COMMENT • link updated 3.1 years ago by Istvan Albert 100k • written 3.1 years ago by braun_tube ▴ 30

score 0 · Answer 1 · 2021-03-31

Surely this cannot mean that across the whole genome there is only one base where they differ. I know this because the reads for my isolates do not cover every single nucleotide in the genome.

A SNP caller can call SNPs only over the regions that contain data. If some regions are not covered those will no be included in any SNP analysis.

I would recommend consulting the original publication for more details.

https://peerj.com/articles/cs-20/

In addition, open you BAM alignments in IGV and you will gain a better understanding of how many SNPs you ought to have, no need for guessing.