How are SNP distance matrices made?
1
0
Entering edit mode
3.1 years ago
braun_tube ▴ 30

I used the CFSAN SNP Pipeline to generate a SNP distance matrix for my bacterial isolates using a reference sequence.

I am wondering how to interpret the output when my matrix tells me that two isolates have a genetic distance of 1 SNP. Surely this cannot mean that across the whole genome there is only one base where they differ. I know this because the reads for my isolates do not cover every single nucleotide in the genome. Then are these SNPs based off of specific alleles? If so how many different bases/alleles are used and by what logic are they chosen?

If anyone could explain simply how these matrices are made it would be greatly appreciated!

matrix SNP distance CFSAN pipeline • 2.5k views
ADD COMMENT
0
Entering edit mode
3.1 years ago

Surely this cannot mean that across the whole genome there is only one base where they differ. I know this because the reads for my isolates do not cover every single nucleotide in the genome.

A SNP caller can call SNPs only over the regions that contain data. If some regions are not covered those will no be included in any SNP analysis.

I would recommend consulting the original publication for more details.

https://peerj.com/articles/cs-20/

In addition, open you BAM alignments in IGV and you will gain a better understanding of how many SNPs you ought to have, no need for guessing.

ADD COMMENT

Login before adding your answer.

Traffic: 1464 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6