SNP difference between isolates
1
0
Entering edit mode
5.7 years ago

Hello, I have two .vcf files containing annotated SNP lists obtained from SNPeff for two different isolates of tubercle bacilli. I now wish to compare SNP differences between these two isolates. I wonder if there are any tools that I can use to perform the task.

Thanks Sanjay

SNP SNPeff Mycobacterium tuberculosis • 1.5k views
ADD COMMENT
1
Entering edit mode

Hello sanjay.bpkihs,

"compare" is such an elastic word. Could you please explain in detail how your output should look like?

fin swimmer

ADD REPLY
1
Entering edit mode
5.7 years ago
Medhat 9.7k

You can use bcftools:

bcftools isec [OPTIONS] A.vcf.gz B.vcf.gz […]

Creates intersections, unions and complements of VCF files. Depending on the options, the program can output records from one (or more) files which have (or do not have) corresponding records with the same position in the other files.

Another tool for comparison is vcf-comapre:

vcf-compare

Compares positions in two or more VCF files and outputs the numbers of positions contained in one but not the other files; two but not the other files, etc, which comes handy when generating Venn diagrams. The script also computes numbers such as nonreference discordance rates (including multiallelic sites), compares actual sequence (useful when comparing indels), etc.

Or bedtools intersect you can find it here.

Lastly from GATK CombineVariants

And for statistics:

bcftools stats [OPTIONS] A.vcf.gz [B.vcf.gz]
Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats. When two files are given, the program generates separate stats for intersection and the complements. By default only sites are compared, -s/-S must given to include also sample columns. When one VCF file is specified on the command line, then stats by non-reference allele frequency, depth distribution, stats by quality and per-sample counts, singleton stats, etc. are printed. When two VCF files are given, then stats such as concordance (Genotype concordance by non-reference allele frequency, Genotype concordance by sample, Non-Reference Discordance) and correlation are also printed. Per-site discordance (PSD) is also printed in --verbose mode.

ADD COMMENT

Login before adding your answer.

Traffic: 2807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6