Having Issues with Variant Calling
1
0
Entering edit mode
6.1 years ago
shola_aleru ▴ 10

Hi! I am a beginner in bioinformatics so I apologize if I am not too clear on the terminology.
I am trying to variant call some transcriptome data. So far I have:

  • mapped the reads to a refrence genome using Bowtie2
  • converted the .sam files into .bam files using samtools view
  • sorted and indexed the bamfiles using samtools

Next I know I will use the mpileup function but whenever I specify a region, the resulting vcf only gives me this when I open in excel: contig= ID=NW_019351050.1,length=6341

When I do not specify a region, I still get that same contig line but I get other regions that look like this in excel.

NW_019316579.1  613728  .   C   <*> 0   .   DP=1;I16=0,1,0,0,40,1600,0,0,42,1764,0,0,25,625,0,0;QS=1,0;MQ0F=0   PL  0,3,40

I am a bit confused on what I am actually viewing. Any help would be appreciated! Thank you in advance

The code I used for mpileup: samtools mpileup -v -r NW_019351050.1 -f genomic.fna sorted.bam > variant.vcf.gz

sequencing SNP samtools variant calling • 1.5k views
ADD COMMENT
0
Entering edit mode

what do you open in excel ? variant.vcf.gz ? the compressed file variant.vcf.gz ?

ADD REPLY
1
Entering edit mode

I gunzip the vcf.gz file and then I view the vcf in excel.

ADD REPLY
0
Entering edit mode

I almost forgot : don't use excel.

ADD REPLY
0
Entering edit mode

and what is the version of samtools ?

ADD REPLY
0
Entering edit mode

I am using samtools 1.5

ADD REPLY
1
Entering edit mode
6.1 years ago

samtools mpileup produces a raw vcf file that must be called with "bcftools call"

see http://www.htslib.org/workflow/

to convert your BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome. We use this information to call genotypes and reduce our list of sites to those found to be variant by passing this file into bcftools call.

You can do this using a pipe as shown here:

bcftools mpileup -Ou -f <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam> | bcftools call -vmO z -o <study.vcf.gz>
ADD COMMENT
0
Entering edit mode

and do not use excel. Never.

ADD REPLY
0
Entering edit mode

What is the reason why to never use excel? I don't know any other way to see what I actually have.

ADD REPLY
0
Entering edit mode

What is the reason why to never use excel?

excel introduces error in gene names, excel prevents you from having good practices with the command line

I don't know any other way to see what I actually have.

gunzip -c  variant.vcf.gz   | more
ADD REPLY
1
Entering edit mode

Our less can read compressed files without needing a gunzip

ADD REPLY
0
Entering edit mode

Thank you!

Maybe I am not understanding it well but I followed that pipeline to get the vcf.gz file and then used the gunzip -c command, sent that to a file and the region I am interested in still looks like this :

contig= ID=NW_019351050.1,length=6341

ADD REPLY
1
Entering edit mode

Hello,

it is better to post every single command you use and also the first line of your resulting file, because "I followed that pipeline" and "looks like this" let so much space for a misinterpretation.

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6