Hi! I am a beginner in bioinformatics so I apologize if I am not too clear on the terminology.
I am trying to variant call some transcriptome data.
So far I have:
- mapped the reads to a refrence genome using Bowtie2
- converted the .sam files into .bam files using samtools view
- sorted and indexed the bamfiles using samtools
Next I know I will use the mpileup function but whenever I specify a region, the resulting vcf only gives me this when I open in excel: contig= ID=NW_019351050.1,length=6341
When I do not specify a region, I still get that same contig line but I get other regions that look like this in excel.
NW_019316579.1 613728 . C <*> 0 . DP=1;I16=0,1,0,0,40,1600,0,0,42,1764,0,0,25,625,0,0;QS=1,0;MQ0F=0 PL 0,3,40
I am a bit confused on what I am actually viewing. Any help would be appreciated! Thank you in advance
The code I used for mpileup: samtools mpileup -v -r NW_019351050.1 -f genomic.fna sorted.bam > variant.vcf.gz
what do you open in excel ? variant.vcf.gz ? the compressed file variant.vcf.gz ?
I gunzip the vcf.gz file and then I view the vcf in excel.
I almost forgot : don't use excel.
and what is the version of samtools ?
I am using samtools 1.5