Biostar Beta. Not for public use.
Number of snps are very low
0
Entering edit mode
21 months ago
waqasnayab • 180
Pakistan

Hi,

Dear All,

I have WGS data of one healthy individual. I want to make a VCF file. I am running the following pipeline but the number of SNPs are low, only in thousands. I have a big bam file (256GB) with depth of 38X.

java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar MarkDuplicates MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=16000 REMOVE_DUPLICATES=true INPUT=gcxall_sort_mapped.bam OUTPUT=gcxall_sort_mapped_dedup.bam METRICS_FILE=gcx.mat ASSUME_SORTED=true

nohup java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar AddOrReplaceReadGroups INPUT=gcxall_sort_mapped_dedup.bam OUTPUT=gcxall_sort_mapped_dedup_crctd.bam SORT_ORDER=coordinate RGLB=MP RGPL=SOLID RGPU=NOBARCODE RGSM=GCXI

java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar CreateSequenceDictionary REFERENCE=new_newhg38.fa OUTPUT=new_newhg38.dict

java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar ReorderSam I=gcxall_sort_mapped_dedup_crctd.bam O=gcxall_sort_mapped_dedup_crctd_rordr.bam REFERENCE=new_newhg38.fa

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr.bam --known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf --known 1000G_phase1.indels.hg19.sites.vcf -o gcxall_sort_mapped_dedup_crctd_rordr.intervals --defaultBaseQualities 35

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T IndelRealigner -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr.bam -targetIntervals gcxall_sort_mapped_dedup_crctd_rordr.intervals -o gcxall_sort_mapped_dedup_crctd_rordr_interval.bam -known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -known 1000G_phase1.indels.hg19.sites.vcf -l INFO --defaultBaseQualities 35

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T BaseRecalibrator -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr_interval.bam -knownSites 1000G_phase1.indels.hg19.sites.vcf -knownSites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -knownSites dbsnp_138.hg19_out.vcf -o gcxi_recal_data.table --covariate QualityScoreCovariate --covariate ReadGroupCovariate --covariate ContextCovariate --covariate CycleCovariate --solid_nocall_strategy PURGE_READ --solid_recal_mode SET_Q_ZERO_BASE_N

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T PrintReads -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr_interval.bam -BQSR gcxi_recal_data.table -o gcxall_sort_mapped_dedup_crctd_rordr_interval_recal.bam -allowPotentiallyMisencodedQuals

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr_interval_recal.bam -o raw_variants.vcf -D dbsnp_138.hg19_out.vcf -allowPotentiallyMisencodedQuals

Kindly help me in this regard, what I am missing?

Thanks.

SNP VCF • 1.1k views
ADD COMMENTlink
2
Entering edit mode
Looks like you are mapping to hg38 and recalibrating using hg19 snps


-R /scratch/gcxi_all_bam/new_newhg38.fa 
--known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
ADD REPLYlink
0
Entering edit mode

Are you talking about novel SNPs or all SNPs detected (known + unknown)? The only thing I can suggest is trying a different tool for SNP detection such as FreeBayes

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1