How to make interval list for GBS sequencing?
2
0
Entering edit mode
5.1 years ago
rimgubaev ▴ 330

Hello everyone! I am pretty new to GATK. And now I face the following problem: I got g.vcf files for my samples produced by HaplotypeCaller and then I want to consolidate GVCFs using GenomicsDBImport which requires interval list and then proceed to GenotypeGVCFs in order to make joint VCF. How can I make the interval list? My object is rapeseed (Brassica oleracea). Many thanks in advance.

GBS GATK4 • 2.7k views
ADD COMMENT
1
Entering edit mode

If I understand you correctly, you are analyzing your Genotyping-by-sequencing data with GATK? I think it might be prudent to use a reduced-representation analysis method such as STACKS (http://catchenlab.life.illinois.edu/stacks/) or ipyrad(https://ipyrad.readthedocs.io/) for this purpose.

ADD REPLY
0
Entering edit mode

Thanks, I will try! And why GATK could be inappropriate for genotyping-by-sequencing technique on your opinion?

ADD REPLY
0
Entering edit mode

I don't think it would be inappropriate for variant-calling per se, but I wonder about the down-stream analysis leading up to GATK variant calling. Others with more knowledge could pipe in if you make a separate thread or quickly Google pros and cons of reduced-representation analysis pipelines versus say the good-ol' GATK best practices. Presumably you have a reference genome that you like to get this far, so I would recommend the reference-based GBS analysis pipeline in STACKS.

ADD REPLY
1
Entering edit mode
5.1 years ago
rimgubaev ▴ 330

I tried to solve the problem in 2 ways:

1) Using CombineGVCFs which produced a combined vcf file:

samples=$(find . | sed 's/.\///' | grep -E 'g.vcf$' | sed 's/^/--variant /') # place sample paths into variable
path/to/gatk --java-options "-Xmx36G" CombineGVCFs \
       $(echo $samples) \
       -O path/to/combined.vcf \
       -R path/to/ref.fa

2) Using GenomicsDBImport (after making bed file for my reference) which produced a my_database folder:

samtools faidx ref.fna # make .fai file 
awk '{print $1 "\t0\t" $2}' ref.fna.fai > ref.fna.bed # make .bed file

samples=$(find . | sed 's/.\///' | grep -E 'g.vcf$' | sed 's/^/--variant /') # place sample paths into variable
path/to/gatk --java-options "-Xmx36G" GenomicsDBImport \
          $(echo $samples)\
          --genomicsdb-workspace-path my_database \
          --intervals path/to/ref.fna.bed

After that, I run GenotypeGVCFs using "my_database" or "combined.vcf"

 path/to/gatk --java-options "-Xmx36G" GenotypeGVCFs \
           -R path/to/ref.fna \
           -V gendb://path/to/my_database OR -V path/to/combined.vcf \
           -O path/to/genotypes.vcf

In both cases, I got the same number of SNPs in genotypes.vcf.

ADD COMMENT
0
Entering edit mode

Does this answer your question then?

ADD REPLY
0
Entering edit mode

Yes, it is exactly what I have looking for.

ADD REPLY
1
Entering edit mode
5.1 years ago

which requires interval list

for which option ? -L ? it is well described in https://software.broadinstitute.org/gatk/documentation/article?id=11009

ADD COMMENT
0
Entering edit mode

Thanks, I've already read this article and still do not understand where I can download interval list for the rapeseed? Or is it appropriate to produce interval list by myself using reference fasta and bed tools and then take it as an input?

ADD REPLY

Login before adding your answer.

Traffic: 2926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6