Why Gatk Realignertargetcreator Outputs An Empty .Intervals???
1
2
Entering edit mode
12.1 years ago
Chris ▴ 40

Hello,everyone! When I use GATK to do the first step of Local Alignment,RealignerTargetCreator,to creat the .intervals file with a raw .BAM file (the .bai file and REF.fasta REF.fai REF.dict are complete), I got an empty output .intervals file after hours with NO ERROR in process. The command follows below:

java -jar /path/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/chris/data/hg/hg19.fasta -I /home/chris/data/reorder.test.sorted.bam -o reorder.test.sorted.intervals

Then, the GATK runs well with NO ERROR until the process ends with an empty output (0 byte).

And I use the simple sample files in resources/ in GATK's folder:

java -jar /path/GenomeAnalysisTK.jar -T RealignerTargetCreator -R resources/exampleFASTA.fasta -I resources/exampleBAM.bam  -o example.intervals, the output is still empty.

Does someone has this problem met? I am new to GATK, I would be grateful if someone tell me why and how I can solve this!

chris@chris-OptiPlex-780:~/install/GenomeAnalysisTK-1.5-9-ga05a7f2$ java -jar      
GenomeAnalysisTK.jar -I /home/chris/data/rat_rel65_MT_validated.bam -R     
/home/chris/data/rat_rel65_MT_validated.fasta -T RealignerTargetCreator -o  
123456.intervals

INFO  17:17:14,667 HelpFormatter - 

----------------------------------------------------- --------------------------- 
INFO  17:17:14,686 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.5-9-ga05a7f2,   Compiled 2012/03/17 00:05:08 
INFO  17:17:14,686 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  17:17:14,686 HelpFormatter - Please view our documentation at    http://www.broadinstitute.org/gsa/wiki 
INFO  17:17:14,686 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa 
INFO  17:17:14,687 HelpFormatter - Program Args: -I /home/chris/data/rat_rel65_MT_validated.bam -R /home/chris/data/rat_rel65_MT_validated.fasta -T RealignerTargetCreator -o 123456.intervals 
INFO  17:17:14,687 HelpFormatter - Date/Time: 2012/04/07 17:17:14 
INFO  17:17:14,687 HelpFormatter - ----------------------------------------------------- --------------------------- 
INFO  17:17:14,688 HelpFormatter - -------------------------------------------------------------------------------- 
INFO  17:17:14,703 GenomeAnalysisEngine - Strictness is SILENT 
INFO  17:17:14,882 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  17:17:14,976 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08 
INFO  17:17:16,658 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] 
INFO  17:17:16,659 TraversalEngine -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining 
INFO  17:17:46,578 TraversalEngine -  chr10:17409001        1.74e+07   30.0 s        1.7 s      0.6%        78.1 m    77.6 m 
INFO  17:18:16,762 TraversalEngine -  chr10:37666001        3.77e+07   60.2 s        1.6 s      1.4%        72.4 m    71.4 m 
INFO  17:18:46,763 TraversalEngine -  chr10:54320001        5.43e+07   90.2 s        1.7 s      2.0%        75.2 m    73.7 m 
INFO  17:19:17,276 TraversalEngine -  chr10:69417001        6.94e+07    2.0 m        1.7 s      2.6%        78.8 m    76.8 m 
INFO  17:19:47,533 TraversalEngine -  chr10:70190001        7.02e+07    2.5 m        2.2 s      2.6%        97.5 m    94.9 m 
INFO  17:20:17,534 TraversalEngine -  chr10:90199001        9.02e+07    3.0 m        2.0 s      3.3%        90.9 m    87.9 m 
..............................................
..............................................
INFO  18:03:31,900 TraversalEngine -   chr9:93072115        2.54e+09   71.9 m        1.7 s     93.3%        77.0 m     5.1 m 
INFO  18:04:01,912 TraversalEngine -  chr9:109658115        2.55e+09   72.4 m        1.7 s     93.9%        77.1 m     4.7 m 
INFO  18:04:09,814 TraversalEngine - Total runtime 4352.99 secs, 72.55 min, 1.21 hours 
INFO  18:04:09,814 TraversalEngine - 180568 reads were filtered out during traversal out of 26944143 total (0.67%) 
INFO  18:04:09,815 TraversalEngine -   -> 180568 reads (0.67% of total) failing MappingQualityZeroFilter 
INFO  18:04:16,423 GATKRunReport - Uploaded run statistics report to AWS S3

Another DataSet Still got an empty .intervals.

I am so sad!

Thank you !

gatk • 6.3k views
ADD COMMENT
1
Entering edit mode
12.1 years ago
Johan ▴ 890

The RealignmentTargetCreator needs a set of known indels to realign against. This is set with the "--known" option. This can either be from a external sources such as the dbSNP, or from your own raw indel calling.

Here is a link explaining it in more detail: http://www.broadinstitute.org/gsa/wiki/index.php/Local_realignment_around_indels.

Hope this helps. :)

ADD COMMENT
0
Entering edit mode

thanks a lot! Actually,I have NO dbSNP file or the argument --knownSites needs, and --known is Optional. Even without it, I should get the right output, am I right??

ADD REPLY
0
Entering edit mode

hanks a lot! Actually,I have NO dbSNP file or the argument --known needs, and --known is Optional. Even without it, I should get the right output, am I right?? –

ADD REPLY
0
Entering edit mode

My guess is that you will only get intervals for realignment if the walker detects a region which is in need of realignment. You might try playing around with the rest of the parameters as described here: http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_walkers_indels_RealignerTargetCreator.html Not having seen your data it's difficult to say if there is a need for realignment or not.

ADD REPLY
0
Entering edit mode

btw I saw now that you had some really long times to run the analysis. If you have the possibility of increasing the memory to the Java VM by adding a -Xmx flag, e.g. Java -Xmx4g [-jar GenomeAnalysisTK.jar etc...] might make it run faster.

ADD REPLY
0
Entering edit mode

btw I saw now that you had some really long times to run the analysis. If you have the possibility of increasing the memory to the Java VM by adding a -Xmx flag, e.g. Java -Xmx4g [-jar GenomeAnalysisTK.jar etc...] it might make it run faster

ADD REPLY
0
Entering edit mode

Thanks! You mean my .BAM file may need not to be realigned? But how about the sample data? The result is still empty, what is your result about the sample?

ADD REPLY
0
Entering edit mode

I get the same result as you for the sample file with the same settings.

ADD REPLY
0
Entering edit mode

From the docs:

Fully local realignment uses mismatching bases to determine if a site should be realigned, and relies on sufficient coverage to discover the correct indel allele in the reads for alignment. It is much slower (involves SW step) but can discover new indel sites in the reads. If you have a database of known indels (for human, this database is extensive) then at this stage you would also include these indels during realignment, which vastly improves sensitivity, specificity, and speed.

ADD REPLY
0
Entering edit mode

My interpretation of this is that you either include previously known indels, or that you will have to change the --mismatchFraction parameter for get it to realign regions where indels might have messed up your raw alignments.

ADD REPLY
0
Entering edit mode

Tnaks for the timely reply and sorry for my delay. I have download the data, hg19.20.bam and its fasta file, in GATK resource bundle b37 to check the approach. Finally, I get a very good result as same as the Given one, and the INDEL can be called. Also, the official reply to me said that it maybe caused by the my .BAM file. I think so too

ADD REPLY
0
Entering edit mode

Hi Chris, I have the same problems. How did you fix it? With additional indel files? Could post your code for it? And what do you mean it may be caused by .BAM file? How to check if there are problems in bam file?

ADD REPLY
0
Entering edit mode

Hi C Shao, I got the same problem. That's an empty result file about RealignmentTargetCreator. How did you fix it?

ADD REPLY

Login before adding your answer.

Traffic: 3279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6