GATK HC mis-reading chromosomes
2
0
Entering edit mode
5.0 years ago
Ace ▴ 90

My command:

$GATK HaplotypeCaller -R $ref -L $locations -I Bamfiles.txt -ploidy 1

The user error looks like this:

A USER ERROR has occurred:
Input files reference and reads have incompatible contigs: No overlapping contigs found.
reference contigs = [All of the correct contigs are listed out]
reads contigs = []

Meaning, the reads contigs are being read-in as blank

I thought maybe one of my files was aligned incorrectly. I looked at their headers

for file in $(cat Bamfiles.txt); do samtools view -H $file; done | sort | uniq -c > Out.txt

The counts for each chromosomal header were consistent and the same numbers as I have samples. The only thing that looked different: about 1/4 of my samples (from an earlier study) had VN (sam version): 1.3 while the rest had VN: 1.5. Could this be enough to cause GATK to reject my samples and if so is there an easy way to convert between samfile 1.3 and 1.5 or should I re-align them?

Variant-Calling software-error snp Bam • 1.1k views
ADD COMMENT
0
Entering edit mode

Hello Ace!

We believe that this post does not fit the main topic of this site.

Error resolved

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY
0
Entering edit mode

No need to close the post. That is an action used by moderators for inappropriate, incomplete, off-topic posts.

You can accept your own answer below to provide closure to this thread.

ADD REPLY
1
Entering edit mode
5.0 years ago
Ace ▴ 90

Update: I've fixed the error, and boy do I feel silly. I totally forgot how essential it is to have your bamfile list end in ".list". Poor GATK was reading my list as a bam file; no wonder it couldn't find any chromosomes! Closing the question now, thanks for your help!

ADD COMMENT
0
Entering edit mode
5.0 years ago

the sequence dictionaries (lines in the bam headers starting with @SQ and the lines in the .dict file) are not the same.

and looking at reads contigs = [] , i wonder if there is any dict in your bam files ?

ADD COMMENT
0
Entering edit mode

Right, so when I looked at this initially my first step was checking that the headers in the bam file were right. Indeed, each line looks something like this @SQ SN:$chromosome LN:$length and the chromosomes correspond correctly to the chromosome in the reference fasta.

I thought it might be an error with a missing index, so I tried re-indexing all of them with

 for file in $(cat $Bamfiles); do echo $file; samtools index $file; done

Unfortunately, this alone did not fix the error. Is there a way to check their content?

ADD REPLY

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6