ASEReadCounter More then one variant context at position
1
0
Entering edit mode
5.1 years ago
mb2subi ▴ 10

Hi everyone,

I am doing an allelic imbalance of ATAC-seq data using a vcf generated with Strelka2. I used the ASEReadCounter of GATK:

java -jar $GenomeAnalysisTK \
    -R $reference \
    -T ASEReadCounter \
    -o $out \
    -I $bam \
    -sites $vcf

But this error arosed:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: More then one variant context at position: chr1:1056601
##### ERROR ------------------------------------------------------------------------------------------

Then I applied the SelectVariants function of GATK:

java -jar ${GenomeAnalysisTK} \
    -T SelectVariants \
    -R ${reference} \
    -V ${vcf} \
    -o ${vcfNonExt}_Selected.vcf \
    -sn ${sample}
    -selectType SNP \
    -restrictAllelesTo BIALLELIC

And I used the vcf resulting from this SelectVariants function, how ever I obtained the same error.

Any advice?

ASEReadCounter GATK SelectVariants • 3.2k views
ADD COMMENT
1
Entering edit mode

what is the output of

 grep  -P 'chr1\t1056601' the.vcf | cut -f1-5
ADD REPLY
0
Entering edit mode
chr1    1056601 .   A   G
chr1    1056601 .   A   ACGTGTGGGTGCGTGTG
chr1    105660121   .   C   T
ADD REPLY
0
Entering edit mode

GATK doesn't like when there are two variants with the same CHROM/POS/REF

More then one variant context at position: chr1:1056601

ADD REPLY
0
Entering edit mode

But theoretically with SelectVariants I should have corrected this "error", right?

If not, how I can avoid this positions with more than one variant?

ADD REPLY
0
Entering edit mode
5.1 years ago

I'm not sure what extra complications you might encounter with ATAC-Seq data versus RNA-Seq data. However, I have the following suggestions:

1) Last time I checked, I thought the GATK ASEReadCounter only worked for SNPs. If you are already throwing out all indels, then only focusing on unambiguous SNP sites may be less of a big deal.

2) You can try some other allele counting programs, such as allelecounter and PhASER. Again, last time I checked, allelecounter covers some indels without phasing, and PhASER does phasing but doesn't include indels (and I can't really vouch for the extra haplotype files, even though they should theoretically be a possible advantage to that program).

For #2, I also don't know about complications for ATAC-Seq. If you have a Bowtie2 alignment, I think you may need to use the --local parameter to avoid having a substantially lower alignment rate. However, if you want to keep more reads for mutation calling, I think you probably should do some trimming (GATK has options to do things like remove soft-clipped bases, but I might be more worried about the specific alignment positions with the ATAC-Seq alignment...however, that is just a guess, and I don't really have a comparison to back up that claim).

ADD COMMENT
0
Entering edit mode

Theoretically the vcf is of SNPs only.

I align using bwa -mem but I can try with bowtie2 --local

I used allelecounter but the follow advice arose:

Traceback (most recent call last):
  File "/imppc/labs/lplab/share/marc/repos/allelecounter/allelecounter.py", line 177, in <module>
    main()
  File "/imppc/labs/lplab/share/marc/repos/allelecounter/allelecounter.py", line 110, in main
    block = int(block_str) - 1
ValueError: invalid literal for int() with base 10: ''

I will try with PhASER if not.

ADD REPLY
0
Entering edit mode

I'm not entirely sure about the cause of that error message, but it is admittedly usually best to work with the data type from which an algorithm was developed (I believe I've tested those allele counters with TopHat2 and STAR, but not with BWA-MEM or Bowtie2). So, I think at least one of those would typically work without errors for a given RNA-Seq alignment, but I think that may have actually varied between projects (which one could work and/or provide the results that seemed most appropriate).

The Bowtie2 "local" alignment was to get an alignment rate more similar to BWA-MEM, but I think there are still some formatting differences. For example, I think you may find different insert pairing/distributions with Picard (assuming you have paired-end data) if you use BWA-MEM versus Bowtie2.

ADD REPLY

Login before adding your answer.

Traffic: 3008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6