Biostar Beta. Not for public use.
ASEReadCounter More then one variant context at position
0
Entering edit mode
14 months ago
mb2subi • 0

Hi everyone,

I am doing an allelic imbalance of ATAC-seq data using a vcf generated with Strelka2. I used the ASEReadCounter of GATK:

java -jar $GenomeAnalysisTK \
    -R $reference \
    -T ASEReadCounter \
    -o $out \
    -I $bam \
    -sites $vcf

But this error arosed:

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836): 
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: More then one variant context at position: chr1:1056601
##### ERROR ------------------------------------------------------------------------------------------

Then I applied the SelectVariants function of GATK:

java -jar ${GenomeAnalysisTK} \
    -T SelectVariants \
    -R ${reference} \
    -V ${vcf} \
    -o ${vcfNonExt}_Selected.vcf \
    -sn ${sample}
    -selectType SNP \
    -restrictAllelesTo BIALLELIC

And I used the vcf resulting from this SelectVariants function, how ever I obtained the same error.

Any advice?

ADD COMMENTlink
1
Entering edit mode

what is the output of

 grep  -P 'chr1\t1056601' the.vcf | cut -f1-5
ADD REPLYlink
0
Entering edit mode
chr1    1056601 .   A   G
chr1    1056601 .   A   ACGTGTGGGTGCGTGTG
chr1    105660121   .   C   T
ADD REPLYlink
0
Entering edit mode

GATK doesn't like when there are two variants with the same CHROM/POS/REF

More then one variant context at position: chr1:1056601

ADD REPLYlink
0
Entering edit mode

But theoretically with SelectVariants I should have corrected this "error", right?

If not, how I can avoid this positions with more than one variant?

ADD REPLYlink
0
Entering edit mode
13 months ago
Duarte, CA

I'm not sure what extra complications you might encounter with ATAC-Seq data versus RNA-Seq data. However, I have the following suggestions:

1) Last time I checked, I thought the GATK ASEReadCounter only worked for SNPs. If you are already throwing out all indels, then only focusing on unambiguous SNP sites may be less of a big deal.

2) You can try some other allele counting programs, such as allelecounter and PhASER. Again, last time I checked, allelecounter covers some indels without phasing, and PhASER does phasing but doesn't include indels (and I can't really vouch for the extra haplotype files, even though they should theoretically be a possible advantage to that program).

For #2, I also don't know about complications for ATAC-Seq. If you have a Bowtie2 alignment, I think you may need to use the --local parameter to avoid having a substantially lower alignment rate. However, if you want to keep more reads for mutation calling, I think you probably should do some trimming (GATK has options to do things like remove soft-clipped bases, but I might be more worried about the specific alignment positions with the ATAC-Seq alignment...however, that is just a guess, and I don't really have a comparison to back up that claim).

ADD COMMENTlink
0
Entering edit mode

Theoretically the vcf is of SNPs only.

I align using bwa -mem but I can try with bowtie2 --local

I used allelecounter but the follow advice arose:

Traceback (most recent call last):
  File "/imppc/labs/lplab/share/marc/repos/allelecounter/allelecounter.py", line 177, in <module>
    main()
  File "/imppc/labs/lplab/share/marc/repos/allelecounter/allelecounter.py", line 110, in main
    block = int(block_str) - 1
ValueError: invalid literal for int() with base 10: ''

I will try with PhASER if not.

ADD REPLYlink
0
Entering edit mode

I'm not entirely sure about the cause of that error message, but it is admittedly usually best to work with the data type from which an algorithm was developed (I believe I've tested those allele counters with TopHat2 and STAR, but not with BWA-MEM or Bowtie2). So, I think at least one of those would typically work without errors for a given RNA-Seq alignment, but I think that may have actually varied between projects (which one could work and/or provide the results that seemed most appropriate).

The Bowtie2 "local" alignment was to get an alignment rate more similar to BWA-MEM, but I think there are still some formatting differences. For example, I think you may find different insert pairing/distributions with Picard (assuming you have paired-end data) if you use BWA-MEM versus Bowtie2.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1