Error when trying to fix the contigs order in the reference and vcf for FastaAlternateReferenceMaker
1
0
Entering edit mode
8.7 years ago
tiago211287 ★ 1.4k

I am trying to use a VCF containing snps variants to change the mouse reference (GRCm38- c57BL/6J) with BALB/cJ snps.

After running this command:

java \
  -jar ~/programs/GenomeAnalysisTK.jar \
  -T FastaAlternateReferenceMaker \
  -R ~/genome/mouse_GRCm38.p4/GRCm38.primary_assembly/GRCm38.primary_assembly.fa \
  -o ~/BALBcJ.snp.primary.fa \
  -V ~/BALB_cJ.snps.vcf

The following ERROR shows up:

ERROR MESSAGE: Input files /home/tiagocastro/BALB_cJ.snps.vcf and reference have incompatible contigs: The contig order in /home/tiagocastro/BALB_cJ.snps.vcf and referenceis not the same; to fix this please see: (<a href="https://www.broadinstitute.org/gatk/guide/article?id=1328" target="_blank">https://www.broadinstitute.org/gatk/guide/article?id=1328</a>), which describes reordering contigs in BAM and VCF files..
ERROR /home/tiagocastro/BALB_cJ.snps.vcf contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X, Y]
ERROR reference contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 3, 4, 5, 6, 7, 8, 9, MT, X, Y, JH584299.1, GL456233.1, JH584301.1, GL456211.1, GL456350.1, JH584293.1, GL456221.1, JH584297.1, JH584296.1, GL456354.1, JH584294.1, JH584298.1, JH584300.1, GL456219.1, GL456210.1, JH584303.1, JH584302.1, GL456212.1, JH584304.1, GL456379.1, GL456216.1, GL456393.1, GL456366.1, GL456367.1, GL456239.1, GL456213.1, GL456383.1, GL456385.1, GL456360.1, GL456378.1, GL456389.1, GL456372.1, GL456370.1, GL456381.1, GL456387.1, GL456390.1, GL456394.1, GL456392.1, GL456382.1, GL456359.1, GL456396.1, GL456368.1, JH584292.1, JH584295.1]

So Trying to fix, I used the perl script in the link to sort properly within the reference.

I did this:

./sortByRef.pl \
  ~/BALB_cJ.snps.vcf \
  /home/tiagocastro/genome/mouse_GRCm38.p4/GRCm38.primary_assembly/GRCm38.primary_assembly.fa.fai > ~/BALB_cJ.snps_sorted.vcf

Using the new vcf file, a new error is shown:

ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file '/home/tiagocastro/BALB_cJ.snps_sorted.vcf' could not be determined dynamically. Please add an explicit type tag :NAME listing the correct type from among the supported types:
ERROR Name FeatureType Documentation
ERROR BCF2 VariantContext (this is an external codec and is not documented within GATK)
ERROR VCF VariantContext (this is an external codec and is not documented within GATK)
ERROR VCF3 VariantContext (this is an external codec and is not documented within GATK)

Looking at the head of each, sorted and basic vcf, I can see that is little different, the new file does not has the header.

Can someone help me?

RNA-Seq GATK FastaAlternateReferenceMaker • 4.0k views
ADD COMMENT
1
Entering edit mode
Try copying the header of the original vcf file into the new vcf file and run again.
ADD REPLY
0
Entering edit mode
8.7 years ago
tiago211287 ★ 1.4k

I fixed the problem by doing what Ashutosh Pandey suggested.

I copied the header to the sorted file and it solved all errors.

For copying I used this little bash command:

{ head -n69 original.vcf; cat sorted.vcf; } >tmp$$ && mv tmp$$ sorted.vcf
ADD COMMENT
1
Entering edit mode

Great. Keep going.

ADD REPLY

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6