LiftOver a VCF file
1
0
Entering edit mode
5.6 years ago

Hey all,

I have been trying to liftover a particular VCF file from GRCm38 to NCBIm37. I have used UCSC LiftOver tool, Ensembl API, CrossMap and Picard. None of them are lifting over completely. Either they are not working at all or having rejected variants. Especially in Picard LiftoverVcf, the rejected variants are those with have NoTarget in them. No idea why. The reference fasta file I am using is Mus_musculus.NCBIM37.61.dna.toplevel.fa. and the liftover chain file is GRCm38_to_NCBIM37.chain.gz

The vcf file is from:

ftp://ftp-mouse.sanger.ac.uk/current_snps/strain_specific_vcfs/129S1_SvImJ.mgp.v5.snps.dbSNP142.vcf

Any leads will be helpul. Thanks in advance

Best,

Susmita

VCF Picard CrossMap LiftoverVcf • 6.0k views
ADD COMMENT
0
Entering edit mode

Especially in Picard LiftoverVcf, the rejected variants are those with have NoTarget in them

what is 'NoTarget' ?

ADD REPLY
0
Entering edit mode

I have no idea. Its something like this: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 129S1_SvImJ 1 3000023 . C A 109 NoTarget CSQ=A||||intergenic_variant||||||||;DP=6;DP4=0,0,6,0 GT:GQ:DP:MQ0F:GP:PL:AN:MQ:DV:DP4:SP:SGB:PV4:FI 1/1:22:6:0.166667:152,22,0:137,18,0:2:36:6:0,0,6,0:0:-0.616816:.:1

ADD REPLY
0
Entering edit mode

it's the FILTER column , and should be defined in the VCF header...

ADD REPLY
0
Entering edit mode

I just want the VCF to be lifted over!

ADD REPLY
0
Entering edit mode

searching online would help (key words: picard liftover notarget in google): https://github.com/broadinstitute/picard/blob/master/src/main/java/picard/vcf/LiftoverVcf.java

     * Filter name to use when a target cannot be lifted over.
     */
public static final String FILTER_NO_TARGET = "NoTarget";
ADD REPLY
0
Entering edit mode

NoTarget is not the main issue. Issue is why theVCF is not getting lifted completely. Is there any tool that can do?

ADD REPLY
0
Entering edit mode

VCF from the link posted in OP is huge and gzipped vcf is ~200 mb (on http://crispor.tefor.net/genomes/mm10/orig/). It would help if you could post example records that are not lifted between the builds with headers. In general, there are always discrepancies between builds (vcf). some of the record get merged and some of the records get dropped. However this % is small, in consecutive builds.

ADD REPLY
0
Entering edit mode

I dont think I can copy that many lines here.

ADD REPLY
2
Entering edit mode
5.6 years ago
Emily 23k

Genome assemblies do not 100% map to one another. Newer assemblies will have novel regions that were not found in the older assemblies, and older assemblies will have incorrectly assembled regions that cannot easily be mapped across to the correctly assembled regions. If the variants were called on loci in GRCm38 that did not have coverage in NCBIM37, then there will be no mapping.

Why do you want to map your VCFs back to an old assembly? Would it not be better to map your other data forward onto the newer assembly?

ADD COMMENT
0
Entering edit mode

or use VCFs relevant to that build/assembly.

ADD REPLY
0
Entering edit mode

I took your advice. Earlier I couldn't find VCFs of 129S1/Sv mapped onto mm9. I found some files though last night, its a tab delimited file with #CHROM POS REF 129S1/Sv and a tbi file. I have no idea how to get the vcf file from these two. Any ideas?

ADD REPLY
0
Entering edit mode

I am creating custom in silico parental genomes and for that i would need VCFs from both the parents mapped onto the same reference genome

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6