Ideas on how to evaluate genomic coordinates remapping tools?
1
0
Entering edit mode
4.8 years ago
juliette ▴ 50

Hi,

I know about the various remapping tools that allow you to remap features and coordinates from one genome assembly to another.
I was wondering if anyone has any ideas on how to compare these tools, apart from runtime and % of remapped features. (just to clarify, I'm specifically working on remapping variants, therefore I have VCF files as input)

remap remapping liftover variants assembly • 1.4k views
ADD COMMENT
0
Entering edit mode

I am not sure how much of a difference the tools introduce, especially for single-nucleotide positions. The chain files are probably the main factor and I haven't seen many versions of those (for the same reference pairs).

ADD REPLY
0
Entering edit mode

I understand that, however I'm in the process of developing a remapping tool that doesn't use chain files, so I'd like to be able to compare it to existing tools in more ways than just runtime and % of remapped variants. But yeah, so far they don't seem to be that different in terms of runtime and % of remapped variants.

ADD REPLY
0
Entering edit mode

Interesting. How would it work without chain files?

ADD REPLY
0
Entering edit mode

The basic idea is to create "reads" from the flanking sequences of each variant (using the old assembly), and then map them to the new assembly using traditional mapping tools (ex: bowtie). I can then extract the coordinates and other relevant information to recreate a VCF output.

ADD REPLY
0
Entering edit mode
4.8 years ago

Liftover is what you would do if you cannot realign the data to the other reference genome, because you don't have the fastq files or don't want to process all genomes.

Ideally, I think you would also need the fastq/bam file aligned to the reference genomes you are comparing and call variants from that. That would be your truth dataset.

ADD COMMENT
0
Entering edit mode

I do have the genomes in fasta files as well, I was just specifying what data I'm working on. This doesn't have much to do with the question though.
I was simply asking what could be some good ways of comparing how different remapping tools perform on the same dataset (CrossMap, UCSC LiftOver, LiftoverVcf etc).

ADD REPLY
1
Entering edit mode

I think what Wouter meant is that you have to call variants using the other version of the reference (the one you are lifting over to). You should compare the liftover variants to those since they are the closest to the truth.

ADD REPLY
0
Entering edit mode

Oh I see, thank you for explaining. It is an interesting idea, however I only have the variant coordinates for the "old" assembly in a VCF file. I don't know too much about variant calling but I think I'd need the original fastq reads to be able to call using the "new" assembly, correct?

ADD REPLY
1
Entering edit mode

You would need the fastq or the bam.

ADD REPLY

Login before adding your answer.

Traffic: 3050 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6