Genome Assembly vs Reference Genome for slicing around mutation location.
0
0
Entering edit mode
5.3 years ago

I have a bam file and corresponding vcf file from some source. I am trying to slice the DNA across its mutation location to feed into one of the algorithms.

I wanted to know what is the right way to do this. I have two options

  1. I can create a genome assembly/contigs from the bam file and then slice it using mutation information from vcf file.
  2. I can take a reference genome and then slice it from vcf file.

What are the pros and cons of either method?

Assembly genome rna-seq mutation • 1.1k views
ADD COMMENT
0
Entering edit mode

Please tell us more about the goal of your "algorithms". Depending on that, the one or the other way might be better.

In general: If you variants are not phased and you create a consensus sequence, from where you like slice a region, you cannot be sure, that the variants next to each other are on the same strand. You need to know if this is important for your "alogrithms".

ADD REPLY
0
Entering edit mode

Specifically, I am trying to learn the distributed representation of variants using some similar strategy to word2vec. So I want to slice DNA of 2*K+1 length centered around a mutation. But, I am in a dilemma of what is the correct way to slice the DNA so that most of the information is preserved.

To more clearly stating my doubt, Is it wise to use the reference genome against a patient-specific vcf file to slice DNA or one should first create gnome assembly/contigs (since read length is short and K > 200) from patient-specific bam.

How much information loss will occur in either case?

ADD REPLY

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6