Hi,
I have a vcf file a_filtered_ann10.vcf
annotated for 10 genes by bcftools
and consists of 5 samples (details here). I have visualized the vcf file in IGV and can see the variation with respect to the reference.
I would like to retrieve the DNA sequences of 10 genes for all 5 samples as well as their alignment with the reference, so that I can translate them to proteins and later compare with that of reference protein sequences.
I have gone through this, but not helpful.
Any guidance would be appreciated.
Thanks!
how about just using something like VEP or SNpEff ?
Would it be easier to just annotate the vcf (e.g. using snpeff or VEP) to get the changes to the protein sequences? Based on the information you gave you are overcomplicating this.
ensembl-vep package or its web-based tool. Will it retain sample names information?
It will just add a lot of information to your vcf.
you can try FastaAlternateReferenceMaker from GATK