DNA sequence of genes from vcf file
3
0
Entering edit mode
6.7 years ago
bioinfo8 ▴ 230

Hi,

I have a vcf file a_filtered_ann10.vcf annotated for 10 genes by bcftools and consists of 5 samples (details here). I have visualized the vcf file in IGV and can see the variation with respect to the reference.

I would like to retrieve the DNA sequences of 10 genes for all 5 samples as well as their alignment with the reference, so that I can translate them to proteins and later compare with that of reference protein sequences.

I have gone through this, but not helpful.

Any guidance would be appreciated.

Thanks!

vcf variant calling gene bcftools ensembl • 3.5k views
ADD COMMENT
2
Entering edit mode

, so that I can translate them to proteins and later compare with that of reference protein sequences.

how about just using something like VEP or SNpEff ?

ADD REPLY
2
Entering edit mode

so that I can translate them to proteins and later compare with that of reference protein sequences.

Would it be easier to just annotate the vcf (e.g. using snpeff or VEP) to get the changes to the protein sequences? Based on the information you gave you are overcomplicating this.

ADD REPLY
0
Entering edit mode

ensembl-vep package or its web-based tool. Will it retain sample names information?

ADD REPLY
0
Entering edit mode

It will just add a lot of information to your vcf.

ADD REPLY
0
Entering edit mode

you can try FastaAlternateReferenceMaker from GATK

ADD REPLY
6
Entering edit mode
6.7 years ago
Emily 23k

There's a new tool in the VEP package called Haplosaurus (still just in script mode, hence not been fully shouted from the rooftops) which will give you individual gene sequences from phased VCFs: https://github.com/Ensembl/ensembl-vep#haplo

ADD COMMENT
0
Entering edit mode

I used VEP web-based tool which gave pie distribution for most (not all) of the genes (individual vcf files for genes were uploaded, for e.g. gene1.vcf, gene2.vcf etc. from a_filtered_ann10.vcf). Further, I can see only very few entries in the output which corresponds to the pie distribution.

ADD REPLY
0
Entering edit mode

If there is a problem with your VEP output, please email details to helpdesk [at] ensembl.org and we'll look into it

ADD REPLY
0
Entering edit mode
6.7 years ago
Whoknows ▴ 960

Hi

You could create a table consists 3 columns like below and then go to Ensemble -> Biomart section -> in Filtration section you could import this list and get fasta file based on selected regions.

Chr   Start   End
12    100000  20000
ADD COMMENT
0
Entering edit mode
6.5 years ago

I have a software designed for this purpose.

https://github.com/baoxingsong/AnnotationLiftOver

ADD COMMENT

Login before adding your answer.

Traffic: 3741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6