Hello everyone,
I'm new to this community and looking for some help. I applied a variant calling pipeline with RNAseq data of four plant samples. Thus, after mapping each sample to a reference genome using Tophat, i used MPileup for variant calling, including all four samples. While I am content with the results, i wonder if there is any possibility to add the sequence information from the BAM files, to the VCF output of MPileup. I i get it right, the VCF output gives me information about the position of the SNP/Variant on the reference chromosome/contig. Since my goal is to create KASP assays from the SNP information, I'd like to add 100bp in before and after the called SNP. Is this possible, without having to find each position in the contig by hand?
Maybe someone had the same issue before?
Thanks in advance Cheers Helge
Extract SNPs flanking sequences based on VCF and genome Fasta files ,
https://bedtools.readthedocs.io/en/latest/content/tools/flank.html
https://genome.sph.umich.edu/wiki/Arf
Thanks a lot, this should provide exactly what i need. I should've mentioned though, that i work within Galaxy and that I don't have very much experience with python or coding in general. I'll give the java tool a try and hope i can get it to work. Wish me luck :)
May the force be with you..Check if your plant genome is present in the pull down list and upload the vcf file here to get flanking regions: http://sniplay.southgreen.fr/cgi-bin/analysis_v3.cgi
You may want to post this over at Galaxy biostars to see if they have any other suggestions related to doing this is galaxy.