I have a large set of data of nearly 200 individuals who were sequenced by NGS after a GWAS was performed to identify top variants. I have fine-mapped this region (including filtering steps) using GATK, samtools, bwa etc. and currently working with VCF files of nearly 6000 variants. I have removed all multi-allelic variants manually and used VCFtools to filter for minor allele frequencies. I currently need to prioritize these variants in order to perform subsequent genotyping using TaqMan because there are currently way too many. I would also like to mention that all the individuals that were sequenced had the phenotype of interest; therefore, no comparisons can be made. Do you have any suggestions for filtering out these variants (all of which are in a noncoding region) for regulatory influence or any other parameter?
Thank you.
Maybe start with FunSeq2. Includes modules to prioritize variants e.g. by calculating scores based on transcription factor motif disruption.
get bed file and intersect vcf with bedtools.@ dunya1001