I am trying to do a variant calling for specific genes on the SRA database of NCBI for Apis mellifera (+-8000 samples). This database consists both of whole genome sequencing samples and rna sequencing samples. I am using STAR to align the RNASeq samples to the reference genome however there is a large variation in mapping quality between the different samples (ranging from 5% to 95% of uniquely mapped reads). My question is whether it is recommended to filter out samples based on the quality of the alignment before variant calling and what threshold would be good?
Many thanks,
Gilles
Why are you using the RNA-seq data for variant calling when you have WGS data?
The RNAseq samples either cover regions of the world that are not covered by the WGS samples or serve as additional data on top of my core WGS dataset.