filtering the somatic variants
3
5
Entering edit mode
8.0 years ago
Bogdan ★ 1.4k

Dear all,

after surveying the literature (http://www.ncbi.nlm.nih.gov/pubmed/26647970), some groups describe the following way to filter the somatic variants : wondering if any of you has any comments/suggestions. thanks ! bogdan

-- for point mutations :

 $GATK -T VariantFiltration with --clusterWindowSize 10 --clusterSize 3 \
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)"  --maskName InDel \
--filterExpression "DP < 5 " --filterName "LowCoverage" \
--filterExpression "QUAL < 30.0 " -- filterName "VeryLowQual" \
-- filterExpression "QUAL > 30.0 && QUAL < 50.0 " --filterName "LowQual" \
--filterExpression "QD < 2.0 " -- filterName "LowQD" \
--filterExpression "FS > 60.0" --filterName "StrandBiasFishers" \
--filterExpression "SB > -0.1" --filterName "StrandBias" \
--filterExpression "DP > 500 " --filterName "ExcessiveDepth"

-- for indels :

$GATK -T VariantFiltration with --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
--filterExpression "FS > 60.0" --filterName "StrandBiasFishers" \
--filterExpression "SB > -10.0" --filterName "StrandBias" \
--filterExpression "DP < 5" --filterName "LowCoverage" \
-- filterExpression "QUAL < 30.0" --filterName "VeryLowQual" \
--filterExpression "QUAL > 30.0 && QUAL < 50.0 " --filterName "LowQual"
SNP snp • 4.9k views
ADD COMMENT
0
Entering edit mode

Possibly I could add if anyone knows a good publication that details the ways we can do a good filtering of somatic SNV.

ADD REPLY
0
Entering edit mode

Hello I am wondering if there's any updates on how to filter the somatic variants? do you have any more insight? thanks

ADD REPLY
0
Entering edit mode

yes, using VariantAnnotation package from BioC : https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html

I could post an R script a bit later during the day ..

ADD REPLY
0
Entering edit mode

Hi jackyen,

what kind of data do you have? Matched samples or only a tumor or disease sample, aiming to call somatic events without a normal control?

ADD REPLY
0
Entering edit mode

hi ATpoint, we only have tumor sample. My understanding is that the standard GATK workflow works a little better with a matched normal but in real world it's difficult to design an experiment because the challenges getting both. What would you suggest how to approach this? thanks

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Hi Bogdan,

I am wondering if there's any updates or publications on how to filter the somatic variants to lower the false positive rate?

ADD REPLY
0
Entering edit mode

we can search the latest publications on pubmed please. Recently, i've been doing mainly scRNA-seq analyses.

ADD REPLY
0
Entering edit mode
5.5 years ago
Bogdan ★ 1.4k

An example of filtering a VCF based on Allele Depth, Allele Fraction, FS, and SOR (metrics from GATK):

AF_AD_filters = function(x) {
                             af <- geno(x)$AF[,"TUMOR"] >= 0.05
                             ad <- min(as(geno(x)$AD[,"TUMOR"], "List")) >= 5
                             af & ad
                            }

AF_AD_rules <- FilterRules(list(AF_AD_filters = AF_AD_filters))

##### To filter a VCF file, use the filter as the argument to filterVcf() :

vcf_filtered <- filterVcf( "AML.vcf.bgzip", "hg38", 
                           "AML_out_AF_and_AD_filtered.vcf", 
                            filters=AF_AD_rules)

#######################################################################
#######################################################################

compressVcf <- bgzip("AML_out_AF_and_AD_filtered.vcf", 
                     "AML_out_AF_and_AD_filtered.vcf.bgzip")
idx <- indexTabix(compressVcf, "vcf")
tab <- TabixFile(compressVcf, idx)

#######################################################################
#######################################################################

FS_SOR_filters = function(x) {
                               fs  <- info(x)$FS <= 60
                               sor <- info(x)$SOR <= 4
                               fs & sor | !isSNV(x) 
                             }

FS_SOR_rules <- FilterRules(list(FS_SOR_filters = FS_SOR_filters))

vcf_filtered <- filterVcf( "AML_out_AF_and_AD_filtered.vcf.bgzip", "hg38", 
                           "AML_out_AF_and_AD_FS_and_SOR_filtered_way1.vcf", 
                            filters=FS_SOR_rules)
ADD COMMENT
0
Entering edit mode
5.5 years ago
Bogdan ★ 1.4k

A slightly modified version :

################################### FILTERING a VCF OBJECT :


AF_AD_filters = function(x) {
                             af <- geno(x)$AF[,"TUMOR"] >= 0.05
                             ad <- min(as(geno(x)$AD[,"TUMOR"], "List")) >= 5
                             af & ad
                            }

AF_AD_rules <- FilterRules(list(AF_AD_filters = AF_AD_filters))

############################

FS_SOR_filters = function(x) {
                               fs  <- info(x)$FS <= 60
                               sor <- info(x)$SOR <= 4
                               fs & sor | !isSNV(x) 
                             }

FS_SOR_rules <- FilterRules(list(FS_SOR_filters = FS_SOR_filters))

##########################

vcf2 <- vcf[AF_AD_filters(vcf)]
vcf3 <- vcf2[FS_SOR_filters(vcf2)]

###############################
writeVcf(vcf3,"AML_AF_and_AD_FS_SOR_filtered_AS_OBJ.vcf")
ADD COMMENT
0
Entering edit mode
5.5 years ago
Bogdan ★ 1.4k

Or :

###########################################################

vcf <- readVcf("AML.vcf",genome="hg38")


######################################################################
################################### FILTERING a VCF OBJECT :

AF_AD_filters = function(x) {
                             af <- geno(x)$AF[,"TUMOR"] >= 0.05
                             ad <- min(as(geno(x)$AD[,"TUMOR"], "List")) >= 5
                             af & ad
                            }

AF_AD_rules <- FilterRules(list(AF_AD_filters = AF_AD_filters))

############################
############################
############################

FS_SOR_filters = function(x) {
                               fs  <- info(x)$FS <= 60
                               sor <- info(x)$SOR <= 4
                               fs & sor | !isSNV(x) 
                             }

FS_SOR_rules <- FilterRules(list(FS_SOR_filters = FS_SOR_filters))

##########################
#######################################################################
#######################################################################


vcf2 <- subsetByFilter(vcf, filter=FilterRules(list(AF_AD_filter=AF_AD_filters)))
vcf3 <- subsetByFilter(vcf2, filter=FilterRules(list(FS_SOR_filtersr=FS_SOR_filters)))

writeVcf(vcf3,"AML_out_AF_and_AD_FS_SOR_filtered_AS_OBJ_subsetbyfilter.vcf")
ADD COMMENT

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6