Biostar Beta. Not for public use.
Tool to identify recurrent mutations directly from VCF
1
Entering edit mode
10 weeks ago
ATpoint 17k
Germany

Is anyone aware of a tool that accepts multiple VCF files and checks for recurrence of mutations, preferentially using flexible definitions of recurrence. A simple example might be strict = mutation at the exact same position, relaxed = within a certain window, feature-based = within the same genomic feature (intron, exon, gene, promoter etc). So far I was using custom combinations of VCFtools and BEDtools together with annotation tools such as VEP, but maybe there is comprehensive solution out there?

ADD COMMENTlink
0
Entering edit mode

A combination of the VariantAnnotation package, to read the VCF files (excluding some info fields to reduce size), and GenomicRanges/GenomicFeatures Bioconductor packages can provide the flexibility, annotation, and performance you want, I suspect.

ADD REPLYlink
0
Entering edit mode

Hi ATpoint, I'm a newbie trying to do just what you describe. would you share your code for handling this? it will probably save me many hours. many thanks.

ADD REPLYlink
0
Entering edit mode
13 months ago
jared.andrews07 ♦ 2.4k
St. Louis, MO

I'm not a huge fan of the tool in general, but FunSeq2 kind of does what you want, though it doesn't have quite the level of precision it seems you're looking for.

RECUR (recurrent genes, regulatory elements and mutations within samples) Example: ‘RECUR=Pseudogene(ENST00000467115.1|chr1:568914-569121):PR1783(chr1:568941,chr1:569004),PR2832(chr1:569004)’ When analyzing multiple genomes, if genes or regulatory elements are shown in >= 2 samples, they are annotated as ‘gene/regulatory element name: recurrent samples (variants in corresponding samples (position is 1-based))’. If it is a same site mutation, ‘*’ is tagged.

DBRECUR (Recurrence databse) Example: ‘DBRECUR=Enhancer(chmm/segway|chr15:22517400-22521103):Lung_Adeno(Altered in 4/24(16.67%) samples.)| Prostate(Altered in 2/64(3.12%) samples.),Enhancer(drm|chr15:22517700-22521100):Lung_Adeno(Altered in 4/24(16.67%) samples.)| Prostate(Altered in 2/64(3.12%) samples.)’ If genes, regulatory elements or mutations are observed in the recurrence database (currently including 570 samples of 10 cancer types and COSMIC), the recurrence information is shown here. ‘recurrent element(name|coordinates):cancer type(recurrence information in this cancer type)’. Recurrence information is separated by ‘,’.

Be warned that its VCF output probably won't stay true to the format and likely won't run through anything else afterwards. I've had to go back and manually fix issues in the header to get it to run through other programs afterwards.

ADD COMMENTlink
0
Entering edit mode

Thanks for the suggestion. Already had a look at it earlier, but the point with FS2 is that it does not provide the required prebuilt genomic context for hg38 (which would take weeks to calculate according to the manual), so it is not an option for me.

ADD REPLYlink
0
Entering edit mode

Ah, tough luck there. Kinda surprised they haven't done that themselves given what it is.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1