Is there any other way to separate CNVs from SNVs and MNVs from vcf file of Ion Reporter other than vcftools?
1
0
Entering edit mode
7.8 years ago
ivivek_ngs ★ 5.2k

I am currently working on some Ion Torrent exome data and I realized that the vcf given by the in house Ion Reporter software is comprising of CNV, SNV and MNV in one file itself. I still do not have my account in the Ion Torrent so I cannot get hold of the raw file and also cannot work with the tool, I have just received the vcfs (somatic). I tried to separate them with the vcftools but seems that it does not work. Below is my command

vcftools --vcf IonXpress_001_somatic_v5.0.vcf --remove-filtered CNV --out IonXpress_001_somatic_SNVs

Log details
VCFtools - v0.1.9.0
(C) Adam Auton 2009

Parameters as interpreted:
    --vcf IonXpress_001_somatic_v5.0.vcf
    --out IonXpress_001_somatic_SNVs
    --remove-filtered CNV

VCF index is older than VCF file. Will regenerate.
Building new index file.
    Scanning Chromosome: chr1
    Scanning Chromosome: chr2
    Scanning Chromosome: chr3
    Scanning Chromosome: chr4
    Scanning Chromosome: chr5
    Scanning Chromosome: chr6
    Scanning Chromosome: chr7
    Scanning Chromosome: chr8
    Scanning Chromosome: chr9
    Scanning Chromosome: chr10
    Scanning Chromosome: chr11
    Scanning Chromosome: chr12
    Scanning Chromosome: chr13
    Scanning Chromosome: chr14
    Scanning Chromosome: chr15
    Scanning Chromosome: chr16
    Scanning Chromosome: chr17
    Scanning Chromosome: chr18
    Scanning Chromosome: chr19
    Scanning Chromosome: chr20
    Scanning Chromosome: chr21
    Scanning Chromosome: chr22
    Scanning Chromosome: chrX
Writing Index file.
File contains 7789 entries and 2 individuals.
Applying Required Filters.
Filtering sites by FILTER Status.
After filtering, kept 2 out of 2 Individuals
After filtering, kept 7789 out of a possible 7789 Sites
Run Time = 1.00 seconds

The input file with CNV in the vcf

chr1    68928   .   T   <CNV>   100.0   PASS    PRECISE=FALSE;SVTYPE=CNV;END=10684538;LEN=10615610;NUMTILES=3786;CONFIDENCE=0;PRECISION=1908.24;FUNC=[{'gene':'OR4F5'},{'gene':'LOC729737'},{'gene':'LOC100133331'},{'gene':'RP4-669L17.10'},{'gene':'OR4F16'},{'gene':'OR4F3'},{'gene':'OR4F29'},{'gene':'MIR6723'},{'gene':'LOC100288069'},{'gene':'FAM87B'},{'gene':'LINC00115'},{'gene':'LINC01128'},{'gene':'FAM41C'},{'gene':'LOC100130417'},{'gene':'SAMD11'},{'gene':'NOC2L'},{'gene':'KLHL17'},{'gene':'PLEKHN1'},{'gene':'PERM1'},{'gene':'HES4'},{'gene':'ISG15'},{'gene':'AGRN'},{'gene':'RNF223'},{'gene':'C1orf159'},{'gene':'RP11-465B22.5'},{'gene':'MIR200B'},{'gene':'MIR200A'},{'gene':'MIR429'},{'gene':'TTLL10'},{'gene':'TNFRSF18'},{'gene':'TNFRSF4'},{'gene':'SDF4'},{'gene':'B3GALT6'},{'gene':'FAM132A'},{'gene':'UBE2J2'},{'gene':'SCNN1D'},{'gene':'ACAP3'},{'gene':'MIR6726'},{'gene':'PUSL1'},{'gene':'CPSF3L'},{'gene':'MIR6727'},{'gene':'GLTPD1'},{'gene':'TAS1R3'},{'gene':'DVL1'},{'gene':'MIR6808'},{'gene':'MXRA8'},{'gene':'AURKAIP1'},{'gene':'CCNL2'},{'gene':'LOC148413'},{'gene':'MRPL20'},{'gene':'ANKRD65'},{'gene':'TMEM88B'},{'gene':'VWA1'},{'gene':'ATAD3C'},{'gene':'ATAD3B'},{'gene':'ATAD3A'},{'gene':'TMEM240'},{'gene':'SSU72'},{'gene':'C1orf233'},{'gene':'MIB2'},{'gene':'MMP23B'},{'gene':'MMP23A'},{'gene':'CDK11B'},{'gene':'SLC35E2B'},{'gene':'CDK11A'},{'gene':'SLC35E2'},{'gene':'NADK'},{'gene':'GNB1'},{'gene':'CALML6'},{'gene':'TMEM52'},{'gene':'KIAA1751'},{'gene':'GABRD'},{'gene':'PRKCZ'},{'gene':'C1orf86'},{'gene':'SKI'},{'gene':'MORN1'},{'gene':'LOC100129534'},{'gene':'RER1'},{'gene':'PEX10'},{'gene':'PLCH2'},{'gene':'PANK4'},{'gene':'HES5'},{'gene':'LOC115110'},{'gene':'LOC100133445'},{'gene':'TNFRSF14'},{'gene':'FAM213B'},{'gene':'MMEL1'},{'gene':'TTC34'},{'gene':'ACTRT2'},{'gene':'LINC00982'},{'gene':'PRDM16'},{'gene':'MIR4251'},{'gene':'ARHGEF16'},{'gene':'MEGF6'},{'gene':'MIR551A'},{'gene':'TPRG1L'},{'gene':'WRAP73'},{'gene':'TP73'},{'gene':'TP73-AS1'},{'gene':'CCDC27'},{'gene':'SMIM1'},{'gene':'LRRC47'},{'gene':'CEP104'},{'gene':'DFFB'},{'gene':'C1orf174'},{'gene':'LINC01134'},{'gene':'RP13-614K11.1'},{'gene':'RP5-1166F10.1'},{'gene':'AJAP1'},{'gene':'MIR4417'},{'gene':'MIR4689'},{'gene':'NPHP4'},{'gene':'KCNAB2'},{'gene':'CHD5'},{'gene':'RPL22'},{'gene':'RNF207'},{'gene':'ICMT'},{'gene':'LINC00337'},{'gene':'HES3'},{'gene':'GPR153'},{'gene':'ACOT7'},{'gene':'HES2'},{'gene':'ESPN'},{'gene':'MIR4252'},{'gene':'TNFRSF25'},{'gene':'PLEKHG5'},{'gene':'NOL9'},{'gene':'TAS1R1'},{'gene':'ZBTB48'},{'gene':'KLHL21'},{'gene':'PHF13'},{'gene':'THAP3'},{'gene':'DNAJC11'},{'gene':'LOC100505887'},{'gene':'CAMTA1'},{'gene':'VAMP3'},{'gene':'PER3'},{'gene':'UTS2'},{'gene':'TNFRSF9'},{'gene':'PARK7'},{'gene':'ERRFI1'},{'gene':'SLC45A1'},{'gene':'RERE'},{'gene':'ENO1'},{'gene':'MIR6728'},{'gene':'ENO1-AS1'},{'gene':'CA6'},{'gene':'SLC2A7'},{'gene':'SLC2A5'},{'gene':'GPR157'},{'gene':'MIR34A'},{'gene':'H6PD'},{'gene':'SPSB1'},{'gene':'LOC100506022'},{'gene':'SLC25A33'},{'gene':'TMEM201'},{'gene':'PIK3CD'},{'gene':'C1orf200'},{'gene':'CLSTN1'},{'gene':'CTNNBIP1'},{'gene':'LZIC'},{'gene':'NMNAT1'},{'gene':'RBP7'},{'gene':'UBE4B'},{'gene':'KIF1B'},{'gene':'PGD'},{'gene':'APITD1-CORT'},{'gene':'APITD1'},{'gene':'CORT'},{'gene':'DFFA'},{'gene':'PEX14'}]   GT:GQ:CN    ./.:0:2 ./.:.:.

Just showed for one CNV , seems that it contains a lot of information but how can I separate it from the VCF and keep my vcf only with SNVs and INDELS, I do not want to use awk and grep here as I have to then recreate the vcf format, any tool that can do this and give me output vcf? vcftools does not work here it seems. Any help will be appreciated. Thanks

vcf vcftools SNP SNV CNV • 3.7k views
ADD COMMENT
1
Entering edit mode

from http://vcftools.sourceforge.net/man_latest.html

--remove-filtered <string>

Includes or excludes all sites marked with a specific FILTER flag. These options may be used more than once to specify multiple FILTER flags.

--remove-filtered CNV

there is no such FILTER named "CNV" in your VCF file in the FILTER=PASS column (but there is a Symbolic ALT allele named CNV)

ADD REPLY
0
Entering edit mode

Yes my mistake. I have to filter for column 5 of for INFO field. I will check for other options now. Thanks for pointing out the mistake.

ADD REPLY
3
Entering edit mode
7.8 years ago
William ★ 5.3k

BCFTools should do the job:

https://samtools.github.io/bcftools/bcftools.html#view

-v, --types snps|indels|mnps|other comma-separated list of variant types to select.

or

-V, --exclude-types snps|indels|mnps|other comma-separated list of variant types to exclude.

It looks at the alleles to compute the variant type.

Site is selected if any of the ALT alleles is of the type requested. Types are determined by comparing the REF and ALT alleles in the VCF record not INFO tags like INFO/INDEL or INFO/VT. Use --include to select based on INFO tags.

Note that selecting SNP also selects mixed SNP+INDEL variants. Exclude all other types if you want just the non mixed SNPs.

You can also use bcftools stats to look at the number of SNPs, INDELs etc.

https://samtools.github.io/bcftools/bcftools.html#stats

ADD COMMENT
0
Entering edit mode

Actually the motivation is to keep SNVs and INDELs and or just SNVs to work on driver mutation analysis with Intogen. The web platform is not able to do that on my vcf files and not giving any output for driver analysis so I downloaded it locally and would separate the SNVs or SNV+INDEL and then run the driver analysis to see if I can get something that supports my experimental designs. I will take a look at it tonight and get back to here if it works out. Seems that it should. I completely forgot about bcftools. Thanks a lot.

ADD REPLY
0
Entering edit mode

worked for me with bcftools , thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 2530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6