Biostar Beta. Not for public use.
Keeping only common variants in the merged VCF file
0
Entering edit mode
14 months ago
seta ♦ 1.2k
Sweden

Hi all,

After merging my vcf file containing specific variants with those variants in 1000 genome vcf, the ID column of merged VCF file is like below:

chr1:39440410:SG

rs6722104

rs60323161;chr1:39244787:SG

which only the rs60323161;chr1:39244787:SG are common variants. Please kindly let me know how can keep only common variants in the merged vcf file?

I used bcftools view -T for keeping just common variants, but it didn't work well; actually, the variants like below is still exist in the file, which chr1:39448418:SG should be removed

rs3118014;chr1:39448418:SG

chr1:39448418:SG

I also tested grep -Fwvf and grep -vf for removing those variants, but none of them works well. Please kindly share me your solution?

Thanks

ADD COMMENTlink
1
Entering edit mode
17 months ago
husensofteng • 80
Sweden

I am not sure if I understand the question correctly, but it sounds as a line filtering issue to me. So:

awk '$1~"#" || ($3~"rs" && $3~"chr")' inputfile > outputfile

*Only keep lines that start with # (header lines) or there is rs ID and chr info at the third column of the file.

ADD COMMENTlink
0
Entering edit mode

Many thank for your nice solution.

ADD REPLYlink
0
Entering edit mode
13 months ago
United States

Two options:

1) use BEDtools 'intersect' for the two original VCFs.

2) use VCFtools 'vcf-annotate' to add the 1000 Genomes rs numbers, then 'grep' to keep the variants that were annotated as such.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1