vcf file processing

0

Entering edit mode

5.9 years ago

neeraj4biotech • 0

I have a vcf file, have run SnpEff for annotation. I need to group these snps based on their belong genes. such as x, y and z snps belong to gene w, for all gene.

SNP next-gen gene • 1.4k views

ADD COMMENT • link updated 5.9 years ago by GenoMax 141k • written 5.9 years ago by neeraj4biotech • 0

0

Entering edit mode

Are you trying to extract them into separate files per gene or are you trying to run a burden test or something sophisticated?

ADD REPLY • link 5.9 years ago by Vivek ★ 2.7k

0

Entering edit mode

Thanks Vivek for quick response. Have vcf file and bed/gff file as input file. Actually I want separate files per gene.

ADD REPLY • link 5.9 years ago by neeraj4biotech • 0

1

Entering edit mode

There are more elegant solutions if you can do some scripting but here's a crude workflow:

If you have one line per gene in the bed file, you can initially split the BED file into one file per gene like this:

split -l 1 Genes.bed Genes-

Depending on the number of genes, you might produce a lot of files here.

Rename to bed extension

for file in `ls Genes-*`;do mv $file $file.bed;done

Then use Tabix to split your VCF

for bed in `ls Genes-*.bed`;do tabix variants.vcf -h -B $bed > variants-${bed}.vcf;done

ADD REPLY • link 5.9 years ago by Vivek ★ 2.7k

0

Entering edit mode

It always helps if you can post some example data. Use datamash to group by gene and collapse all SNPs.

output:

$ datamash -H -g 1 collapse 2 < snps.txt 
GroupBy(gene)   collapse(snp)
x   a,b,c
y   d,e
z   f,g,h

input:

$ cat snps.txt 
gene    snp
x   a
x   b
x   c
y   d
y   e
z   f
z   g
z   h

Install datamash either from here or from distro repos (for debian based; sudo apt install datamash -y; for conda, conda install datamash -y).

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

Neeraj, can you post few lines of the data? I know it should be a standard vcf, still it helps !

ADD REPLY • link 5.9 years ago by lakhujanivijay 5.8k

Login before adding your answer.