Biostar Beta. Not for public use.
Question: How to subset a set of variations from a VCF on specific chromosome and between 2 postions?
0
Entering edit mode

Hi,

I'm a very beginner on using bash so my question may seem stupid for some of you. I have a VCF annotated file with a big number of samples. I want to subset a file from this one with all the variations of a gene (located on the chromosome ($1 = chr9) and between the position ($2 = POS) 81583683 and 81689305. I used the awk command after modifications awk '{$1== "chr9" && 81583683 <$2< 81689305}' VCF1 > VCF2 but had always error message.

Can anyone tell me please if the awk command is correct in this case for selection with 2 conditions or I should use another command?

Thank you

ADD COMMENTlink 8 months ago Defne • 0 • updated 8 months ago finswimmer 11k
1
Entering edit mode

you want:

 awk -F '\t' '($0 ~ /^#/ || ("chr9" && 81583683 <$2 && $2< 81689305))' VCF1 > VCF2

or, better, after indexing the VCF1:

bcftools view vcf1.vcf.gz "chr9:81583683-81689305"
ADD COMMENTlink 8 months ago Pierre Lindenbaum 120k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0