how to get a subset of vcf file for specific SNPs
2
1
Entering edit mode
7.6 years ago
yingw7373 ▴ 20

Hi

I don't have any experience of creating vcf file. Now I have a list of specific SNPs (csv file with SNPs ID) and would like to create a vcf file of these target SNPs. Could you please let me know how to do it? Thank you!

Kate

SNP • 18k views
ADD COMMENT
6
Entering edit mode
7.6 years ago

If you want to use vcftools you can select SNPs either by ID or positions

with --snps file_listing_snpIDs or with --positions file_listin_chr_and_positions

check the manual for more information: http://vcftools.sourceforge.net/man_latest.html

For example, this could be a command:

vcftools --vcf input_file.vcf --snps mySNPs.txt --recode --recode-INFO-all --out SNPs_only

where mySNPs.txt looks like this:

rs12121
rs242343
rs2348724
.
.
.
ADD COMMENT
0
Entering edit mode

How would the file file_listin_chr_and_positions have to look like? I didn't find it in the manual.

Maybe like this?

1 15342 15563
Y 1513212 1516246

Edit: Ok I thought I could select SNPs within a certain genomic range with that list, but that is not the case apparently.

ADD REPLY
1
Entering edit mode
7.6 years ago

assuming you have a list of human rs ID## you can just collect the lines from the NCBI VCF. ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/ using GATK selectVariant with option --keepIDs

List of variant IDs to select: If a file containing a list of IDs is provided to this argument, the tool will only select variants whose ID field is present in this list of IDs. The matching is done by exact string matching. The expected file format is simply plain text with one ID per line.

ADD COMMENT

Login before adding your answer.

Traffic: 2861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6