Biostar Beta. Not for public use.
Question: Multiple rsIDs at chromosomal location?
1
Entering edit mode

In the VCF format, there is the option for the ID field to have multiple semi-colon separated values. In theory, there could be two dbSNP rsIDs in a single line (i.e. two indels at chr:pos), but for programming purposes, that should not happen, correct? dbSNP has merged all variants for a given position to a common rsID?

ADD COMMENTlink 9 months ago rrbutleriii • 60 • updated 9 months ago Pierre Lindenbaum 120k
3
Entering edit mode

dbSNP has merged all variants for a given position to a common rsID?

I'm afraid no:

$ wget -q -O - "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/All_20180418.vcf.gz" | gunzip  -c | grep -v "#" | cut -f 1,2 | uniq -d  | head
1   10051
1   10055
1   10108
1   10109
1   10128
1   10132
1   10177
1   10228
1   10229
1   10235

.

$ wget -q -O - "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/All_20180418.vcf.gz" | gunzip  -c | grep -v "#" | cut -f 1,2,3,4,5 | grep -w 10051 -m2
1   10051   rs1052373574    A   G
1   10051   rs1326880612    A   AC
ADD COMMENTlink 9 months ago Pierre Lindenbaum 120k
Entering edit mode
0

Follow up: So when parsing a vcf, would I then have to anticipate some variant callers giving me: 1 10051 rs1052373574;rs1326880612 A G,AC

I haven't ever seen that before, but I don't see anything to prohibit it.

ADD REPLYlink 9 months ago
rrbutleriii
• 60
Entering edit mode
1

Correct - nothing to prohibit it; however, it can cause issues for downstream analysis tools. Most will not support multi-allelic calls like this.

ADD REPLYlink 9 months ago
Kevin Blighe
43k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0