Biostar Beta. Not for public use.
Annotate genomic positions with dbSNP rsIds
0
Entering edit mode
20 months ago
Jimbou • 690
@Jimbou9847

Although I already found some ways to annotate genomic positions with rsIDs using e.g. UCSC table browser, I'm not happy with that since I want a one-in-all linux script taking also strand issues (flipped alleles A-T vs- T-A or switched reference alleles) into account.

What I have:

chr position ref alt
10  169560   G   T
10  171117   G   A
10  171126   G   A
10  172995   A   C
10  178499   C   T

What I want:

chr position ref alt rsID
10  169560   G   T   rsXXX
10  171117   G   A   rsXXX, rsXXX
10  171126   G   A   rsXXX
10  172995   A   C   rsXXX
10  178499   C   T   rsXXX

Thanks

rsID dbSNP position chr tool • 206 views
ADD COMMENTlink
2
Entering edit mode
20 months ago
Jimbou • 690
@Jimbou9847

I will write down my solution as an answer for documentation purposes. I started as Pirerre recommended, but then I used bcftools instead of GATK.

First, I created a header .txt file for the custom vcf file

##fileformat=VCFv4.0
##fileDate=09052019
##source=allchr_allvsall_sex_adjusted
##reference==GRCh37.p13
##phasing=partial
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO

Then I used awk to generate the data for vcf according the specifications (8 columns). Setting ID="." == missing, Quality to 100 and PASS for the filter for all positions. Of note my_chr_pos_alt_ref.out.gz data consists only of autosomal SNVs!

zcat my_chr_pos_alt_ref.out.gz | awk '{print $1, ".", $2, $3, $4, 100, "PASS", "AA="$3}' OFS='\t' > tmp.vcf

add the header

cat header.txt tmp.vcf > mydata.vcf
rm tmp*

zipped and indexed

bgzip mydata.vcf
tabix -p vcf mydata.vcf.gz

Finally annotated rsIDs using:

bcftools annotate \
-a 00-common_all.vcf.gz \
-c ID mydata.vcf.gz \
--output-type z \
-o mydata_dbSNP151.vcf.gz

dbSNP files from ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/

ADD COMMENTlink
0
Entering edit mode
20 months ago
@Pierre Lindenbaum30

use awk to convert to vcf and then use gatk VariantAnnotator https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_annotator_VariantAnnotator.php with --dbsnp

ADD COMMENTlink
0
Entering edit mode

Thanks a lot. Started as you recommended, but switched to bcftools in the end.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3