From genomic coordinate to Reference base
3
0
Entering edit mode
9.2 years ago

Hi, I have an extensive set of genomic coordinates (i.e. chr1:9071988) and I need to automatically get the reference (Human Genome hg19 ) base corresponding to that positions.

Could you provide me some hints to solve this issue please?

Thank you in advance!

genome • 3.2k views
ADD COMMENT
0
Entering edit mode

Thank you everybody for the useful comments and suggestions, really appreciated! samtools faidx and bedtools getfasta work both perfectly for my purpose. Many thanks again!

ADD REPLY
3
Entering edit mode
9.2 years ago

samtools faidx and some shell scripting should work.

ADD COMMENT
0
Entering edit mode

What? Sorry, I don't get it, again. samtools? Why samtools? That makes no sense! ;)

ADD REPLY
0
Entering edit mode

Samtools can be used to extract subsequences from a fasta file. That's what the faidx command does.

ADD REPLY
0
Entering edit mode

Wow! Didn't know that! That is a nice way of doing that! Thank you for that information!

Just looked it up:

samtools faidx hg19.fasta chr1:9071988,9071988
ADD REPLY
1
Entering edit mode

You may also find the bedtools getfasta command useful.

ADD REPLY
1
Entering edit mode
9.2 years ago

You could easily script the following DAS lookup for each of your coordinates:

$ wget -qO- http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:9071988,9071988 | grep -v '^<'
c
ADD COMMENT
1
Entering edit mode
9.2 years ago

You can try fastacmd.

Create DB (run this only once):

​formatdb -i hg19.fa -o T -p F -V

Get nucleotide:

​fastacmd -d hg19.fa -L 9071988,9071989 -s "chr1" (-S 2 if neg. Strang)
ADD COMMENT

Login before adding your answer.

Traffic: 1926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6