I'm working with a fasta file that isn't directly organized by chromosome. In the fasta, the chromosome info is held in the range= section:
>hg19_refGene_NM_032291 range=chr1:66999725-67210868 5'pad=0 3'pad=0 strand=+ repeatMasking=none
ggagatgcaggctggct etc.
The index file looks like this:
hg19_refGene_NM_032291 4894 97 60 61
so that when I search for a chromosome:position, i get an error message saying 'chr1' cannot be found. I've had this problem with a tool built on the c version of samtools, and in htsjdk. I know java so I tried working through the htsjdk code, and it is trying to find 'chr1' in a dictionary where the keys all look like the index entries, ie hg19_refGene_NM_032291.
any help is greatly appreciated. thanks.
you can clearly see that the entries don't contain the word 'chr1' , why would you expect to find such word ?
BTWn, most indexers (samtools faidx, stop reading the name after the first blank)
Thanks. I don't expect to find it directly (now that I understand how the file is indexed). I had hoped that someone would be familiar with this kind of index organization and explain how I could locate a coordinate in the reference,