tabix indexing on a non vcf/bed/sam txt file
0
1
Entering edit mode
5.2 years ago
PedroBarbosa ▴ 20

Hello,

I'm struggling to get the tabix index on a simple 3 columns bgzipped txt file:

#chr    pos     score
1       1       0.061011
1       2       0.061011
1       3       0.061011
...

Oddly, the indexing step is really fast (like 2 seconds), considering the file size (9Gb) and when a query a position I get no result without any warning. Has anyone faced a similar issue ?

tabix -s1 -b2 file.txt.bgz

tabix file.txt.bgz 1:2-3 -> empty result

Thanks in advance,

Pedro

next-gen software error • 3.3k views
ADD COMMENT
1
Entering edit mode

This works for me. Some troubleshooting questions:

  • Are there any messages during the index creation?
  • Is the file tab delimited?
  • Is the file sorted by the first and second column?
  • Is the file compressed by bgzip?
  • Have you tried your little example as well, or just your large data file?

fin swimmer

ADD REPLY
0
Entering edit mode

Indeed, it worked for my little example. I'm now running a large sort on the file (sort -V -k1,1 -k2,2) to see if this was the problem. Although I wasn't expecting that as I zcatted all chromosome files in the proper order, and in theory I donwloaded them already sorted.

Thanks for the suggestions, i'll let you know how it went.

ADD REPLY
0
Entering edit mode

@finswimmer, it didn't work, unfortunately.

These are my full commands, if you see any possible source of error let me know. This "wrong" index takes 2 seconds to be created. Never happened before.

header_file=$(head -n1 $files) 
zcat $header_file | head -1 | cut -f1,2,3 | bgzip > fitcons_v1.01_header.txt.bgz
srun cat $files | xargs zcat | grep -v "^#" | sort -V -k1,1 -k2,2 | awk -v OFS='\t' '{print $1,$2,$3}' |  bgzip > fitcons_v1.01.txt.gz
srun cat fitcons_v1.01_header.txt.bgz fitcons_v1.01.txt.gz > fitcons_v1.01.txt.bgz
tabix -s 1 -b 2 fitcons_v1.01.txt.bgz
ADD REPLY

Login before adding your answer.

Traffic: 2245 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6