Restriction in read name length
1
1
Entering edit mode
5.5 years ago

Hi all,

Does anybody know if there is a restriction for a read name length (say up to 10k characters)? Have you ever experienced that any software (like samtools, bwa, STAR, etc) complains about long read names?

Thanks

read name • 1.2k views
ADD COMMENT
0
Entering edit mode

Please define long. 100 characters, 10000 characters?

ADD REPLY
0
Entering edit mode

See the edited version.

ADD REPLY
0
Entering edit mode

what's your motivation for doing this ?

ADD REPLY
0
Entering edit mode

I am writing a pipeline, where there is a read simulation step, and I want to include some information in the read name, like chr name (which can be long in some cases), coordinates, strand etc.

ADD REPLY
5
Entering edit mode
5.5 years ago

in htslib / sam.h https://github.com/samtools/htslib/blob/develop/htslib/sam.h

the read name is coded by

/* @field  l_qname length of the query name */
 uint8_t l_qname;

which is an unsigned byte. So the read name shouldn't be longer than 2^8=256.

it's even shorter in the read-sam method:

https://github.com/samtools/htslib/blob/f2150106bafda80ccd97970637091bb2799bc426/sam.c#L1305

 _parse_err(p - q > 252, "query name too long");
ADD COMMENT
0
Entering edit mode

Thanks, very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6