Question

Spot Group in SRA

0

Entering edit mode

6.9 years ago

ognjen011 ▴ 250

I have recently dabbled into the barcoding of reads, and it seems I don't understand many basic concepts that I need for advanced understanding. I started by downloading a SRA from a paper with Unique Molecular Identifiers (UMI), but it didn't have any observable UMIs. My questions:

Am I correct to assume that if most reads are fully matched, my reads do not have adapters?
SRA has an option to generate read name by adding $sg for spot group (barcode) (as shown here https://edwards.sdsu.edu/research/fastq-dump-options/). In this case it is only 12 bases long. What is this barcode EXACTLY? Standard Illumina adapters seem to be a bit longer than that, and this should also contain an UMI. How come this sequence is so short?
Who trimmed these adapters and why? Is this customary in storing reads in archives?

I apologize if the questions seem trivial, but I couldn't find the answers anywhere.

EDIT: The paper is here, samples are from this study https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4907374/

sequencing • 2.7k views

ADD COMMENT • link updated 4.4 years ago by Haodong Chen • 0 • written 6.9 years ago by ognjen011 ▴ 250

0

Entering edit mode

Define what you mean by "fully matched" in this context.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k

0

Entering edit mode

e.g. 96M or any situation where no soft clipping occurs

ADD REPLY • link 6.9 years ago by ognjen011 ▴ 250

0

Entering edit mode

The UMIs aren't part of the read in these datasets, they're part of the barcodes and multiple different strategies were used throughout the paper (look at the methods). You'll find the UMIs in the lines with the read names.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k

0

Entering edit mode

Yes, as I said, I obtained them as part of the read name by explicitly requesting them with SRA fastq-dump. I was hoping to understand these barcodes on this example, so I can use this particular example.

ADD REPLY • link 6.9 years ago by ognjen011 ▴ 250

0

Entering edit mode

I take that back, part of the UMIs are part of the reads in at least some of the cases described in the paper. It's completely unclear if they've moved the UMI to be part of the barcode (this is the only explanation for it being 12 bases, since the index read was apparently 8 bases), it would have been nicer if they'd just provided the fastq files as they came off the machines without screwing with the read names and such.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k

1

Entering edit mode

Exactly. I really wasn't lazy, I read the whole paper, I fiddled around with the barcodes, but I can't get to the bottom of this. Thank you for taking a look though!

ADD REPLY • link 6.9 years ago by ognjen011 ▴ 250

0

Entering edit mode

Yeah, they made a real mess out of that upload. I suspect you'll need to contact the authors. Sorry I don't have better news there :(

ADD REPLY • link 6.9 years ago by Devon Ryan 104k

1

Entering edit mode

Unfortunately, I did that two weeks ago as well, but no reply from either corresponding author. I try to exhaust all the options before wasting the community time. Thanks, though!

ADD REPLY • link 6.9 years ago by ognjen011 ▴ 250

score 0 · Answer 1 · 2019-11-14

0

Entering edit mode

4.4 years ago

Haodong Chen • 0

You can use sam-dump to download the SAM file format and the UMI information is in its RG tag.

ADD COMMENT • link 4.4 years ago by Haodong Chen • 0