Spot Group in SRA
1
0
Entering edit mode
6.9 years ago
ognjen011 ▴ 250

I have recently dabbled into the barcoding of reads, and it seems I don't understand many basic concepts that I need for advanced understanding. I started by downloading a SRA from a paper with Unique Molecular Identifiers (UMI), but it didn't have any observable UMIs. My questions:

  1. Am I correct to assume that if most reads are fully matched, my reads do not have adapters?
  2. SRA has an option to generate read name by adding $sg for spot group (barcode) (as shown here https://edwards.sdsu.edu/research/fastq-dump-options/). In this case it is only 12 bases long. What is this barcode EXACTLY? Standard Illumina adapters seem to be a bit longer than that, and this should also contain an UMI. How come this sequence is so short?
  3. Who trimmed these adapters and why? Is this customary in storing reads in archives?

I apologize if the questions seem trivial, but I couldn't find the answers anywhere.

EDIT: The paper is here, samples are from this study https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4907374/

sequencing • 2.7k views
ADD COMMENT
0
Entering edit mode

Define what you mean by "fully matched" in this context.

ADD REPLY
0
Entering edit mode

e.g. 96M or any situation where no soft clipping occurs

ADD REPLY
0
Entering edit mode

The UMIs aren't part of the read in these datasets, they're part of the barcodes and multiple different strategies were used throughout the paper (look at the methods). You'll find the UMIs in the lines with the read names.

ADD REPLY
0
Entering edit mode

Yes, as I said, I obtained them as part of the read name by explicitly requesting them with SRA fastq-dump. I was hoping to understand these barcodes on this example, so I can use this particular example.

ADD REPLY
0
Entering edit mode

I take that back, part of the UMIs are part of the reads in at least some of the cases described in the paper. It's completely unclear if they've moved the UMI to be part of the barcode (this is the only explanation for it being 12 bases, since the index read was apparently 8 bases), it would have been nicer if they'd just provided the fastq files as they came off the machines without screwing with the read names and such.

ADD REPLY
1
Entering edit mode

Exactly. I really wasn't lazy, I read the whole paper, I fiddled around with the barcodes, but I can't get to the bottom of this. Thank you for taking a look though!

ADD REPLY
0
Entering edit mode

Yeah, they made a real mess out of that upload. I suspect you'll need to contact the authors. Sorry I don't have better news there :(

ADD REPLY
1
Entering edit mode

Unfortunately, I did that two weeks ago as well, but no reply from either corresponding author. I try to exhaust all the options before wasting the community time. Thanks, though!

ADD REPLY
0
Entering edit mode
4.4 years ago

You can use sam-dump to download the SAM file format and the UMI information is in its RG tag.

ADD COMMENT

Login before adding your answer.

Traffic: 1483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6