Question

SRA and the relationship between biosamples, experiments and runs

1

Entering edit mode

5.2 years ago

JJ ▴ 670

Dear all,

Obviously I've read this (https://www.ddbj.nig.ac.jp/faq/en/biosample-bioproject-sra-e.html):

What is the relationship between BioSamples, SRA Experiments, SRA Runs, and my data files?

BioSample is descriptive information about the biological source materials, or samples, used to generate experimental data in any of primary data archives. Biological and technical replicates need to be registered as separate BioSamples distinguished by the "replicate" attribute having values such as "biological replicate 1" and "biological replicate 2".

Each SRA Experiment is a unique sequencing library for a specific sample. Importantly, much of the descriptive information that is displayed in the public record of your data is captured at the level of the DRA Experiment.

SRA Runs are simply a manifest of data file(s) that should be linked to a given sequencing library – no information present in the Run is displayed on the public record of your project. Note that all data files listed in a Run will be merged into a single SRA archive file (and fastq file for distribution), so files from different samples should not be grouped in the same Run. Paired-end data files (forward/reverse), conversely, MUST be listed in a single run in order for the two files to be correctly processed as paired-end. Do not divide a sample for a paired-end library (for example, forward and reverse).

Still I am struggling to understand. e.g., this study: https://www.ncbi.nlm.nih.gov/Traces/study/?acc=ERP004697

It has 2 biosamples, 8 experiments and 8 runs
It's clear to me that the 8 runs are 8 different data files.
As there are 8 experiments, there must be 4 distinct sequencing libraries for each biosample - their could e.g., have different insert sizes (this is not the cases for this study though)
In the corresponding paper, it's stated that there are two technical replicates per sample, which I found also stated under the experiment factor [replicate]. Ok.
The paper also says that they have multiplexed 4 on one lane. Now I would conclude that two libs were created for each biosample and all four together have been sequenced on two lanes. Is this wrong?
Still, why aren't there only 2 experiments (hence 2 distinct sequencing libraries) per biosample?
And, I thought replicates should be different biosamples?

e.g., this study: the Illumina BodyMap Data (only the 16 Tissues mixture samples): https://www.ncbi.nlm.nih.gov/Traces/study/?acc=ERP000546

There are 3 biosamples
there are 16 runs, so 16 data files.
each is a separate experiment.
the experimental factor I could find is LIBRARYPREP, which has 3 different values
again, I don't get why each run is a separate experiment. Some I would say are the same...

What am I not getting here? Thanks!!!

sequencing • 3.5k views

ADD COMMENT • link updated 5.2 years ago by JC 13k • written 5.2 years ago by JJ ▴ 670

score 1 · Answer 1 · 2019-02-15

1

Entering edit mode

5.2 years ago

JC 13k

In this case, the statement "Each SRA Experiment is a unique sequencing library for a specific sample." is the explanation of your questions, in ERP004697 there are 2 biological samples (liver and brain), then before sequencing, the samples were prepared using 4 indexes, that means the library preparation was done individually, generating the 8 experiments. This is done to enable multiplexing and because you can have a problem in library preparation or in sequencing. Hope that helps.

ADD COMMENT • link 5.2 years ago by JC 13k

0

Entering edit mode

Thank you very much for your answer. However, I am still not getting it. Sorry ... I am trying to rap my head around it!

then before sequencing, the samples were prepared using 4 indexes, that means the library preparation was done individually, generating the 8 experiments.

There are 2 biosamples (liver and brain) The papers states they have done 2 replicates each (not 4 - this would indicate 8 lib preps - this is also reflected in the metadata). They have sequenced two lanes. And the 8 runs have 4 different indexes. And the runs belonging to the same replicate entity have the same index. I would have interpreted this as 4 lib prep (each with a different index) spread on 2 lanes. Hence this would add up to 4 experiments. Can you explain to me again why you came up with 8 experiments? Thanks!

ADD REPLY • link 5.2 years ago by JJ ▴ 670

1

Entering edit mode

Which one is the paper? link?

From SRA looks like they have 2 samples (liver and brain), each one with 4 experiments, but it's unclear to me if the replicates are biological (which I don't see) or technical.

ADD REPLY • link 5.2 years ago by JC 13k

0

Entering edit mode

Hi, thank you for taking another look! they are technical according to the paper. Thanks!!!

ADD REPLY • link 5.2 years ago by JJ ▴ 670