SRA and the relationship between biosamples, experiments and runs
16 months ago
JJ • 430

Dear all,

Obviously I've read this (

What is the relationship between BioSamples, SRA Experiments, SRA Runs, and my data files?

BioSample is descriptive information about the biological source materials, or samples, used to generate experimental data in any of primary data archives. Biological and technical replicates need to be registered as separate BioSamples distinguished by the "replicate" attribute having values such as "biological replicate 1" and "biological replicate 2".

Each SRA Experiment is a unique sequencing library for a specific sample. Importantly, much of the descriptive information that is displayed in the public record of your data is captured at the level of the DRA Experiment.

SRA Runs are simply a manifest of data file(s) that should be linked to a given sequencing library – no information present in the Run is displayed on the public record of your project. Note that all data files listed in a Run will be merged into a single SRA archive file (and fastq file for distribution), so files from different samples should not be grouped in the same Run. Paired-end data files (forward/reverse), conversely, MUST be listed in a single run in order for the two files to be correctly processed as paired-end. Do not divide a sample for a paired-end library (for example, forward and reverse).

Still I am struggling to understand. e.g., this study:

  • It has 2 biosamples, 8 experiments and 8 runs
  • It's clear to me that the 8 runs are 8 different data files.
  • As there are 8 experiments, there must be 4 distinct sequencing libraries for each biosample - their could e.g., have different insert sizes (this is not the cases for this study though)
  • In the corresponding paper, it's stated that there are two technical replicates per sample, which I found also stated under the experiment factor [replicate]. Ok.
  • The paper also says that they have multiplexed 4 on one lane. Now I would conclude that two libs were created for each biosample and all four together have been sequenced on two lanes. Is this wrong?
  • Still, why aren't there only 2 experiments (hence 2 distinct sequencing libraries) per biosample?
  • And, I thought replicates should be different biosamples?

e.g., this study: the Illumina BodyMap Data (only the 16 Tissues mixture samples):

  • There are 3 biosamples
  • there are 16 runs, so 16 data files.
  • each is a separate experiment.
  • the experimental factor I could find is LIBRARYPREP, which has 3 different values
  • again, I don't get why each run is a separate experiment. Some I would say are the same...

What am I not getting here? Thanks!!!

sequencing • 264 views
11 months ago
JC 7.9k

In this case, the statement "Each SRA Experiment is a unique sequencing library for a specific sample." is the explanation of your questions, in ERP004697 there are 2 biological samples (liver and brain), then before sequencing, the samples were prepared using 4 indexes, that means the library preparation was done individually, generating the 8 experiments. This is done to enable multiplexing and because you can have a problem in library preparation or in sequencing. Hope that helps.

Thank you very much for your answer. However, I am still not getting it. Sorry ... I am trying to rap my head around it!

then before sequencing, the samples were prepared using 4 indexes, that means the library preparation was done individually, generating the 8 experiments.

There are 2 biosamples (liver and brain) The papers states they have done 2 replicates each (not 4 - this would indicate 8 lib preps - this is also reflected in the metadata). They have sequenced two lanes. And the 8 runs have 4 different indexes. And the runs belonging to the same replicate entity have the same index. I would have interpreted this as 4 lib prep (each with a different index) spread on 2 lanes. Hence this would add up to 4 experiments. Can you explain to me again why you came up with 8 experiments? Thanks!

Which one is the paper? link?

From SRA looks like they have 2 samples (liver and brain), each one with 4 experiments, but it's unclear to me if the replicates are biological (which I don't see) or technical.

Hi, thank you for taking another look! they are technical according to the paper. Thanks!!!


