SRA Data Selection Bias??
1
0
Entering edit mode
6.4 years ago
Light ▴ 20

Hi all If I select 10 SRA dataset from the same SRPxxxxx, from same lab, same cell line, with good reads ranging from 1.7 to 2.2MBases. will there be any bias? if there is how to avoid it?

Thank you in advance

RNA-Seq Bias Data selection • 1.6k views
ADD COMMENT
0
Entering edit mode
6.4 years ago
Michael 54k

Yes, obviously, you named the biases:

from same lab, same cell line, with good reads ranging from 1.7 to 2.2MBases

The question is if that bias is relevant and what you want to achieve with analysing the data, so it seems that your problem is with the semantics of bias rather than the data. What do you want to use the data for?

how to avoid bias:

  • randomize, e.g. select random samples
  • stratify, e.g.: make sure your sample is representative of the population structure (e.g. contains as many good and bad sample, and represents the taxon structure, different technology, etc.)

Biases in your data will most certainly manifest in various parameters, such as:

  • contaminants (you might be able to distinguish labs based on "metagenomics" of contaminants)
  • mappability
  • gc content, k-mer content, read quality, RNA degradation profiles, coverage,....
ADD COMMENT
0
Entering edit mode

was about to find DEGs and study GO...Thanks for the answers.

ADD REPLY
0
Entering edit mode

please explain....

ADD REPLY
0
Entering edit mode

identification of differentially expressed genes between 5 vs 5 samples(infected and normal) and gene ontology and pathway enrichment study of it.

ADD REPLY
0
Entering edit mode

And what does this have to do with bias? As I said the problem of semantics: Bias (my simplified definition) means that the sampled data or method, with respect to some variable, do deviate from the expected value of the whole population. Example, you sample 10 people and record their age, education, if you sample from your university course only, you will probably get a very biased estimate of the population, like so: 100% of humans have higher education and 99% of people are below 30 years.

That does not mean that your approach is invalid, because if you are interested in your classmates specifically this sample is possibly valid. Therefore a bias is always a relation of two things: an estimate of a value based on a sample and the "truth" wrt some population. To ask whether 30 year of age is biased or not is meaningless per se.

With respect to DEGs and GO terms, there will definitely be a bias towards:

  • The genes that come up under the conditions the labs PI is most interested in
  • The GO terms that the lab is specifically provoking to come up because of the conditions and cell-lines tested

You might also want to read this: https://en.wikipedia.org/wiki/Bias_of_an_estimator

ADD REPLY
0
Entering edit mode

Btw.: what I meant with semantic problem is that: your question shows that you are not using the term Bias in a correct way and need to define it properly.

ADD REPLY
0
Entering edit mode

OK. I think am started getting it.

ADD REPLY

Login before adding your answer.

Traffic: 2649 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6