a question about some "basic numbers" for rna-seq experiments
3
0
Entering edit mode
8.8 years ago
CrazyB ▴ 280

From my limited experience with genomic DNA NGS, I could calculate how many "genomic samples" (be it WES (~50Mb), WGS (~3Gb), or targeted sequencing) to pool for one lane in Hiseq2500, how many sequencing lanes for one project (e.g. calculating at ~ 50-80 Gb data per lane) and use the info to estimate the total cost of the experiments. But it appears that this does not translate well into experiments of RNA-seq using NGS. Could anyone provide any pointer regarding (for lack of a better word) "RNA-seq statistics" to help me estimate the cost of an RNA-seq experiment? For example,

(a) in one Hiseq2500 sequencing lane, what's the total number of reads I could get for a 2 ug mRNA with a standard cDNA PCR cycle (5' reverse transcription + 3' PCR extension)?

(b) Nature Methods 6, 377 - 382 (2009) suggested for example that with a modified cDNA PCR cycle (extending reaction time from 5' to 30' for reverse transcription and from 3' to 6' for PCR extension), the authors got 100 million 50-base reads, which means ~ 5 Gb. If we assume 50-80 Gb data per lane in Illumina Hiseq2500, does this mean with the protocol from Nature Methods people could pool 10 different samples in one lane ?

(c) what's the average transcript reads for a sample of 2 ug mRNA as starting materials?

(d) from UCSC, hg19 statistics report a total of 34702 genes. On average, how many of these 34702 genes get transcribed ? what would be the average number of transcript per gene?

Any info on any of these and any other references on RNA-seq experiment would be greatly appreciated.

Thank you

RNA-seq • 3.5k views
ADD COMMENT
0
Entering edit mode

Please learn to ask one question per post.

ADD REPLY
1
Entering edit mode
8.8 years ago
Adamc ▴ 680

Regarding (a) and (c), some more info is required- there are a number of variables that could affect the number of reads per run. Outside of factors such as library prep and amount of input material (which I haven't been able to strongly associate with # of output reads, but that might just be since we don't run enough samples to establish a correlation there) you'll need to know about the run mode- high output or rapid run- and the version of the chemistry the instrument is set up for. There was a huge jump in the number of reads between the v3 and v4 update for the HiSeq, and with v4 we can get more high-quality 125bp reads than we could get 100bp reads on v3, for pretty much the same cost.

On (d)- from the standpoint of annotation, it's not just protein coding transcripts that get called "genes". You'll want to just count the ones that are annotated as "protein coding". You should be able to get that info through Biomart, and I imagine there's also some way to do it through the UCSC browser as well.

ADD COMMENT
0
Entering edit mode
8.8 years ago
Ido Tamir 5.2k

You can expect 250-300M reads per lane on a HiSeq2500. This is independent of the input to the library preparation step. Encode guidelines talk about 30M reads per experiment, so 8-10 samples per lane are poolable by this standard.

ADD COMMENT
0
Entering edit mode
8.8 years ago
Gary ▴ 480

In our lab, we pool 6 samples in one lane, but it is case by case. The best way is to see recently published papers that are very similar to your biological questions and experimental design. In addition, you can ask technicians in the sequencing center you will send samples to sequence. In fact, not only the price, but also the number of reads per lane is various between different sequencing centers.

ADD COMMENT

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6