How To Best Get Up To Speed In Dealing With Rnaseq Data ?
5
8
Entering edit mode
11.9 years ago
Wayne ★ 1.0k

Hello all, I have some experience with next generation DNA exome sequencing but in the next few weeks I will be getting in RNAseq data for which I do not have experience with. The data will be mapped already. I want to hit the ground running when the data gets here and want to prepare by being familiar with programs ill need to use. The goals are to: 1. Check to validate mutations identified in DNA from exome sequencing 2. Check expression levels 3. Check for fusions 4. Open to suggestions for other things to do.....?

I've really tried to find reviews and tutorials to get me up to speed but haven't had much luck. Any reviews, tutorials, or software recommendations of things I should definitely study or practice with would be extremely appreciated! Thanks so much for your time.

rna-seq rna sequencing expression • 6.0k views
ADD COMMENT
8
Entering edit mode
11.9 years ago
User 59 13k

I'll go for the obvious recent paper:

http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

ADD COMMENT
0
Entering edit mode

nice find, haven't see this one, I will post it in the tutorial section as well

ADD REPLY
7
Entering edit mode
11.9 years ago

This is how I would explain RNA-seq to someone who is new to the area.

Step 0: You have a hypothesis. You have decided that RNA-seq will be an ideal/novel experiment to investigate your hypothesis.

Step 1: Get your samples (case/control, tumor/normal, time-series... extract your RNA and make sure you do all QC)

  • Library preparation: key experimental step of RNA-seq. This determines the outcome of your experiment.

Step 2: Deep sequencing (Read on next-generation sequencing. You may use one of the recent NGS platform for your sequencing. Read about them here). Make sure that you understand the lingua franca of NGS (for example: single-end vs. paired-end, coverage etc.)

Step 3: Analysis pipe-line Typical output from an RNA-seq experiment is a .fastq file with sequence reads (two files for paired end experiment). Depending on the biological question, down-stream analysis can be designed.

I am adding a highly simplified conceptual framework to understand RNA-seq analytical frameworks

Primary analysis:

  • QC: Quality control and removal of poor-quality reads, adapters and linkers

Secondary analysis

  • Mapping: Find the location where each short read best matches the reference sequence. It is ideal to progressively increase the complexity of the mapping strategy to handle the unaligned reads from your experiment. This will help to turn millions of short reads into a quantification of expression.

  • Summarization: Aggregate sequence reads over biological units (exons, transcripts, genes). This is where you bring biological context to your sequencing reads.

  • Normalization: This is the step that help you to compare expression levels between (for example cases vs. controls) and within your samples (biological vs. technical replicates). Several statistical approaches are available see: RPKM(single-read), FPKM(paired-end) Quantile normalization, House-keeping gene normalization etc.

  • Differential expression testing: This step help to identify genes that have changed significantly. Here you use table of summarized count data and perform statistical test between samples (pairwise or multiple group comparisons) of interest. You can use statistical techniques based on empirical bayes estimation, negative binomial distribution etc for this.

Tertiary analysis

  • Down-stream analysis: Creating lists of DE genes gives you an estimate of expression trends. You can now use the list(s) and perform meta-analysis to see the functional, pathway-centric or network analysis. Remember that most of the existing down-stream analysis tools are designed for gene expression data from microarray experiments. You have to use tools that are designed for RNA-seq data for down-stream analysis (for example: fusion transcripts detection tools, enrichment tools designed to use RNASeq output etc. ). Other option is that you can use only gene lists for such analysis.

Step 4: Interpretation of your results: Use the results to assess your hypothesis

Step 5: Validation using alternate techniques (resequencing of your gene of interest, quantifying transcript levels, functional studies etc. )

PS. This answer is based on references in my citeulike library (See rnaseq)

ADD COMMENT
1
Entering edit mode

“step 0” very optional though. ;-) RNA-seq and other high-throughput techniques make it very tempting to do hypothesis-free research. Of course there’s some controversy about this but I think it’s clear that not every experiment starts with a hypothesis to be tested firmly in mind.

ADD REPLY
4
Entering edit mode

IMHO the main reason that high throughput sequencing makes hypothesis-free research tempting is that people know way too little about data analysis. The more you know the less tempting it is.

ADD REPLY
0
Entering edit mode

@op, That pretty much sums it up. However, for more detailed info on each of these steps, you could refer to the Wiki from my reply.

ADD REPLY
2
Entering edit mode
11.9 years ago
Arun 2.4k

I love this wiki on RNA-seq from seqanswers community; had to mention it!

ADD COMMENT
1
Entering edit mode
11.3 years ago
boczniak767 ▴ 850

You could also check that material from bioconductor course (with exercises) here

ADD COMMENT
1
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 3443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6