Deleted:Guide to RNA-seq mapping with TopHat2: example of gene expression in human brain
0
1
Entering edit mode
8.2 years ago

Analysis of RNA expression is of the most important bioinformatics tasks. However, with RNA-seq many things can go wrong which makes expression analysis very tricky. In this tutorial we provide quite a detailed guide to RNA-seq mapping and explain some of the important factors you need to consider when doing mapping. You are going to touch a fascinating RNA-seq dataset obtained from a human brain tissue and used to study changes in gene expression patterns during aging in human.

RNA-seq is a next generation sequencing method that allows us to obtain a snapshot of the RNA present in a sample and estimate its abundance. RNA-Seq provides a comprehensive gene expression profile and helps to quantify and annotate genes and isoforms. The ability to quantify the level at which a particular gene is expressed in a cell, tissue or organism provides us with valuable biological information. For example, measuring gene expression can help to:

  • Identify viral infection of a cell (viral protein expression);
  • Determine an individual's susceptibility to cancer (oncogene expression);
  • Find if a bacterial strain is resistant to penicillin (beta-lactamase expression).

Ideally, measurement of expression should be done by detecting the final gene product (for many genes this is a protein); however, technically it is often easier to detect one of the protein precursors - typically mRNA - and infer gene expression level from there.

Several important factors make analysis of RNA-seq data complex:

  1. Most of the sequencing platforms only allow for up to 400 bp read length (but see PacBio and Oxford Nanopore). Therefore, reads are generally too short to cover an expressed gene region entirely and are thus called 'partial transcripts'.
  2. Some fraction of the sequencing reads in an RNA-seq experiment align to non-contiguous segments of the genome. Such reads are called "junction reads" - that is, reads that span the site of a splice in mRNA. Junction reads allow us to identify sites of alternative splicing, but can be complex to map and identify.
  3. In RNA-seq experiments, there are some sources of systematic variation that should be eliminated from RNA-seq data before the differential expression (DE) analysis. In particular, such variations include between-sample differences such as library size (sequencing depth) or within-sample differences, for example, in gene length, guanine-cytosine (GC) content or unwanted variation introduced by the batch effect.

A critical step in the RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. Reference-based alignment methods utilize the sequence of each read to find a potential mapping location either by an exact match for a reference or by scoring sequence similarity.

More at: https://insidedna.me/tutorials/view/tophat2-analysis-of-rna-expression-is

genomics rna-seq • 4.2k views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2751 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6