Question

normalization with RPKM

0

Entering edit mode

4.9 years ago

Chironex ▴ 40

hello! I'm very i difficult with normalization of my data. I was searching for transposable elements in my genome, and after this step, I made counts of reads in some transcripts. I produced something like that:

head(table_tissues_filtered_TE)
                  Lengths ova testes lobe retina suckers brain1 brain2 skin stage
Simple_repeat_80      134  58     77   48     69     115     137  131  195     75
tRNA_1                 59   0     14   12      1      19      12   14   21    104
Simple_repeat_87       26   1     33   12      3      15      24   21   19    180
Simple_repeat_114      22   0      0    0      1       0       0    0    2      7
Simple_repeat_115      30   0      0    0      0       0       0    0    0      1
Simple_repeat_123      22   2      3   317     45      13    652  651   15     21
                              axial                    gland viscera
Simple_repeat_80                 99                       35     557
tRNA_1                            9                        0       3
Simple_repeat_87                  9                        0       4
Simple_repeat_114                 0                      204       0
Simple_repeat_115                 0                       42       0
Simple_repeat_123               333                        5       4

where Lengths are the Length of each elements (simple repeats, etc), and the other columns indicate the reads counted with Featurecounts. I've another thable with the number of reads for each tissues:

head(reads_table)
       ova   testes lobe   retina  suckers   brain1  brain2  skin   stage   axial
      522444 310243 226146  102307  126055   489389  668243  372728 262536  233754
  gland  viscera
  24817   25689

I would make a RPKM analysis to normalize the data using R, but I don't know exactly of to do it. Anyone can help me? thank you!!!

normalization RNA-Seq data rpkm transposon • 1.2k views

ADD COMMENT • link updated 4.3 years ago by Biostar 20 • written 4.9 years ago by Chironex ▴ 40

0

Entering edit mode

RPKM/FPKM is a unit, not a method or analysis. Today, people usually use TPM unit instead of R(F)PKM. To calculate TPM, you can run your bam files through Stringtie software, or you can use Salmon or Kallisto software using the fastq files directly.

ADD REPLY • link 4.9 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

Stringtie is not implemented in R, right?

ADD REPLY • link 4.9 years ago by Chironex ▴ 40

0

Entering edit mode

No, its a standalone software. Here is the link https://ccb.jhu.edu/software/stringtie/

ADD REPLY • link 4.9 years ago by grant.hovhannisyan ★ 2.6k

score 0 · Answer 1 · 2019-06-09

0

Entering edit mode

4.9 years ago

vin.darb ▴ 300

DESeq2's (R package) normalisation method seem's to be better than RPKM/FPKM/TPM method for gene count comparisons between samples (source https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html)

The package take featurecounts table as input and then you canretrieve normalized counts

https://bioc.ism.ac.jp/packages/2.14/bioc/vignettes/DESeq2/inst/doc/beginner.pdf

ADD COMMENT • link 4.9 years ago by vin.darb ▴ 300

0

Entering edit mode

Is it usable even if elements are not genes, but transposon? the gtf file of featurecounts is the output of repeatmasker analysis

ADD REPLY • link 4.9 years ago by Chironex ▴ 40