Question

which genes have altered transcript proportions?

0

Entering edit mode

5.6 years ago

mamillerpa ▴ 40

I collected some rsem isoforms.results files and created a data frame with the transcript ID, the isoform percentage, and an experimental condition (mutant or wildtype)

> head(one.var, n=8)
             transcript_id IsoPct GENO.COHORT
3246628 ENSMUST00000000028  62.32      MUTANT
3381904 ENSMUST00000000028  61.87      MUTANT
3517180 ENSMUST00000000028  46.42    WILDTYPE
3652456 ENSMUST00000000028  53.99    WILDTYPE
3787732 ENSMUST00000000028  77.04      MUTANT
3923008 ENSMUST00000000028  37.67      MUTANT
4058284 ENSMUST00000000028  88.63    WILDTYPE
4193560 ENSMUST00000000028  69.87    WILDTYPE

> dput(head(one.var, n=8))
structure(list(transcript_id = c("ENSMUST00000000028", "ENSMUST00000000028", 
"ENSMUST00000000028", "ENSMUST00000000028", "ENSMUST00000000028", 
"ENSMUST00000000028", "ENSMUST00000000028", "ENSMUST00000000028"
), IsoPct = c(62.32, 61.87, 46.42, 53.99, 77.04, 37.67, 88.63, 
69.87), GENO.COHORT = c("MUTANT", "MUTANT", "WILDTYPE", "WILDTYPE", 
"MUTANT", "MUTANT", "WILDTYPE", "WILDTYPE")), row.names = c(3246628L, 
3381904L, 3517180L, 3652456L, 3787732L, 3923008L, 4058284L, 4193560L
), class = "data.frame")

Now I'd like to see what genes have altered transcript proportions in the mutant condition. So I thought I'd start by looking for changes in the percent abundance for individual transcripts, and then merge the gene infraction back in afterwards.

transcript.list <- sort(unique(as.character(one.var$transcript_id)))
last.transcript <- tail(transcript.list, n = 1)

test.res.frame <-
  lapply(transcript.list, function(one.transcript) {
    temp <- one.var[one.var$transcript_id == one.transcript ,]
    print(paste0(one.transcript, " ", last.transcript))
    if (length(unique(temp$IsoPct)) > 1) {
      test.res <- wilcox.test(IsoPct ~ GENO.COHORT, temp)
      return(list(transcript_id = one.transcript, p.value = test.res$p.value))
    } else {
      return(list(transcript_id = one.transcript, p.value = NA))
    }
  })

test.res.frame <- do.call(rbind.data.frame, test.res.frame)
test.res.frame$p.adj <- p.adjust(test.res.frame$p.value)

I think this is going to take ~ 40 minutes to loop through 135k transcripts on a single core, and I have other datsets that I want to process in a similar fashion.

Is there some existing base R or Bioconductor function that can do a two-way test like this on all transcripts in bulk? Should I just use something like mclapply?

RNA-Seq star rsem • 1.3k views

ADD COMMENT • link updated 5.6 years ago by h.mon 35k • written 5.6 years ago by mamillerpa ▴ 40

score 3 · Accepted Answer · 2018-08-27

3

Entering edit mode

5.6 years ago

h.mon 35k

If I understand correctly, you want to test for differential isoform usage. If so, there are the R packages called RATs ( https://github.com/bartongroup/Rats ) and IsoformSwitchAnalyzeR ( https://bioconductor.org/packages/release/bioc/html/IsoformSwitchAnalyzeR.html ). Also, I think Cufflinks / Cuffdiff also reports "isoform switches" in the file splicing.diff.

ADD COMMENT • link 5.6 years ago by h.mon 35k

0

Entering edit mode

So far I have only tried RATS, but it's clearly more sophisticated than anything I could have thrown together. It's also faster than what I described above, but it still takes several minutes on a modern Windows laptop. I haven't looked into multi-threading for tximport or RATS yet. I also need to do more reading about the consequences of different kinds of normalization during tximport.

I found browseVignettes("rats") to be the best way to study RATS

I'm looking forward to trying isoformSwitchAnalyzeR since it offers Functional Consequences analysis.

ADD REPLY • link 5.6 years ago by mamillerpa ▴ 40