I'm quite new to RNA-sequencing and am playing around with data to get a handle on it. I have quantified with Kallisto
and am using tximport
to summarize transcript counts for differential gene expression analysis.
I am running into a problem associating gene ID's with my transcripts for the summarization portion. I believe that the likely cause is the actual TxDb library I am using and that it may be different from the transcriptome file I used, but I am not sure and my attempts at solving this haven't been successful.
I am working with human samples. I quantified my transcripts using this transcriptome file for homo sapiens. I have 6 samples, 3 WT replicates, and 3 KO replicates.
I created a vector pointing to my kallisto files as detailed in the tximport manual.
files <- file.path(dir, "kallisto", samples$run, "abundance.tsv")
I created a data.frame from a TxDb object to construct the tx2gene table.
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
tx2gene <- df[, 2:1] # tx ID, then gene ID
But head(tx2gene)
produces:
TXNAME GENEID
1 uc002qsd.4 1
2 uc002qsf.2 1
3 uc003wyw.1 10
4 uc002xmj.3 100
5 uc010xbn.1 1000
6 uc002kwg.2 1000
This obviously isn't right.
Using tximport's
tximport
function.library(tximport)
library(readr)
txi <- tximport(files, type = "kallisto", tx2gene = tx2gene, reader = read_tsv)
names(txi)
Does the following:
txi $abundance
sample 1 sample 2 sample 3 sample 4 sample 5 sample 6
$counts
sample 1 sample 2 sample 3 sample 4 sample 5 sample 6
$length
sample 1 sample 2 sample 3 sample 4 sample 5 sample 6
$countsFromAbundance
[1] "no"
and head(txi$counts)
:
head(txi$counts)
sample 1 sample 2 sample 3 sample 4 sample 5 sample 6
I'm not completely sure what i'm doing incorrectly. I'll give it another shot after lunch, it might just be the frustration at this point but any help is appreciated.
Link to Bioc post: https://support.bioconductor.org/p/81012/#81016