Genes with identical reads mapping values across all samples - Kallisto
1
0
Entering edit mode
3.0 years ago

Dear All,

For the first time I wanted to try kallisto for gene expression quantification of RNA-seq data (bacterial strain). I noticed that 5 genes shared the same number of reads mapping across all samples (n = 36). The same behavior was observed for other tRNA genes

Gene_1,49,49.4,71.2,80.6,62.4,61.8,52.6,68.2,105.2,118.6,113.2,117.6,98.8,90.8,133.2,102.6,97.2,100.2,115,139,103.2,84,82,59.6,104.8,112,63,67.6,112.8,95.6,87.6,68.2,81
Gene_2,49,49.4,71.2,80.6,62.4,61.8,52.6,68.2,105.2,118.6,113.2,117.6,98.8,90.8,133.2,102.6,97.2,100.2,115,139,103.2,84,82,59.6,104.8,112,63,67.6,112.8,95.6,87.6,68.2,81 
Gene_3,49,49.4,71.2,80.6,62.4,61.8,52.6,68.2,105.2,118.6,113.2,117.6,98.8,90.8,133.2,102.6,97.2,100.2,115,139,103.2,84,82,59.6,104.8,112,63,67.6,112.8,95.6,87.6,68.2,81   
Gene_4,49,49.4,71.2,80.6,62.4,61.8,52.6,68.2,105.2,118.6,113.2,117.6,98.8,90.8,133.2,102.6,97.2,100.2,115,139,103.2,84,82,59.6,104.8,112,63,67.6,112.8,95.6,87.6,68.2,81  
Gene_5,49,49.4,71.2,80.6,62.4,61.8,52.6,68.2,105.2,118.6,113.2,117.6,98.8,90.8,133.2,102.6,97.2,100.2,115,139,103.2,84,82,59.6,104.8,112,63,67.6,112.8,95.6,87.6,68.2,81

This is how the expression matrix was exported:

files <- file.path(base_dir, "kallisto", samples$sample, "abundance.h5")
names(files) <- paste0("sample", 1:36)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
write.table(txi.kallisto$counts, file = "countData")

Each gene encode for a tRNA-Glu, they are found on different chromosomal location (I have the full genome sequence), and all share the same sequence.

Since I am building a co-expression network, what should I do?

Thank you for your time!

Andrea

gene Kallisto expression • 713 views
ADD COMMENT
2
Entering edit mode
3.0 years ago
Michael 54k

If the transcripts are of fully identical sequence there is no information that could help to distinguish them for Kallisto, so in a sense the result is correct as you are presenting it. This might be a marginal case, however, so it might be ok to do a de-duplication of the transcriptome before running Kallisto, or simply collapse identical sequences before network construction to avoid these highly correlated nodes. On the other hand, this might only affect a handful of genes and tRNAs are maybe not the most exiting genes either (excuse me if I offended you as a tRNA researcher). So I guess it could be simply ignored. If you get a module with a lot of highly correlated tRNAs you would know where that comes from.

ADD COMMENT
0
Entering edit mode

Thank you Michael,

excuse me if I offended you as a tRNA researcher

No offense taken, I am not a tRNA researcher. I think I am going to collapse identical sequences and run the network analysis again.

Best Andrea

ADD REPLY

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6