RNAseq Tximport transcript counts to gene counts
1
0
Entering edit mode
6.0 years ago

Hi!

I have quant.sf files generated by Salmon by mapping 100bp single-end Illumina libraries against primary transcripts. The genome for this species is not so perfect and is missing some genes of interest. So I am using both transcriptome and genome data for this RNA-seq.

Before passing this quantification data, I needed to run Tximport and generated a table containing transcript ID and gene ID from a gff3 file based on the genome annotation. Then, I realized many of the primary transcripts were missing in the genome.

I was going to add the unique transcript ID and some arbitrary gene ID into the table. But is this okay? What would be the standard protocol to deal with this?

Thanks!

RNA-Seq Salmon Tximport • 2.1k views
ADD COMMENT
0
Entering edit mode

So I am using both transcriptome and genome data for this RNA-seq.

How are you avoiding double counting entities shared between those two?

ADD REPLY
0
Entering edit mode
6.0 years ago
h.mon 35k

I don't see a simple solution to your problem. Why the genome is missing these genes of interest? Are the genes absent from the reference, or they are present (you can find them with, e.g., blast), but unannotated? Are these genes present on your transcriptome assembly?

The simplest solution: use only the transcriptome. You may use Corset to build a transcript to gene map, and you can map the transcripts to the genome and use bedtools intersect, subtract and overlap to annotate the transcripts and to find which annotated genes are found / not found on your transcriptome, and vice-versa.

ADD COMMENT

Login before adding your answer.

Traffic: 1907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6