How to analyze unannotated lncRNA using RNA-seq data?
1
0
Entering edit mode
7.5 years ago
xiaoyonf ▴ 60

Hi all,

I have a ~2kb sequence of unannotated lncRNA acquired from published literature. Since it is unannotated, I can not search by its name in any Genome Browser (i.e. TCGA, UCSC) and check its expression in RNA-seq datasets.

How to analyze such unannotated lncRNA using RNA-seq data? e.g., its expression across different subtypes of BC in TCGA dataset?

Thanks, Xiaoyong

RNA-Seq • 2.3k views
ADD COMMENT
3
Entering edit mode
7.5 years ago
tiago211287 ★ 1.4k

If this feature is not annotated, the programs for counting and measuring will not 'see' it. I would first visualize the expression by looking into the coordinates of this unannotated lncRNA using IGV or any other visual tool. If you have no reads mapping to this position, there is nothing you can do because it is not being expressed in your dataset.

If it is being expressed, you can create some 'fake' row at the annotation file (GTF file) using the coordinates of this lncRNA you have so HTSeq or Kallisto could see it.

Afterwards, you can use any statistics program (DESeq2, EdgeR) for telling if it is over or under expressed.

For Kallisto, you can transform your modified GTF to a transcriptome fasta file using gffread from the cufflinks package like this:

gffread -w transcriptome.fa -g Reference.genome.fa annotation.gtf

Afterwards, you can use this transcriptome.fa in Kallisto index and perform the counting with kallisto quant.

PS: Kallisto give you both normalized data and raw counts estimation. If you are going to use DESeq2 keep in mind that you must give only raw counts as input and never normalized data.

ADD COMMENT

Login before adding your answer.

Traffic: 3195 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6