Question

How to separate htseq-count table to coding RNAs and non-coding RNAs

1

Entering edit mode

8.6 years ago

Naresh D J ▴ 110

Hi,

I have generated the raw read counts for genes from RNA-seq data using htseq-count. Now I want to separate the this table into coding RNAs and non-coding RNAs.

I am new to the NGS data analysis.

Can anyone help me or suggest me ideas how to do it?

Thank you,
Naresh

RNA-Seq htseq-count • 4.6k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Naresh D J ▴ 110

0

Entering edit mode

What you mean by coding and non-coding RNAs? Do you mean separating counts for coding and non-coding transcripts ? Or do you mean separating counts for coding (exonic) and non-coding (intronic, UTRs) regions for a given transcript?

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Ashutosh Pandey 12k

0

Entering edit mode

@Ashutosh Pandey, yes I want to separate the counts for coding and non-coding transcripts.

For separation of coding and non-coding regions there is a tool RSeQC.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Naresh D J ▴ 110

0

Entering edit mode

Well RSeQC will give you the numbers or fractions of reads aligned to different genic features but it won't separate them. Anyways, what you need is the annotation of transcripts (genes) based on their biotypes. If these are ENSEMBL genes or gene IDs then you can use Biomart (http://www.ensembl.org/biomart) to download the "Biotype" for each gene and then annotate ENSEMBL genes in the count file as protein-coding, rRNA, tRNA, snoRNA, miRNA etc.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thank you. I will try your suggestion and let you know.

ADD REPLY • link 8.6 years ago by Naresh D J ▴ 110

Ram · Answer 1 · 2015-09-28

What is your organism model?

If you are using some genome from ensembl, and used the gtf file with the set of anotations in the HTSeq-count, you can import all the tables with counts in txt files inside a data.frame in R.

With bioconductor do:

biocLite("biomaRt")

With this package you can get from the ensembl, a dataset based in several filters, for example, the biotype (if it is coding or noncoding).

Then you can simple merge the two tables based in the ensembl ID's and separate them based in your criteria. If you do not want to use R, the ensemble has a graphic web interface in http://www.ensembl.org/biomart, although I recommend R, because will be more easy later to create better graphics and statistics.

Links:

biomaRt manual
bioconductor website
R website

P.S.: biomaRt also can handle Uniprot and HapMap databases

score 1 · Answer 2 · 2015-09-28

1

Entering edit mode

8.6 years ago

Antonio R. Franco ★ 5.1k

To do so, you need a file with a relation (range of bases) of the sequences that are coding and not coding. Mapping reads to the reference genome or transcriptome is not aware of this information

ADD COMMENT • link 8.6 years ago by Antonio R. Franco ★ 5.1k

0

Entering edit mode

@Antonio R, Franco, can you kindly elaborate your thoughts.

ADD REPLY • link 8.6 years ago by Naresh D J ▴ 110

0

Entering edit mode

What other information would you want?

ADD REPLY • link 5.4 years ago by scchess ▴ 640

0

Entering edit mode

What other information would you want?

ADD REPLY • link 5.4 years ago by scchess ▴ 640