Biostars beta testing.
Question: How do you convert featureCounts output into composition of mapped reads by RNA class?
1
Entering edit mode

Hey Guys - New to RNAseq. I used featureCounts to generate a table that has my gene id's and the counts for my untreated and treated samples (did smallRNAseq). I want to be able to convert this data into a summary by RNA class (i.e. what % of these reads are miRNA, snoRNA, rRNA etc). Can someone share how to do this or point me in the right direction. I've read a few things online but it makes no sense to me.

To add to complications... I also have a seperate table of tRNA genes and their counts for untreated vs treated. My end objective is to be able to say x% of reads were tRNAs, x% were miRNA, x% were snoRNAs etc.

ADD COMMENTlink 18 months ago 2405592M • 30
Entering edit mode
0

Hi EagleEye, I've already generated a table in the terminal that looks like the following:

Geneid Ctrl Treated ENSG00000223972 0 0 ENSG00000227232 0 0 ENSG00000278267 0 0

saved as a .txt file. Would I still have to carry out 2) or would I be able to go straight to 3).

ADD REPLYlink 18 months ago
2405592M
• 30
Entering edit mode
0

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment belongs under @EagleEye's answer.

ADD REPLYlink 18 months ago
genomax
68k
Entering edit mode
0

You mean you got matrix like this,

Geneid  Ctrl    Treated         
ENSG00000223972 0       0
ENSG00000227232 0       0 
ENSG00000278267 0       0
ADD REPLYlink 18 months ago
EagleEye
6.4k
Entering edit mode
0

Yes exactly, thats the matrix I've got !

ADD REPLYlink 18 months ago
2405592M
• 30
Entering edit mode
0

Consider this matrix as 'featurecounts.matrix' in the below example. Follow other steps I mentioned.

ADD REPLYlink 18 months ago
EagleEye
6.4k
0
Entering edit mode

If in case you used GTF file as reference annotation,

1) You can just convert the annotation into table format.

Example: C: How do I get the gene annotation for the latest version (GRCh38)?

2) Import you GTF converted table (Geneid GeneSymbol Chromosome Start End Class Strand Length) and your matrix from featurecounts (Geneid sample1expr Sample2expr Sample3expr) into R and use 'merge' by 'Geneid' column.

x <- read.table("featurecounts.matrix", header=T, sep="\t")

annotation <- read.table("annotation.txt", header=T, sep="\t")

featurecounts_annotated <- merge( annotation, x, by='Geneid')

3) Then you can sum the counts in the sample column based on RNA class you are interested in.

Two-step:

### Two-step 1) sum the reads by column class

sample1_countSum <- aggregate(cbind(featurecounts_annotated$sample1expr) ~ Class, data = featurecounts_annotated, sum)

### Two-step 2) calculate percentage

sample1_countSum[,"percentage"] <- ( sample1_countSum$V1/sum( sample1_countSum$V1))*100

Single-step:

sample1_result <- aggregate((cbind(featurecounts_annotated$sample1expr)/sum(featurecounts_annotated$sample1expr))*100 ~ Class, data = featurecounts_annotated, sum)

Final output you will have Class of RNAs with corresponding percentage mapped reads from sample1.

ADD COMMENTlink 18 months ago EagleEye 6.4k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0