How do you convert featureCounts output into composition of mapped reads by RNA class?
1
1
Entering edit mode
5.7 years ago
2405592M ▴ 140

Hey Guys - New to RNAseq. I used featureCounts to generate a table that has my gene id's and the counts for my untreated and treated samples (did smallRNAseq). I want to be able to convert this data into a summary by RNA class (i.e. what % of these reads are miRNA, snoRNA, rRNA etc). Can someone share how to do this or point me in the right direction. I've read a few things online but it makes no sense to me.

To add to complications... I also have a seperate table of tRNA genes and their counts for untreated vs treated. My end objective is to be able to say x% of reads were tRNAs, x% were miRNA, x% were snoRNAs etc.

RNA-Seq featureCounts DESeq2 • 3.2k views
ADD COMMENT
0
Entering edit mode

Hi EagleEye, I've already generated a table in the terminal that looks like the following:

Geneid Ctrl Treated ENSG00000223972 0 0 ENSG00000227232 0 0 ENSG00000278267 0 0

saved as a .txt file. Would I still have to carry out 2) or would I be able to go straight to 3).

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment belongs under @EagleEye's answer.

ADD REPLY
0
Entering edit mode

You mean you got matrix like this,

Geneid  Ctrl    Treated         
ENSG00000223972 0       0
ENSG00000227232 0       0 
ENSG00000278267 0       0
ADD REPLY
0
Entering edit mode

Yes exactly, thats the matrix I've got !

ADD REPLY
0
Entering edit mode

Consider this matrix as 'featurecounts.matrix' in the below example. Follow other steps I mentioned.

ADD REPLY
0
Entering edit mode
5.7 years ago
EagleEye 7.5k

If in case you used GTF file as reference annotation,

1) You can just convert the annotation into table format.

Example: C: How do I get the gene annotation for the latest version (GRCh38)?

2) Import you GTF converted table (Geneid GeneSymbol Chromosome Start End Class Strand Length) and your matrix from featurecounts (Geneid sample1expr Sample2expr Sample3expr) into R and use 'merge' by 'Geneid' column.

x <- read.table("featurecounts.matrix", header=T, sep="\t")

annotation <- read.table("annotation.txt", header=T, sep="\t")

featurecounts_annotated <- merge( annotation, x, by='Geneid')

3) Then you can sum the counts in the sample column based on RNA class you are interested in.

Two-step:

### Two-step 1) sum the reads by column class

sample1_countSum <- aggregate(cbind(featurecounts_annotated$sample1expr) ~ Class, data = featurecounts_annotated, sum)

### Two-step 2) calculate percentage

sample1_countSum[,"percentage"] <- ( sample1_countSum$V1/sum( sample1_countSum$V1))*100

Single-step:

sample1_result <- aggregate((cbind(featurecounts_annotated$sample1expr)/sum(featurecounts_annotated$sample1expr))*100 ~ Class, data = featurecounts_annotated, sum)

Final output you will have Class of RNAs with corresponding percentage mapped reads from sample1.

ADD COMMENT

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6