Question

Using featureCounts Output for DE analysis in edgeR

4

Entering edit mode

5.2 years ago

markm014 ▴ 40

I am getting the following error in R studio when I attempt to import featureCounts count matrices into edgeR for analysis. My question is, do I have to manually modify the output of featureCounts for use in edgeR? Everywhere I look within documentation, it makes it seem as if I can directly load the output of featureCounts into edgeR. If the answer is as easy as switching negative values (I think featureCounts outputs '-1' in the case of no overlaps) to zero, I can handle this but it seems to me as if this is a good way to mess with statistics.

The error is:

Error: Negative counts not allowed

The command I am running to generate this error is:

group = c(0,1,2,3,4,5,3,4,5,3,4,5)
dge = DGEList(counts = 'file_path', group = group)

where the file path listed is the output of featureCounts run on 12 bam files. I ran featureCounts with no issue and about 60% of my reads overlapped features using the following command:

featureCounts -a 'Mus_musculus.GRCm38.95.gtf' -o features_count_all/total_file.count Sample1-1 Sample2-1 Sample3-1 Sample4-1 Sample4-2 Sample4-3 Sample5-1 Sample5-2 Sample5-3 Sample6-1 Sample6-2 Sample6-3

My matrix file looks like this. Do I need to consolidate this to be only a simple raw count number for every matrix spot rather than the comma separated fields?

ENSMUSG00000088159  1   15019040    15019159    -   120 0   0   0   0   0   0   0   0   0   0   0   0
ENSMUSG00000073737  1   15268802    15269797    -   996 2   1   2   4   1   3   2   2   2   6   2   1
ENSMUSG00000092083  1;1;1;1;1   15287254;15312363;15312452;15709485;15709485    15287484;15313030;15313030;15712548;15723750    +;+;+;+;+   15165   3   6   3   0   0   0   2   0   0   1   0   0
ENSMUSG00000102937  1   15364302    15365834    +   1533    0   0   0   0   0   0   0   0   0   0   0   0
ENSMUSG00000104149  1   15556249    15558337    +   2089    0   0   0   0   0   0   0   0   0   0   0   0
ENSMUSG00000088829  1   15685935    15686046    -   112 0   0   0   0   1   0   0   0   0   0   0   0
ENSMUSG00000077377  1   15757832    15757963    +   132 0   0   0   0   0   0   0   0   0   0   0   0
ENSMUSG00000101652  1;1 15760122;15760560   15760263;15760668   -;- 251 0   1   1   2   1   1   2   3   1   3   4

RNA-Seq featureCounts edgeR • 5.8k views

ADD COMMENT • link updated 5.2 years ago by Friederike 8.9k • written 5.2 years ago by markm014 ▴ 40

score 6 · Accepted Answer · 2019-02-25

6

Entering edit mode

5.2 years ago

Friederike 8.9k

FeatureCounts adds a couple of additional gene information to the beginning of the matrix, i.e., it does not only contain the counts. You need to remove the columns containing the gene position, strand and length (e.g. 1 15019040 15019159 - 120) and the GeneIDs should be assigned to row names. If you read in the feature counts results first, this will become clear:

# read in the results (not tested, you may need to play around with read.table parameters)
fc_res <- read.table('file_path', header = T)

# assign row.names
row.names(fc_res) <- fc_res$GeneID

# exclude superfluous columns
fc_res <- fc_res[, -c(1:6)]

ADD COMMENT • link 5.2 years ago by Friederike 8.9k

0

Entering edit mode

Thank you a bunch, this worked like a charm.

For anyone trying to replicate this solution, note that you should not use quotes when dropping columns using '-c'.

ADD REPLY • link 5.2 years ago by markm014 ▴ 40

0

Entering edit mode

lol, I hardly use data.frames these days so I tend to forget the syntax details, thanks for hanging in there -- I've updated the code snippet

ADD REPLY • link 5.2 years ago by Friederike 8.9k

0

Entering edit mode

should it be

fc_res <- fc_res[, -c(2:6)]

because you dont want to remove the geneID column?

ADD REPLY • link 4.8 years ago by n.tear ▴ 80

0

Entering edit mode

The previous line assigns the gene ID as the row.names, so they're not lost

ADD REPLY • link 4.1 years ago by Friederike 8.9k