I am getting the following error in R studio when I attempt to import featureCounts count matrices into edgeR for analysis. My question is, do I have to manually modify the output of featureCounts for use in edgeR? Everywhere I look within documentation, it makes it seem as if I can directly load the output of featureCounts into edgeR. If the answer is as easy as switching negative values (I think featureCounts outputs '-1' in the case of no overlaps) to zero, I can handle this but it seems to me as if this is a good way to mess with statistics.
The error is:
Error: Negative counts not allowed
The command I am running to generate this error is:
group = c(0,1,2,3,4,5,3,4,5,3,4,5)
dge = DGEList(counts = 'file_path', group = group)
where the file path listed is the output of featureCounts run on 12 bam files. I ran featureCounts with no issue and about 60% of my reads overlapped features using the following command:
featureCounts -a 'Mus_musculus.GRCm38.95.gtf' -o features_count_all/total_file.count Sample1-1 Sample2-1 Sample3-1 Sample4-1 Sample4-2 Sample4-3 Sample5-1 Sample5-2 Sample5-3 Sample6-1 Sample6-2 Sample6-3
My matrix file looks like this. Do I need to consolidate this to be only a simple raw count number for every matrix spot rather than the comma separated fields?
ENSMUSG00000088159 1 15019040 15019159 - 120 0 0 0 0 0 0 0 0 0 0 0 0
ENSMUSG00000073737 1 15268802 15269797 - 996 2 1 2 4 1 3 2 2 2 6 2 1
ENSMUSG00000092083 1;1;1;1;1 15287254;15312363;15312452;15709485;15709485 15287484;15313030;15313030;15712548;15723750 +;+;+;+;+ 15165 3 6 3 0 0 0 2 0 0 1 0 0
ENSMUSG00000102937 1 15364302 15365834 + 1533 0 0 0 0 0 0 0 0 0 0 0 0
ENSMUSG00000104149 1 15556249 15558337 + 2089 0 0 0 0 0 0 0 0 0 0 0 0
ENSMUSG00000088829 1 15685935 15686046 - 112 0 0 0 0 1 0 0 0 0 0 0 0
ENSMUSG00000077377 1 15757832 15757963 + 132 0 0 0 0 0 0 0 0 0 0 0 0
ENSMUSG00000101652 1;1 15760122;15760560 15760263;15760668 -;- 251 0 1 1 2 1 1 2 3 1 3 4
Thank you a bunch, this worked like a charm.
For anyone trying to replicate this solution, note that you should not use quotes when dropping columns using '-c'.
lol, I hardly use data.frames these days so I tend to forget the syntax details, thanks for hanging in there -- I've updated the code snippet
should it be
because you dont want to remove the geneID column?
The previous line assigns the gene ID as the row.names, so they're not lost