Similar to this post I want to filter out all the rows that contain zero value at all columns. I have a file with transcript counts for each sample+replicate and it turns out that some transcripts have 0 counts for all samples and replicates, and in other cases only one sample does not have zero counts but all the rest of the samples do, so what I want to do is to filter out:

1) all transcripts where there is zero counts for all samples and replicates

2) all transcripts where there is zero counts for all samples except one (e.g., A and B but not C, A and C but not B, B and C but not A)

For example, input:

A_rep1 | A_rep2 | B_rep1 | B_rep2 | C_rep1 | C_rep2 | |
---|---|---|---|---|---|---|

s1 | 0 | 6 | 5 | 3 | 0 | 9 |

s2 | 66 | 0 | 5 | 32 | 8 | 0 |

s3 | 0 | 0 | 0 | 0 | 0 | 0 |

s4 | 8 | 22 | 0 | 4 | 5 | 5 |

Output of task 1):

A_rep1 | A_rep2 | B_rep1 | B_rep2 | C_rep1 | C_rep2 | |
---|---|---|---|---|---|---|

s1 | 0 | 6 | 5 | 3 | 0 | 9 |

s2 | 66 | 0 | 5 | 32 | 8 | 0 |

s4 | 8 | 22 | 0 | 4 | 5 | 5 |

I've been trying in a number of ways to automate the process instead of doing it manually in Excel buy filtering. So my first attempt was in R. For the first task it works well but then when I need to parse the file to process the other tasks it doesn't work.

data=read.table('genes.counts.matrix', header=T)

set1 <- as.matrix(data[,-1])

row.names(set1)<- data[,1]

all <- apply(set1, 1, function(x) all(x[1:16]==0))

newdata <- set1[!all,]

write.table(newdata, "genes.counts.matrix.modified", sep="\t")

# also my problem here is that the output places the headers from column1 but the headers should go on top of the counts and not start at the transcript column. It looks like this

A_rep1 | A_rep2 | B_rep1 | B_rep2 | C_rep1 | C_rep2 | |
---|---|---|---|---|---|---|

s1 | 0 | 6 | 5 | 3 | 0 | 9 |

Then I tried with a oneliner perl but it is not working

perl -a -nle 'print if "$F[1-16] != 0" ' genes.counts.matrix > genes.counts.matrix.modified

or

perl -a -nle 'print if "$F[1]:$F[16] != 0" ' genes.counts.matrix > genes.counts.matrix.modified

My idea is to filter out first when all rows are equal to zero, next when rows from 1:12 are equal to zero, next when rows 1:4 and 9:16 are equal to zero, next when rows 1:8 and 12:16 are equal to zero and finally when rows 5:16 are equal to zero

This was my attempt in R and it didn't work

all <- apply(set1, 1, function(x) (all(x[1:16]==0) | all(x[4:16]==0) | all(x[1:12]==0) | (all(x[1:4]==0) & all(x[9:16]==0)) | (all(x[1:8]==0) & all(x[12:16]==0))))

newdata <- set1[!all,]

Linu

Yes!! your bracket changes worked!! I still get the headers moved to the left but I am more than satisfied with having the function to work. Thanks a million!!

https://www.biostars.org/u/14211/ I have edited my answer to account for the problem with the header.

This is not working, I have two outcomes:

## This one applies the functions but messes up the headers

## This one returns the file with the header in place but it does not apply the function

Umm, of course it will not apply the function! the order of r commands is incorrect. Apply the function first, then use cbind() and then write it! My bad, I used 'set1' as the name of file to write out instead of 'newdata'.

I have made that change. Just follow the order like I have shown.

Excellent, that worked!! thanks a million