filter data based on values
2
0
Entering edit mode
5.8 years ago
kanwarjag ★ 1.2k

I have a very simple question, but sometime one get stuck with simple problems. I have a data file with 500 rows and 2000 columns. I want to filter and subset this data based on if cells have value more than or equal to 2. I know filter of columns is easy. but how can I subset this data retaining the original header information.

Thanks

general • 1.2k views
ADD COMMENT
1
Entering edit mode

Need some more information. If you find one cell in a row is less than 2, do you want to remove the entire row? Same question for the columns as well.

ADD REPLY
0
Entering edit mode
5.8 years ago

If you need to preserve the header:

$ head -1 input.mtx > header.txt
$ awk -vOFS="\t" '{ flag = 1; for (i = 1; i <= NF; i++) { if ($i < 2) { flag = 0; break; } } if (flag == 1) { print $0 } }' <(tail -n+2 input.mtx) > output.woHeader.txt
$ cat header.txt output.woHeader.txt > output.mtx
ADD COMMENT
0
Entering edit mode
5.8 years ago
goodez ▴ 640

Using R this will only keep rows where all values are greater than or equal to 2.

index <- data >= 2
index <- apply(index, 1, all)

data <- data[index,]
ADD COMMENT

Login before adding your answer.

Traffic: 3149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6