Grep A Pattern From File
1
1
Entering edit mode
10.9 years ago

I am trying to use grep to pull out express values.

I run eXpress and I have my .xprs tab separated value file which looks like this:

bundle_id   target_id   length  eff_length  tot_counts  uniq_counts est_counts  eff_counts  ambig_distr_alpha   ambig_distr_beta    fpkm    fpkm_conf_low   fpkm_conf_high  solvable
1   Contig14365 310 106.787904  85  85  85.000000   246.750792  0.000000e+00    0.000000e+00    147.370523  147.370523  147.370523  T
2   Singlet_45262   346 232.432874  109 37  89.933541   133.875234  1.998601e+00    7.198885e-01    71.637085   51.273440   92.000730   T
2   Singlet_68764   236 119.092916  74  2   21.066459   41.746263   6.254955e+00    1.736541e+01    32.750608   0.142967    65.358248   T
3   Contig1270  736 500.694431  50  0   0.125252    0.184116    1.000000e+00    1.000000e+00    0.046316    0.000000    0.759071    F
3   Contig1271  851 628.717767  57  9   43.657462   59.092492   4.701649e-01    1.810055e-01    12.856315   4.051524    21.661106   T
3   Singlet_69558   790 555.880836  50  0   15.217286   21.626318   1.000000e+00    1.000000e+00    5.068381    0.000000    12.670313   F

I want to get non-codingRNA-specific express values so I thought to use:

grep -f <list of ncRNAs contigs> <express file>

I made a file with ncRNAs contigs IDs which looks like this:

Singlet_51268
Singlet_63946
Singlet_70630
Singlet_72272
Singlet_60543
Contig11105
Singlet_18043
Singlet_64779
Singlet_50335
Singlet_39678
Singlet_21655
Singlet_5438
Singlet_6400
Contig4197
Singlet_17193
Singlet_55710
Singlet_70948
Singlet_25172
Singlet_65515
Singlet_30239
Singlet_54617
Singlet_11188
Contig14540

Since my ncRNAs are 577, I expect to end up with a .xprs file with 577 rows but I ended up with a .xprs file of 701 Contigs.

So I have 124 Contigs that do not correspond to my ncRNAs.

How could I pull out ncRNAs-specific values? I tried playing around with grep but I can't fix it.

Any suggestions?

THanks

command-line • 2.8k views
ADD COMMENT
8
Entering edit mode
10.9 years ago
rbagnall ★ 1.8k

I think you need to add -w (grep word).

Without this, grepping Singlet_51268 will also pull out Singlet_512681, Singlet_512682, Singlet_512683 etc..

try:

grep -w -f <list of ncRNAs contigs> <express file>

ADD COMMENT
0
Entering edit mode

it worked perfectly. Thankls for the right and fast answer!

ADD REPLY
0
Entering edit mode

Please use grep -wFf <list>. It will be much faster given a long list.

ADD REPLY

Login before adding your answer.

Traffic: 2239 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6