Question

how to print a specific column condition in multiple files

0

Entering edit mode

6.6 years ago

Ana ▴ 200

I have a bunch of files (=2181;anacovis2_1_summary_betai_reg.out to anacovis2_2181_summary_betai_reg.out) which are the results of my genome scan test, below are a few of the

anacovis2_2_summary_betai_reg.out
anacovis2_3_summary_betai_reg.out
anacovis2_4_summary_betai_reg.out
anacovis2_5_summary_betai_reg.out
anacovis2_6_summary_betai_reg.out
anacovis2_7_summary_betai_reg.out
.
.
.
anacovis2_2179_summary_betai_reg.out
anacovis2_2180_summary_betai_reg.out
anacovis2_2181_summary_betai_reg.out

and each file looks like this:

COVARIABLE MRK M_Pearson SD_Pearson BF(dB) Beta_is SD_Beta_is eBPis
  1       1    -0.03252566     0.19829865    -7.79643409    -0.00109073     0.00858566     0.04628457
  1       2     0.01711174     0.18940858   -10.81988136     0.00048600     0.00852316     0.02021122
  1       3    -0.01249828     0.18648579   -11.20068061    -0.00045063     0.00736952     0.02170898

I want to go through each file (1 to 2181) and if the first column (COVARIABLE) is equal to "1" print only the last column and save it in a separate file (I want to save the output of each file in a seperate file).I know it should be done by something like awk but do not know exactly how! Does anyone has any idea how to do that? Any suggestion is appreciated.

Unix text-processing • 2.4k views

ADD COMMENT • link updated 6.6 years ago by Kevin Blighe 87k • written 6.6 years ago by Ana ▴ 200

0

Entering edit mode

for i in *.out; do echo $i;awk '{if ($1==1)  print $NF}' $i > $(basename $i .out); done

This is a bash script. Assuming that there are no files with .out extension other than those file with .reg.out extension. This script will print last column if first column is equal to 1 and output file name would be same as original file, but without .out extension. For eg. if file is test.reg.out, output file will be test.reg. Make sure that input file has uniform delimiters. I added echo FYI to check if all the files are processed or not.

ADD REPLY • link 6.6 years ago by cpad0112 21k

score 1 · Answer 1 · 2017-09-09

Good evening,

This will print out everything that has '1' in the first column sequentially to your screen. Save it as a shell script (e.g. MyScript.sh) and then run it with sh MyScript.sh

NB - note that you first get a list of all your files and save it to list.list

NB - --maxdepth 1 instructs the find command to only look in the specified directory, i.e., /MyDirectory/

find /MyDirectory/ -maxdepth 1 -name "*_reg.out" > list.list

paste list.list | while read FILE ;
do
        awk 'NR==1 {print}; $1==1 {print}' "${FILE}" ;
done

This does the same as above but only prints the final column (column #8) and also saves each to a new file of the same name with '.new' as the extension:

find /MyDirectory/ -maxdepth 1 -name "*_reg.out" > list.list

paste list.list | while read FILE ;
do
        awk 'NR==1 {print $8}; $1==1 {print $8}' "${FILE}" > "${FILE}".new ;
done

If you don't want the header in the new files, then just remove NR==1 {print $8}; from the awk command

These commands assume that your data is tab-delimited. If your fields are separated by multiple whitespace, for whatever reason, I recommend converting them into tab-delimited fields:

sed 's/\ \+/\t/g' test.txt > test.tsv

...or comma-separated:

sed 's/\ \+/,/g' test.txt > test.csv

Kevin