Biostar Beta. Not for public use.
Struggle with awk command
2
Entering edit mode
13 months ago
Limoges, CBRS, France

Hello, I'm fighting with my awk command since yesterday.

I have a file (locus.txt), this is some IgH locus from mm10 (I don't have header but you have : chr, start, end, strand and name_of_the_locus, separated by tab)

chr12   113363298   113365156   -   gamma3
chr12   113330756   113338695   -   gamma1
chr12   113308036   113314227   -   gamma2b
chr12   113274557   113277035   -   gammaepsilon
chr12   113260153   113264625   -   alpha
chr12   113289248   113295541   -   gamma2a
chr12   113423027   113426701   -   muIgh
chr12   113225832   113255223   -   3'RR
chr12   113416247   113418358   -   IgD

What I want to do is to grab the minimum position in this file, so the minimum position in start column (second column : 113225832, for 3'RR)

Then, I want to substract all my position with this minimum and rearrange the file like this

gamma3  137466   139324
gamma1  104924   112863
...etc

What I have tried so far

Search for minimum value, saved in $min :

min=`awk -v min=1000000000 '{if($2<min){min=$2}}END{print min}' locus.txt`

Then substract position and rearrange the file :

awk -F $'\t' '{$1=$4=""; print $5"\t"$2-$min"\t"$3-$min}' locus.txt

But I got this :

gamma3  0   1858
gamma1  0   7939
gamma2b 0   6191
gammaepsilon    0   2478
alpha   0   4472
gamma2a 0   6293
muIgh   0   3674
3'RR    0   29391
IgD 0   2111

The only correct result is 29391 for 3'RR

Seems not like a complex problem but I can't find a way out of this...

I bet on a casting problem but i'm not even sure. Thanks for your help !

awk • 347 views
ADD COMMENTlink
3
Entering edit mode
6 weeks ago
ATpoint 17k
Germany
## First get the minimum:
MIN=$(bc <<< $(sort -k2,2n in.file | awk 'NR == 1 {print $2}'))

## Then subtract and rearrange:
awk -v min=$MIN 'OFS="\t" {print $5, $2-min, $3-min}' in.file > out.rearranged
ADD COMMENTlink
0
Entering edit mode

Thanks, could you explain the bc <<< please

ADD REPLYlink
1
Entering edit mode

bc is a basic calculator in bash, allowing to deal with floating point numbers, e.g. bc <<< "scale=9;100/4545454" gives you the division result by nine digits. Admittedly not necessary in your situation, simply MIN=$(sort -k2,2n in.file | awk 'NR == 1 {print $2}') will do just fine, but I got used to always include that bc thing.

ADD REPLYlink
0
Entering edit mode

outside awk, min variable can be had from: min=$(datamash min 2 < test.txt)

ADD REPLYlink
2
Entering edit mode
13 months ago
WCIP | Glasgow | UK

To pass a variable to awk you can use the -v option like this (not tested):

awk -v min=$min -F $'\t' '{$1=$4=""; print $5 "\t" $2 - min "\t" $3 - min}' locus.txt
ADD COMMENTlink
0
Entering edit mode

Throught this was intuitive, thanks

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1