Biostar Beta. Not for public use.
How do I replace a value used in script with a range of values in a file?
0
Entering edit mode
4.4 years ago
Italy

I recently enquired about an AWK script to keep summing a column until it reaches a certain value then print that line.

The Awk script I got from Alex Reynolds was very hepful and is show below

$awk '\ BEGIN { \ s = 0; \ } \ { \ s +=$4; \
if (s >= 100) { \
print $0; \ exit; \ } \ }' chr1.bedgraph  I would use this on chr1.bedgraph chr1.bedgraph chr1 1000 2000 25 chr1 2000 3000 50 chr1 3000 4000 25 chr1 4000 5000 30  And the awk script would print the line were the sum of$4 is reaches 100

chr1 3000 4000 25


I now want to replace "100" in the line "if(s >=100)" with every value in values.txt (apart from $1) values.txt sample1_chr1 200 50 90 sample2_chr1 300 60 40 sample3_chr1 400 20 40  So the script would essentially use the numbers values.txt from line1$2, $3,$4 then move on to line2 and line3 and so on.

So that the output would print the line from chr1.bedgraph that when it reaches 200 then below that, 50, then 90....the it would print the lines when it reaches 300, 60 and 40......

Any thoughts? I'm not a very experienced programmer and I have been trying to do this for a while now.

Many thanks

0
Entering edit mode

If it gets slightly more complicated (like your question now) I guess it becomes time to move away from awk to e.g. python. It's not completely clear what you try to accomplish (and with which purpose).

0
Entering edit mode

You are right - I feel I am pushing the limits of awk and its probably time to move over to python. My apologies for not being more clear. The example I had shown wasn't very well put together. I am basically obtaining "median" values over large peak domains in ChIP-seq with the intention to show movement these genetic loci between individuals(sample1, sample2, sample3). I obtain a total read count across the domain and extract the point where the median read count value lies. Think of it as a centre of gravity value(point at which 50% of the reads lies). I then want to find the 5% and 95% values. My values.txt file contains 4 columns - sample_chr, 5%, 50%, 95%. If i had a script that would use my 5%, 50% & 95% read count values to scan my pre defined domains (like the awk script was doing) it would make the processing a lot faster...Is this a little more clear?

Note: my values.txt file does not represent the real values so that would also make things more confusing.

3
Entering edit mode
2.8 years ago
tomc • 80
United States

You can pass your script an arbitrary variable say 'LIMIT' with

script.awk -v "LIMIT=73" chr1.bedgraph

then inside the script use

if(s >=LIMIT)

To get the sequence of limits from your second file (values.txt),
filter the first column out (assuming tab separated otherwise specify your -d delimiter)

cut -f 2,3,4 values.txt

As @decosterwouter says it is not clear what you are hoping to achieve or how the values would be applied as limits nor what range of your first file, the limit would apply.

to apply each of the values to all of the first bedgraph file

for v in \$(cut -f 2,3,4 values.txt) ; do script.awk -v "LIMIT=4{V}" chr1.bedgraph; done

Hope that get you closer, but if you are doing something more complicated please explain it better and be ready to consider python or other language of your choice