Biostar Beta. Not for public use.
samtools piping with awk/ bash commands -> Wonky things happen!!!
0
Entering edit mode
13 months ago
USA SoCal

My problem is so weird.

I'm using samtools with two windows like

$ samtools view in.bam 1:1000-2000 1:3000-4000

And it will print out paired ends with the first mate mapping to the first window and second window

When I do this

$ samtools view in.bam 1:1000-2000 1:3000-4000 | sort 

or this

$ samtools view in.bam 1:1000-2000 1:3000-4000 | sort | uniq 

It only prints out the first window?

When I do this

$ samtools view in.bam 1:1000-2000 1:3000-4000 | awk '{ if($7 == "=") { print $0 }}'

It will print the first window as such

HS2000-123    145   1   1005   60   100M   ...

and the second ...

HS2000-456 145 1 3509 60 100M ...

The first is tab delimited and the second is space delimited

When I pipe on sort | uniq to the awk command I only return the first window.

Am I going crazy? It feels like it. I could fix this is perl, but want to know why.

What the phread is going on here?

ADD COMMENTlink
4
Entering edit mode

Do you see the same behavior if you run \samtools instead of samtools?

If not, maybe some pre-defined aliases are disrupting your workflow.

ADD REPLYlink
1
Entering edit mode

This is very good suggestion.

ADD REPLYlink
1
Entering edit mode

Thank you :)

ADD REPLYlink
0
Entering edit mode

Thank you!

This did work, however I'm still getting this weird delimiter issue where some reads have tabs and others have spaces

ADD REPLYlink
0
Entering edit mode

So, this processes both files and writes out all windows but you still see the delimiter issue, is it? (Just clearing stuff up)

ADD REPLYlink
2
Entering edit mode

I can't reproduce this. I get both intervals regardless of piping

ADD REPLYlink
2
Entering edit mode

put it into a file and process the file - then you can test what happens - is it the samtools or the other commands that misbehave - you should not fix it in perl ... this should work on its own

it is alway the simplest of mistakes that produce the most mindbendingly mysterious errors

ADD REPLYlink
0
Entering edit mode

The point of me piping commands is so I can do everything without writing to files. I have over 1.5 million regions in 250 genomes to look for mate !

ADD REPLYlink
1
Entering edit mode

The point of writing to files here is to drill down and figure out what's wrong with the pipeline, so you can correct it and get the intended use out of the pipeline. That's how testing works.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1