Biostar Beta. Not for public use.
GNU parallel on multiple files to re-header bam files
1
Entering edit mode
2.2 years ago

Hi All,

The purpose of my question is to generate a new header for a bam files. For one sample, i defined two command lines and its works:

#filter header
samtools view -H file1.bam | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ' > file1_filtered.txt

#re-header
cat file1_filtered.txt <(samtools view file1.bam) | samtools view -hb - > file1_filtered_re-header.bam

I would like to do the same thing for a lot of samples

#filter header
ls *.bam | parallel -j 8 'samtools view -H -@ 7 {} > {.}_filtered.txt'

After this step i didn't find the solution, if someone has an idea? Many thanks in advance.

GNU parallel • 838 views
ADD COMMENTlink
1
Entering edit mode

Keep in mind that working with files in parallel may or may not be faster than doing each independently. If the processing is IO-bound, then running in parallel may not be faster (though not likely slower).

ADD REPLYlink
0
Entering edit mode

Curious if you have done any recent testing with new high performance file systems such as isilon/NetApp (versions with spinning disks and pure SSD's)? At some point one of the connections (pci-e bus) may become the bottleneck.

ADD REPLYlink
0
Entering edit mode

I have not, but I am pretty sure the conclusion will be the same: It depends, and you should therefore test with different parallelization and measure.

ADD REPLYlink
0
Entering edit mode

Hi,

if it works with the command lines, why not putting these into a bash script and running this bash script with parallel?

Otherwise, you may use a tab-separated file where you have input and output variables stored like here.

Cheers, Michael

ADD REPLYlink
0
Entering edit mode

thanks for your help. I solve the issues for my first step "#filtered". I didn't find a solution to run the step "re-header" for a multiple files...

Many thanks for your help, Pierre

ADD REPLYlink
0
Entering edit mode

Hi all, Thanks a lot for your help and script, it's run perfectly. I better understand where was my mistake. I just discovered GNU parallel, I'm a bit confused with some syntaxes, i'm gonna to learn more about it.

Thanks again, Pierre

ADD REPLYlink
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted (green check mark). Upvote|Bookmark|Accept

ADD REPLYlink
6
Entering edit mode
17 months ago
ole.tange ♦ 3.4k
Denmark
doit() {
  file="$1"
  out="$2"
  (samtools view -H $file | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ';
   samtools view $file) | samtools view -hb - > $out
}
export -f doit
parallel doit {} {.}_filtered_re-header.bam ::: *.bam

Or if you want to use the header from the first file for all the rest:

samtools view -H file1.bam | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ' >header
doit() {
  file="$1"
  out="$2"
  (cat header;
   samtools view $file) | samtools view -hb - > $out
}
export -f doit
parallel doit {} {.}_filtered_re-header.bam ::: *.bam
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1