Biostar Beta. Not for public use.
Question: GNU parallel on multiple files to re-header bam files
1
Entering edit mode

Hi All,

The purpose of my question is to generate a new header for a bam files. For one sample, i defined two command lines and its works:

#filter header
samtools view -H file1.bam | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ' > file1_filtered.txt

#re-header
cat file1_filtered.txt <(samtools view file1.bam) | samtools view -hb - > file1_filtered_re-header.bam

I would like to do the same thing for a lot of samples

#filter header
ls *.bam | parallel -j 8 'samtools view -H -@ 7 {} > {.}_filtered.txt'

After this step i didn't find the solution, if someone has an idea? Many thanks in advance.

ADD COMMENTlink 2.0 years ago pierre.justeau • 10
Entering edit mode
1

Keep in mind that working with files in parallel may or may not be faster than doing each independently. If the processing is IO-bound, then running in parallel may not be faster (though not likely slower).

ADD REPLYlink 2.0 years ago
Sean Davis
25k
Entering edit mode
0

Curious if you have done any recent testing with new high performance file systems such as isilon/NetApp (versions with spinning disks and pure SSD's)? At some point one of the connections (pci-e bus) may become the bottleneck.

ADD REPLYlink 23 months ago
genomax
68k
Entering edit mode
0

I have not, but I am pretty sure the conclusion will be the same: It depends, and you should therefore test with different parallelization and measure.

ADD REPLYlink 23 months ago
ole.tange
♦ 3.4k
Entering edit mode
0

Hi,

if it works with the command lines, why not putting these into a bash script and running this bash script with parallel?

Otherwise, you may use a tab-separated file where you have input and output variables stored like here.

Cheers, Michael

ADD REPLYlink 2.0 years ago
michael.ante
♦ 3.3k
Entering edit mode
0

thanks for your help. I solve the issues for my first step "#filtered". I didn't find a solution to run the step "re-header" for a multiple files...

Many thanks for your help, Pierre

ADD REPLYlink 2.0 years ago
pierre.justeau
• 10
Entering edit mode
0

Hi all, Thanks a lot for your help and script, it's run perfectly. I better understand where was my mistake. I just discovered GNU parallel, I'm a bit confused with some syntaxes, i'm gonna to learn more about it.

Thanks again, Pierre

ADD REPLYlink 2.0 years ago
pierre.justeau
• 10
Entering edit mode
0

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted (green check mark). Upvote|Bookmark|Accept

ADD REPLYlink 2.0 years ago
genomax
68k
6
Entering edit mode
doit() {
  file="$1"
  out="$2"
  (samtools view -H $file | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ';
   samtools view $file) | samtools view -hb - > $out
}
export -f doit
parallel doit {} {.}_filtered_re-header.bam ::: *.bam

Or if you want to use the header from the first file for all the rest:

samtools view -H file1.bam | awk '(/^@SQ/ && /chrM/) || (!/^@SQ/) {print $0} ' >header
doit() {
  file="$1"
  out="$2"
  (cat header;
   samtools view $file) | samtools view -hb - > $out
}
export -f doit
parallel doit {} {.}_filtered_re-header.bam ::: *.bam
ADD COMMENTlink 2.0 years ago ole.tange ♦ 3.4k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0