Biostar Beta. Not for public use.
Parallel for bwa mem - problem with -R argument for ID and SM
0
Entering edit mode
12 months ago
Korsocius • 110

Dear all,

I need help with the Read group ID and SM. I have parallel syntax to alignment via bwa mem.

ls *R1_001.fastq | parallel 'bwa mem -k 19 -A 1 -B 4 -O 6 -L 5 -R '@RG\tID:'{.}'\tSM:'{.}'\tLB:'Trusight_custom_amplicon_CARR'\tPL:'ILLUMINA'\tPI:150' $REFERENCE {} {= s/_R1_001/_R2_001/ =} > {= s/_R1_001.fastq/.sam/ =}'

For the -R argument I would like to get name of sample.. I tried save the name like variable and echo it and so on, but still time doesn't work. Could you help me please to get name of sample to ID and SM ?

Thank you.

ADD COMMENTlink
3
Entering edit mode
12 months ago
France/Nantes/Institut du Thorax - INSE…

second answer, use a Makefile: the same way than in my previous answer C: Is it possible to run variant calling software in parallel, are there any Shell

to run 16 parallel jobs invoke with

make -j 16

ADD COMMENTlink
0
Entering edit mode

Make file looks like the best option for all these process :-), i tried it without quotes, but I received error log : No input for ID.....

ADD REPLYlink
0
Entering edit mode
12 months ago
France/Nantes/Institut du Thorax - INSE…

(not tested) try to remove those single quotes ?

'@RG\tID:'{.}'\tSM:'{.}'\tLB:'Trusight_custom_amplicon_CARR'\tPL:'ILLUMINA'\tPI:150'

to

'@RG\tID:{.}\tSM:{.}\tLB:Trusight_custom_amplicon_CARR\tPL:ILLUMINA\tPI:150'
ADD COMMENTlink
0
Entering edit mode
5 weeks ago
ATpoint 17k
Germany

For parallel jobs it is convenient to wrap the actual job script into a function and use parallel to parallelize this function:

## Want to do parallel alignments:
function BWAMEM {

  BASENAME=$1
  IDX=$2
  bwa mem (options...) -R '@RG\tID:'${BASENAME}'_ID\tSM:'${BASENAME}'_SM\tPL:Illumina' $IDX ${BASENAME}.R1_001.fastq | \
  samtools view -o ${BASENAME}_unsorted.bam

}; export -f BWAMEM

## Call the function within parallel using awk to extract the basename of the files without the ".R1_001.fastq":
ls *.R1_001.fastq | awk -F ".R1_001.fastq" '{print $1}' | parallel "BWAMEM {} /path/to/bwa_index 2> {}.log"

Using 2> {}.log will print all stderr messages from within the function to one log file per processed fastq file. Say you have a file test.R1_001.fastq, you'll get as output test_unsorted.bam and test.log. Using this approach you do not have to bother yourself with quotes within the parallel function. You can do basically everything you want inside the BWAMEM function and parallel only executes it. Try squeezing an awk command with all its single-and double quotes into parallel, it will be...fun. Or simply write it into a wrapper function to avoid the issues ;-)

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1