Alignment of single-ended library distributed among different lanes
1
0
Entering edit mode
5.1 years ago

I want to perform alignment using Hisat2 for single-ended thousands of samples and each sample distributed among different libraries.

I have modified this script (https://www.biostars.org/p/223404/#224169):

#!/bin/bash
for f in `ls data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/*.fastq.bz2_trimmed.fq.gz | sed 's/.fastq.bz2_trimmed.fq.gz//g' | sort -u`
do
echo HISAT/hisat2-2.1.0/hisat2 --p 8 --min-intronlen 60 --max-intronlen 6000 --dta -x Hisat2_index/arabidopsis -U ${f}.fastq.bz2_trimmed.fq.gz -S ${f}.bam
done

It gives me echo as:

HISAT/hisat2-2.1.0/hisat2 --p 8 --min-intronlen 60 --max-intronlen 6000 --dta -x Hisat2_index/arabidopsis -U data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460433.fastq.bz2_trimmed.fq.gz -S data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460433.bam
HISAT/hisat2-2.1.0/hisat2 --p 8 --min-intronlen 60 --max-intronlen 6000 --dta -x Hisat2_index/arabidopsis -U data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460434.fastq.bz2_trimmed.fq.gz -S data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460434.bam
HISAT/hisat2-2.1.0/hisat2 --p 8 --min-intronlen 60 --max-intronlen 6000 --dta -x Hisat2_index/arabidopsis -U data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460435.fastq.bz2_trimmed.fq.gz -S data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460435.bam
HISAT/hisat2-2.1.0/hisat2 --p 8 --min-intronlen 60 --max-intronlen 6000 --dta -x Hisat2_index/arabidopsis -U data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460436.fastq.bz2_trimmed.fq.gz -S data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460436.bam

As this is the same sample (Alst-1_6989) is distributed among different lanes (SRR3460433,SRR3460434,SRR3460435,SRR3460436) which should be joined by comma not as a separate command as follows and I want the name of the sample (Alst-1_6989) in the output file (Alst-1_6989.bam), currently its name of the distributed library. Its just one example I have thousands of sample with a variable number of distributed library, so we need to keep this thing in mind.

HISAT/hisat2-2.1.0/hisat2 --p 8 --min-intronlen 60 --max-intronlen 6000 --dta -x Hisat2_index/arabidopsis -U data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460433.fastq.bz2_trimmed.fq.gz,data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460434.fastq.bz2_trimmed.fq.gz,data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460435.fastq.bz2_trimmed.fq.gz,data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460436.fastq.bz2_trimmed.fq.gz -S data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/Alst-1_6989.bam

Any help will be highly appericated.

RNA-Seq alignment • 1.4k views
ADD COMMENT
1
Entering edit mode
5.1 years ago
ATpoint 81k

I would make a layout file layout.txt like

SRR3460433  lib1
SRR3460434  lib1
SRR3460435  lib1
SRR3460436  lib1
SRR3460437  lib2
SRR3460438  lib2
SRR3460439  lib3
SRR3460440  lib4

and then use something like this:

cut -f2 layout.txt | sort -u | \
  while read p
    do
    grep "$p" layout.txt | \
    cut -f1 | \
    awk '{print $1".fastq.gz"}' | \
    xargs cat | \
    hisat2 (options) -U - | \
    samtools view -bo ${p}.bam
    done < /dev/stdin

Edit: Added double quotes to $p in the grep part

This will take every lib-identifier from the file, cat the respective sequencing runs together and stream them into hisat2, outputting bam files like lib1.bam, lib2.bam.... Only thing you might have to change is the suffix in the awk command that is appended to the SRR identifier depending on how you named your files.

ADD COMMENT
0
Entering edit mode

I have made a layout.txt as:

data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460433  Alst-1_6989
data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460434  Alst-1_6989
data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460435  Alst-1_6989
data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460436  Alst-1_6989
data/1547_2015/Accessions/Bch-1_7028/transcriptome/fastq/trim/SRR3460454   Bch-1_7028
data/1547_2015/Accessions/Bch-1_7028/transcriptome/fastq/trim/SRR3460455   Bch-1_7028
data/1547_2015/Accessions/Bch-1_7028/transcriptome/fastq/trim/SRR3460456   Bch-1_7028
data/1547_2015/Accessions/Bch-1_7028/transcriptome/fastq/trim/SRR3460457   Bch-1_7028
data/1547_2015/Accessions/Bch-1_7028/transcriptome/fastq/trim/SRR3460458   Bch-1_7028
data/1547_2015/Accessions/Bch-1_7028/transcriptome/fastq/trim/SRR3460459   Bch-1_7028

Modified script:

#!/bin/bash
cut -f2 layout.txt | sort -u | \
  while read p
    do
    grep $p layout.txt | \
    cut -f1 | \
    awk '{print $1".fastq.bz2_trimmed.fq.gz"}' | \
    xargs cat | \
    HISAT/hisat2-2.1.0/hisat2 --min-intronlen 60 --max-intronlen 6000 --dta -x Hisat2_index/arabidopsis -U - | \
    samtools view -bo ${p}.bam
    done < /dev/stdin

Error:

grep: Alst-1_6989: No such file or directory
cat: 'layout.txt:data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460433.fastq.bz2_trimmed.fq.gz': No such file or directory
[E::hts_open_format] Failed to open file Alst-1_6989.bam
samtools view: failed to open "Alst-1_6989.bam" for reading: No such file or directory
0 reads
0.00% overall alignment rate
(ERR): hisat2-align exited with value 141
grep: Alst-1_6989: No such file or directory
cat: 'layout.txt:data/1547_2015/Accessions/Alst-1_6989/transcriptome/fastq/trim/SRR3460434.fastq.bz2_trimmed.fq.gz': No such file or directory
[E::hts_open_format] Failed to open file Alst-1_6989.bam
samtools view: failed to open "Alst-1_6989.bam" for reading: No such file or directory
0 reads
0.00% overall alignment rate
(ERR): hisat2-align exited with value 141

ATpoint Can you please help with it?

ADD REPLY
0
Entering edit mode
grep "$p" layout.txt

should do it, so double quotes around the $p.

ADD REPLY
0
Entering edit mode

ATpoint I got this error now, does that mean we need to adjust .bam file in the code?

[E::hts_open_format] Failed to open file Alst-1_6989.bam
samtools view: failed to open "Alst-1_6989.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Alst-1_6989.bam
samtools view: failed to open "Alst-1_6989.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Alst-1_6989.bam
samtools view: failed to open "Alst-1_6989.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Alst-1_6989.bam
samtools view: failed to open "Alst-1_6989.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Bch-1_7028.bam
samtools view: failed to open "Bch-1_7028.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Bch-1_7028.bam
samtools view: failed to open "Bch-1_7028.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Bch-1_7028.bam
samtools view: failed to open "Bch-1_7028.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Bch-1_7028.bam
samtools view: failed to open "Bch-1_7028.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Bch-1_7028.bam
samtools view: failed to open "Bch-1_7028.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Bch-1_7028.bam
samtools view: failed to open "Bch-1_7028.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
ADD REPLY
0
Entering edit mode

Output of samtools --version?

ADD REPLY
0
Entering edit mode

ATpoint the previous version of samtools was 1.7 but now i updated to latest version (1.9) but still getting this error:

[E::hts_open_format] Failed to open file Alst-1_6989.bam
samtools view: failed to open "Alst-1_6989.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
[E::hts_open_format] Failed to open file Alst-1_6989.bam
samtools view: failed to open "Alst-1_6989.bam" for reading: No such file or directory
Error: reads file does not look like a FASTQ file
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): hisat2-align exited with value 134
xargs: cat: terminated by signal 13
ADD REPLY
0
Entering edit mode

The error means basically that no data arrive at samtools, so there is an issue with the alignment. The commands work fine for me, maybe something with your files not being properly catted. Difficult to tell without being in front of your computer, sorry. Maybe try to put all the relevant fastq files into one folder, then cd into that folder and run the script from there.

ADD REPLY
0
Entering edit mode

ATpoint I am trying to understand the script, can you please comment so that I can understand it properly. Normally we pass single ended fastq files seperated by comma, but I didn't see anything after -U - , can you please also explain this thing?

ADD REPLY
0
Entering edit mode

The - indicates reading from stdin. The cat command concatenated the individual fastq files that are then passed over to hisat (that is called a Unix pipe, stdin/stdout). This is a simplified version of dealing with multiple lane replicates (well at least if it works... :-D ) avoiding any comma-separated lists. I frequently use these kinds of scripts and have good experience with them.

ADD REPLY
0
Entering edit mode

But look, if it doesn't work for you skip my suggestion and try something else. You need a certain experience with these pipes and nested scripts to properly handle them. It is only one of many options to solve your task. Just try something else ;-) The simplest thing would be to cat together the individual runs into one fastq file, save that to disk and then align as you did with the individual runs.

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6