Entering edit mode
5.1 years ago
juan.crescente
▴
110
I want to obtain a fastq file containing reads (RNASeq) that failed to map to a reference genome.
hisat2 -p 3 -x index -1 TD1.R1.fastq.gz -2 TD1.R2.fastq.gz -S out.sam --un-conc out.un.sam --summary-file out.info
I obtained that way 4 files (.sam + .info + out.un.1.sam + out.un.2.sam). Those ".un" files seems like the paired end reads that failed to map. The problem is that the bam2fastq scripts out there want only one bam and create two fastq files.
$ bedtools bamtofastq -i aln.qsort.bam \
-fq aln.end1.fq \
-fq2 aln.end2.fq
What can I use to take both .un files and create the two fastq files that I need? thanks!
Update: At the time I do the following
#align against first genome
hisat2 -p 3 -x index/svevo -1 reads/TD1.R1.fastq.gz -2 reads/TD1.R2.fastq.gz -S data/landic_svevo_1.sam --un-conc data/landic_svevo_1.unmapped.sam --summary-file data/landic_svevo_1.sam.info
# unmapped files by hisat2 are broken, error when converting to bam
#get unmapped reads with samtools
samtools view -h -f 4 data/landic_svevo_1.sam > data/landic_svevo_1.un.sam
#convert to bam
samtools view -b -S data/landic_svevo_1.un.sam > data/landic_svevo_1.un.bam
#get fastq from unmapped bam
samtools fastq -1 data/landic_svevo_1.un.1.fastq -2 data/landic_svevo_1.un.2.fastq data/landic_svevo_1.un.bam
#align unmapped against second genome
hisat2 -p 3 -x index/zavitan -1 data/landic_svevo_1.un.1.fastq -2 data/landic_svevo_1.un.2.fastq -S data/landic_un_zavitan_1.sam --un data/landic_un_zavitan_1.un.sam --summary-file data/landic_un_zavitan_1.sam.info
But get the next error with hisat2 at the end:
Error, fewer reads in file specified with -1 than in file specified
with -2 terminate called after throwing an instance of 'int' Aborted
(core dumped)