Error HISAT2 output for HTseq-count
0
0
Entering edit mode
6.8 years ago

Dear all,

I am analyzing paired-end stranded RNAseq data obtained from parasite cells. I would be interested in using HISAT2 for the alignment, and HTseq-count to count the features of the aligned reads.

Except for the intron min/max size, I used the default parameters for HISAT2. After getting the sam file from the alignment (therefore containing both mapped and unmapped reads), I converted sam file to bam and sorted the reads by name using Picard tool:

java -jar ~/Documents/Softwares/picard.jar AddOrReplaceReadGroups INPUT=$FILE/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits.sam OUTPUT=$FILE/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam SORT_ORDER=queryname RGID=1 RGLB=Pberghei RGPL=Illumina RGPU=NextSeq RGSM=Pber-GFP RGCN=IPP

Next, I just used HTseq-count to count the reads:

htseq-count -f bam -r name -s reverse -a 10 -t exon -i Parent -m union $FILE/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam $FILE/Genome/PlasmoDB-33_PbergheiANKA.gff > $FILE/TopHat_vs_Hisat/Hisat/Plasmodium/counts_hisat.txt

However, I got this error:

38235 GFF lines processed.
Error occured when processing SAM input (record #194 in file /Volumes/@home/Plasmodium_strains_RNAseq/Raw_WEHI/Plasmo-SPZ-RNAseq/tests/Aligned/TopHat/Pber1/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam):
  my_showwarning() takes from 3 to 5 positional arguments but 6 were given
  [Exception type: TypeError, raised in warnings.py:99]

I tried to sort the reads from sam file with samtools sort and also got this error. I also tried to remove the unmapped reads from the HISAT2 output (reads with mapq = 0) and it didn't work better. Finally, I used TopHat to align the reads and used the same pipeline (with Picard tools) and here, it works and counted correctly my features.

Does anyone has an idea of the problem with the reads aligned with HISAT2? Are they compatible with HTseq-count?

Thank you very much for any help.

Nicolas

RNA-Seq HISAT2 HTseq-count • 3.4k views
ADD COMMENT
0
Entering edit mode

I converted sam file to bam and sorted the reads by name using Picard tool

If that was sorted on read names then you would need to co-ordinate sort the file instead. Use featureCounts instead of HTSeq-count. It can sort the BAM files as needed during counting.

ADD REPLY
1
Entering edit mode

I'd also recommend STAR for mapping and read summarization. In my experience it is faster and more accurate than HISAT, and produces the same count table as htseq in union mode.

ADD REPLY
0
Entering edit mode

Thank you for your replies. I tested by coordinate sorting but also got an error. I found in others topics that paired-end reads would rather require a name sorting for HTseq-count.

I am currently reading the manual for featureCounts (and, based on what I read before, really consider to use it rather than HTseq). But I was also interested in understanding the origin of this problem with HTseq-count, if someone also encountered it before.

I will also have a look to STAR. Thanks

ADD REPLY
0
Entering edit mode

Did you look at the record marked to see what, if anything, was wrong with it?

194 in file /Volumes/@home/Plasmodium_strains_RNAseq/Raw_WEHI/Plasmo-SPZ-RNAseq/tests/Aligned/TopHat/Pber1/TopHat_vs_Hisat/Hisat/Plasmodium/accepted_hits_name.bam

ADD REPLY
0
Entering edit mode

Yes, I checked directly on the bam file, including or excluding the header lines (I don't know if it was counted as rows by HTseq-count). In both cases, it is just a read, without anything special. In both cases, it has a mapq = 1 (multiple alignments), so it should not be included in the count (because -a 10 for HTseq-count). But it was not the first read with multiple alignments in the file, so I don't know why it stopped at this one.

ADD REPLY

Login before adding your answer.

Traffic: 1545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6