hisat2 and RSEM pipeline
1
0
Entering edit mode
5.5 years ago
Shahzad ▴ 30

Hello All I am using hisat2 for alignment and RSEM for estimated counts as suggest by a previous post on this site. I need some guidance regarding 1> Does RSEM accepts hisat2 generated transcriptome indexes?

2> rsem-calculate-expression can be used alone without rsem-prepare-reference step and BAM from hisat can be accepted in RSEM count matrix.

I tried for my Q1 here and had error that RSEM was unable to find indexes

reference indexing showed following error. (rsem)> rsem-prepare-reference RNASeq_transcriptome.fa ref rsem-synthesis-reference-transcripts 0 0RNASeq_transcriptome.fa (ASCII code 13), at line 2, position 61!scriptome.fa contains an unknown character, "rsem-synthesis-reference-transcripts ref 0 0 RNASeq_transcriptome.fa" failed! Plase check if you provide correct parameters/options for the pipeline! have used same transcriptome to build hisat2 index successfully

My RNA-seq experiment is 2 Genotypes1control, genotype1treated, Genotypes2control, genotype2treated, >3 replicates-PE each i.e 12 samples. I have reference transcriptome but no GTF/gff files available.

will be very thankful for any suggestion regarding another pipeline or current problem. thank you

rna-seq software error hisat2 gtf RSEM • 6.0k views
ADD COMMENT
1
Entering edit mode
5.5 years ago
EagleEye 7.5k

This is the way I could integrate HISAT2 with RSEM,

RSEM pre-preparation:

cat Homo_sapiens.GRCh38.90.chr.gtf | grep -v "^#" | grep -w "transcript" | cut -f9 | sed 's/;/\t/g' | cut -f1,3 | sed 's/gene_id \"//' | sed 's/transcript_id \"//' | sed 's/\"//g' | awk '!x[$0]++' > transcript_gene_map.txt

RSEM indexing:

rsem-prepare-reference --star --star-path STAR_PATH/ --num-threads 8 --gtf Homo_sapiens.GRCh38.90.chr.gtf --transcript-to-gene-map transcript_gene_map.txt Homo_sapiens.GRCh38.dna.primary_assembly.fa rsem_index/hg38

HISAT2 pre-preparation SS:

hisat2_extract_splice_sites.py Homo_sapiens.GRCh38.90.chr.gtf > Homo_sapiens.GRCh38.90_spliceSites.txt

HISAT2 pre-preparation Exons:

hisat2_extract_exons.py Homo_sapiens.GRCh38.90.chr.gtf > Homo_sapiens.GRCh38.90_Exons.txt

HISAT2 indexing:

hisat2-build -p 32 --ss Homo_sapiens.GRCh38.90_spliceSites.txt --exon Homo_sapiens.GRCh38.90_Exons.txt Homo_sapiens.GRCh38.dna.primary_assembly.fa HISAT2_index/hg38

HISAT2 alignment:

hisat2 --dta --time --summary-file $outpath/${outname}_alignment_summary.txt --avoid-pseudogene --known-splicesite-infile Homo_sapiens.GRCh38.90_spliceSites.txt --phred33 -p 8 --qc-filter -x $ht2_index -U $inpath/${outname}.fastq.gz | samtools view -bS -o $outpath/${outname}.bam -

RSEM quantification:

rsem-calculate-expression --time --num-threads 2 --phred33-quals --alignments --no-bam-output $inpath/${outname}_sorted.bam $rsemIndex $outpath/${outname}
ADD COMMENT
0
Entering edit mode

Oops sorry that didn't work. I just checked output from this pipeline, I never succeeded in integrating HISAT2 + RSEM. Looks like RSEM cannot handle gapped alignments.

ADD REPLY
0
Entering edit mode

thank you for confirming. is there any suggestion about which pipeline will be appropriate. i have transcriptome.fa but no gtf file. RNAseq data from plant is paired end 4 conditions with 3 bioreps.

ADD REPLY

Login before adding your answer.

Traffic: 2898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6