Question

RNA-seq alignment rate is too low.

0

Entering edit mode

5.4 years ago

zhul09 • 0

Hello

Recently, I have read a paper (Tiedt, S., et al. (2017). RNA-seq identifies circulating miR-125a-5p, miR-125b-5p and miR-143-3p as potential biomarkers for acute Ischemic stroke. Circulation research, CIRCRESAHA-117). Some detail of this paper is followed: PMID: 28724745 DOI: 10.1161/CIRCRESAHA.117.311572 Pubmed GEO database: SRA: SRP133275

I wanted to get the expression matrix of miRNA after stroke in human circulating blood. I got these files (SRA format) from Pubmed GEO database. Trimmed them with Trimmomatic software, and used the Hisat2 software to align the reads to the genome. However, the alignment is too low as followed.

6644136 reads; of these:
  6644136 (100.00%) were unpaired; of these:
    6631500 (99.81%) aligned 0 times
    2981 (0.04%) aligned exactly 1 time
    9655 (0.15%) aligned >1 times
    0.19% overall alignment rate

Here is the shell script:

hisat2 -p 4 --dta -x ./indexes/genome_tran -U ./samples/ SRR6761159.fastq -S ./temp/ SRR6761159.sam

The indexes file is “genome_tran.[1-8].ht2”.

The alignment is too low. Does anyone have any suggestions on how to address this problem? Thank you.

rna-seq alignment assembly • 3.3k views

ADD COMMENT • link updated 5.4 years ago by ATpoint 82k • written 5.4 years ago by zhul09 • 0

2

Entering edit mode

If this is miRNA data then you should not be using HISAT2 for alignments. You would want ungapped alignments and bowtie v.1 would be more appropriate.

ADD REPLY • link 5.4 years ago by GenoMax 141k

0

Entering edit mode

Thank you. I will try it now.

ADD REPLY • link 5.4 years ago by zhul09 • 0

0

Entering edit mode

Hi, genomax. Thanks for your help. Last two days, I used the bowtie software, but it was still similar to the above result. Was the index file appropriate? It was downloaded from ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/GRCh38_no_alt.zip Is there an index file specially for miRNA alignment? I tried the miRbase website, downloaded the hairpin.fa and mature.fa files, built them to *.ebwt format index files with bowtie-build. Then I repeated the alignment process again. It didn’t work either. I think this may be because these .fa format files are not human reference files.

ADD REPLY • link 5.4 years ago by zhul09 • 0

0

Entering edit mode

Have you cleaned the data downloaded from SRA? It may still have adapter sequences in it. Sequence data from miRBase generally has U bases which have to be changed to T before you can do the alignments. Have you done that before creating the bowtie indexes?

ADD REPLY • link 5.4 years ago by GenoMax 141k

0

Entering edit mode

Hi, genomax. Thanks for your help. Do you know how to get adapter sequences? I got some from the fastqc result report file, using trimmomatic to trim reads, aligning the trimmed reads to genome. Here is the report.

# reads processed: 5131255
# reads with at least one reported alignment: 198985 (3.88%)
# reads that failed to align: 4932270 (96.12%)
Reported 198985 alignments

The alignment rate increased, but it was still too low.

ADD REPLY • link updated 5.4 years ago by ATpoint 82k • written 5.4 years ago by zhul09 • 0

0

Entering edit mode

You know that a .gtf file is not a genome, right?

ADD REPLY • link 5.4 years ago by swbarnes2 14k

0

Entering edit mode

Thank you. That is a typo. I have corrected it. The indexes file is ht2 format file.

ADD REPLY • link 5.4 years ago by zhul09 • 0

0

Entering edit mode

6631500 (99.81%) aligned 0 times

99 % aligned zero times (not at all aligned) ? How much sure are you about the data?

ADD REPLY • link 5.4 years ago by lakhujanivijay 5.8k

0

Entering edit mode

This result was given by hisat2. It seems that these reads were not mapped to the genome at all. I am not sure about the data. These files are downloaded from pubmed GEO database.

ADD REPLY • link 5.4 years ago by zhul09 • 0

0

Entering edit mode

Which Hisat2 version are you using?

ADD REPLY • link 5.4 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

HISAT2 version 2.1.0 by Daehwan Kim (infphilo@gmail.com, www.ccb.jhu.edu/people/infphilo) Operate system: Manjaro Linux 64 bit. This software works fine when I process another RNAseq data of mice with indexes files for mice.

ADD REPLY • link 5.4 years ago by zhul09 • 0

0

Entering edit mode

That sounds really strange. Now I almost think there is a mixup with the samples. Could you try selecting a couple of random reads and blasting the first say 30-40 nucleotides (online tool here) just to make sure there is not a mixup with the samples?

ADD REPLY • link 5.4 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

Have you tried using a mir reference? i.e. a reference only containing the mirs sequences? You can use MiRbase download to get this data: http://www.mirbase.org/ftp.shtml

ADD REPLY • link 5.4 years ago by blur ▴ 280

score 2 · Answer 1 · 2018-11-27

Hi zhul09,

the adapter sequence you are looking for is TGGAATTCTCGGGTGCCAAGG. When you run fastqc before and after trimming, you'll see that the adapter content will be vanished. I quickly downloaded one of the files of this dataset (SRR6761198) and can confirm that aligning against the combined mature- and hairpin files from mirbase is problematic:

$ bowtie -S mirbase.fa SRR6761198-trimmed.fastq out.sam
# reads processed: 3790622
# reads with at least one reported alignment: 16997 (0.45%)
# reads that failed to align: 3773625 (99.55%)
Reported 16997 alignments

whereas against the full hg38:

$ bowtie --threads 16 -S hg38 SRR6761198-trimmed.fastq out.sam
# reads processed: 3790622
# reads with at least one reported alignment: 1632354 (43.06%)
# reads that failed to align: 2158268 (56.94%)
Reported 1632354 alignments

I only had a brief look at the methods, and this seems to be a plasma sample, so circulating small RNAs. Not my expertise at all, but maybe it is expected to have this alignment result, and the simple presence of miRs in this sample is already informative in terms of biomarkers, but this is something you have to know or discuss with your supervisor. I could imagine (just thinking aloud) that upon stroke the necrotised cells release brain-specific miRs which normally one would not find in the blood. So given that was true, finding some of them among the aligned reads from a blood plasma sample (even though it might only be like 100 reads or 0.01%) could already serve as a valuable information. You should see if within those reads that align to mirbase, you find the miRs back that the paper describes as potential biomarkers to increase your confidence. Hope this helps!

score 0 · Answer 2 · 2018-11-21

0

Entering edit mode

5.4 years ago

rbagnall ★ 1.8k

I think your indexes file is incorrect. -x <hisat2-idx> The basename of the index for the reference genome.

You need to index the genome with hisat2-build command first. From the manual, hisat2-build builds a HISAT2 index from a set of DNA sequences.

ADD COMMENT • link 5.4 years ago by rbagnall ★ 1.8k

0

Entering edit mode

Thank you. That is a typo. The indexes file(ht2 format) was downloaded from "ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38_tran.tar.gz". Since it was already a file in ht2 format, I didn't use hisat2-build to rebuild the index file.

ADD REPLY • link 5.4 years ago by zhul09 • 0