high q value of RNA-seq(By Hisat2-Stringtie-Ballgown)
0
0
Entering edit mode
5.1 years ago
245712608 • 0

Hello! I am a novice in RNA-seq analysis. Recently, I followed the protocol "Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown" to perform RNA-seq analysis, but got gene lists with their q values higher than 0.1. I was confused, cuz my mentor already did that once and got gene lists with acceptable q values. Then I went to check the sam file produced by Hisat2, and found that the mapQ values are too uniform (0,1,60 only). So I was wondering if I got something wrong with my Hisat2 analysis part.

The following is my bash code for this analysis. Hope you could provide me some valuable advices! Very appreciate that!

  1. export PATH=$HOME/bin:$PATH

  2. hisat2 -p 4 --dta -x mm10/genome -U WTBMCold0h1_1.fq.gz -S WTBMCold0h1_1.sam

  3. samtools view -h -q 15 WTBMCold0h1_1.sam -o 1WTBMCold0h1_1.sam

    After deleting the WTBMCold0h1_1.sam file I change the name of 1WTBMCold0h1_1.sam into WTBMCold0h1_1.sam*

  4. samtools sort -@ 4 -o WTBMCold0h1_1.bam WTBMCold0h1_1.sam

  5. sh stringtie1.sh

    stringtie1.sh:

    for i in {0h1,0h4,4h4,5d1,5d4,12h1,12h4};do
        stringtie -p 4 -G genes.gtf -o WTBMCold${i}_1.gtf -l WTBMCold${i}_1 WTBMCold${i}_1.bam
        rm WTBMCold${i}_1.bam
    done
    
  6. stringtie --merge -p 8 -G genes.gtf -o stringtie_merged.gtf mergelist.txt

  7. sh stringtie2.sh

    stringtie2.sh:

    for i in {0h1,0h4,4h1,4h4,5d1,5d4,12h1,12h4};do
        stringtie -e -B -p 4 -G stringtie_merged.gtf -o ballgown/WTBMCold${i}_1/WTBMCold${i}_1.gtf WTBMCold${i}_1.bam
    done
    
  8. R analysis:

    library(ballgown)
    
    library(RSkittleBrewer)
    
    library(genefilter)
    
    library(dplyr)
    
    library(devtools)
    
    pheno_data_file <- "pheno_list.txt"
    
    pheno_data <- read.table(pheno_data_file, header=TRUE, colClasses = rep("character", 2))
    
    pheno_data = pheno_data[order(pheno_data$ids),]
    
    bg <- ballgown(dataDir = "ballgown", samplePattern="WTBMCold", pData=pheno_data)
    
    bg_filt = subset(bg,"rowVars(texpr(bg)) >1",genomesubset=TRUE)
    
    results_transcripts = stattest(bg_filt, feature="transcript",covariate="timepoint", getFC=TRUE, meas="FPKM")
    
    results_genes = stattest(bg_filt, feature="gene", covariate="timepoint",  getFC=TRUE, meas="FPKM")
    
    results_transcripts = data.frame(geneNames=ballgown::geneNames(bg_filt), geneIDs=ballgown::geneIDs(bg_filt), results_transcripts)
    
    results_transcripts = arrange(results_transcripts,qval)
    
    results_genes = arrange(results_genes,qval)
    
    write.csv(results_transcripts, "transcript_results.csv", row.names=FALSE)
    
    write.csv(results_genes, "gene_results.csv", row.names=FALSE)
    
    subset(results_transcripts,results_transcripts$qval<0.05)
    
    subset(results_genes,results_genes$qval<0.05)
    
RNA-Seq • 1.9k views
ADD COMMENT
0
Entering edit mode

How many replicates do you have? I think ballgown is recommended for more than 5 replicates, why not try deseq2?

ADD REPLY
0
Entering edit mode

Ok I will give it a try. Thanks buffo.

ADD REPLY
0
Entering edit mode

I have edited your post to make it more readable. Numbered lists are created by using a number, then a period, then a space. You were missing the space. Short sections of code can be formatted with back-ticks, and longer sections by starting lines with four spaces (or 8 in a numbered list) or using the code format button (the one labelled 101010).

ADD REPLY
0
Entering edit mode

I wouldn't worry about the mapQ numbers, I each aligner sets these differently and I think that HiSat only ever uses those three values.

ADD REPLY
0
Entering edit mode

Thanks a lot. I have removed the —dta parameters and try it once.Hope I can get desirable results.

ADD REPLY

Login before adding your answer.

Traffic: 1556 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6