Question

ripseeker result interpretation help

0

Entering edit mode

4.9 years ago

bioinfo_ga ▴ 70

Hi

I am working on ripseq analysis and have used ripseeker tool for the same. I have used main function ripSeek to predict RIP for my data which gives output annotated file with the following headers . Can anyone please provide me description for the headers as i am not able to interpret the results.

seqnames    start    end    width    strand    ensembl_gene_id    external_gene_name    description    count    fpk    logOddScore    pval    pvalAdj    CTL_count    CTL_fpk    CTL_logOddScore    eFDR    peak    feature    start_position    end_position    feature_strand    insideFeature    distancetoFeature    shortestDistance    fromOverlappingOrNearest

It would be of great help.

rip-seeker • 1.4k views

ADD COMMENT • link updated 4.6 years ago by xiaocong3333 • 0 • written 4.9 years ago by bioinfo_ga ▴ 70

score 0 · Answer 1 · 2019-05-14

Using the help function ?ripSeek, you should see the following information:

The results as GRangesList generated from the RIP peak detection. Each list item represents the RIP peaks on a chromosome accompanied with statistical scores including (read) count, logOddScore, pval, pvalAdj, eFDR for the RIP and control (if available). Please refer to seekRIP for more details.

If you follow the instructions to look up ?seekRIP, you will find the following info:

The RIPScore is compupted in computeLogOdd as the log odd ratio of the posterior for the RIP state (zi = 2) over the posterior for the background state (zi = 1) in RIP library. When control is avail- able, the RIPScore is updated by the difference between the RIPScores between RIP and control. The adjacent bins with hidden states predicted by nbh_vit as the enriched state (corresponding to the NB with larger mean) are merged. The RIPSscores are averaged over the merged bins. To assess the statistical significance of the RIPScore for each region, we assume that the RIPScore follows a Gaussian (Normal) distribution with mean and standard deviation estimated using the RIPScores over all of the bins. The rationale is based on the assumption that most of the RIPScores correspond to the background state and together contribute to a stable estimate of the test statistics (TS) and p-value computed using the R built-in function pnorm. The p-value is adjusted by p.adjust with BH method by default. The same procedure is applied optionally to the control library. Only when the control is available, is an empirical false discovery rate (eFDR) estimated based on the idea of "sample swap" inspired by MACS (a ChIP-seq algorithm from Zhange el al. (2008). At each p-value, RIPSeeker finds the number of significnat RIP-regions over control (CTL) based on pvalRIP and the number of significant control regions over RIP based on pvalCTL. The eFDR is defined as the ratio of the number of "RIP" (false positive) regions iden- tified from CTL-RIP comparison over the number of RIP regions from the RIP-CTL comparison. The maximum value for eFDR is 1 and minimum value for eFDR is max(p-value, 0). The former takes care of the case where the numerator is bigger than the denominator, and the latter for zero numerator.

I'm sure, there are even more details in the RIPseeker vignette.

score 0 · Answer 2 · 2019-09-29

I'm using the ripseeker to analyze my RIPseq data. I'm studying on fly. My code as below:

------------------------------------------------------------------------------------------------library(RIPSeeker)library('biomaRt')library("curl")library("org.Dm.eg.db")outDir <- file.path(getwd(), "TEST6")# Parameters settingcNAME <- "Non-Arc-R1.sorted.bam"binSize <- NULL # set to NULL to automatically determine bin sizeminBinSize <- 11000 # min bin size in automatic bin size selectionmaxBinSize <- 12000multicore <- TRUEstrandType <- "-" # set strand type to minus strandbiomart <- "ensembl"biomaRt_dataset <- "dmelanogaster_gene_ensembl" goAnno <- "org.Dm.eg.db" # GO annotation database seekOut.PRC2 <- ripSeek(bamPath = "/home/congxiao/captain/RIPSEQ/Arc_Cop_RIP/test_test/test2/test", cNAME = cNAME, reverseComplement = TURE, strandType = strandType, paired = TRUE, uniqueHit = TRUE, assignMultihits = TRUE, rerunWithDisambiguatedMultihits = TRUE, binSize=binSize, minBinSize = minBinSize, maxBinSize = maxBinSize, biomart = biomart, biomaRt_dataset = biomaRt_dataset, outDir = outDir)

My library is a directional strand library. I used Hisat2, I tried both Dm3/BDGP6 as ref, I changed the strandType with “+”, “-”, “*”, and reverseComplement as TRUE or FALSE. There are always some problems. The main errors are: There are no more than 2 genes for annotationRIP; OR I delete the goAnno, the RIPregions_annotated.gff3 has only a few lines, it should not be like that. I set the genomeBiuld as 'dm', 'dm3', or just delete it, the final result seems no different.I’m wondering what’s wrong with my parameters.Could you give some suggestions?

Thank you!