Converting .narrowPeak file to GRanges for use in Bioconductor ChIPseeker
2
1
Entering edit mode
8.5 years ago
dally ▴ 210

I have a .narrowPeak file of Pol IIthat has called peaks using MACS2 and has annotated the peaks using HOMER. I am attempting to visualize the peaks using the covplot command in the Bioconductor package "ChIPseeker". I've looked at the user manual and seen that the 'readpeakfile' command is what I'm looking for (I think) but i've been unable to get it to work. Could anyone help me on how the command line would look so that I can accomplish what I'm trying to do?

R ChIP-Seq ChIPseeker • 11k views
ADD COMMENT
0
Entering edit mode

Could you give examples of how you've attempted to get it to work? i.e. post the specific series of commands you attempted (avoiding unwanted personal information).
From the manual, it looks like you can either import a peak file using readPeakFile (I suspect you need to specify as="GRanges"), or you can directly specify the peak file when invoking the covplot command.

Are you sure that your path to the peak file is correct?

The tricky bit is gettting the syntax right for specifying peakfile (as seen on page 24 in the readPeakFile section).

ADD REPLY
0
Entering edit mode
require(ChIPseeker)
require(TxDb.Hsapiens.UCSC.hg19.knownGene)

txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
covplot(home/carlos/data/Pol-II-Chip.MACS2_peaks.narrowPeak, weightCol ="score", xlab = "Chromosome Size (bp)", ylab =" ", title = "Pol-II Peaks over Chromosomes", chrs = NULL, xlim = NULL)

So I'm pretty sure my problem is that I need to specify the narrowPeak file as an object which I am unsure how to do.

ADD REPLY
0
Entering edit mode

And what's the error that you get?

It looks like the parameters you are setting are correct (and in fact, are mostly the defaults, so you don't even need to set them). I'm wondering if it's the way you are specifying the path to the file is correct. In the manual, the examples use the "system.file" command, maybe try that and see what the exact format of the file path that is saved.

You might just need quotes around your file path?

Edit: I think that if you specified the path to your .narrowPeak file using file.path, that should work. Could be wrong.

system.file (as used in the ChIPseeker examples) seems to just return a character vector of the path.

ADD REPLY
0
Entering edit mode

Error is:

Error in covplot("home/carlos/data/Pol-II-Chip.MACS2_peaks.narrowPeak", : peak should be a GRanges object or a peak file ...

I'm pretty sure it's the way I'm specifying the path file ... i've tried setting

peak <- system.file("home/data/Pol-II-Chip.MACS2_peaks.narrowPeak")

but that doesn't seem to be working either so I'm probably using it wrong.

EDIT: Tried using path.file("home/data/Pol-II-Chip.MACS2_peaks.narrowPeak") and then running the script again but received the same error except now it says:

Error in covplot(peak, weightCol = "score", xlab = "Chromosome size (bp)", : peak should be a GRanges object or a peak file ...
ADD REPLY
0
Entering edit mode

I actually got it to work, but now whenever I attempt

covplotpeakfile.gr, weightCol ="score", xlab = "Chromosome Size (bp)", ylab =" ", title = "Pol-II Peaks over Chromosomes", chrs = NULL, xlim = NULL)

I get the following:

Error in .noramrg_shift_or_weight(weight, "weight", x) : 'weight' must be a numeric vector, a single string, or a list-like object
ADD REPLY
0
Entering edit mode

I suspect that the weightCol parameter is supposed to specify the name of your GRanges metadata column, or the column name of your file.

1) Do you have a column in your file with the title "score"?

2) Have you tried importing your MACS data as a GenomicRanges, then running covplot with that object?

ADD REPLY
0
Entering edit mode
  1. I have no column in my file with the title "score", but I do have one in my annotated .txt file with a column "Score" but for some reason the readPeakFile to convert to GRanges doesn't seem to work on .txt files.
  2. I am running the covplot with only peakfile.gr) so lets see what happens. I will edit this once it's finished running.

EDIT: I ran the covplot data but without a weightCol all the peaks were of identical heights.

ADD REPLY
0
Entering edit mode

Probably the simplest thing (in the short run) is to read in your .narrowPeak file as a data.frame, then construct a GenomicRanges object from it, with the score column specified as metadata. Then, run covplot on your GRanges, and specify the weight column by name.

ADD REPLY
0
Entering edit mode

can't seem to construct the Genomic Ranges Object. I'm pretty bad at syntax in general.

I'm trying to follow this guide.

ADD REPLY
1
Entering edit mode

As a larf, imported a bed file (tab-delimited) with some additional columns into a data.frame.

My recommendation is to put a header line into your file, then import a data.frame as follows:

Because my bed file had headers, I put header=TRUE

peaks<- read.table("midEnriched.Full.txt",header=TRUE)

This makes a data.frame, where each column (variable) has a different name, which can be seen here when I type

> str(peaks)
'data.frame':   167 obs. of  8 variables:
 $ chrom : Factor w/ 5 levels "chr2L","chr2R",..: 5 3 3 5 3 3 1 4 3 4 ...
 $ left  : int  2690162 40353 26610 8190040 33472 1355033 18370795 8896863 12663057 8893104 ...
 $ right : int  2690879 40727 27041 8190186 33774 1355708 18371139 8897318 12663550 8893766 ...
 $ logFC : num  -11.9 -9.84 -9.7 -9.54 -9.24 ...
 $ logCPM: num  7.3 5.26 5.12 4.96 4.67 ...
 $ PValue: num  6.76e-21 1.11e-12 5.78e-12 2.06e-11 2.80e-09 ...
 $ FDR   : num  1.35e-16 7.42e-09 2.89e-08 8.23e-08 6.23e-06 ...
 $ peakID: Factor w/ 167 levels "sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_10041",..: 45 112 108 53 110 120 73 12 143 11 ...

To make a GRanges, I typed

myRanges<- GRanges(seqnames=peaks$chrom,ranges=IRanges(peaks$left,peaks$right),strand=NULL,peaks$logFC,peaks$logCPM,peaks$PValue,peaks$FDR,peaks$peakID)

Where I used the column name from each data frame variable (e.g. chromosomes are peaks$chrom, because that's what my column header name was). This generated a GRanges object, This is what it looks like:

> myRanges
GRanges object with 167 ranges and 5 metadata columns:
        seqnames               ranges strand   | peaks.logFC peaks.logCPM peaks.PValue  peaks.FDR
           <Rle>            <IRanges>  <Rle>   |   <numeric>    <numeric>    <numeric>  <numeric>
    [1]     chrX   [2690162, 2690879]      *   |  -11.902233     7.301598     6.76e-21   1.35e-16
    [2]    chr3L   [  40353,   40727]      *   |   -9.844267     5.255242     1.11e-12   7.42e-09
    [3]    chr3L   [  26610,   27041]      *   |   -9.702037     5.122502     5.78e-12   2.89e-08
    [4]     chrX   [8190040, 8190186]      *   |   -9.541360     4.958998     2.06e-11   8.23e-08
    [5]    chr3L   [  33472,   33774]      *   |   -9.239363     4.673488     2.80e-09   6.23e-06
    ...      ...                  ...    ... ...         ...          ...          ...        ...
  [163]    chr2R [20778828, 20779622]      *   |   -1.626363     6.466925  0.002774428 0.04910492
  [164]    chr2L [ 8842609,  8843061]      *   |   -1.802049     5.857315  0.002782288 0.04920049
  [165]    chr3R [ 6318490,  6318881]      *   |   -1.844687     5.630014  0.002795912 0.04931061
  [166]     chrX [ 8530879,  8531472]      *   |   -1.685482     6.171483  0.002822427 0.04969062
  [167]    chr2R [15682073, 15682610]      *   |   -2.013013     5.481337  0.002844105 0.04997572
                                         peaks.peakID
                                             <factor>
    [1] sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_13262
    [2]  sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_5843
    [3]  sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_5838
    [4] sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_13905
    [5]  sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_5840
    ...                                           ...
  [163]  sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_5747
  [164]  sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_1345
  [165] sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_10066
  [166] sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_13947
  [167]  sorted_68hr_sim_FAIRE_POOLED_MACS2_peak_5020
  -------
  seqinfo: 5 sequences from an unspecified genome; no seqlengths
ADD REPLY
5
Entering edit mode
6.8 years ago
zhangly811 ▴ 50

Install this package:

source("https://bioconductor.org/biocLite.R")
biocLite("rtracklayer")
library(rtracklayer)

And then load narrowPeak and the import command will do the conversion:

file_narrowPeak = "/Users/xxx/mypeak.narrowPeak" 
extraCols_narrowPeak <- c(signalValue = "numeric", pValue = "numeric",
                          qValue = "numeric", peak = "integer")

gr_narrowPeak <- import(file_narrowPeak, format = "BED",
                        extraCols = extraCols_narrowPeak)

The solution is adopted from https://charlesjb.github.io/How_to_import_narrowPeak/

ADD COMMENT
0
Entering edit mode

BiocManager::install("rtracklayer") install.packages("remotes") remotes::install_github("CharlesJB/GenomicFormatExamples")

library(rtracklayer) library(GenomicFormatExamples)

file_narrowPeak = "‪C:/Users/DCL/Documents/file_narrowPeak.narrowPeak" extraCols_narrowPeak <- c(signalValue = "numeric", pValue = "numeric", qValue = "numeric", peak = "integer") gr_narrowPeak <- import(file_narrowPeak, format = "bed", extraCols = extraCols_narrowPeak)

I am getting an error that says:

Error in open.connection(con, open) : cannot open the connection In addition: Warning message: In open.connection(con, open) : cannot open file '': Invalid argument

ADD REPLY
0
Entering edit mode
8.5 years ago
Guangchuang Yu ★ 2.6k

If you looking at the source code from covplot:

    if (is(peak, "GRanges")) {
        peak.gr <- peak
    }
    else if (file.exists(peak)) {
        peak.gr <- readPeakFile(peak, as = "GRanges")
    }
    else {
        stop("peak should be a GRanges object or a peak file...")
    }

You will found that the file you passed is not correct, so that it reach 'stop' to throw error.

Make sure you are using the correct path and correct file name.

ADD COMMENT

Login before adding your answer.

Traffic: 2234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6