HOWTO extract regions from GRanges object
2
0
Entering edit mode
8.3 years ago
Assa Yeroslaviz ★ 1.8k

Hi,

I have a small set of bam files from centromere regions of S. cerevisiae. Those I have extracted from the complete bam files using the samtools view option based on a list of centromere positions as shown below. I have used the centromere middle point to go forward and reverse 30kb and extract the regions.

chromosome    CEN-[30kb]    CENmitte    CEN+[30kb]
I    121500    151500    181500
II    208300    238300    268300
III    84500    114500    144500
IV    419700    449700    479700 ...

I would like to see how many reads are mapped to each of the positions in these regions. This I would like to count into 50b bins of genome regions. For that I have used the tileGenome command to create a virtual GRanges object as shown below (this is why it has no actual information yet).

bins1 <- tileGenome(seqinfo(Scerevisiae), tilewidth=500, cut.last.tile.in.chrom=TRUE)
GRanges object with 24322 ranges and 0 metadata columns:
          seqnames         ranges strand
             <Rle>      <IRanges>  <Rle>
      [1]     chrI   [   1,  500]      *
      [2]     chrI   [ 501, 1000]      *
      [3]     chrI   [1001, 1500]      *
      [4]     chrI   [1501, 2000]      *
      [5]     chrI   [2001, 2500]      *
      ...      ...            ...    ...
  [24318]     chrM [83501, 84000]      *
  [24319]     chrM [84001, 84500]      *
  [24320]     chrM [84501, 85000]      *
  [24321]     chrM [85001, 85500]      *
  [24322]     chrM [85501, 85779]      *
  -------
  seqinfo: 17 sequences (1 circular) from sacCer3 genome

using the summarizeOverlaps command I can now summarize exactly how many reads are mapped into each of my bins in the genome

bamFiles <- list.files(path = ~/centromereBamFIles.withchrIII/", pattern = ".bam$", full.names = TRUE)
olapTable <- summarizeOverlaps(bins1, bamFiles, inter.feature=FALSE)

But the problem is that I still have the complete genome in the bins1 GRanges object. But I would like it to contain only the regions I am using for the centromeres.

Hence my question, is there a way to extract multiple regions from the GRanges object? I would like to extract the centromere regions (± a few 20kb) of each chromosome. e.g. these regions:

chrI:101500-201500
chrII:188300-288300
chrIII:64500-164500

But I can't find a way to do it?

I know I can remove all the rows with a sum of 0, but with that I will also remove the rows from the centromere regions where no reads were mapped. These I need to keep in my GRanges object.. So I will try to work it out with restrict or findOverlaps before I run the summarizeOverlaps with the bam files

Is there a more direct way to do it?

Thanks
Assa

GenomicRanges R GRanges IRanges • 5.5k views
ADD COMMENT
0
Entering edit mode

Do you have centromere coordinates ? What do you mean by "I would like to extract the centromere regions (± a few 20kb) of each chromosome"

ADD REPLY
3
Entering edit mode
8.3 years ago

It should be simple with bedtools.

regions.bed

chrI    101500    201500
chrII    188300    288300
chrIII    64500    164500

Try:

bedtools makewindows -b regions.bed -w 500 | less

To get the SAF format for featureCounts:

bedtools makewindows -b regions.bed -w 500 | awk -v OFS="\t" '{print $1":"$2"-"$3,$1,$2,$3,"+"}' > regions.saf

You don't need to first generate all the bins and then subset regions. You can use the regions.saf file with featureCounts to get the count matrix.

ADD COMMENT
0
Entering edit mode

thanks. I haven't thought of doing it like that.

ADD REPLY
1
Entering edit mode
8.3 years ago
Michael 54k

It is not clear why you would like to do this. Your GRanges do not contain any additional information, so there won't be anything to extract except the bins themselves. However there is no sequence information for the centromere itself I guess. So let's assume it's a toy example. There are two ways to approach this:

  • findOverlaps: find overlaps between the GRanges and the centromeric regions, also converted to GRanges
  • restrict: restrict the GRanges to the centromeric regions, will also truncate some ranges potentially

See the documentation of those functions.

ADD COMMENT
0
Entering edit mode

sorry yes I can see that the question might be a bit unclear. i have worked it out and made it more detailed, so that it will be easier to understand.

sorry about that.

ADD REPLY

Login before adding your answer.

Traffic: 2375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6