Biostar Beta. Not for public use.
Question: Rsamtools-Countoverlaps
1
Entering edit mode

Hi, I'm a beginner analyzing genomic data and using R and I have a doubt that I don't know how to solve. I'm using Rsamtools to analyze mRNAseq_data but once I have the reads covering each gene I would like to add the information about their chromosome, start and stop of each gene. I have downloaded the gene information from the UCSC browser as a RangeList following this command line:

library(GenomicFeatures)
txdb = makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene')

I have tried to coerce the RangeList into a data frame in order to get this "extra data" but the number of observations changes so I don't know how to extract from the RangeList the information I need of each gene.

Right now I have this data frame:

Gene          Reads

AGR37373.F06    5

AGR37273.F04    0

And I would like to have this:

Gene          Chr     Start    Stop   Reads

AGR37373.F06    1      20       150     5

AGR37273.F04    1      300      410     0

I'd be very grateful if someone could give me any advice about how to solve this problem.

Thanks!

ADD COMMENTlink 8.5 years ago Cdiez • 150 • updated 8.5 years ago Michael Dondrup 46k
Entering edit mode
0

Did you try to merge the two dataframes? read ?merge and merge on the the geneid.

ADD REPLYlink 8.5 years ago
Michael Dondrup
46k
1
Entering edit mode

My guess is you do something like:

library(GenomicFeatures)
txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene')
aligns <- readBamGappedAlignments("my.bam")
counts <- countOverlaps(transcripts(txdb), aligns)
names(counts) <- elementMetadata(tx)[,"tx_name"]

so change the last line to and following:

elementMetadata(txdb)[,"my.counts"] <- counts
df <- as.data.frame(txdb)

and voila you have a data frame with all genomic positions and metadata.

However I would suggest you to look at htseq-count because it has multiple resolution rules for situations that a simple count does not answer. call it from R with system("htseq-count ...") and read in the resulting data.frame.

ADD COMMENTlink 8.5 years ago Ido Tamir 5.0k • updated 9 months ago RamRS 21k
0
Entering edit mode

Hi! Thanks a lot for your useful comments!

I have tried the two suggested methods but I have not had any successful result :(

  1. Yes, you are right (Ido Tamir), I was trying this command line:

library(GenomicFeatures) txdb <- makeTranscriptDbFromUCSC(genome='hg19',tablename='ensGene') aligns <- readBamGappedAlignments("my.bam") counts <- countOverlaps(transcripts(txdb), aligns) counts.dframe <- data.frame(counts, stringsAsFactors=FALSE) rownames(counts.dframe) <- names(txdb)

Following this command line I obtained the number of reads overlapping each gene:

CC209858.4_FG002 214 CC209858.4_FG003 0 CC209858.4_FG004 17 CC209859.2_FG006 0 CC209859.2_FG008 0

Now I wanted to add an extra column with the information stored in the Metadata(txdb). I tried the two suggested options suggested command lines but I got an error:

rownames(counts.dframe) <- elementMetadata(txbd)[,"tx_name"]

Error in elementMetadata(txbd)[, "tx_name"] : selecting cols: cannot subset by character when names are NULL

elementMetadata(txbd)[,"my.counts"] <- count.dframe

Error in [<-(*tmp*, , "my.counts", value = list(counts = c(1L, : replacing cols: cannot subset by character when names are NULL

  1. About trying to merge both files. The problem to merge the two files(counts and genes) using the gene ID is that I have to coerce into data.frame both files, and when I do it the number of rows of the txbd.dataframe doesn't correspond with the number of observations this file has as GRangeList (I don't get why..). Then, I can not merge the two datafrmes.

i.e:

Original files Files coerced into data frame

counts (interger[110185]) counts.dframe (110185 obs. of 1 variables) txbd (GRangesList of length 110185) txbd.dframe (392978 obs. of 8 variables)

Then, I don't know how I can access to the information store in the metaData and use it to complete the information about the reads overlapping each gene. I just thinking about to export the files and write and script in Perl to do it, but should be a way to do it in R (I guess...).

Thanks!

ADD COMMENTlink 8.5 years ago Cdiez • 150
Entering edit mode
0
elementMetadata(txbd)[,"my.counts"] <- count.dframe

should be - I did not convert the counts vector into a data.frame:

elementMetadata(txbd)[,"my.counts"] <- count.dframe$Reads

I don't think countOverlaps is appropriate. Try htseq.

ADD REPLYlink 8.5 years ago
Ido Tamir
5.0k
• updated 9 months ago
RamRS
21k
0
Entering edit mode

The number of rows doesn't have to be equal

Try:

merge(counts, meta, all=TRUE, by.x=1, by.y=1) # set by.x by.y to the column number with the Geneids

That way you will get a full outer join like in SQL with NAs inserted in rows that don't in exist in one of the files.

ADD COMMENTlink 8.5 years ago Michael Dondrup 46k • updated 9 months ago RamRS 21k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0