Error when counting reads in genes with summarizeOverlaps (Genomic-Alignments package)
2
1
Entering edit mode
9.4 years ago

Hello,

I'm following the RNA-seq workflow for differential gene expression white paper by Michael Love, Simon Anders, and Wolfgang Huber (https://www.bioconductor.org/help/course-materials/2014/BioC2014/RNA-Seq-Analysis-Lab.pdf)

This is what I'm doing:

## Use the function summarizeOverlaps to count reads in the gene
library("GenomicAlignments")
se <- summarizeOverlaps(exonsByGene, BamFileList(bamFiles), mode="Union", singleEnd=TRUE, ignore.strand=FALSE, fragments=FALSE);

However I got this error and I have not idea how to fix it:

Error in .summarizeOverlaps_BamFileList(features, reads, mode, ignore.strand = ignore.strand, :
duplicate 'names(reads)' not allowed

Can someone help please!!

Here are all the previous step before trying to create the object se

### read the table: sampleTable.csv
sampleTable <- read.csv("sampleTable.csv", header=TRUE);

### build the full path to the tophat produced bam files
bamFiles <- file.path(".", sampleTable$dirName, sampleTable$fileName);

### see the created vector with paths
bamFiles

##### Use the BamFile function from the RsamTools to se if these paths are functional
library ("Rsamtools");
seqinfo(BamFile(bamFiles[1]));

#Counting reads in genes
library("GenomicFeatures");

hse <-makeTranscriptDbFromGFF("/proj/seq/data/TAIR10_Ensembl/Annotation/Genes/genes.gtf", format="gtf")
exonsByGene <- exonsBy(hse, by="gene");

## Use the function summarizeOverlaps to count reads in the gene
library("GenomicAlignments")
se <- summarizeOverlaps(exonsByGene, BamFileList(bamFiles), mode="Union", singleEnd=TRUE, ignore.strand=FALSE, fragments=FALSE);
GenomicAlignments summarizeOverlaps rna-seq R • 6.3k views
ADD COMMENT
0
Entering edit mode

It would seem that your BAM files have overlaps in the reads that they contain.

Also, your link is broken - you might wanna update your question with the corrected link.

ADD REPLY
0
Entering edit mode

it's discouraged to cross post on mulitple forum sites (Bioc support site, seqanswers, biostars), as it makes multiple people answer your question, and also makes it difficult to track down an answer which might have been given on another site.

One of the GenomicAlignments maintainers answered here: https://support.bioconductor.org/p/62966/

ADD REPLY
0
Entering edit mode
9.4 years ago
Chris S. ▴ 320

Can you paste the output from

names(BamFileList(bamFiles))
[1] "SRR479052.bam" "SRR479053.bam" "SRR479054.bam"
any(duplicated( names(BamFileList(bamFiles)) )
FALSE

I think that warning should only be caused by duplicate BAM file names

ADD COMMENT
0
Entering edit mode
8.7 years ago
scchess ▴ 640

The error was caused by duplicate BAM file names. For example, in my script:

files <- c('/Volumes/SSHFS/Sources/QA/scripts/RNA_A1/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_A2/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_A3/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_B1/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_B2/aligned/accepted_hits.bam',
           '/Volumes/SSHFS/Sources/QA/scripts/RNA_B3/aligned/accepted_hits.bam')

# .... load the files into BAM objects ....

> names(bams)
[1] "accepted_hits.bam" "accepted_hits.bam" "accepted_hits.bam" "accepted_hits.bam" "accepted_hits.bam" "accepted_hits.bam"

Even though the files are physically different, they all have the same key. We can rename those files or we can do:

names(bams) <- c("A1.bam", "A2.bam", "A3.bam", "B1.bam", "B2.bam", "B3.bam")

This will fix the problem.

ADD COMMENT

Login before adding your answer.

Traffic: 3458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6