R: extract interest nucleotide from BAM files (BS-Seq)
1
3
Entering edit mode
8.6 years ago
Shicheng Guo ★ 9.4k

Hi colleagues,

I want to extract the nucleotide (C or T) for each reads from BAM files (BS-Seq) in some interesting positions (CpG sites). Do you know any R script can do it with well speed?

As many colleagues mentioned that why I insist to use R. Sure, I can use perl and samtools to extract these informations, but It would be complicated to use many programming language. I know maybe be R would be slow to deal with this problem, but anyway, it is in the one platform and it would be easy for the user. Acutally, I have finished the Perl script to do it. I just want to transfer the code to R.

library("Rsamtools")
library("GenomicRanges")
library("GenomicAlignments")

setwd("/home/shg047/bam")
# quickBamFlagSummary("Indx16_S12.sorted.clipped.bam", main.groups.only=TRUE)
flag1 <- scanBamFlag(isFirstMateRead=NA, isSecondMateRead=NA,isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
param1<-ScanBamParam(flag=flag1, what="seq")
gal1 <- readGAlignments("Indx16_S12.sorted.clipped.bam", use.names=TRUE, param=param1)
seq<-mcols(gal1)$seq
is_on_minus <- as.logical(strand(gal1) == "-")
seq[is_on_minus] <- reverseComplement(seq[is_on_minus])

bf <- BamFile("Indx16_S12.sorted.clipped.bam", asMates=F)
paln <- readGAlignmentsList(bf)
j <- junctions(paln, with.revmap=TRUE)
roi <- GRanges("chr1", IRanges(10056, width=500))
findOverlaps(paln,roi)
j_overlap <- paln[paln %over% roi]
paln[j_overlap$revmap[[1]]]

Best regards

alignment • 2.5k views
ADD COMMENT
1
Entering edit mode

Do you absolutely have to do this with R? It's great for some things, but terrible for things like this.

ADD REPLY
0
Entering edit mode

R is intended for statistical data analysis. You can use samtools mpileup to get the base calls for a base position

ADD REPLY
0
Entering edit mode
8.6 years ago
glihm ▴ 660

Hi Shicheng Guo,

like posted by Devon Ryan, you have lot's of opportunities in order to do that. The thing is: "What orders are?" :D

  1. You have to do it with R -> This package from bioconductor will be helpful!
  2. You want to try an other solution, perhaps you have some knowledge with Python -> pysam is your man.
  3. With other languages, you can use SamTools in order to extract sequences and then process them with bash scripts for example or languages like C -> samtools.

Depending on the R bioconductor package performance, other programming languages may be faster and easier to extend/develop than R scripts.

However, if you are a pro R, enjoy it with Rsamtools package!

ADD COMMENT
1
Entering edit mode

More specifically, if R isn't needed then PileOMeth might be the simplest tool.

ADD REPLY
0
Entering edit mode

I did not know this one, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6