How to extend ChIP-seq reads from BAM file ?
3
0
Entering edit mode
3.0 years ago
jseg • 0

Hi all,

I'm doing some ChIP-seq and I merged my reads with new data from a rerun.

While my original reads were around 86bp-long, the reads from the rerun are around 50bp-long.

Since I intend to do peak-calling, I woud like to extend my reads so that everything is the same size.

Some tools like bamCoverage from deeptools extend the reads from the BAM file, which is what I want. However, I would like extended reads also in a BAM format and not in a bigWig format...

Do you guys know any tools/approaches that could allow me to extend my reads after I've mapped them to the genome?

Kind regards.

read-extension NGS ChIP-seq • 2.2k views
ADD COMMENT
1
Entering edit mode
3.0 years ago
ATpoint 81k

Peak calling (e.g. macs2) does not use the sequenced read length but the inferred fragment length which it internally calculates. You can also give it a fragment length, e.g. based on the wetlab results (Bioanalyzer/Tapestation results). So from that perspective you do not have to do anything. It is good practice though to use the same read length within an experiment to avoid mappability bias (the shorter ones may align differently than the longer ones). As you cannot extend (so making up data) the reads you shozld trim the 86er to 50bp, repeat alignment and feed it to macs2 or any other downstream analysis. That is what I'd do.

ADD COMMENT
0
Entering edit mode

Thanks for your answer, you're right, trimming sounds good.

ADD REPLY
1
Entering edit mode
3.0 years ago
seidel 11k

What is your analysis environment? Are you sure you really need to extend them? If you think of the reads as observed events on the genome, it doesn't really matter how long they are for many purposes. Peaks can be determined by evaluating event density. Nonetheless, if you plan to use MACs to find peaks, and you're familiar with R, you can read in your BAM files from both data sets - which will result in a GRanges object for each BAM file (essentially an efficient way to manage chr,start,end), and you can then make all your "reads" the same length using the resize() function. It's pretty easy:

gr <- resize(gr, newlength)

From there you can find coverage, create bigWigs, or export the results as a BED file that can be read by MACS, etc.

ADD COMMENT
0
Entering edit mode

Thank you :) So can I export the resized GRanges to BAM format?

ADD REPLY
1
Entering edit mode

No. BAM format is for storing sequence data, aligned or unaligned. By extending the reads, you're only extending their alignment coordinates in the genome, but not actually getting the sequence. Indeed, it's common to follow the step above with gr <- trim(gr), because reasd can be extended right of the ends of the chromosomes. I'm not sure why you would need BAM format at this point, when other export formats would be more appropriate (e.g. BED), unless you need it to fool some upstream program that takes no other input format.

ADD REPLY
0
Entering edit mode

Ok thanks for your help :)

Bonus question: do you have any nice tutorial on Bioconductor/GenomicRanges ?

ADD REPLY
1
Entering edit mode
3.0 years ago
heskett ▴ 110

Bedtools slop can add bases in both or either directions to a read. very easy to use

ADD COMMENT
0
Entering edit mode

Thanks! I didn't know this, I'll try :)

ADD REPLY

Login before adding your answer.

Traffic: 2976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6