Normalizing ChIP-seq and ATAC-seq data?
1
1
Entering edit mode
5.2 years ago
star ▴ 350

I have some Single end and Paired end ChIP-seq and ATAC-seq data. I have done aligning using Bowtie2 and ignored duplicated reads using Picard, then performed peak calling on Bam file using MACS2 (I just need peaks and I will not continue with Differential analysis).

I would like to normalize data based on sequence depth and I found I can do it using bamCoverage based on bam file and using cqn packages based on number of reads for each peaks. I prefer to do on bam file using bamCoverage but its out put is bigwig/bedgraph that is not suitable for peak calling by MACS2.

I would like to know is there any solution for that?

ChIP-Seq alignment normalization MACS2 • 3.2k views
ADD COMMENT
0
Entering edit mode
5.2 years ago
ATpoint 81k

You can simply subsample the BAM files to match the one with the fewest reads determined by samtools flagstat

Once you have that you can use this code, e.g. to subsample all files to 10mio reads:

#!/bin/bash
## Subsample BAM to a given number of reads:

function SubSample {
## code inspired by: http://crazyhottommy.blogspot.com/2016/05/downsampling-for-bam-files-to-certain.html
FACTOR=$(samtools idxstats $1 | cut -f3 | awk -v COUNT=$2 'BEGIN {total=0} {total += $1} END {print COUNT/total}')

if [[ $FACTOR > 1 ]]
  then 
  echo '[ERROR]: Requested number of reads exceeds total read count in' $1 '-- exiting' && exit 1
fi

sambamba view -s $FACTOR -t 2 -f bam -l 5 $1

}

export -f SubSample

ls *.bam | parallel "SubSample {} 10000000 > {.}_subsampled.bam"
ADD COMMENT
0
Entering edit mode

Thank for your reply.

For ChIp-seq that we use Control for peak calling using MACS2, I think the MACS2 does normalizing, in that case is it necessary to do normalizing on bam file before peak calling?

But for ATAC seq, can we do normalizing after peak calling?e.g. using RPM?

ADD REPLY
0
Entering edit mode

I think the MACS2 does normalizing

True

But for ATAC seq, can we do normalizing after peak calling?e.g. using RPM?

To do what, displaying the data on a browser? You said you are only interested in peak presence. Please explain.

ADD REPLY
0
Entering edit mode

Thanks, yes I got different ATAC seq data and ChIP seq data and re-analyzing them and I want to see just their peaks in browser but if I want to compare them together.Can I normalize ATAC seq peak?

ADD REPLY
0
Entering edit mode

Compare them by eye? I do not understand.

ADD REPLY

Login before adding your answer.

Traffic: 2139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6