Biostar Beta. Not for public use.
Normalizing ChIP-seq and ATAC-seq data?
0
Entering edit mode
13 months ago
star • 150
Netherlands

I have some Single end and Paired end ChIP-seq and ATAC-seq data. I have done aligning using Bowtie2 and ignored duplicated reads using Picard, then performed peak calling on Bam file using MACS2 (I just need peaks and I will not continue with Differential analysis).

I would like to normalize data based on sequence depth and I found I can do it using bamCoverage based on bam file and using cqn packages based on number of reads for each peaks. I prefer to do on bam file using bamCoverage but its out put is bigwig/bedgraph that is not suitable for peak calling by MACS2.

I would like to know is there any solution for that?

ADD COMMENTlink
0
Entering edit mode
10 weeks ago
ATpoint 17k
Germany

You can simply subsample the BAM files to match the one with the fewest reads determined by samtools flagstat

Once you have that you can use this code, e.g. to subsample all files to 10mio reads:

#!/bin/bash
## Subsample BAM to a given number of reads:

function SubSample {
## code inspired by: http://crazyhottommy.blogspot.com/2016/05/downsampling-for-bam-files-to-certain.html
FACTOR=$(samtools idxstats $1 | cut -f3 | awk -v COUNT=$2 'BEGIN {total=0} {total += $1} END {print COUNT/total}')

if [[ $FACTOR > 1 ]]
  then 
  echo '[ERROR]: Requested number of reads exceeds total read count in' $1 '-- exiting' && exit 1
fi

sambamba view -s $FACTOR -t 2 -f bam -l 5 $1

}

export -f SubSample

ls *.bam | parallel "SubSample {} 10000000 > {.}_subsampled.bam"
ADD COMMENTlink
0
Entering edit mode

Thank for your reply.

For ChIp-seq that we use Control for peak calling using MACS2, I think the MACS2 does normalizing, in that case is it necessary to do normalizing on bam file before peak calling?

But for ATAC seq, can we do normalizing after peak calling?e.g. using RPM?

ADD REPLYlink
0
Entering edit mode

I think the MACS2 does normalizing

True

But for ATAC seq, can we do normalizing after peak calling?e.g. using RPM?

To do what, displaying the data on a browser? You said you are only interested in peak presence. Please explain.

ADD REPLYlink
0
Entering edit mode

Thanks, yes I got different ATAC seq data and ChIP seq data and re-analyzing them and I want to see just their peaks in browser but if I want to compare them together.Can I normalize ATAC seq peak?

ADD REPLYlink
0
Entering edit mode

Compare them by eye? I do not understand.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1