Biostar Beta. Not for public use.
Calculating amount of reads on certain chromosomes in bam file
1
Entering edit mode
13 months ago
caggtaagtat • 620

Hi,

I'm looking for a nice way to calculate the percentage of reads on specifique chromosomes (MT und (1-23,X,Y) und unplaced scaffold).

I know I can get the read counts per chromosome with samtools idxstats, however I have a lot of bam files and I would like to automate the calculation. The problem is, I'm struggling with basic batch text maniplulation and would therefore appreciate any help you can give me (specific or general).

Edit: I forgot to mention, that I have multiple bam files, which are indexed and only consist of uniquely mapped reads.

ADD COMMENTlink
0
Entering edit mode

Do you have some familiarity with R or python?

ADD REPLYlink
0
Entering edit mode

Yes, I always work with R

ADD REPLYlink
1
Entering edit mode

So, can you update your question with where, specifically, you are getting stuck with processing many files with samtools idxstats? Ideally, post any code you have tried.

ADD REPLYlink
0
Entering edit mode

I will try to do it with R now. I just thought, there might be a short command in bash :)

ADD REPLYlink
1
Entering edit mode

You can certainly loop over your files in bash with something like (untested):

for i in `ls *bai`; do j=`echo $i | sed s/\.bai//`; samtools idxstats $i > $j.idxstats; done

I suspect you will still want to "analyze" the data, so R will likely be involved at some point.

ADD REPLYlink
2
Entering edit mode
14 months ago
National Institutes of Health, Bethesda…

General workflow you could try given that you use R.

  1. Look at using dir() to get a list of index files.
  2. Look at using sprintf() to create a command string to run samtools idxstats given a filename (which you have from #1), piping the results to a unique file for each sample.
  3. Look at using system() to run the command string from 2.
  4. Look at using a for loop loop to run #3 over your list of index files from #1.
  5. Define a function to read a single idxstats output file.
  6. Loop using sapply or lapply over all the idxstat output files
  7. Combine output of 6 into a single matrix, with samples in columns and chromosomes in rows.
ADD COMMENTlink
0
Entering edit mode

Thank you, I will do that

ADD REPLYlink
3
Entering edit mode
13 months ago
Belgium

I'd suggest to also having a look at mosdepth and indexcov.

ADD COMMENTlink
1
Entering edit mode

You have the same link for both programs mentioned.

ADD REPLYlink
0
Entering edit mode

Good catch, fixed it. Thanks!

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1