Question

bbmap for FAIRE-seq and other line than reference

0

Entering edit mode

3.1 years ago

boczniak767 ▴ 850

Hi,

I've read the post bbmap and chip, I'm sure that bbmap is ok also for other chromatin-level applications. I'm trying to map some maize lines (DNA from FAIRE experiment) to reference line. So there are problem, that the aligner must cope with a number of indels, snp and so on. I wonder, how could I allow bbmap for this variation. Now, setting mapq=10 filter in samtools I see, that my mapping percentage is only 15-30 ! The maize line in question is somehow different from the reference.

My current code is for i in {38..39}; do ~/bin/bbmap/bbmap.sh in=../reads_bbtrim/hds${i}_clean.fastq.gz ref=Zea_mays.B73_RefGen_v4.dna.toplevel.fa outm=../bbmaped/vs${i}.bam vslow k=8 maxindel=20 ambig=random minratio=0.1 32bit showprogress=100000 -Xmx100g statsfile=../bbmaped/stats${i}vs covstats=../bbmaped/covstats${i}vs ; done

I thought, that very sensitive setting and some parameters from the cited post would be ok. So there are two problems: 1. use bbmap for FAIRE-seq reads, 2. equalize settings for maize line not-very-similar to the reference maize line.

aligner mapping sequencing • 2.1k views

ADD COMMENT • link 3.1 years ago by boczniak767 ▴ 850

0

Entering edit mode

ok, I have to stop my script, it would end in 21 days. It's too long for me.

ADD REPLY • link 3.1 years ago by boczniak767 ▴ 850

0

Entering edit mode

Before making any changes to the default you should simply adjust maxindel= as Brian had suggested in other thread. You should preferably index the maize genome separately if you are going to reuse the index again. Please use threads=N if you have more cores available that will speed things up significantly. This is a tough problem for any aligner (because of ploidy etc) so be patient.

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

Thanks, I've generated index in advance and use all available cores. I'll try adjusting maxindel

ADD REPLY • link 3.1 years ago by boczniak767 ▴ 850

0

Entering edit mode

If you have generated index in advance then use path= instead of ref=. Using ref= will recreate the index each time. I don't see threads= option in your command line so I asked about that. Include it explicitly. Why do you have the word 32-bitin the command line?

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

Thanks, I must misunderstood the meaning of path and ref, in fact index has been recreated each time. I don't use threads because I saw, that bbmap uses all threads by default, I'll add this option, certainly it won't hurt. Most of the time I've also not set memory usage. I've used 32bit because, as written in the documentation:

set to true if you need per base coverage over 64k

In fact I don't know if I need this, I want that coverage is calculated properly. I've read the bbmap documentation but, as you see, not all parameters are clear for me.

ADD REPLY • link 3.1 years ago by boczniak767 ▴ 850

0

Entering edit mode

Ok, maxindel=20 didn't help. Still majority of the reads have mapq<10. Or maybe I'm interpreting this value incorrectly and I should expect that when there are snps and indels?

ADD REPLY • link 3.1 years ago by boczniak767 ▴ 850

0

Entering edit mode

Still majority of the reads have mapq<10

That is because most of your reads are likely multi-mapping. Since maize has a large genome and you have short reads. You could get around that problem by using ambig=random. This will pick a random location to place a read that is otherwise multi-mapping. Decide if that makes sense in your specific use case. maxindel= parameter controls length of opening a gap when aligning two parts of a read. This will generally be determined by how far apart your spliced exons are.

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

I've tried both ambig=random and default setting, which is best and I believe chooses first best match. My full code, is given below. It give mapq=3 for majority of the reads (55%). bbmap.sh in=hds38_clean.fastq.gz path=b73_2020 outm=mapped38.bam maxindel=20 ambig=random threads=24 -Xmx100g showprogress=250000. I think I must read something about mapq values meaning.

ADD REPLY • link 3.1 years ago by boczniak767 ▴ 850

0

Entering edit mode

Take a look at this blog post for mapq values. I don't remember how bbmap implements these. You can take a look at code if you are a programmer. Brian Bushnell (author of BBMap) unfortunately no longer participates on public forums but I will try to email him and ask.

ADD REPLY • link 3.1 years ago by GenoMax 141k

0

Entering edit mode

Ok, I'll look at it. I'm not a programmer. BTW, I wanted to use bbmap mainly because it seems like cohesive package, i.e. can do many operations, e.g. trimming. I also like bbwrap because it allows to use both pe and se reads at the same time - I think it is quite unique capability. bbmap has also good references CIPHER

ADD REPLY • link 3.1 years ago by boczniak767 ▴ 850

0

Entering edit mode

In fact I also have maize genome (ph207) which is closer to the line I work on (s68). Surprisingly, the mapping is even worse than for the best known reference genome (b73). I suppose it could need some experimentation with the alignment parameters. Second thing, I also have resequenced genome of my investigated line (s68). So potentially I could use it somehow for alignment of FAIRE-seq reads from that line (s68). However I don't know, how to integrate info. from resequencing and variant-calling and FAIRE-seq alignment.

ADD REPLY • link 3.1 years ago by boczniak767 ▴ 850