I am analyzing ChIP-Seq data of Plasmodium falciparum ( which is well known for ~80% AT , ~20 % GC) The reads are 75 bp paired-end reads and were mapped to genome using Bowtie (1&2), and BWA. I am getting a very low percentage of alignment with less than 10% for the sample of our interest, but as per the FastQC report the data quality seems to be good, though it complaints about few other like duplicates, GC content which I suppose is normal in this genome which is AT biased.
I tried to BLAST the sequence and majority of the query matches many E.coli, and many other bacterial sequences, though the post-doc who performed the assay says they never used Plasmid in the pipeline!
I welcome any suggestion on how else could we improve the alignments or troubleshoot this.
PS: Control sample read alignment was 50% as against the treated one. The sample is blood cells infected with P. falciparum, so no other sources of other genomic contamination too. This is the second time we are repeating the ChIP-Seq and last time the alignment was around 22%. :(. I just read about the GEM Mappability tool and planning to try it.
Update: Mapping to host(human) didn't turn out fruitful. But the blast results are strong implying E.coli. So we are mapping with E.coli now. Just curious, has anyone handled low complexity libraries like Plasmodium falciparum? We would like some advice from you, as we feel that should be the problem now, as this is our first time with ChIP-Seq and this is the first time for the facility that did the experiment to handle a AT rich genome! Thanks
Could some of the samples have high amounts of host DNA carry-over? Try aligning against the host species as well.
Thanks Ryan, I will try it. We did try for the previous data, but no luck!