BagFoot TF foot printing
3
1
Entering edit mode
6.5 years ago
rbronste ▴ 420

Wondering if anything has been able to successfully run this TF foot printing package outlined in the following paper?

http://www.cell.com/cell-reports/fulltext/S2211-1247(17)30609-5

R bagfoot bagfootr ChIP-Seq • 8.5k views
ADD COMMENT
0
Entering edit mode

Hi Ricardo,

Did you publish the comparison of TF footprinting methods, particularly related to CENTIPEDE's performance? Or do you happen to know of any other published work on this?

Thank you! Duygu

ADD REPLY
0
Entering edit mode

I've managed part of the way using my own data, but I'm stuck on generating the hexamer table for the hotspot regions. The relevant part of the demo code is:

gfootoption_fed_fasted <- 
    GFootOption(biasfile = "example_data/Hexamer_fed_fasted_merged_withMap.csv",  
        hexamer_pattern= "example_data/nuccode/nuccode_mm9_6mer_{chr}.mm9.dat")

Moreover, I don't see anything in the package that would generate this file: "example_data/Hexamer_fed_fasted_merged_withMap.csv".

Perhaps this is just concatenating the hexamer files, but this would not produce as csv file.

It's a nice paper, and so close to being a useful too...

As far as other tools, I've found HINT to be useful, and it does bias correction for ATAC-seq data. The package authors are very responsive as well.

ADD REPLY
0
Entering edit mode

Hi rbronste,
I have managed it to work recently after painful 5 days on debugging, I noticed this post is 15 months ago, So, if you still need some help, please email me at hughiez047@gmail.com and I will provide you my detailed records :-)
Hope this helps, best wishes!

ADD REPLY
4
Entering edit mode
5.9 years ago
Ricardo A. ▴ 90

EDIT: I called victory too soon... It took me two full days to figure out why this package would not run on our data. It was the first time I've ever had to really use the debugging functions in RStudio (thanks, Hadley!). After extensive tinkering and innumerous tracebacks, I found that the problem was one the input files generated out of of macs_output2csv.R had a duplicated column. This fixed the package on my end, but I have a lab mate that is trying to use it and currently running into a different issue.

Bottom line: make absolutely sure that your input matches exactly the example files down to the minimum details. Pay specific attention to the .meme files that you want use as input. They must have all the fields in the example files (i.e. alength=, w= , nsites=, and E=) down to the space between the equal sign and the value (this will change depending on your MEME version, so be aware).


I managed to get it working in here (v0.9.6). You need to run the steps in bagfoot_prep_example.R using your own BAM files and then use the output in bagfoot_run_example.R, changing the necessary file paths, of course. The main issue I found is that parallel processing in the run() function used in the run script is broken for unknown reasons, so you have to specify mc.cores=1. Other than that, the necessary scripts seem to working as expected, without having to dig too much into the code.

On a separate note, if you only want to do TF binding predictions on a single sample, you might want to use some other software. My understanding is that BaGFoot is made to provide a framework for comparing footprints between two (or more?) datasets. Gordon Hager's lab already have a footprinting method called DNase2TF, which was written by the same guy (Songjoon Baek). Part of my thesis work is comparing TF binding prediction methods, and so far CENTIPEDE is very solid compared to what's out there - that would be my recommendation. We used it for a couple of recent papers (see Varshney et al, 2017 PNAS, Tomonori & Albanus et al 2018 Sci Reports).

Hope this helps.

Cheers

ADD COMMENT
0
Entering edit mode

Hi,Ricardo

How do you get all the meme files for runFIMOandMakeMotifList.R?

ADD REPLY
1
Entering edit mode

It looks as though the author posted meme files on sourceforge, but you can also do something like the following.

I used a motifDB object that contains the homer motifs. This is available in the motifbreakR package. Then, I export motifs as meme format, using motifDB.

library(motifbreakR)
data(homer)
meme.text = MotifDb::export(homer, "homer.meme", format= 'meme')

Then for bagfoot:

 motifDBhomer_bagfoot = MotifDB(motiflistfile = "homer.meme") 
# use motifDB=motifDBhomer_bagfoot as argument in GFoot function call.

Good luck!!!

ADD REPLY
0
Entering edit mode

Hi guys,

thank you very much for the meme motif solution, Ricardo! I am very close to run bagfoot... there is still on step I can not get through: how to merge the 2 Hexamer bias frequency tables (that I already produced)? Do I have to calculate a mean of "correctionFactor"? Add up some numbers? I am completly newbie to ATACseq. Thanks a lot in advance,

Mathilde

[UPDATE] In case someone is interested, I think I found the answer: I just add the observed cuts in the 2 files (2nd column), and I recalculated the different ratio (found in bagfoot.R code)

ADD REPLY
0
Entering edit mode

Hi Mathilde,

Could you please provide more details on how to concatenate the two Hexamer tables? I'm still confused. What's more, do you find any difference when generating the hexamer bias frequency table with or without mappability? Thanks a lot in advance!

Zihan

ADD REPLY
0
Entering edit mode

Hi Zihan,

I generated the hexamer tables using the mappability only. The output of bagfoot is nice and makes sense with the ATACseq data I used. Regarding the merging of the tables, basically for each hexamer you have to add up the observed cuts (2nd column) and then recalculate the ratio. Be careful to not use the hexamer "other" when you calculate the total of observed cuts and the total of genomic positioin count. Here a peace of my code to merge the Hexamer tables (in Java), let me know if you have question

double genPosSum=0;
    double cutSum=0;
    for (String hexamer: hexamerToGenomicPosCount.keySet()){
        if (!hexamer.contains("other")) {
            cutSum += hexamerToObCuts1.get(hexamer) + hexamerToObCuts2.get(hexamer);
            genPosSum +=hexamerToGenomicPosCount.get(hexamer);
        }
    }

    BufferedWriter out = new BufferedWriter(new FileWriter(outFile));
    out.write(header+Consts.ls);
    for (String hexamer: hexamerToGenomicPosCount.keySet()){
        out.write(hexamer);
        int sum =hexamerToObCuts1.get(hexamer) + hexamerToObCuts2.get(hexamer);
        out.write(" "+sum+" "+hexamerToGenomicPosCount.get(hexamer));
        double ObCutRatio = (double) sum / cutSum;
        double GPRatio = (double)hexamerToGenomicPosCount.get(hexamer)/genPosSum;
        double corrFactor=GPRatio/ObCutRatio;
        if (!hexamer.contains("other")) {
            out.write(" " + String.format("%.12f",ObCutRatio) + " " + String.format("%.12f",GPRatio) + " " + String.format("%.12f",corrFactor) + Consts.ls);
        }else{
            out.write(" NA NA 1"+Consts.ls);
        }
    }
    out.close();

best, Mathilde

ADD REPLY
0
Entering edit mode

Hi Mathilde,

Thanks a lot for the quick reply and code. One more question, I'm not sure where to download the mappability file that ends with .out format as mentioned in the bagfoot_prep_example.R. The only website that I found is this: http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/M.musculus/, but it comes from a different package called PeakSeq. Should I generate the .out files by myself? If so, how to do that? Thank you so much!

Zihan.

ADD REPLY
0
Entering edit mode

Hi Zihan,

yes you have to make them, it took me one entire week on my server... I did hg19. Here some tips to make them:

MAPPABILITY files: - download all .fa files for each chr (hg19 genome): http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/ - download mappability scripts: http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/Code/ -> Make the three executables chr2hash, oligoFindPLFFile, mergeOligoCounts, using the Makefile -> In the directory containing the fa files, do: python compile.py 35 (35 for 35mer) * VERY LONG: 1 WEEK! **

I have one question for you, now I would like to use a more recent motif database. I got the meme files, but I dont know how to get/generate the related meme_index.file. Would you know by chance or did you use only the meme files provided by Bagfoot? Thanks! Mathilde

ADD REPLY
0
Entering edit mode

Hi Mathilde,

Thanks for the tips, very useful. For the meme files, I download it from JASPAR: http://jaspar.genereg.net/search?q=Mus+musculus&collection=CORE&tax_group=vertebrates&tax_id=all&type=all&class=all&family=all&version=all This is the example for mouse, you can do the same for human genome. At the bottom left of this page, you can see a csv button, I download this csv file as my index file. I guess you just need to change the row names of this csv file as the meme_index files that the author provided. Hope this helps.

Zihan

ADD REPLY
0
Entering edit mode

Thanks a lot Zihan! Since I want to use other DBs, I will have to make my own script to make this file. That's OK, I will just skip the description fields for now. All the best, Mathilde

ADD REPLY
0
Entering edit mode

Hi Mathilde,
Does your index file function normally if you just skip the description field?

ADD REPLY
0
Entering edit mode

Hi Hughie, yes it does!

ADD REPLY
0
Entering edit mode

Thanks for reply,Mathilde,
I exported meme file from MotifDb of JASPAR2018 and I downloaded index file from JASPAR website, the motif number and rank is a little different and I failed some of the motifs (Actually I indeed got some Fimo results) when runFIMOandMakeMotifList.R. So, I want to ask whether you re-ranked your index file to coordinate with your meme file? Thank you!

ADD REPLY
0
Entering edit mode

I have re-ranked my index file to fit .meme file and got normal result, Thank you!

ADD REPLY
0
Entering edit mode

First of all many thanks to Mathilde, Zihan and other contributors to this really helpful discussion, which I essentially used as a manual for using BaGFoot.

I have a follow-up question. Did anyone figure out how to calculate p- and q-values for each TF as the authors did in Fig 3D (https://ars.els-cdn.com/content/image/1-s2.0-S2211124717306095-gr3.jpg) and other figures? I see a text on that in the supplementary material, but the complex statistics that they have used is hard for me to understand. I wonder if a piece of code for calculating p- and q-values and plotting them on Bagplot already exists and if anyone is willing to share that!

I would highly appreciate any help. Thank you.

Best wishes, Ravi

ADD REPLY
1
Entering edit mode

Hi Ravi,
I have sucessfully used this package before and I may give you some suggestions.
The default result of bagfoot is without p-value / q-value, however, the code for statistical analysis exists in original bagfoot code. Just use gen_bagplot_chisq() instead of gen_bagplot() .......

ADD REPLY
0
Entering edit mode

Perfect! Thanks Hughie!

ADD REPLY
2
Entering edit mode
6.5 years ago
Ian 6.0k

I spent about a week (off and on) and gave up. A colleague of mine has apparently managed to get usable output; after tweaking the R code. I don't like saying bad things about peoples efforts, but the documentation was atrocious.

EDITED (March 2019):

Possible alternative: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6099720/

DAStk: https://biof-git.colorado.edu/dowelllab/DAStk

ADD COMMENT
2
Entering edit mode
6.5 years ago

Not sure how well it compares, but I found http://pythonhosted.org/pyDNase/ to be very straightforward to run.

ADD COMMENT
0
Entering edit mode

Yes Ive seen this one before however I think its more tuned towards DNase-seq data and I am working with ATAC-seq.

ADD REPLY
1
Entering edit mode

It does have support for ATAC-seq last I checked. Maybe not in the official release, might be in a branch in the github repo.

ADD REPLY
1
Entering edit mode

As Simon said it does have support for ATAC-seq.

ADD REPLY

Login before adding your answer.

Traffic: 1500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6