Question

Tassel4 Doesn'T Read My Flapjack-Files - Tassel3 Does, On The Other Hand. Anyone Else Ever Encounter This?

1

Entering edit mode

11.1 years ago

Philipp Bayer 8.4k

Good morning,

I'm currently trying to use TASSEL to generate a Linkage Disequilibrium plot using the ALL-function, but because I have a lot of SNPs I'd like to use the "retainRareAlleles - false" function which apparently only exists in TASSEL4. I'm using a genotype and a map-file generated by the Export function of Flapjack 1.13.03.19.

Here's my XML-file generated by TASSEL 4 (which, by the way, didn't include the runfork1/ part so nothing was ever run :( )

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<TasselPipeline>
    <fork1>
        <flapjack>
            <geno>a.genotype</geno>
            <map>a.map</map>
            <retainRareAlleles>false</retainRareAlleles>
        </flapjack>
        <ld>
            <ldType>All</ldType>
        </ld>
        <td_csv>ld_out.csv</td_csv>
        <ldd>svg
            <ldplotlabels>false</ldplotlabels>
            <o>ld_output.svg</o>
        </ldd>
    </fork1>
    <runfork1/>
</TasselPipeline>

generated by the command:

perl tassel4-standalone/run_pipeline.pl -createXML mycleanconf.xml -fork1 -flapjack -geno a.genotype -map a.map -retainRareAlleles false -ld -ldType All -td_csv ld_out.csv -ldd svg -ldplotlabels false -o ld_output.svg

and the XML is run using:

perl tassel4-standalone/run_pipeline.pl -configFile mycleanconf.xml

Now the problem: When I run the XML with TASSEL4, I get this:

net.maizegenetics.baseplugins.FlapjackLoadPlugin
   net.maizegenetics.baseplugins.LinkageDisequilibriumPlugin
      net.maizegenetics.baseplugins.TableDisplayPlugin
         net.maizegenetics.baseplugins.LinkageDiseqDisplayPlugin
[Thread-2] ERROR net.maizegenetics.baseplugins.FlapjackLoadPlugin - Flapjack files a.genotype and a.map failed to load. Make sure the import options are properly set.

The kicker is that TASSEL3 loads the files without complaining (after I removed the retainRareAlleles line) but has been running for about 5 days now without any result at all.

I know that using the ALL-function (comparing all SNPs against all) rightly takes forever (a sliding window of 50 just takes 1-2 hours) and isn't the best choice when it comes to my ~30,000 SNPs, but I'm still curious as to why TASSEL4 doesn't work here and whether anyone else has ever encountered this?

Edit: Or even better, does anyone know a faster alternative to do a full LD-analysis on such a large dataset? Or any alternative approaches? I'm a bit new to this LD-analysis-thing. Thanks!

gwas ld • 3.2k views

ADD COMMENT • link 10.7 years ago by Philipp Bayer 8.4k

score 0 · Answer 1 · 2013-09-05

Since this post seems to have come up a couple of times when I was googling for problems with TASSEL 3 input, I've decided to put my "tips" on how to get Flapjack-output into TASSEL 3 here. May some other poor soul one day stumble on this information and save some time. Tested with TASSEL Version 3.0 (Build: September 5, 2013) and Flapjack 1.13.03.19.

TASSEL 3 wants "-" for missing alleles, not "" like Flapjack exports. Use for example Python's split("\t") to get a list of the line and then replace all "" by "-".
TASSEL 3 wants all lines to have the same amount of elements, so use for example Python's split("\t") to see whether you have a line too long (Flapjack seems to ignore this)
In the .map file, commas are not allowed (even though Flapjack exports the positions with commas in them). Use vim's "%s/,//g" or similar to remove them.
The first line in the genotype (or dat) file is not allowed to be '# fjFile = GENOTYPE', delete it so that the first line is the list of marker names.

That is all I have so far. It has not escaped my notice that there's probably more! Enjoy.