Good morning,
I'm currently trying to use TASSEL to generate a Linkage Disequilibrium plot using the ALL-function, but because I have a lot of SNPs I'd like to use the "retainRareAlleles - false" function which apparently only exists in TASSEL4. I'm using a genotype and a map-file generated by the Export function of Flapjack 1.13.03.19.
Here's my XML-file generated by TASSEL 4 (which, by the way, didn't include the runfork1/ part so nothing was ever run :( )
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<TasselPipeline>
<fork1>
<flapjack>
<geno>a.genotype</geno>
<map>a.map</map>
<retainRareAlleles>false</retainRareAlleles>
</flapjack>
<ld>
<ldType>All</ldType>
</ld>
<td_csv>ld_out.csv</td_csv>
<ldd>svg
<ldplotlabels>false</ldplotlabels>
<o>ld_output.svg</o>
</ldd>
</fork1>
<runfork1/>
</TasselPipeline>
generated by the command:
perl tassel4-standalone/run_pipeline.pl -createXML mycleanconf.xml -fork1 -flapjack -geno a.genotype -map a.map -retainRareAlleles false -ld -ldType All -td_csv ld_out.csv -ldd svg -ldplotlabels false -o ld_output.svg
and the XML is run using:
perl tassel4-standalone/run_pipeline.pl -configFile mycleanconf.xml
Now the problem: When I run the XML with TASSEL4, I get this:
net.maizegenetics.baseplugins.FlapjackLoadPlugin
net.maizegenetics.baseplugins.LinkageDisequilibriumPlugin
net.maizegenetics.baseplugins.TableDisplayPlugin
net.maizegenetics.baseplugins.LinkageDiseqDisplayPlugin
[Thread-2] ERROR net.maizegenetics.baseplugins.FlapjackLoadPlugin - Flapjack files a.genotype and a.map failed to load. Make sure the import options are properly set.
The kicker is that TASSEL3 loads the files without complaining (after I removed the retainRareAlleles line) but has been running for about 5 days now without any result at all.
I know that using the ALL-function (comparing all SNPs against all) rightly takes forever (a sliding window of 50 just takes 1-2 hours) and isn't the best choice when it comes to my ~30,000 SNPs, but I'm still curious as to why TASSEL4 doesn't work here and whether anyone else has ever encountered this?
Edit: Or even better, does anyone know a faster alternative to do a full LD-analysis on such a large dataset? Or any alternative approaches? I'm a bit new to this LD-analysis-thing. Thanks!