Question

Genotyping by sequencing in Arundo donax

1

Entering edit mode

8.5 years ago

fvalli84 ▴ 20

Hi all,

I’m working with Arundo donax a species characterized by a polyploid genome, doesn’t produce viable seeds and the genetic variability is very low. In order to increase the genetic variability I produced almost 1,000 independent mutants using gamma ray and fast neutron as source of irradiation.

What we would like to do is something like a molecular fingerprinting of each mutant using ILLUMINA technology. I did a genotyping by sequencing and de novo using STACKS produced consensus tags (this species lacks of a reference sequence and is polyploid).

So, my question is if anyone has any hint about how to use the GBS data for characterizing mutants.

I was thinking to evaluate for example the number of reads that align with a given locus against a genome of reference of a related specie, and see if the difference in number we have among mutants, could be attributed to deletions caused by the mutagenic treatment.

I really appreciate any help!

next-gen alignment blast • 2.8k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.5 years ago by fvalli84 ▴ 20

1

Entering edit mode

Interesting problem. But some vital IMHO data is missing:

How large is the genome and what is the level of polyploidy?
Do you sequence with enough coverage to discover any of the changes?
What are the expected mutations & mutation rates caused by the radiations you have used? Similar to ones in rice: http://www.ncbi.nlm.nih.gov/pubmed/20154423
How similar on the nucleotide level is the related species genome? Obviously for that you have to have some known genes / sequenced BACs from Arundo, or any not highly repetitive contigs (if you can assemble any) from your NGS data.
Any repeat library for other species/Arundo?

Because without this it will be quite hard to guess what to expect.

Edit: spell

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.5 years ago by Darked89 4.6k

Ram · Answer 1 · 2015-10-15

The genome size is 1C = 2.744 pg, and it should be a pseudo triploid (the ploidy level is not defined yet, could be also an hexaploid)

The coverage is around 5x.

I don't know the expected mutation rates with gamma ray in Arundo, and in literature is quite variable depending on the characacteristic of the species mutagenized

We have tried to align the reads against the sequence of Setaria italica, but the percentage of reads aligned was only 1.8%, so, since the transcriptome sequence of Arundo donax is available I'm thinking to use is as the reference.

Unfortunately no other genomic information are available for A. donax or other species of the same genus.

Ram · Answer 2 · 2015-10-15

re mapping genomic DNA to transcriptome: I would try LAST because with any read spanning the intron-exon border more mainstream mappers I believe will reject the mapping because of the mismatch (intronic sequence from the read vs next exon in your transcript). LAST, given reasonably long exon-exon match should accept it and truncate your read. I have not done it myself in this exact scenario, but mapped RNASeq with trans-splicing leader to a genome with LAST. Close enough I hope.

re mapping to close genome: in a typical scenario the mapper choice is crucial. You need something being able to accept/report mappings with higher mismatch rates, but still not going overboard and placing almost every read anywhere. Check out again LAST and GEM

Also because the taxonomies are still not based on sequence similarity, I would go and get all available (just 5) genomes from the same PACMAD clade: http://www.ncbi.nlm.nih.gov/genome/?term=txid147370[Organism:exp]

Only maize genome is of comparable size to Arundo, I think. Pick the one FASTQ with the best quality values from your data set, map to all 5 genomes with at least 2 mappers listed above. Assuming you can get the soft masked genome sequences for these 5 genomes, repeat. Hopefully, you will map more than 2% of your reads, but obviously I can not guarantee it.

very long shot (very drafty genome assembly): if you got 5x for each of your individual mutants, and mutations are rare, you may pull all this data together, preferably after getting some $$$ for a PacBio of the unmutated strain, and see what comes out of this. Even if just a shattered mitochondrial and plastid sequences plus a big swarm of pathetically sized contigs, you can map back your individual samples to this, and maybe get some idea about differences in the coverage. Then cluster your mutants based on this (like: sy 0.5M contigs RPKMs /sample ), and check if there are any patterns (assuming deletions).

Hope it helps.