Biostar Beta. Not for public use.
Question: Find gene duplications in a draft genome assembly
0
Entering edit mode

Hi everyone, I have recently assembly a draft genome. I consider it has an acceptable quality based on the assessment I made and my biological goals. Before I have the genome assembly ready, I performed a de novo transcriptome assembly of this species and, after some analysis, I found some candidate genes to be duplicated in the genome. My idea to check this by mapping the reads to these transcripts (candidates) and to the genome and the check the positions of those reads that have mapped both. I am a beginner in this area, so I would like to know your ideas and advice about it.

Thanks in advance.

Entering edit mode
0

Sounds like a valid approach indeed.

Could you nonetheless add some details how specifically you wan to do this? eg. which programs to use? param settings?

ADD REPLYlink 20 months ago
lieven.sterck
5.1k
Entering edit mode
1

Thanks for your reply.

I have both short and long reads but I think it would be better to use short ones in this case. So I would map reads to transcripts (candidates genes) with bowtie2, then filter the mapped reads and map them to the genome. After that, I will need to find primary and secondary alignments of each read (if exists) and compare the transcripts to these regions. I have not thought about param settings but I have lost the pairing information after the first map of reads to candidate genes (only one read of the pair map). I need to make some tests to define the params.

ADD REPLYlink 20 months ago
niconm89
• 10
Entering edit mode
0

Did you check for duplicated contigs in the genome assembly? Specifically, did you check if the candidate duplicated transcripts fall into truly unique contigs?

ADD REPLYlink 19 months ago
h.mon
25k
Entering edit mode
0

I do not think I have but is it possible to check it easily? Because I do not have too many contigs (~300) and are longer than the duplications I am looking for...I carried out a hybrid approach in the assembly so I would think that all the duplicated contigs were merged. What do you think? I tried to map the candidated transcripts to the genome contigs with GMAP, but I am not sure how to analyze the splicing alignments of each transcript. A duplication should be noticed by the primary and secondary aligments that would be similar, right? But in this case, I could (probably) have different alignments given the alternative splicing.

Thanks for your advice!

ADD REPLYlink 19 months ago
niconm89
• 10

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0