Biostar Beta. Not for public use.
Orthofinder software problem,
0
Entering edit mode
16 months ago

Dear all, I have two question about orthorfinder software.

one question is that I have 3 species (A,B,C), each species select 4 timepoint for RNA sequencing. A and B species don't have reference genome, C species has a reference genome. Now, after all the analysis and differential expression analysis within each species, I want to find the genes exist in A,B species, but don't exist in C species, will orthofinder works for that?

Another question is that if orthofinder can help find that kind of genes, What will be the input file for orthofinder? My prediction is that for no-reference genome species A,B, after Trinity assembly to produce a fasta file, I can use TRANSDECODER to covert this fasta file to protein sequence file, then use the protein sequence as the input file for orthofinder. (But I also did the uniprot annotation, I am not sure if it is better to download the protein sequence from uniprot?)

I will be pretty appreciated if you could help with that or give me some suggestions!

ADD COMMENTlink
2
Entering edit mode
5 months ago
VIB, Ghent, Belgium

Yes, OrthoFinder will work for this indeed.

And yes again, the approach you describe (trinity + transdecoder ) is a correct approach to include the A,B species in your OrthoFinder analysis. One thing to keep i mind is that you are mixing up transcriptomics with genomics data which can (and will) cause some issue in the interpretation of the results. Genes can be missing from your transcriptomics dataset for instance

ADD COMMENTlink
1
Entering edit mode

mmh, I have sort of a stronger opinion on the - in my opinion - flawed approach to start with genome/transcriptome mix, but I guess you're right in the sense that technically it will "work".

You can't say whether a not measured gene is absent or just not expressed in your experimental condition. That means, you might not find the correct ortholog but define a paralog with modified function the ortholog. Given to usually high number of genes not expressed at any given condition, I wouln't want to evaluate those results

ADD REPLYlink
0
Entering edit mode

correct indeed and I have the same reservation on it.

But if this is the data you have to work with there is not much else you can do (I guess quickly sequence/assemble/annotate A and B will not be an option ;) ) except having the 'knowledge' it's suboptimal.

OP is also looking for genes in A,B that are not in C , so in that sense it is not that all bad (the reverse, looking for things in C that are not in A or B is even much more tricky )

ADD REPLYlink
0
Entering edit mode

agreed, you certainly showed OP options while I focused on the limitations (a bit onesided...)

ADD REPLYlink
0
Entering edit mode

Hi, Thank you! but the orthofinder need the protein sequence as the input file, so I need to get protein sequence for each species (fasta format), will it also cause issue for the interpretation of the results..?

ADD REPLYlink
0
Entering edit mode

indeed, orthofinder needs protein input, but that's what TransDecoder will give you. For the genome I bluntly assumed that you had an annotation, if that's not the case, you're not there yet unfortunately .

An inaccurate annotation or protein set will have its influence on the results yes, but that's something you will have to deal with, however under normal circumstance TransDecoder will do a decent job for this.

ADD REPLYlink
0
Entering edit mode

I will try it! Thank you and have a great day!

ADD REPLYlink
0
Entering edit mode

Hi, sorry to disturb but I find a new problem for input file of orthofinder...for the RNA sequencing data without reference genome, do I need to do cd-hit first before conduct the transdecoder (or if I need to do transdecoder first then cd-hit..?)

Another question is that Transdecoder actually has two step, the first one is TransDecoder.LongOrfs (output is " longest_orfs.pep "), the next step is TransDecoder.Predict (output is " transcripts.fasta.transdecoder.pep "), I assume that "transcripts.fasta.transdecoder.pep" is actually the final output and should be used as the input for orthofinder...? I am not sure if my opion is correct and any suggestion will be pretty appreciated!

ADD REPLYlink
0
Entering edit mode

hmm, 'raw' RNAseq data will not be suited to work with for this purpose, you will need to do a transcriptome assembly first then I'm afraid. The output of Transdecoder should be non-redundant as in that shorter ORFs on the same transcript/reading frame will be removed to only retain the longest one

Yes, it's the transcripts.fasta.transdecoder.pep file that you will need to use for orthofinder

ADD REPLYlink
1
Entering edit mode

Thanks for your reply! yes, I have already assembled a fasta transcriptome, I did " CD-HIT" to remove redundant, then I conduct Transdecoder. But in my case, I actually don't need to do " CD-HIT" right..? Because Transdecoder can only retain the logest one. So I think it means I can use the Trinity assembled fasta directly for Transdecoder, then use transcripts.fasta.transdecoder.pep for orthofinder, without any redundant removement using "CD-HIT"....

I am not sure if my thought is correct, and looking forward to your suggestion! Thank you!

ADD REPLYlink
0
Entering edit mode

ah, yes indeed , your approach looks sound indeed. On the other hand, it likely won't hurt to include the CD-HIT step anyway (on the transcript level that is, before running transdecoder)

good luck

ADD REPLYlink
0
Entering edit mode

Great! Thank you and have a great day!! :)

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1