Do you agree with me on having as few as possible transcript with putative framshifts can be considered one of the quality factors during de novo transcriptome assembly? Could you please share your experience about this issue and introduce your way (tool) to detect them on assembled transcriptome? Any feedback warmly welcomed.
ADD COMMENT
• link
updated 22 months ago by
Ram
43k
•
written 9.1 years ago by
seta
★
1.9k
0
Entering edit mode
Did you check my suggestion to your previous question? What about an example? If you get comments or suggestions you should follow them up before asking approximately the same question again.
Yeah, I run blastx for one of the transcripts with putative frameshifts against nr database and check the first 50 hit, all of them were at frame -2, so there sounds no frameshifte, am I right? For this reason, I would like to check this issue using another tool, any suggestion?
Looks like the tool you used has a lot false positives, I wouldn't trust it too much. Instead we could try to automate the blast method. Run blastx on all transcripts.
Retain all best hits or all hits with score > some threshold
Retain all hits with more than one HSP
Retain all hits where at least one HSP has different frame from the others
All queries that pass these filters are frame shift candidates
This can be implemented easily using BioPerl or BioPython.
Here is an example in BioPerl that you can adjust for your needs.
Output:
./filterBlastFrameShift.pl ~/Downloads/JR132X8F11N-Alignment.txt
lcl|Query_197237 has 6 hsps with the following frames: (*: e < 1e-10)
1*-3-3-2-23
Query gi|23274247|gb|BC035912.1| has 0 out of 1 hits with frameshifts
lcl|Query_197237 has 5 hsps with the following frames: (*: e < 1e-10)
3*-1-3-32
Query query_1_no_frameshift has 0 out of 1 hits with frameshifts
lcl|Query_197237 has 5 hsps with the following frames: (*: e < 1e-10)
3*-1-3-32
Query query_2_insert_no_frameshift has 0 out of 1 hits with frameshifts
lcl|Query_197237 has 6 hsps with the following frames: (*: e < 1e-10)
1*3*-1-32-1
frame mismatch 0 vs. 2
Query query_2_insert_with_frameshift has 1 out of 1 hits with frameshifts
And the blast example output with fabricated frame shift:
Yeah I agree with you on tool. Since doing blastx is really time consuming, I'm looking for another tool to retrieve some information instead of using blastx. However, many thanks for your suggestion, could you please share your Bioperl or Biopython script to evaluate them?
ADD REPLY
• link
updated 23 months ago by
Ram
43k
•
written 9.1 years ago by
seta
★
1.9k
0
Entering edit mode
I don't have such a script, it would be easy to write, given the spi documentation but not on the iPad, sorry you'll have to wait.
BTW, as you are located in Sweden, you might have access to SweGrid or SweHPC to run the computations, see here http://www.snic.vr.se/, there is a similar infrastructure here in Norway (even though, I usually run blastX on a single server with 90 threads/40 cores, took ~1 week for 40k transcripts if I remember correctly)
Thanks so much for sharing your experience. I access to a server with 140 GB of RAM and 32 core, which concern me a bit about blastx. Is there any command to evaluate the required time to finish blastx job?
ADD REPLY
• link
updated 22 months ago by
Ram
43k
•
written 9.0 years ago by
seta
★
1.9k
0
Entering edit mode
Many thanks for your script. Come back to the post and reply it is really kind of you
Did you check my suggestion to your previous question? What about an example? If you get comments or suggestions you should follow them up before asking approximately the same question again.
Yeah, I run blastx for one of the transcripts with putative frameshifts against nr database and check the first 50 hit, all of them were at frame -2, so there sounds no frameshifte, am I right? For this reason, I would like to check this issue using another tool, any suggestion?