Tool to detecte transcript with putative framshifts in the de novo assembled transcriptome
1
1
Entering edit mode
9.1 years ago
seta ★ 1.9k

Hi everybody,

Do you agree with me on having as few as possible transcript with putative framshifts can be considered one of the quality factors during de novo transcriptome assembly? Could you please share your experience about this issue and introduce your way (tool) to detect them on assembled transcriptome? Any feedback warmly welcomed.

Assembly RNA-Seq frame-shift blast alignment • 2.5k views
ADD COMMENT
0
Entering edit mode

Did you check my suggestion to your previous question? What about an example? If you get comments or suggestions you should follow them up before asking approximately the same question again.

ADD REPLY
0
Entering edit mode

Yeah, I run blastx for one of the transcripts with putative frameshifts against nr database and check the first 50 hit, all of them were at frame -2, so there sounds no frameshifte, am I right? For this reason, I would like to check this issue using another tool, any suggestion?

ADD REPLY
2
Entering edit mode
9.1 years ago
Michael 54k

Looks like the tool you used has a lot false positives, I wouldn't trust it too much. Instead we could try to automate the blast method. Run blastx on all transcripts.

  • Retain all best hits or all hits with score > some threshold
  • Retain all hits with more than one HSP
  • Retain all hits where at least one HSP has different frame from the others
  • All queries that pass these filters are frame shift candidates

This can be implemented easily using BioPerl or BioPython.

Here is an example in BioPerl that you can adjust for your needs.

Output:

 ./filterBlastFrameShift.pl ~/Downloads/JR132X8F11N-Alignment.txt 
lcl|Query_197237 has 6 hsps with the following frames: (*: e < 1e-10)
1*-3-3-2-23
Query gi|23274247|gb|BC035912.1| has 0 out of 1 hits with frameshifts
lcl|Query_197237 has 5 hsps with the following frames: (*: e < 1e-10)
3*-1-3-32
Query query_1_no_frameshift has 0 out of 1 hits with frameshifts
lcl|Query_197237 has 5 hsps with the following frames: (*: e < 1e-10)
3*-1-3-32
Query query_2_insert_no_frameshift has 0 out of 1 hits with frameshifts
lcl|Query_197237 has 6 hsps with the following frames: (*: e < 1e-10)
1*3*-1-32-1
frame mismatch 0 vs. 2
Query query_2_insert_with_frameshift has 1 out of 1 hits with frameshifts

And the blast example output with fabricated frame shift:

ADD COMMENT
0
Entering edit mode

Yeah I agree with you on tool. Since doing blastx is really time consuming, I'm looking for another tool to retrieve some information instead of using blastx. However, many thanks for your suggestion, could you please share your Bioperl or Biopython script to evaluate them?

ADD REPLY
0
Entering edit mode

I don't have such a script, it would be easy to write, given the spi documentation but not on the iPad, sorry you'll have to wait.

ADD REPLY
0
Entering edit mode

I have added an example for you to check.

ADD REPLY
0
Entering edit mode

BTW, as you are located in Sweden, you might have access to SweGrid or SweHPC to run the computations, see here http://www.snic.vr.se/, there is a similar infrastructure here in Norway (even though, I usually run blastX on a single server with 90 threads/40 cores, took ~1 week for 40k transcripts if I remember correctly)

ADD REPLY
0
Entering edit mode

Thanks so much for sharing your experience. I access to a server with 140 GB of RAM and 32 core, which concern me a bit about blastx. Is there any command to evaluate the required time to finish blastx job?

ADD REPLY
0
Entering edit mode

Many thanks for your script. Come back to the post and reply it is really kind of you

ADD REPLY

Login before adding your answer.

Traffic: 2127 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6