Predicting Full Length Cdna Sequences From A Fasta Sequence File
1
1
Entering edit mode
11.3 years ago
Prakki Rama ★ 2.7k

Hi,

Can i consider an sequence to be full-length cDNA, if the ORF in the sequence aligns with ~99%-100% identity, with 100% coverage to the reference protein? The species i work on does not have any nearest reference.

Could i please know, if there are any tools/scripts available which can run locally to predict full-length cDNA sequences. I know tools like TargetIdentifier and Full-lengther which can run online, but they eat a lot of time (for running BlastX). (edit: bit confused with target identifier terminology (eg: sense complete and partial), full-lengther does not work any more)

I would also like to appreciate if someone guides me in this issue. Pleae spare if i am not able to put properly.

Thanks in advance.

cdna • 4.0k views
ADD COMMENT
0
Entering edit mode

As far I understood, you want to find out genes in your fasta file? Then glimmer is a good option.

ADD REPLY
0
Entering edit mode

@pappu: Thank you, but is not glimmer is especially designed for microbial sequences? I used augustus to do that in that place which gives out the most possible ORF's.

ADD REPLY
0
Entering edit mode

Did you check out the papers by Gunnar Rätsch on ORF predictions?

ADD REPLY
0
Entering edit mode

you mean mGene? My species does not have Genome either to train the model.

ADD REPLY
1
Entering edit mode
11.2 years ago
Raghul ▴ 200

Hi There is one easy way to get full length predicted sequences with OrfPredictor. But Orf-predictor need blastx result. So find a very closely & well annotated/completely sequenced genomes for your dataset.Collect all the protein sequences & create a database using standalone BLAST (makeblastdb command ). Do blastx with this newly created protein database.Standalone blast will finish in few seconds. Remember to give the parameters -num_alignments 1 & -num_descriptions 1. Uploading the result in Ful-lengther will fetch the result in 30 minutes or so.

This method works when you have complete sequences, if u have partial sequences u get partial CDS (naturally!) I got the results this way & it is fine, also output is both protein & CDS!http://proteomics.ysu.edu/tools/OrfPredictor.html

good luck raghul

ADD COMMENT
0
Entering edit mode

Thank you raghul. I tried a similar apporach using BLAT. It helped me to get the results quickly.

ADD REPLY

Login before adding your answer.

Traffic: 2239 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6