Entering edit mode
9.7 years ago
halima.loulou
•
0
I have a sequence that have homologous in some species and the score of this homologue.. ex: this is a record from the gff file:
4592637 Beutenbergia_cavernae_DSM_12333 TILL 70731 70780 . 0 . clst_id=429;SubjectOrganism=Thermofilum_pendens_Hrk_5;SubjectScore=0.343373493975904;SubjectOrganism=Ignicoccus_hospitalis_KIN4_I;SubjectScore=0.323293172690763;SubjectOrganism=Burkholderia_pseudomallei_MSHR346;SubjectScore=0.343373493975904;SubjectOrganism=Burkholderia_mallei_SAVP1;SubjectScore=0.343373493975904;SubjectOrganism=Enterobacter_638;SubjectScore=0.343373493975904;SubjectOrganism=Rickettsia_felis_URRWXCal2;SubjectScore=0.343373493975904;SubjectOrganism=Gemmatimonas_aurantiaca_T_27;SubjectScore=0.343373493975904;SubjectOrganism=Streptomyces_coelicolor;SubjectScore=0.363453815261044;SubjectOrganism=Beutenbergia_cavernae_DSM_12333;SubjectScore=1;SubjectOrganism=Kocuria_rhizophila_DC2201;SubjectScore=0.343373493975904;SubjectOrganism=Rhodococcus_jostii_RHA1;SubjectScore=0.383534136546185;SubjectOrganism=Symbiobacterium_thermophilum_IAM14863;SubjectScore=0.363453815261044;
where:
- 4592637 => NAPP(Nucleic Acid Phylogenetic Profiling database) id of sequence (not genbank id)
- Beutenbergia_cavernae_DSM_12333 => specie name of sequence
- TILL => type of sequence
- 70731 .. 70780 => start and end of sequence
- clst_id=429 => is the id of cluster of this sequence
- SubjectOrganism => name of specie that sequence has homologues with it
- SubjectScore => score of homologues of sequence with this species (Blastn score)
I want to extract the sequence from the SubjectOrganism
where the sequence(4592637) make similarity.
How can I extract the sequence from genome where a sequence has homologues in biopython???
Your file does not contain information on where the sequences align in the other organisms. So how would one extract that?
I want to extract the sequence with the highest score...
Like I said, your file does not contain information on what the alignment is so you cannot extract the sequence because the sequence is not present in the file.
It is possible that what you want is actually something else but you just call it the "sequence".
I try to blast the sequence with the entire genome of organism and choose the sequence that has highest score