- - - - - H E A D E R - - - - - - - - - - - - - - - - -

Question

Quick help: How can I find and replace a specific nucleotide within a gene sequence?

0

Entering edit mode

6.3 years ago

mokunf • 0

So I have a

usr/bin/python
from collections import defaultdict
import re, sys, random
from Bio import SeqI

- - - - - H E A D E R - - - - - - - - - - - - - - - - -

Objectives:
1. Read in a sequence
2. Find a specific segment of that sequence
3. Change a letter (mutation)
4. Output the sequence with the mutation

- - - - - U S E R V A R I A B L E S - - - - - - - -

mssg = " Search and Destroy"
genFile  = 'P1.txt'
inFile   = 'P1.txt'
inFolder = '.'
site   = ""  # what the mutation is 
outFile  = "Project1-Out.txt"
GenSeqs = defaultdict(lambda: "my own unknown" )
CT        = defaultdict(lambda: 'nada')
GeneSeqs   = defaultdict(lambda: 'nada')
CX  = defaultdict(lambda: 0)
count = defaultdict(lambda: 0)
NTs    = ['a', 't', 'g', 'c']
afreq        = defaultdict(lambda: -1.1 )
tfreq        = defaultdict(lambda: -1.1 )
gfreq        = defaultdict(lambda: -1.1 )
cfreq       = defaultdict(lambda: -1.1 )
seq = 'P1.txt'

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - M A I N - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

print("\n\n", mssg, ". . . . ")

IN1 = open( "P1.txt", 'r')
number = 1
for line in IN1:
    if (re.match('>', line)):
        header = line.rstrip()   # remove right white space
    else:
        GenSeqs[header] = line.rstrip()   # Dict[key] = value, key = header, value = sequence
        number += 0

print("There are %d gene sequences in file %s" % (number, inFile))


for records in SeqIO.parse ('P1.txt','fasta'):
    if 'a' in records.seq:
        print (records)

RNA-Seq gene sequence genome • 2.0k views

ADD COMMENT • link 6.3 years ago by mokunf • 0

5

Entering edit mode

To answer your question, you can use CRISPR.

ADD REPLY • link 6.3 years ago by theobroma22 ★ 1.2k

1

Entering edit mode

I prefer CRISPR with vim bindings.

ADD REPLY • link 6.3 years ago by kloetzl ★ 1.1k

2

Entering edit mode

I'm a big fan of cRispr, crispy, and crispl

ADD REPLY • link 6.3 years ago by WouterDeCoster 47k

1

Entering edit mode

Format your code properly so we can easily read it and be more specific with your question and you'll get a lot more help.

ADD REPLY • link 6.3 years ago by jared.andrews07 ★ 16k

0

Entering edit mode

I have tidied it a fair bit to how [I believe..] it should look

ADD REPLY • link 6.3 years ago by Kevin Blighe 87k

1

Entering edit mode

Now we need to figure out what the actual question is. So much for quick help...

ADD REPLY • link 6.3 years ago by WouterDeCoster 47k

0

Entering edit mode

It does not appear that you are searching for a pattern but replacing in a specific location. Do you need to use a python program for this (unless this is an assignment).

ADD REPLY • link 6.3 years ago by GenoMax 141k

GenoMax · Answer 1 · 2018-01-16

I apologize for not responding sooner. I'm not sure how to enter the script that I've created so far. Here it is again, with the sequence that i'm trying to analyze. Below the code is the sequence that I'm focusing on. I've installed Biopython and it seems to be helpful in my analysis. I could be mistaken as I am still new to this. Any help that's offered is greatly appreciated.

#!/usr/bin/python
 input_file = open('P1.txt', 'r')
OUT = open('P1-Out.txt','w') 
OUT.write('Gene_Name\tA\tC\tG\tT\tTotal_Length\tCG%\n') 
from Bio import SeqIO
from Bio import IUPAC

 print ("......... START........... ")

for cur_record in SeqIO.parse(input_file, "fasta") :
#count nucleotides in this record...
    gene_name = cur_record.name 
   A_count = cur_record.seq.count('a') 
   C_count = cur_record.seq.count('c') 
   G_count = cur_record.seq.count('g') 
   T_count = cur_record.seq.count('t') 
   length = len(cur_record.seq) 
   cg_percentage = 100 * float(C_count + G_count) / length

print ("A_Count %d" % A_count)
print ("C_Count %d" % C_count)
print ("G_Count %d" % G_count)
print ("T_Count %d" % T_count)
print ("CG_Percentage %d" % cg_percentage)

cur_seq = SeqIO('input_file', IUPAC.unambiguous_gene)
cur_seq.alphabet

counter = 0
for cur_record in SeqIO.parse(input_file, "fasta"):
counter+= cur_record.seq.count ('a')


print ("............. FINISH .............")
OUT.write('%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % (gene_name, A_count, C_count, G_count, T_count, length, 
       cg_percentage))
 OUT.close()

Example sequence

>reverse translation of sp|P68871|HBB_HUMAN Hemoglobin subunit beta OS=Homo sapiens GN=HBB PE=1 SV=2 to a 441 base sequence of most likely codons.
atggtgcatctgaccccggaagaaaaaagcgcggtgaccgcgctgtggggcaaagtgaac
gtggatgaagtgggcggcgaagcgctgggccgcctgctggtggtgtatccgtggacccag
cgcttttttgaaagctttggcgatctgagcaccccggatgcggtgatgggcaacccgaaa
gtgaaagcgcatggcaaaaaagtgctgggcgcgtttagcgatggcctggcgcatctggat
aacctgaaaggcacctttgcgaccctgagcgaactgcattgcgataaactgcatgtggat
ccggaaaactttcgcctgctgggcaacgtgctggtgtgcgtgctggcgcatcattttggc
aaagaatttaccccgccggtgcaggcggcgtatcagaaagtggtggcgggcgtggcgaac
gcgctggcgcataaatatcat