Insillico dual restriction enzyme reference genome digestion.
4
1
Entering edit mode
7.8 years ago
William ★ 5.3k

I am looking for a tool that can do a insillico digestion of a reference genome given 2 restriction enzymes (or the pattern at which both cut).

The output that I would like to have is a BED file with the start and end position on the reference genome of all the produced RE DNA fragments. (or only those that are larger than a specific lenght).

The tool should take into account that both the forward and reverse DNA strand could be cut.

Is there any such tool that I can download?

ddradseq insillico • 2.9k views
ADD COMMENT
0
Entering edit mode

You can try restrict from EMBOSS. You would need to make the BED format file from the output of restrict.

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
7.8 years ago
William ★ 5.3k

I found the R package SimRad which

provides a number functions to simulate restriction enzyme digestion, library construction and fragments size selection.

https://cran.r-project.org/web/packages/SimRAD/SimRAD.pdf

A nice feature is that a filter step can also be done on restriction site combination to only select fragments that start with enzyme A restriction site and end with enzyme B restriction site.

adapt.select(sequences, type = "AB+BA", cut_site_5prime1, cut_site_3prime1,
cut_site_5prime2, cut_site_3prime2)

The output is the DNA fragments produced by the digestion and size selection.

The output does not contain were the fragments are located on the genome.

To create the BED file I need to BLAST the fragments versus the reference genome I guess.

ADD COMMENT
2
Entering edit mode

In this case, Blast will be complete overkill. All your fragments are EXACT substrings of your ref sequence. Thus you can find their location with any scripting language which offers a string.find() method.

ADD REPLY
0
Entering edit mode
7.8 years ago

This is a hack, but it might work. I wrote a script to find regular expressions in fasta files here fastaRegexFinder.

To find all the restriction fragments from two enzymes, run fastaRegexFinder.py for each combination of enzyme 1 and enzyme 2 in forward and reverse (= 16 times). E.g. say restriction sites are TTCC and ACTG, do:

res="TTCC GGAA ACTG CAGT"
for r1 in $res
do
    for r2 in $res
    do
    echo "$r1 and $r2"
    fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r $r1.*?(?=$r2)
    done
done

This compiles and runs the following commands:

fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=TTCC)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=GGAA)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=ACTG)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=CAGT)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=TTCC)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=GGAA)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=ACTG)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=CAGT)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=TTCC)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=GGAA)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=ACTG)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=CAGT)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=TTCC)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=GGAA)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=ACTG)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=CAGT)

The output (to stdout by default) will be in bed format.

ADD COMMENT
0
Entering edit mode
5.8 years ago
c_u ▴ 520

Another good option is a utility called digest_genome(.py) that comes with HicPro. It can digest the reference genome by the provided restriction enzymes(s) and generate a BED file with the list of restriction fragments after digestion. You can specify multiple restriction enzymes too. here is the link

ADD COMMENT
0
Entering edit mode
5 months ago
tshtatland ▴ 190

Use EMBOSS: restrict. From the docs:

Report restriction enzyme cleavage sites in a nucleotide sequence

You have to write a trivial parser to convert the output to bed format.

ADD COMMENT

Login before adding your answer.

Traffic: 1816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6