How do I Download a Portion of a Random Bacterial Genome
1
0
Entering edit mode
6.6 years ago
13dsc • 0

I wish to download a snippet ~ 20kb - 60kb in length of a random bacterial genome. How would I go about randomizing it? I think that once I understand how to pick from the consortium of bacteria I should be able to figure out how to take a random portion of the genome.

Cheers, Dragos

python biopython • 1.1k views
ADD COMMENT
1
Entering edit mode

Is it possible to download a random set of proteins? (fasta files)
You can probably use a random number generator and NCBI e-utils as an alternate.

ADD REPLY
1
Entering edit mode

Once you have a bacterial genome, you can grab a random piece of it with BBMap like this:

mutate.sh in=bacteria.fa out=snippet.fa fraction=0.01

That will yield a piece of the genome 1% of the original genome size. You can also add snps and indels with that tool if you want. If you just want one piece of the primary chromosome add the flag "reads=1" to stop processing after the first contig; otherwise it will give you a random 1% of every contig.

ADD REPLY
3
Entering edit mode
6.6 years ago
5heikki 11k

This gives random bacterial genome to stdout.

awk 'BEGIN{OFS=FS="\t"}NR>3{print $20}' \
    <(wget -qO- ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt) \
    | shuf -n1 \
    | awk 'BEGIN{OFS=FS="/"}{print $0,$NF"_genomic.fna.gz"}' \
    | wget -i - -qO- \
    | gunzip
ADD COMMENT
1
Entering edit mode

@Brian's solution can be tacked on at the end to get a random piece of that random genome :)

ADD REPLY

Login before adding your answer.

Traffic: 2127 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6