Making a haploid genome
1
1
Entering edit mode
8.8 years ago
Jautis ▴ 530

Hi, I have a reference genome (fasta) and list of genotypes (vcf) for one individual. I want to make a haploid genome file by randomly selecting one of the individual's alleles at each variable site. I would use GATK's alternate reference maker, but I do not want to bias my choice to either the reference or non-reference allele. Anybody know how I can do this?

Thank you!

vcf haploid-genome variant • 2.2k views
ADD COMMENT
3
Entering edit mode
8.8 years ago
JC 13k
#!/usr/bin/perl

use strict;
use warnings;
use autodie;

$ARGV[2] or die "use newGenome.pl <VCF> <FASTA> <OUTPUT>\n";

my $vcf = shift @ARGV;
my $fas = shift @ARGV;
my $out = shift @ARGV;

# load sequences
my %seq;
$/ = "\n>";
open (my $fh, "<", $fas);
while (<$fh>) {
    s/>//g;
    my ($id, @seq) = split (/\n/, $_);
    $seq{$id} = join "", @seq;
}
close $fh;

# parse VCF
$/ = "\n";
open (my $vh, "<", $vcf);
while (<$vh>) {
    next if (m/^#/);
    chomp;
    my ($chr, $pos, $id, $ref, $alt, $qual, $fil, $info) = split (/\t/, $_);
    next unless ($alt =~ m/^[ACGT]$/); # only valid SNPs
    my $afrq = 0.5; # in case of missing values use a 50% chance
    if ($info =~ m/AF=(0\.\d+)/) {
        $afrq = $1;
    }

    my $prob = rand();
    if ($prob < $afrq) {
        substr ($seq{$chr}, $pos - 1, 1) = $alt);
        warn "modified $chr:$pos $ref -> $alt\n";
    }
}
close $vh;

# Write modified sequences
open (my $oh, ">", $out);
while (my ($id, $seq) = each %seq) {
    print $oh ">$id\n";
    while ($seq) {
        print $oh substr ($seq, 0, 80), "\n";
        substr ($seq, 0, 80) = '';
    }
}
close $oh;

* Cold night, procastinating work, perl lover

ADD COMMENT
0
Entering edit mode

Very useful! Will you teach me the ways of the force?

ADD REPLY
0
Entering edit mode

You don't need a teacher, just start coding to solve problems like this.

ADD REPLY

Login before adding your answer.

Traffic: 2790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6