Question

Beagle files using the latest 1000 genomes

0

Entering edit mode

8.4 years ago

jamespoweraid2 • 0

Hi,

I would like to get the latest beagle files from vcf files from phase 3 of the 1000 genomes data with 2504 unrelated individuals that is here:

http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/, which uses these 1000 Genomes vcf files: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/

In particular I am trying to get something like what was available for the previous releases to create the files:

ALL.chr1.phase1_release_v2.20101123.filt.bgl.gz
ALL.chr16.phase1_release_v2.20101123.filt.tabix.gz
ALL.chr1.phase1_release_v2.20101123.filt.markers

Would I need to use the script here with the BEAGLE utilities?

https://data.broadinstitute.org/srlab/BEAGLE/1kG-beagle-release3/READ_ME_beagle_phase1_v3

Thank you so much for any advice about how to get these files in beagle, very very much appreciated...

1000Genomes • 3.5k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.4 years ago by jamespoweraid2 • 0

Ram · Accepted Answer · 2015-12-02

3

Entering edit mode

8.4 years ago

Kamil ★ 2.3k

Use the BEAGLE tools to change the file format. Here's an example that should get you started:

wget https://faculty.washington.edu/browning/beagle/bref.09Nov15.d2a.jar
wget http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/individual_chromosomes/chr22.1kg.phase3.v5a.bref
java -jar bref.09Nov15.d2a.jar chr22.1kg.phase3.v5a.bref | gzip > chr22.1kg.phase3.v5a.vcf.gz
zcat chr22.1kg.phase3.v5a.vcf.gz | head -n6 | cut -c1-100 | grep -v '^#' | perl -ane 'print join("\t",@F[0..4]),"\t"; $i=0; foreach $G (@F[9..$#F]) { @A = split("\\|", $G, 2); print " " if $i++; print $F[3+$A[0]]," ",$F[3+$A[1]]; }; print "\n"'

Output

22    16050115    rs587755077    G    A    G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Kamil ★ 2.3k

0

Entering edit mode

Thank you very much Kamil!

Would this get me the same files as if I ran the script here then?

-- I am trying to get the .filt.bgl.gz, filt.tabix.gz, .filt.markers to be able to run EPIGWAS--

https://data.broadinstitute.org/srlab/BEAGLE/1kG-beagle-release3/READ_ME_beagle_phase1_v3

But using this version of the genome instead?

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL*
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/phase1*
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/README*

Thanks again!!

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by jamespoweraid2 • 0

0

Entering edit mode

For filtered variants, you might consider taking the files from the BEAGLE website instead of the 1000 Genomes website. The developer of BEAGLE filtered the variants from 1000 Genomes.

ADD REPLY • link 8.4 years ago by Kamil ★ 2.3k