loadling bcf2 file in plink
1
0
Entering edit mode
6.6 years ago
janhuang.cn ▴ 210

I am using bcftools to save a bcf2 file (chr22.bcf2) for a later analysis using plink.

bcftools view --samples-file sample.txt --force-samples --types snps --output-type b chr22_genotypes.vcf.gz | bcftools norm -Ob -m+any  | bcftools norm -Ob --fasta-ref human_g1k_v37.fasta | bcftools annotate --output-type b --output chr22.bcf2  -I '%CHROM:%POS:%ID'

But when I load this file into plink, invalid chromosome was found.

plink --bcf chr22.bcf2 --make-bed --out chr22

Error: Invalid chromosome code 'hs37d5' in .bcf file.

This error was not returned when I save the file as vcf and load it with plink. And when I checked which line has a chromosome code of "hs37d5", the below message was returned.

bcftools view  chr22.bcf2  | grep -e "hs37d5" | cut -f1-7

##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2                 _reference_assembly_sequence/hs37d5.fa.gz
##contig=<ID=hs37d5,assembly=b37,length=35477943>

Are these the header lines? Can exclude them when loading with plink? or exclude them when saving as bcf2?

plink bcf • 2.9k views
ADD COMMENT
1
Entering edit mode
6.6 years ago

I was able to replicate your error and also prove that it works when it's a VCF (but not BCF).

Error: Invalid chromosome code 'hs37d5' in .bcf file.

(Use --allow-extra-chr to force it to be accepted.)

The contig parameter in the VCF header just lists all of the contigs/chromosomes that should be present in your file. You can safely remove it manually.

Afer removing it, if you convert your file back to vcf with bcftools view -Ov chr22.bcf2 > chr22.vcf and then index with bgzip chr22.vcf followed by tabix -p chr22.vcf.gz, VCFtools will automatically add new contig tags for whatever contigs are present. In my example, ##contig= < ID=22 > is added.

I then convert this back to BCF and index it:

bcftools view -Ob chr22.vcf.gz > chr22.bcf
bcftools index chr22.bcf

Once you do that, it reads into PLINK:

/Programs/plink1.90/plink --bcf chr22.bcf2 --make-bed --out chr22
PLINK v1.90b3.38 64-bit (7 Jun 2016)       https://www.cog-genomics.org/plink2
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to chr22.log.
Options in effect:
  --bcf chr22.bcf2
  --make-bed
  --out chr22

15037 MB RAM detected; reserving 7518 MB for main workspace.
--bcf: chr22-temporary.bed + chr22-temporary.bim + chr22-temporary.fam written.
12 variants loaded from .bim file.
2504 people (0 males, 0 females, 2504 ambiguous) loaded from .fam.
Ambiguous sex IDs written to chr22.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 2504 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.999434.
12 variants and 2504 people pass filters and QC.
Note: No phenotypes present.
--make-bed to chr22.bed + chr22.bim + chr22.fam ... done.
ADD COMMENT
0
Entering edit mode

I am not sure about the steps you mentioned.

My original data is chr22_genotypes.vcf.gz. Do you mean I can remove the contig parameter in chr22_genotypes.vcf.gz and save as chr22.vcf And then apply bgzip chr22.vcf | tabix -p chr22.vcf.gz to chr22.vcf?

And then apply this command to convert to bcf?

bcftools view -Ob chr22.vcf > chr22.bcf
bcftools index chr22.bcf

Also, do you mean remove the contig parameter using bcftools? Could you suggest the command I should be using?

ADD REPLY
0
Entering edit mode

Hey, no, I only convert the BCF (binary compressed) back to VCF (plain text) so that I can manually remove the line using the vi editor (or any other text editor). This is a perfectly reasonable thing to do, provided that you understand.

ADD REPLY
1
Entering edit mode

I see. Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 1473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6