Cannot merge BCF files with `bcftools` files because "Index required, expected .vcf.gz or .bcf file" ?
1
0
Entering edit mode
7.0 years ago
jespinoz ▴ 20

I can't merge my BCF files together using bcftools. Below are the details of my pipeline. After running the pipeline, I created a subdirectory that has 2 *.bcf files to try and merge them as a test set but it's not working.

My commands to merge 2 *.bcf files

 # Directory contents
-bash-4.1$ cd bcf_files/testing/
-bash-4.1$ ls
S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf  S-1410-81.A_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf   

# Attempting to merge 2 bcf files
-bash-4.1$ bcftools view S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf  S-1410-81.A_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf > testing.merged.bcf

#Error below
Index required, expected .vcf.gz or .bcf file: S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
Failed to open or the file not indexed: S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf

I tried indexing them

$ bcftools index S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf
[E::main_vcfindex] bcf_index_build failed for S-1409-57.B_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf

My pipeline: I have 88 samples whose reads together total to about 746 G in size.

I used HISAT2 for the mapping using human assembly hg38. HISAT2 supplies preindexed files that we used located at ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38.tar.gz

The assembly for the genome used for the indexing was retrieved from ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

Create the sam file

hisat2 -q -p 2 --fast -x ./grch38/genome -1 {r1_path} -2 {r2_path} -S ./sam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam

Sam => Sorted-bam

samtools view -bS ./sam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam | samtools sort -@ 16 -o ./sorted_bam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted

Sorted-bam => BCF samtools mpileup -uf ./grch38/Homo_sapiens.GRCh38.dna.primary_assembly.fa -C 50 --BCF -o ./bcf_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf ./sorted_bam_files/{lib}.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted

$ du -sh *
5.8T    bcf_files
8.5G    grch38
4.7G    grch38.tar.gz
435K    reads
34K run_tmp.sh
176G    sam_files
38G sorted_bam_files
bcf vcf merge index snps • 4.1k views
ADD COMMENT
1
Entering edit mode
6.9 years ago
jespinoz ▴ 20

The bcf files weren't generated correctly for some reason so I converted to vcf w/ bcftools view then bgzip the file, then indexed the file with bcftools index.

bcftools view ./bcf_files/1054.2_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.bcf | bgzip -c > ./vcf_bgz_files/1054.2_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.vcf.bgz; bcftools index ./vcf_bgz_files/1054.2_RD1.kneaddata.paired.human.bowtie2.R1-R2.sam.bam.sorted.vcf.bgz

ADD COMMENT

Login before adding your answer.

Traffic: 3197 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6