Question

Generate VCF from different .bam files with different chromosome names

0

Entering edit mode

4.2 years ago

evoclive • 0

I have two resources of .bam files. One is generated by our lab (1 sample = 1 bam). One is downloaded online (again 1 sample = 1 bam). For the downloaded samples the chromosomes are labelled: chr1, chr2, chr3 etc For our lab samples, the chromosomes are labelled: 1, 2, 3 etc. I want to generate a single VCF file of variants across all samples.

I'm using bcftools: bcftools mpileup -Ov -f ref.fasta -b samples.txt | bcftools call -mv -o bamMge.vcf

However, I get no calls and the repeated error: [E::faidx_adjust_position] The sequence "1" was not found

I have two questions:

Is my strategy correct (i.e. bcftools mpileup)? Do I need to incorporate extra steps (NB I also have matching gVCF files)
How can I either (i) alter the chromosome labeling of one of the subsets of bam files, or, (ii) use a mapping file to match chromosome labels during the mpileup run?

Thanks

bcftools mpileup vcf • 1.8k views

ADD COMMENT • link updated 4 months ago by Ram 43k • written 4.2 years ago by evoclive • 0

0

Entering edit mode

I have an associated question. I am creating a vcf from 128 bams. All of the bams were aligned to the same reference genome, but the reference genome and paths are different in some of the bams. Furthermore, some of the bams were created with bwa-mem and others were created with bwa-mem2. Will these slight differences in bam creation affect my vcf?

For example the bam headers are slightly different:

BAM1

@PG ID:bwa-mem2 PN:bwa-mem2 VN:2.2.1 CL:bwa-mem2 mem -t 8 /path/to/indexed/ref/genome /path/to/1.fq.gz /path/to/2.fq.gz
@PG ID:samtools PN:samtools PP:bwa-mem2 VN:1.10 CL:samtools view -H BAM1.bam

BAM2

@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa mem -t 20 /path/to/nonindexed/genome/different_name.fasta /path/to/combined.fq
@PG ID:samtools PN:samtools PP:bwa VN:1.10 CL:samtools sort -@20 -0 bam -o /path/to/BAM1.bam
@PG ID:samtools.1 PN:samtools PP:samtools VN:1.10 CL:samtools view -H BAM1.bam

The contig names of the genome files are the same! Any help would be greatly appreciated.

ADD REPLY • link updated 4 months ago by Ram 43k • written 4 months ago by emily.c.giles • 0

0

Entering edit mode

Ask a new question and maybe reference this post in that question.

ADD REPLY • link 4 months ago by Ram 43k

score 0 · Answer 1 · 2020-03-25

The contig names in your BAM files should match those in your file, ref.fasta; so, you need to solve this issue. The question that worries me is this: were the BAMs produced by aligning reads to different reference genomes? If this is the case, which seems to be the likely scenario, you should re-do the alignment to ensure that the data is aligned to the same genome.

Alternatively, try to determine the respective genomes to which your BAMs were aligned, and use these respective genomes for bcftools mpileup. You can almost certainly find out the reference genome used by simply looking at the information in the BAM headers, via:

bcftools view -h my.bam

Kevin