How to do a pairwise alignment and convert the result to a VCF file?
1
0
Entering edit mode
3.0 years ago
arturo.marin ▴ 20

Hi,

I need to do a lot of pairwise alignments and obtain a VCF file of each alignment. The purpose of this is to find the mutations between different ORFs of a reference genome and this ORFs in other genomes (one by one). Is there some combination of software that can do this? I am not sure that this can be done with MAFFT or muscle, since they are multiple aligner, and I am not sure that this can be done with a mapper like minimap2 or BWA, since mappers works with reads.

Maybe I could use MAFFT and build a parser in Python to see the mutations in each alignment, but maybe there is a program (or combination of program) that does this more easily.

Thanks,

VCF Genomics mutations • 1.3k views
ADD COMMENT
0
Entering edit mode
3.0 years ago

You could use minimap2 and paftools, some inspiration can be found here: https://github.com/lh3/CHM-eval/blob/master/dip-call/README.md

ADD COMMENT
0
Entering edit mode

Following the Minimap2 cookbook:

minimap2 -cx asm10 --cs "$ref_fasta" "$asm_fasta" \
  | sort -k6,6 -k8,8n \
  | paftools.js call -L 1000 -f "$ref_fasta" - > "$out_vcf"

Breaking this down:

Align using minimap2. Include the options -c: output CIGAR -x asm10: Preset parameter for alignment between assemblies assuming <1% sequence divergence. (Could use asm5 or asm20 if your sequences are more/less similar.) --cs: output cs tag

Sort the PAF output

Redirect to paftools.js call. By default, alignments less than L=50000 are not included for variant calling. Set L to lower value if your assembly is smaller than this. Provide reference fasta sequence with -f.

ADD REPLY

Login before adding your answer.

Traffic: 2339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6