Biostar Beta. Not for public use.
Mpileup generation using Bisulfite converted reference
0
Entering edit mode
15 months ago
kspata • 50
Chicago

I have a bisulfite converted reference genome (used bismark_genome_preparation on the reference file). Can I use this bisulfite converted reference genome to create index file using samtools faidx? I want to create a samtools mpileup file for downstream analysis.

The bismark_genome_preparation creates following folders and files:

Bisulfite-Genome: 1. CT_conversion -BS_CT.1.bt2 -BS_CT.3.bt2 -BS_CT.rev.1.bt2 -BS_CT.2.bt2 - BS_CT.rev.2.bt2 - BS_CT.4.bt2 - genome_mfa.CT_conversion.fa 2. GA_conversion -BS_GA.1.bt2 -BS_GA.3.bt2 -BS_GA.rev.1.bt2 -BS_GA.2.bt2 - BS_GA.rev.2.bt2 - BS_GA.4.bt2 - genome_mfa.GA_conversion.fa

Can I use the files genome_mfa.CT_conversion.fa and genome_mfa.GA_conversion.fa to create index files using samtools faidx command?

My other question is

How to generate mpileup files from these faidx reference files? Because it will generate two mpileup files for each reference sequence? Can I then merge two mpileup files for downstream analysis?

 samtools mpileup -d <integer> -f <ref1>  <input.bam> >  output_1.mpileup
samtools mpileup -d <integer> -f <ref2>  <input.bam> >  output_2.mpileup


Can I then merge the two mpileup output files?

Is there any other approach?

0
Entering edit mode

That would be a yes on the first question. As for the second question, that would be a bit more dedicated. I assume, you'd need to check each position from both mpileups and convert the information so that the variant information conforms to the original reference (for mpileup the CT and AG converted references are two different references, so the information provided in the files is not compatible.)

What I am wondering, though - why don't you use bismark and its downstream analysis? The whole genome methylation report will then give you exactly what you seemingly want to have (at least if I infer it correctly from your question.)

Edit: bismark-comment

0
Entering edit mode

Thank you for this insight. I really appreciate it. I did use bismark pipeline for downstream analysis instead of the mpileup generation.

I have another question where I am studying the methylation status of a gene DMPK. For this purpose is it better to map samples across whole genome reference sequence (hg38) or only DMPK gene region?

I ran my samples with both and observed that coverage after mapping to entire human genome is very less (<7X) and when I map same samples with DMPK gene sequence the coverage increases to upto 100X. The samples were sequenced on MiSeq and are bisulfite converted amplicons.

Which one of the two approaches is correct?