Phasing trios for identification of denovo variants
2
3
Entering edit mode
8.9 years ago
newbee ▴ 40

Hello,

I am currently focusing on identifying denovo mutations from my trio data (parents are unaffected and child is affected). I used PhaseByTransmission. However, I found all denovo mutations (child is heterozygous, and both parents are hom. ref) were not phased (i.e. I am getting '/' instead of '|'). Do you think it is an error? If I search autosomal recessive, variants were phased correctly. What is the problem in my analysis? I am pasting the summary results provided by PhaseByTransmission for your kind perusal. Please also comment on the summary results, are they looking odd?

Please help.

java -jar /gatk_3.3/GenomeAnalysisTK.jar -R /reference_sequence/human_g1k_v37.fasta -T PhaseByTransmission -V trio1.vcf -ped trio1.ped --DeNovoPrior 0.00001 -o trio_out.vcf --MendelianViolationsFile mendelian_violation.vcf

INFO 20:04:04,201 GenomeAnalysisEngine - Strictness is SILENT
INFO 20:04:04,341 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 20:04:04,453 PedReader - Reading PED file trio1.ped with missing fields: []
INFO 20:04:04,457 PedReader - Phenotype is other? false
INFO 20:04:04,510 GenomeAnalysisEngine - Preparing for traversal
INFO 20:04:04,530 GenomeAnalysisEngine - Done preparing for traversal
INFO 20:04:04,531 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 20:04:04,531 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 20:04:04,532 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 20:04:34,824 ProgressMeter - 15:96876611 147844.0 30.0 s 3.4 m 77.5% 38.0 s 8.0 s
INFO 20:04:43,701 PhaseByTransmission - Number of complete trio-genotypes: 139299
INFO 20:04:43,702 PhaseByTransmission - Number of trio-genotypes containing no call(s): 0
INFO 20:04:43,703 PhaseByTransmission - Number of trio-genotypes phased: 124651
INFO 20:04:43,703 PhaseByTransmission - Number of resulting Het/Het/Het trios: 13391
INFO 20:04:43,704 PhaseByTransmission - Number of remaining single mendelian violations in trios: 937
INFO 20:04:43,704 PhaseByTransmission - Number of remaining double mendelian violations in trios: 12
INFO 20:04:43,704 PhaseByTransmission - Number of complete pair-genotypes: 0
INFO 20:04:43,705 PhaseByTransmission - Number of pair-genotypes containing no call(s): 0
INFO 20:04:43,705 PhaseByTransmission - Number of pair-genotypes phased: 0
INFO 20:04:43,705 PhaseByTransmission - Number of resulting Het/Het pairs: 0
INFO 20:04:43,706 PhaseByTransmission - Number of remaining mendelian violations in pairs: 0
INFO 20:04:43,706 PhaseByTransmission - Number of genotypes updated: 4395
INFO 20:04:45,481 ProgressMeter - done 201351.0 40.0 s 3.4 m 100.0% 40.0 s 0.0 s
INFO 20:04:45,482 ProgressMeter - Total runtime 40.95 secs, 0.68 min, 0.01 hours
INFO 20:04:47,002 GATKRunReport - Uploaded run statistics report to AWS S3
phasing trios denovo-variant sequencing • 4.1k views
ADD COMMENT
0
Entering edit mode

Hi Vivek,

I must appreciate your help. Could you please guide me with some additional information?

  1. Is that okay if I only consider PhaseByTransmission and do not run ReadBackedPhasing? Or I need to run both. I read other comments in GATK forum that users do not require to maintain any order of running these two tools as the results would be same. However, I am wondering if I run only PhaseByTransmission would be okay or not?
  2. Should I consider the file 'mendelian_violation.vcf' to extract all de novo variants?

Thanks a lot.

ADD REPLY
0
Entering edit mode

ReadBackedPhasing and PhaseByTransmission are two entirely different modules based on how they work and they should not necessarily be used together. PBT works by adding a statistical prior before phasing the variants, RBP on the other hand works on constructing haplotype strings by leveraging linkage equilibrium over certain lengths and using reads that span multiple variant sites.

ADD REPLY
0
Entering edit mode

OK...got it...:)

Thank you very much Vivek and Donfreed

ADD REPLY
7
Entering edit mode
8.9 years ago
Vivek ★ 2.7k

Those are your likely candidate de novo mutations, the tool uses / instead of | to indicate a violation of mendelian inheritance pattern and the presence of an un-inherited allele at these loci.

ADD COMMENT
6
Entering edit mode
8.9 years ago
donfreed ★ 1.6k

The program is working correctly. Denovo variants can not be phased by transmission because they are not transmitted from the parents (they arise "de novo"). You also can not phase de novo variants using imputation, as imputation also depends on the variant being transmitted from a parent.

To phase de novo variants, you need to use sequence reads to phase the de novo variant to a nearby inherited heterozygous variant which you can phase by either transmission or imputation.

ADD COMMENT

Login before adding your answer.

Traffic: 1669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6