Whole Genome Analysis Pipeline (Illumina)
4
2
Entering edit mode
12.0 years ago
NB ▴ 960

Hi,

I would like to know what is the feasible algorithm to map human whole genome sequences (Illumina) ? And what is the general pipeline followed for variant calling for whole genome analysis ?

Thank you, Nandini

illumina pipeline • 10k views
ADD COMMENT
1
Entering edit mode
12.0 years ago

You might take a look here for some ideas:

http://www.broadinstitute.org/gsa/wiki/index.php/Best_Practice_Variant_Detection_with_the_GATK_v3

If you have not done this yourself before, I highly suggest getting a collaborator to work with you on these data.

ADD COMMENT
1
Entering edit mode

Yes, is there a bioinformatics group where you work? There are many things to consider and it would help a lot if you can discuss this with people who have experience working with next generation sequence data.

ADD REPLY
0
Entering edit mode

Thank you for your reply. I have worked on SOLiD whole genome before using bioscope and now I have changed to Illumina ( each sample has been sequenced with flow cell having 7 or 8 lanes with 2 reads each)
So I was wondering if BWA-> Base quality score recalibration ->Local realignment -> MarkDuplicates -> Variant calling is a good option .

ADD REPLY
0
Entering edit mode

Local realignment should probably come after BWA and before marking duplicates and recalibration.

ADD REPLY
1
Entering edit mode
12.0 years ago
eonsim ▴ 100

The Gatk pipeline in the previous post is pretty good, but can be a bit painful when implementing the whole thing (and CPU/io intensive). I've been using http://www.realtimegenomics.com a lot for our sequencing project (1200x coverages of bovine genome) and their pipeline is a lot cleaner, ergonomic (4 commands, format, map, coverage, snp or cnv) and faster (5-10x on our cluster) than the BWA/GATK pipeline while giving comparable results (both gave 99.6% concordance with snp chip calls). And their documentation is pretty good, note while they are commercial there is a free license that's suitable for most research and commercial use on a small to medium scale, they support there software very very well.

The output from the rtg pipeline can be feed into GATK as well if you want just need to filter the bams slightly.

ADD COMMENT
0
Entering edit mode
11.8 years ago

We just finished up our own automated pipeline which uses BWA, GATK, ANNOVAR and samtools to process fastq through to annotated VCF. It was designed for our illumina, human-whole genome data, so it assumes paired end data ATM, but it might be of use. It can download and compile/install each of the components (except ANNOVAR, which you'll need to give them your email address to get access to) and allows very high level of control over each of the programs via a single configuration file (which makes it easier to add data later on). It should run on PBS and SGE clusters as well as in serial, and helps ease the hassle of managing all of those jobs.

It's open source, and pretty extendable, but we haven't really put much effort into documenting how to do that just yet:) But, if you have another program that you prefer for variant calls or alignment, you probably can reuse one of the templates to have it use the alternate program. There are instructions on doing just that in the user's guide.

Anyway, if you are interested, have a look at ASAP. If anyone has ideas or questions relating to ASAP, I'd be happy to answer them.

ADD COMMENT
0
Entering edit mode
11.8 years ago

We've created the pipeline that calls SNPs and SVs. The results are presented to users in Excel tables with effect annotation of each variation. Also the data about protein function, pathways and diseases is presented. The pipeline integrates: GATK best practice Pindel Ensembl variant effect prediction Polyphen SIFT http://code.google.com/p/ngs-pipeline/

ADD COMMENT

Login before adding your answer.

Traffic: 3336 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6