Genomics (DNA) Pipeline - Example
1
0
Entering edit mode
5.2 years ago
caspase8mach ▴ 20

Hello all,

Is there a good example of a genomics pipeline ready to be used for mapping/alignment of NGS data (preferably whole genome) followed by variant calling / annotation along with generation / visualization of quality matrices? It will be even better if the suggested pipeline is Python based.

I would like to use publicly available Fastq and/or bam files to 'learn' and demonstrate the entire DNA analysis workflow.

Your help and suggestions will be greatly appreciated.

Thanks much.

alignment variant calling annotation Python • 3.0k views
ADD COMMENT
2
Entering edit mode

Not a ready to use workflow, but if your goal is to learn, you might want to have a look at the tutorial about Creating workflows with snakemake and conda I've wrote some time ago.

ADD REPLY
0
Entering edit mode

Thanks finswimmer for the workflow ... certainly will help me to learn.

By any chance do you have links for the .fa and multiple fastq files for me to give this example a try?

Do I also have to provide an index file?

TIA

ADD REPLY
1
Entering edit mode

Hello caspase8mach ,

you can search in the European Nucleotide Archive for a suitable public dataset (This tutorial by ATpoint might be useful for you as well)

Do I also have to provide an index file?

What index file do you mean?

fin swimmer

ADD REPLY
0
Entering edit mode

Awesome, thanks a lot for the link to the nice tutorial! Its great!

What index file do you mean?

What index file do you mean?

For mapping the Fastq file using a reference genome, do I need to create an index first?

Thanks a lot.

ADD REPLY
1
Entering edit mode

Yes, you need to create an index for the reference genome. How you create this index, depends on the aligner you like to use. E.g. for bwa it's a simple bwa index genome.fa

ADD REPLY
0
Entering edit mode

Thanks a lot. As suggested, I created index file using bwa index hg19.fasta and got the following files: hg19.fasta hg19.fasta.amb hg19.fasta.ann hg19.fasta.bwt hg19.fasta.pac hg19.fasta.sa I did manage to align a pair of FastQ files using your Snakemake tutorial, hurray ... my first NGS DNA Analysis pipeline!

Now my questions is .... how is the analysis done in production, to analyze several samples, is it possible to do in parallel fashion, cloud computing, etc., any examples?

Thanks a ton for your help.

ADD REPLY
1
Entering edit mode

If you start snakemake with the --cores parameter e.g. --cores 4 it runs 4 jobs in parallel.

snakemakecan also be used with cluster and cloud support. See the manual for it. Unfortunately I have no experiences with this.

ADD REPLY
0
Entering edit mode

Certainly helpful, will give it a try and let you know. Any one with an experience with the Apache Spark based DNA NGS Pipeline(s)?

Thanks

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thanks for the info, but somehow I am not able to access the URL you wrote/suggested. Could you please give me the correct URL? Thanks

ADD REPLY
2
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thanks a lot for the information. Am going through the suggested resources to learn and build a genomics pipeline(s). Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 2159 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6