For the longest time I've been using BaseSpace to process my TruSeq Amplicon data however my company wants to move away from them for data security and reliability reasons.
So I've been tasked with figuring out how to streamline and eventually automate the processing of TruSeq Cancer Panel and TruSight Myeloid data generated from a NextSeq 500. The samples are run 2x150bp and have an average coverage of 5k with a max coverage of 50k. The NextSeq has 4 flowcell lanes as well (8 fastq files for each sample, ~40 samples per run). The TruSeq Cancer Panel samples are all FFPE and the TruSight Myeloid samples are all whole-blood.
I have some familiarity with working with Linux and have been using Fedora to demultiplex NextSeq data using bcl2fastq2. I've also looked into the whole workflow for aligning the data with BWA, converting the .sam files to .bam using Samtools, filtering with Samtools/PicardTools, and variant calling with GATK. However I've never been able to successfully process a single sample all the way through due to incorrect use of filtering or some other early step.
I guess what I'm asking is what are the commands and options to use for each program to process the samples from beginning to end. From there I'll work on making a bash or python script to pipe the output from one program into the next and so on, and then eventually making a daemon that will automatically detect and start processing data.