Question

Solid And Hiseq

0

Entering edit mode

11.8 years ago

Liyf ▴ 30

Dear all: I have questions which wish your help. I read paper about exome sequencing and found that in cancer sequencing, there were very few papers. I do not know what are your opinions about Solid sequencing, is it not good or having some problems? Besides, when using chip to capture exome regions, how about 1 chip corresponds to 4 samples? Is there any problems? Thanks.

solid hiseq • 2.9k views

ADD COMMENT • link updated 4.5 years ago by Biostar 20 • written 11.8 years ago by Liyf ▴ 30

0

Entering edit mode

Which paper did you read? what is your question? Could you please explain your question more clearly? what do you mean by -" when using chip to capture exome regions, how about 1 chip corresponds to 4 samples" ?

ADD REPLY • link 11.8 years ago by Vikas Bansal ★ 2.4k

0

Entering edit mode

Excuse me for my poor English. I just want to say some people in order to save money, they just pooled samples together and captured exon region and then sequence.

ADD REPLY • link 11.8 years ago by Liyf ▴ 30

score 4 · Answer 1 · 2012-07-20

4

Entering edit mode

11.8 years ago

Istvan Albert 100k

From my own experience and empirical observation most (if not all) bioinformaticians seem to try to avoid working with the SOLiD sequencing platform because it produces data in color space format and that precludes them from using the majority of existing tools and techniques. This can be very frustrating.

The second part of your question has to do with barcoding the samples, the only question that needs to be determined is the coverage of the samples for each barcode. As long as you get sufficient coverage you can add as many samples as the platform supports.

ADD COMMENT • link 11.8 years ago by Istvan Albert 100k

1

Entering edit mode

Having worked predominantly with SOLiD reads (both 4 and 5), I agree with Istvan that it's a bitch to work with. The only effective aligners you can use are LifeScope/BioScope or Bowtie. ABI's software is also difficult to access as their page is constantly moving or down. And don't even get me started on their XSQ format...

ADD REPLY • link 11.8 years ago by Damian Kao 16k

1

Entering edit mode

I did a fairly thorough comparison of aligners for colorspace and found that BFAST is quite good. Novoalign also did pretty well: https://github.com/brentp/bowfast/tree/master/aligner-compare#solid-3

I agree that it's best to avoid SOLiD if possible. I get my reads out of XSQ before doing anything else.

ADD REPLY • link 11.8 years ago by brentp 24k

0

Entering edit mode

I like the bowfast pipeline. It looks promising. Definitely going to give it a try. Thanks.

ADD REPLY • link 11.8 years ago by Damian Kao 16k

0

Entering edit mode

I sorta abandoned bowfast because BFAST does a decent enough job and it's not too slow as long as you parallelize enough.

ADD REPLY • link 11.8 years ago by brentp 24k

0

Entering edit mode

I also agree with you all. But I also hear some students said to me that mutations called from Solid data are false positive, is it right?

ADD REPLY • link 11.8 years ago by Liyf ▴ 30

1

Entering edit mode

Mutations are called after the alignment is done. The number of mutations will depend on the quality and quantity of the data, the calling algorithm called, and the parameters for the algorithm (the more stringent the less false positives, but also the more true positives missed). At similar calling calling algorthms and parameters, false positives will depend on the depth at the location of calling and on the sequencing error rate in the reads.

ADD REPLY • link 11.8 years ago by Laurent Gautier ▴ 810

1

Entering edit mode

in my experience, yes. you'll have to do a lot of downstream filtering with non-conventional tools to remove false positives from SOLiD data.

ADD REPLY • link 11.8 years ago by brentp 24k

0

Entering edit mode

For the second part of your question, it might be worth having some sense of the number of reads going to unassigned barcodes.

For example, if the runs often have 10,000s of reads (or greater) going to unassigned barcodes, then I might be cautious about trying to sequencing 1,000 reads per sample. I also sometimes get nervous when your number of observed reads varies a lot from the number of expected reads. Sometimes, you can figure out when a barcode has been mixed up (for example, if genomic coverage is very different, and one sample has >10M reads and the other has <100k reads), but that gets harder as the number of missing barcodes increases.

I think there may be also some complications, depending upon the types of samples that you mix. However, I am still trying to understand those trends better, and I am hesitant to say you absolutely can't do something.

In other words, I have some notes on barcoding here:

Calling Single-Barcode Samples from Mixed Runs as Dual-Barcode Samples | Possible Illumina Run QC Flags?

However, you may want to run fewer samples per lane (or less diversity of library/barcode/adapter types per lane) then you may technically be allowed to do.

Also, I think it matters for what you are doing. I would guess germline exome variants are probably not a huge problem (although I could improve concordance of my own Exome versus WGS datasets with reprocessing), but the barcoding may matter more if you are looking for somatic variants (which could be consistent with the greater discordance among variant callers reported in Figure 2 from this paper on TCGA data, or low-frequency variants for HiSeq-X versus NovaSeq in Figure 5 of this pre-print).

ADD REPLY • link 4.5 years ago by Charles Warden 8.2k