Sample contamination level over 30%
0
0
Entering edit mode
6.6 years ago
haiying.kong ▴ 360

I have a pair of samples (normal and blood) that are whole exome sequenced. I checked contamination level of the samples. Tumor is contaminated at level of 30%. If I do validation experiment for any findings from this pair of samples, can I get my work published?

next-gen • 2.1k views
ADD COMMENT
0
Entering edit mode

Tumor is contaminated with what?

ADD REPLY
0
Entering edit mode

cross-individual contamination.

ADD REPLY
0
Entering edit mode

How did you determine that?

ADD REPLY
0
Entering edit mode

run ContEst in GATK tools

ADD REPLY
1
Entering edit mode

It might be an idea to add this into your OP along with commands ran, and your experimental context.

ADD REPLY
0
Entering edit mode

No, that is not the point!

My point is not how to find contamination level. Assuming the computation is correct!

Then if a tumor sample has such high level of contamination, can I still use any findings from the sample? The sample is WES, and if I validate with extra experiment on other samples for any findings from this highly contaminated tumor sample, can I still publish?

ADD REPLY
2
Entering edit mode

What do you think? If you were reviewing such a paper would you consider this acceptable? How did the contamination occur in first place?

ADD REPLY
0
Entering edit mode

I am asking for someone else. Because I saw it published on IF 5+ journal. They did not mention contamination level,.

ADD REPLY
1
Entering edit mode

I don't think we can answer this question without full context as @andrew said above. In general, if something was contaminated then your conclusions are always going to have a cloud hanging over them (unless there is a clear experimental case/explanation for presence of that contamination).

ADD REPLY
0
Entering edit mode

The point is:

whatever they find from the contaminated sample is just suggestion for possible finding. All suggested findings are validated on other samples.

Does this make the work qualified for publication?

ADD REPLY
1
Entering edit mode

If there is independent experimental validation across many samples then it may be acceptable but you would still have to explain why contamination exists at that high level.

That said, consider this quote from ContEst paper

a typical cancer project might expect >10% of the samples to have 1.5% contamination, causing ~0.2 errors/Mb per sample, which is a significant fraction of the typical somatic mutation rate of 1/Mb per sample.

If you had 30% contamination then ...

ADD REPLY
0
Entering edit mode

If I were a reviewer, I would consider data with 30% cross-contamination to be completely useless and evidence of a lack of concern for accuracy, so I'd reject it.

That said, just because some tool reported 30% cross-contamination does not mean you actually have 30% cross-contamination.

ADD REPLY
0
Entering edit mode

in fact i used ContEST and verifyBamID to estimate contamination. both gave abot 30%. i used same software tested other samples. none are bad at this level.

ADD REPLY
0
Entering edit mode

If you have a good number of samples, you can then discard the problematic sample and proceed with downstream analyses.

The question whether a sample with 30% contamination is still publishable is not a bioinformatics question, but if there is one thing we learned from the last few years is anything is publishable, you just have to find the "appropriate" venue.

ADD REPLY
0
Entering edit mode

The thing is that there is only one pair of samples, normal and tumor, for this study whole exome sequenced. After identifying interesting mutations, these mutations are validated on larger number of samples with sanger sequencing which is much cheaper than WES. You are right about "appropriate" venue. This is exactly what I saw. Some people do research to make living, I think.

ADD REPLY

Login before adding your answer.

Traffic: 2094 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6