Biostar Beta. Not for public use.
Samtools variant caller for each individual sample
0
Entering edit mode
14 months ago
bharata1803 • 420
Japan

Hello, So, currently I want to get variation in the sequence for each individual sample (human cancer sample). I want to compare whether there are any variation among the cancer sample. What I mean is if I compare one cancer sample with others, I want to see if there are any different variation occur.

Can I use mpileup in samtools to call variation for a single sample and then compare the result after that?

I tried to use samtools for all samples in one go but it gives only one list of variation (VCF file). I think that VCF is the common variation occur in cancer sample.

Snippet of vcf result : https://docs.google.com/spreadsheets/d/1TbGlLLjSKoVlI5YtPt3ujyDjKG8I_UKQDj5u22kYYoU/edit?usp=sharing

ADD COMMENTlink
0
Entering edit mode

Could you please explain your input and desired output with example snippets of files? Thank you

ADD REPLYlink
0
Entering edit mode

Well, basically the VCF file is what I needed. I just want to know the variation that an individual has. Let's say in chomosome X position N, individual A has SNP G with reference C. I want to compare if individual B,C,D also has that SNP or not. A,B,C,D are all cancer sample. It is really simple I think. I just want to know the whether using samtools mpileup will produce good result if only a single bam file is given. I think samtools and bcftools try to calculate some statistic based on the average across samples.

ADD REPLYlink
0
Entering edit mode

Depends on whether you are looking only for SNPs, what specificity and sensitivity you want, your ability to pay for software and for lab/chemistry optimization.

samtools on multiple bam files in order to make multisample vcf is a very good starting point to understand object you are working with.

ADD REPLYlink
0
Entering edit mode
13 months ago
United States / Los Angeles / ALAPY.com

You should get one vcf file that has variation data of all samples. How many columns do you have in multisample vcf from variant calling on multiple samples? you should have many with fields containing genotype like 1/1 and 0/1etc. If you have only one such column, could you please tell us the command you used and samtools version.

ADD COMMENTlink
0
Entering edit mode

I can see the columns you are referring. Can that be used to get individual genotype?

ADD REPLYlink
0
Entering edit mode

sure https://samtools.github.io/hts-specs/VCFv4.3.pdf Genotype field 1.6.2 page 9 and Header line syntax 1.5 page 7 each genotype column has name (ID) and it has mandatory GT field with genotype data of that individual (revered by name or ID)

So when you see

....... NA12878 NA12877 ....... 1/2:546 0/1:7657

1/2 corresponds to NA12878 and 0/1 - to sample NA12877. / means unphased diploid. | means phased (at least localy). 0 means reference allele. 1,2 and so on ar alternative alleles. See columns REF and ALT. If ALT=A,T and REF=C then 1/2=A/T and 0/1=C/A

There are some more about rules for REF and ALT and how they correspond to real alleles, but for SNPs you are good to go.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1