Question

From genotype raw data .idat to PLINK files

0

Entering edit mode

10.0 years ago

Armand ▴ 20

Dear all,

I have several raw data (exome genotyping): *_Red.idat *_Grn.idat

.. and also the illumina data mapping, a file with this columns :

"Family ID","Individual ID","Sample ID","Genotyping Chip Barcode","Genotyping Chip Type","Final Report Name","Sex","Study Role","Birth Year Month"

....

(where Genotyping Chip Barcode is something like 4252475888_A and Genotyping Chip Type like 1M-Duov3)

I have different platforms, but now I am focused the data from 1M-Duov3)

I would like to generate the PLINK file. I am using the crlmm R package in order to try to get, at least, the .ped plink genotype file. I am figuring out how to launch successfully genotype.Illumina function.

I am following : http://master.bioconductor.org/packages/release/bioc/manuals/crlmm/man/crlmm.pdf

cnSet <- genotype.Illumina(sampleSheet=samplesheet_subset,
                             arrayNames=samplesheet_subset$Sample.ID,
                             path=datadir,
                             arrayInfoColNames=samplesheet[wh_array_name_pos,"Genotyping.Chip.Barcode"],
                             cdfName="human1mduov3b",
                             batch=rep("1", nrow(samplesheet_subset)))

It seems that cdfName according to 1M-Duov3 should be human1mduov3b.

samplesheet_subset a subset data.frame
illumina data mapping file with a subset of .idat files
(I am using 38 samples -parents, probands, sibiling, ..)
arrayNames I don't know what it reefers to... (I try to pass the different sample ID : samplesheet_subset$Sample.ID)
batch following the example ... (the number of rows of samplesheet_subset)

When I launch, I got this error:

Instantiate CNSet container.
Error en constructInf(sampleSheet = sampleSheet, arrayNames = arrayNames,  : 
  Missing some of the *Grn.idat files

But I think that all the .idat files are there ...(`_R01C01_Grn.idat,_R01C02_Grn.idat,_R01C01_Red.idat,*_R01C02_Red.idat`)

[... and I suppose that every .idat file contain variouse samples ..]

Thanks for your help,

Cheers,

genotype sequencing R • 8.9k views

ADD COMMENT • link updated 20 months ago by Dimas • 0 • written 10.0 years ago by Armand ▴ 20

0

Entering edit mode

I encounter the same error.

Have you manage to solve that?

ADD REPLY • link 8.2 years ago by nadne ▴ 40

score 1 · Answer 1 · 2019-12-29

1

Entering edit mode

4.3 years ago

Giulio Genovese ▴ 390

If instead of using CRLMM you are okay with using Illumina proprietary GenCall algorithm to generate GTC files out of IDAT file, there are now two approaches:

(i) using the Illumina Array Analysis Platform

(ii) using the Illumina Beeline/AutoConvert software

I describe how to use either approach on Linux here

You can use my own bcftools plugin gtc2vcf to convert GTC files to VCF

Then it is easy to convert a VCF file to PLINK format using best practices

ADD COMMENT • link 4.3 years ago by Giulio Genovese ▴ 390

0

Entering edit mode

Thank for the guide and the software, really helpful! I have a little confusion though. After we converted the .gtc to .vcf file, as in your blog, do we still have to normalize the .vcf file?

ADD REPLY • link 20 months ago by Dimas • 0

score 0 · Answer 2 · 2019-05-20

0

Entering edit mode

4.9 years ago

eva.gradovich • 0

Had the same issue, make sure your files have a .idat extension and are not gzip-compressed. And that the path directory begins with a / and is input as a string (with " "). Worked for me

ADD COMMENT • link 4.9 years ago by eva.gradovich • 0