From genotype raw data .idat to PLINK files
2
0
Entering edit mode
10.0 years ago
Armand ▴ 20

Dear all,

I have several raw data (exome genotyping): *_Red.idat *_Grn.idat

.. and also the illumina data mapping, a file with this columns :

"Family ID","Individual ID","Sample ID","Genotyping Chip Barcode","Genotyping Chip Type","Final Report Name","Sex","Study Role","Birth Year Month"

....

(where Genotyping Chip Barcode is something like 4252475888_A and Genotyping Chip Type like 1M-Duov3)

I have different platforms, but now I am focused the data from 1M-Duov3)

I would like to generate the PLINK file. I am using the crlmm R package in order to try to get, at least, the .ped plink genotype file. I am figuring out how to launch successfully genotype.Illumina function.

I am following : http://master.bioconductor.org/packages/release/bioc/manuals/crlmm/man/crlmm.pdf

cnSet <- genotype.Illumina(sampleSheet=samplesheet_subset,
                             arrayNames=samplesheet_subset$Sample.ID,
                             path=datadir,
                             arrayInfoColNames=samplesheet[wh_array_name_pos,"Genotyping.Chip.Barcode"],
                             cdfName="human1mduov3b",
                             batch=rep("1", nrow(samplesheet_subset)))

It seems that cdfName according to 1M-Duov3 should be human1mduov3b.

  • samplesheet_subset a subset data.frame
  • illumina data mapping file with a subset of .idat files
  • (I am using 38 samples -parents, probands, sibiling, ..)
  • arrayNames I don't know what it reefers to... (I try to pass the different sample ID : samplesheet_subset$Sample.ID)
  • batch following the example ... (the number of rows of samplesheet_subset)

When I launch, I got this error:

Instantiate CNSet container.
Error en constructInf(sampleSheet = sampleSheet, arrayNames = arrayNames,  : 
  Missing some of the *Grn.idat files

But I think that all the .idat files are there ...(`_R01C01_Grn.idat,_R01C02_Grn.idat,_R01C01_Red.idat,*_R01C02_Red.idat`)

[... and I suppose that every .idat file contain variouse samples ..]

Thanks for your help,

Cheers,

genotype sequencing R • 8.9k views
ADD COMMENT
0
Entering edit mode

I encounter the same error.

Have you manage to solve that?

ADD REPLY
1
Entering edit mode
4.3 years ago

If instead of using CRLMM you are okay with using Illumina proprietary GenCall algorithm to generate GTC files out of IDAT file, there are now two approaches:

(i) using the Illumina Array Analysis Platform

(ii) using the Illumina Beeline/AutoConvert software

I describe how to use either approach on Linux here

You can use my own bcftools plugin gtc2vcf to convert GTC files to VCF

Then it is easy to convert a VCF file to PLINK format using best practices

ADD COMMENT
0
Entering edit mode

Thank for the guide and the software, really helpful! I have a little confusion though. After we converted the .gtc to .vcf file, as in your blog, do we still have to normalize the .vcf file?

ADD REPLY
0
Entering edit mode
4.9 years ago

Had the same issue, make sure your files have a .idat extension and are not gzip-compressed. And that the path directory begins with a / and is input as a string (with " "). Worked for me

ADD COMMENT

Login before adding your answer.

Traffic: 3013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6