Biostar Beta. Not for public use.
Converting Microarray SNP data into VCF format
0
Entering edit mode
22 months ago
Bangalore

Hello everyone,

I have only one microarray data which contains one .CEL file and another intensity .csv file. I'm searching for converting any of the two files into vcf format. After going through Biostars and Bioconductor forums, I found pd.genomewidesnp.6@getdb, crlmm and few others. crlmm doesn't work properly cause it needs bigger sample size and pd.genomewidesnp.6@getdb doesn't have a proper documentation or any scripts. Is there any way I can get the microarray data convert to vcf format?

Thanks,

Susmita

ADD COMMENTlink
1
Entering edit mode
13 months ago
Republic of Ireland

Take a look here, where you already commented: A: How To Convert Illumina/Affy Array Data To Vcf Format - did that not work?

If not, you can also use Affymetrix Power Tools (APT): http://media.affymetrix.com/support/developer/powertools/changelog/apt-format-result.html

apt-format-result is an application which allows for the creation of VCF, or PLINK file formats as well as IGV compatible files from Axiom analyzed batches.

Yet another related answer: A: How to convert and annotate apt-probeset-genotype into PLINK format

Kevin

ADD COMMENTlink
0
Entering edit mode

Yes i did the affy2vcf and APT, but those didn't work as I don't have the annotation file. Although apart from the .CEL file I do have a csv file of genotype calling of the .CEL file generated through BRLMM-P-Plus . So right now I want to convert that genotype call file into vcf format.

ADD REPLYlink
0
Entering edit mode

Which specific annotation file? APT should download it automatically, no?

ADD REPLYlink
0
Entering edit mode

I don't know. When I'm running the apt-format-result, its showing error that annotation file is missing It requires annotation file of the SNP0.6 Array i guess

ADD REPLYlink
0
Entering edit mode

You likely need one of these: https://www.thermofisher.com/order/catalog/product/901153?SID=srch-srp-901153

Please try to search a bit in the APT options where you can specify the annotation file. It has been a good few years since I last used APT.

ADD REPLYlink
0
Entering edit mode

So I won't be having that file cause I am analysing data available in GEO from another paper.

ADD REPLYlink
0
Entering edit mode

So I created an account and downloaded the annotation file. The command also ran but it showed Missing value in identifier column, SNP will be excluded for text export by calls file. Any idea why that could be.

ADD REPLYlink
0
Entering edit mode

Okay, getting somewhere... Could you try to use the snp-identifier-column parameter?

Additional Options

The following optional commands may be added to the command string as desired:

  • snp-identifier-column: allows a column name to be used as SNP identifier for export. This must be one of annotation columns, e.g. {Affy_SNP_ID, dbSNP_RS_ID}. The default value for this is 'probeset_id'

  • pedigree-file: this file can be used if you wish to include basic pedigree information such as family, sample, maternal, ect.

  • export-call-format: sets format of export calls. Available formats: call_code {AA/AB/BB} or base_call {CC/CT/TT} or translated {0/1/2/-1}. Default is call_code.

ADD REPLYlink
0
Entering edit mode

Hello Kevin, I know it's been long time. I just got fed up with this and shifted to another project. Now I have come back to the same problem. I'm trying to do as you said but I'm still getting errors.

#%%field-000=date-time,string8
#%%field-001=facility,string8
#%%field-002=level,int8
#%%field-003=mem usage (MB),string8
#%%field-004=message,string8
#
date-time   facility    level   mem usage (MB)  message
03/02/2019 19:21:33 ERROR   1   503 Failed to execute sql statement, load chromosome table failed.\tSql: select * 
from Chromosome, ErrMsg: Failed to prepare sql statement for querying.\tSQL: select * from Chromosome, ErrCode: 
26 (https://www.sqlite.org/rescode.html)
03/02/2019 19:21:33 WARNING 1   503 Snp identifier column does not exist in annotation file, using 
'ProbeSet_ID' instead.\tSnpIdentifier: probeset_id, AnnotationFile: /home2/Project_2/Affy/GenomeWideSNP_6-na35- 
annot-csv/GenomeWideSNP_6.na35.annot.csv
03/02/2019 19:21:33 ERROR   1   503 Failed to execute sql statement, export failed. Check input annotation 
columns.\tSql: select ProbeSet_ID,Allele_A,Allele_B,Chr_id,Start,dbSNP_RS_ID,ChrX_PAR,Strand from Annotations, 
ErrMsg: Failed to prepare sql statement for querying.\tSQL: select 
ProbeSet_ID,Allele_A,Allele_B,Chr_id,Start,dbSNP_RS_ID,ChrX_PAR,Strand from Annotations, ErrCode: 26 
(https://www.sqlite.org/rescode.html)
03/02/2019 19:21:33 ERROR   1   75  MainNode run failed with one or more errors. ErrCode: 1000

Any ideas how to proceed?

ADD REPLYlink
0
Entering edit mode

So I downloaded sqlite db and again tried. This time the vcf file is being created but with lots of warnings.

#%%field-000=date-time,string8
#%%field-001=facility,string8
#%%field-002=level,int8
#%%field-003=mem usage (MB),string8
#%%field-004=message,string8
#
date-time   facility    level   mem usage (MB)  message
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
ADD REPLYlink
0
Entering edit mode

Okay, getting somewhere again... keep trying!

ADD REPLYlink
0
Entering edit mode

Yes, I understand the frustration, and I do not know to what those error message relate. It seems that at least one indicates that you don't have the 'snp-identifier-column' in the annotation, file, /home2/Project_2/Affy/GenomeWideSNP_6-na35- annot-csv/GenomeWideSNP_6.na35.annot.csv. This should be a column that identifies the genotype of each SNP, I imagine, like A, T, G, C.

Another option: export data using Affymetrix Genotyping Console (or Power Tools) in A, T, G, C format and then manually convert to VCF or PLINK (followed by exporting from PLIBK to VCF).

ADD REPLYlink
0
Entering edit mode

Further update is with the cel file using apt-probeset-genotype I did genotype calling and with those calls I used it to create vcf file using apt-format-result. But the vcf file that I'm getting is somewaht useless. Apparently I'm getting warning as the SNP is detected on the reverse strand and those should be on the forward strand. And moreover the vcf file is created without any REF/ALT or any QUAL or INFO

ADD REPLYlink
0
Entering edit mode

May have to include only those on forward strand. For importing microarray data to PLINK, for example, we filter out 1000s of SNPs because they were called on reverse strand. I also know that working with Affymetric data is cumbersome...

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3