Converting Microarray SNP data into VCF format
1
0
Entering edit mode
5.3 years ago

Hello everyone,

I have only one microarray data which contains one .CEL file and another intensity .csv file. I'm searching for converting any of the two files into vcf format. After going through Biostars and Bioconductor forums, I found pd.genomewidesnp.6@getdb, crlmm and few others. crlmm doesn't work properly cause it needs bigger sample size and pd.genomewidesnp.6@getdb doesn't have a proper documentation or any scripts. Is there any way I can get the microarray data convert to vcf format?

Thanks,

Susmita

Affymetrix SNP VCF RNA-Seq Microarray • 6.2k views
ADD COMMENT
2
Entering edit mode
5.3 years ago

Take a look here, where you already commented: A: How To Convert Illumina/Affy Array Data To Vcf Format - did that not work?

If not, you can also use Affymetrix Power Tools (APT): http://media.affymetrix.com/support/developer/powertools/changelog/apt-format-result.html

apt-format-result is an application which allows for the creation of VCF, or PLINK file formats as well as IGV compatible files from Axiom analyzed batches.

Yet another related answer: A: How to convert and annotate apt-probeset-genotype into PLINK format

Kevin

ADD COMMENT
0
Entering edit mode

Yes i did the affy2vcf and APT, but those didn't work as I don't have the annotation file. Although apart from the .CEL file I do have a csv file of genotype calling of the .CEL file generated through BRLMM-P-Plus . So right now I want to convert that genotype call file into vcf format.

ADD REPLY
0
Entering edit mode

Which specific annotation file? APT should download it automatically, no?

ADD REPLY
0
Entering edit mode

I don't know. When I'm running the apt-format-result, its showing error that annotation file is missing It requires annotation file of the SNP0.6 Array i guess

ADD REPLY
0
Entering edit mode

You likely need one of these: https://www.thermofisher.com/order/catalog/product/901153?SID=srch-srp-901153

Please try to search a bit in the APT options where you can specify the annotation file. It has been a good few years since I last used APT.

ADD REPLY
0
Entering edit mode

So I won't be having that file cause I am analysing data available in GEO from another paper.

ADD REPLY
0
Entering edit mode

So I created an account and downloaded the annotation file. The command also ran but it showed Missing value in identifier column, SNP will be excluded for text export by calls file. Any idea why that could be.

ADD REPLY
0
Entering edit mode

Okay, getting somewhere... Could you try to use the snp-identifier-column parameter?

Additional Options

The following optional commands may be added to the command string as desired:

  • snp-identifier-column: allows a column name to be used as SNP identifier for export. This must be one of annotation columns, e.g. {Affy_SNP_ID, dbSNP_RS_ID}. The default value for this is 'probeset_id'

  • pedigree-file: this file can be used if you wish to include basic pedigree information such as family, sample, maternal, ect.

  • export-call-format: sets format of export calls. Available formats: call_code {AA/AB/BB} or base_call {CC/CT/TT} or translated {0/1/2/-1}. Default is call_code.

ADD REPLY
0
Entering edit mode

Hello Kevin, I know it's been long time. I just got fed up with this and shifted to another project. Now I have come back to the same problem. I'm trying to do as you said but I'm still getting errors.

#%%field-000=date-time,string8
#%%field-001=facility,string8
#%%field-002=level,int8
#%%field-003=mem usage (MB),string8
#%%field-004=message,string8
#
date-time   facility    level   mem usage (MB)  message
03/02/2019 19:21:33 ERROR   1   503 Failed to execute sql statement, load chromosome table failed.\tSql: select * 
from Chromosome, ErrMsg: Failed to prepare sql statement for querying.\tSQL: select * from Chromosome, ErrCode: 
26 (https://www.sqlite.org/rescode.html)
03/02/2019 19:21:33 WARNING 1   503 Snp identifier column does not exist in annotation file, using 
'ProbeSet_ID' instead.\tSnpIdentifier: probeset_id, AnnotationFile: /home2/Project_2/Affy/GenomeWideSNP_6-na35- 
annot-csv/GenomeWideSNP_6.na35.annot.csv
03/02/2019 19:21:33 ERROR   1   503 Failed to execute sql statement, export failed. Check input annotation 
columns.\tSql: select ProbeSet_ID,Allele_A,Allele_B,Chr_id,Start,dbSNP_RS_ID,ChrX_PAR,Strand from Annotations, 
ErrMsg: Failed to prepare sql statement for querying.\tSQL: select 
ProbeSet_ID,Allele_A,Allele_B,Chr_id,Start,dbSNP_RS_ID,ChrX_PAR,Strand from Annotations, ErrCode: 26 
(https://www.sqlite.org/rescode.html)
03/02/2019 19:21:33 ERROR   1   75  MainNode run failed with one or more errors. ErrCode: 1000

Any ideas how to proceed?

ADD REPLY
0
Entering edit mode

So I downloaded sqlite db and again tried. This time the vcf file is being created but with lots of warnings.

#%%field-000=date-time,string8
#%%field-001=facility,string8
#%%field-002=level,int8
#%%field-003=mem usage (MB),string8
#%%field-004=message,string8
#
date-time   facility    level   mem usage (MB)  message
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
03/02/2019 19:38:58 WARNING 1   503 Missing value in identifier column, SNP will be 
excluded for text export by calls file.\tSnpIdentifierColumn: probeset_id, ProbeSetId:  (calls file)
ADD REPLY
0
Entering edit mode

Okay, getting somewhere again... keep trying!

ADD REPLY
0
Entering edit mode

Yes, I understand the frustration, and I do not know to what those error message relate. It seems that at least one indicates that you don't have the 'snp-identifier-column' in the annotation, file, /home2/Project_2/Affy/GenomeWideSNP_6-na35- annot-csv/GenomeWideSNP_6.na35.annot.csv. This should be a column that identifies the genotype of each SNP, I imagine, like A, T, G, C.

Another option: export data using Affymetrix Genotyping Console (or Power Tools) in A, T, G, C format and then manually convert to VCF or PLINK (followed by exporting from PLIBK to VCF).

ADD REPLY
0
Entering edit mode

Further update is with the cel file using apt-probeset-genotype I did genotype calling and with those calls I used it to create vcf file using apt-format-result. But the vcf file that I'm getting is somewaht useless. Apparently I'm getting warning as the SNP is detected on the reverse strand and those should be on the forward strand. And moreover the vcf file is created without any REF/ALT or any QUAL or INFO

ADD REPLY
0
Entering edit mode

May have to include only those on forward strand. For importing microarray data to PLINK, for example, we filter out 1000s of SNPs because they were called on reverse strand. I also know that working with Affymetric data is cumbersome...

ADD REPLY
0
Entering edit mode

System ignores this parameter. I tried it that way and I get the error 'Database Error: no such column: Affy_SNP_ID'. I really don't know where the software gets this parameter, because I already changed the snp-identifier-column parameter.

ADD REPLY
0
Entering edit mode

Did you figure this out later?

ADD REPLY

Login before adding your answer.

Traffic: 2952 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6