SNP Questions: Mendelian Error Rates and SNP Orientation relative to genome.
1
0
Entering edit mode
9.9 years ago
Tom ▴ 40

Hi everyone,

Could I please ask two questions, both related to SNPs.

Question 1:

I have .bim/.bed file for a set of SNPs. I want to remove SNPs from my data set that have a >2% Mendelian error rate.

I tried:

./plink --bfile IN_FILE --me 0.02 0.02 --noweb --make-bed --out X.

But the output tells me 0 SNPs were removed. I am working with 500K,100K and 50K arrays and I would have expected at least one SNP to be removed. I messed around with increasing and decreasing the -me 0.02, 0.02 parameters, and nothing is ever removed.

Could someone tell me the correct command to remove SNPs from a data set that have >2% Mendelian error rate?

Question 2:

I have a set of SNPs (it's an Affymetrix 100K). How do I tell whether a SNP is on the plus or minus strand?

For example, I have a set of SNPs. The information I have for each SNP is:

Chr,Pos,Submitter_snp_name,Ss#,Rs#,Genome_build_id,ALLELE1_genome_orien,ALLELE2_genome_orien,ALLELE1_orig_assay_orien,ALLELE2_orig_assay_orien,QC_TYPE,SNP_flank_sequence,SOURCE,Ss2rs_orientation,Rs2genome_orienation,Orien_flipped_assay_to_genome

This is an example of a SNP (let's call it SNPx):

12,744051,SNP_A287197,ss7481221,rs31368,36.2,G,A,C,T,A,TCGGCCTGCAGTCCTCC[A/G]CTCTCAGGTTTGCAC,HuGeneFocused,+,-,y.

And I have a set of genes and their location in the genome, for example:

EntrezID   Chr #    GeneStart                  GeneEnd               Strand
1              12           744000                   9067900                   minus
4              12           744001                  130887675                 plus
111           3          123282296                123449077                minus
142          1          226360691                 226408100               minus
185           3          148697871                 148743003                plus

This is entrez ID, chromosome number, their start and end position in the genome, and whether they are on the plus/minus strand.

I need to find out whether SNPx lies within any of these genes.

I am confused as to how to tell whether the SNP is on the plus or minus strand?

(1) , in this case, would SNPx belong to Entrez gene ID 1, or 4? They both are on the same chromosome in about the same position, just one is on the plus strand and one is on the minus strand?

(2) Do I need to account for ss to rs orientation, and rs to genome orientation? How should I do this, if so?

Thanks
Aoife

orientation SNP error plink mendelian • 3.4k views
ADD COMMENT
0
Entering edit mode

How many samples are in your bed file?

ADD REPLY
0
Entering edit mode

Well I'm using three data sets: 50K, 100K and 500K data set, so after other quality filtering measures, there's 35,000 in the 50K, 90,000 in the 100K and 300,000 in the 500K data set, for about 1,000 people.

Thanks.

ADD REPLY
0
Entering edit mode

Two quick questions:

(i) are you certain that Mendelian errors weren't filtered out before this dataset got to you?

(ii) how many families/trios does PLINK report being present?

ADD REPLY
0
Entering edit mode
9.9 years ago

Regarding question 2, SNPs don't (normally*) have strands, since a G->A on one strand means a C->T on the other. Thus, both genes are affected in your example (likely in their UTRs). The ss/rs orientation just tells you about the orientation of the probe and reference SNP sequence relative to the genomic sequence.

*I wrote "normally" above since if one is doing single-cell sequencing, then one could observe occasional losses of strand complementarity.

ADD COMMENT

Login before adding your answer.

Traffic: 1478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6