Hi everybody,
while parsing Affymetrix's NetAffx transcript annotation file (HuGene-2_0-st-v1.na35.hg19.transcript.csv) for HuGene-2-0-st chips I discovered that some of the gene_assignment and mrna_assignment fields in the file are incomplete and that these fields contain the string [WARNING: THIS FIELD TRUNCATED]. It appears like all affected fields (only checked gene_assignment and mrna_assignment columns) are 32532 characters long (including the warning message). I was wondering if anyone else already ran into the issue and maybe has an official or in-official explanation for these entries.
Some background (can be safely skipped - TLDR): The NetAffx transcript annotation file for HuGene-2_0-st file is (as other NetAffx annotation files) CSV formatted using commas as seperator and double quotes as quotation chars. Each line holds 18 columns, among others the before mentioned gene_assignment and mrna_assignment columns. Both these columns hold structured annotation (Affymetrix refers to it as multipart) and can hold multiple annotation entries. ⍽///⍽ are used to seperate multiple annotation entries and ⍽//⍽ are used as seperator within an annotation entry. In the documentation the following description is provided for the two columns: gene_assignment: >>Gene information for each assigned mRNA for mRNAs that corresponds to known genes.<< mrna_assignment: >>Description of the public mRNAs that should be detected by the sets within this transcript cluster based on sequence alignment.<< In consequence for each annotation entry in the gene_assignment column a corresponding entry in the mrna_assignment column should be available, but due to the fact that also some mrna_assignment column values have been truncated this does not hold true. Maybe, I appear now to be overly pedantic, however, it is not the fact that this basic assumption is violated that bugs me, but the remote chance that a gene assignment could be missing and that for certain transcripts listed in the gene_assignment column detailed information like assignment score and coverage is not available.