Entering edit mode
9.6 years ago
biorepine
★
1.5k
Dear Biostars,
I have a set of noncoding transcript genomic co-ordinates in BED format but with out strand information. I need to calculate the enrichment of a sox2 and oct4 binding sites at the promoters of these transcripts. Without strand information, how could I do this analysis? Any ideas?
Thanx in advance
This lincrna set came from unstranded published rna-seq data ( http://www.ncbi.nlm.nih.gov/pubmed/22403033 ).
I didn't understand what you are trying to say (are you suggesting to take the transcript end that is highly enriched with a given transcription factor as a promoter?).
No, I'm suggesting that you take both ends if you aren't sure which one is actually the promoter. If you first look for enrichment and then use the end(s) with enrichment to test for enrichment then you're quite likely to bias things. The safest bet is to just assume either end of the transcript could be the beginning.
Ok I get it. thanx. do you think this is best solution for this problem?
The only other solution that I can think of is to align those lincRNAs against another organism where they've been better annotated. Alternatively, perhaps you can find some directional public datasets and determine things that way.