Question

How do I identify and differentiate between unidirectional and bidirectional promoters

0

Entering edit mode

8.2 years ago

cbio ▴ 450

I have a set of genes that contain a protein of interest at the TSS. I would like to be able to separate these genes into two classes: genes with a unidirectional promoter, and genes with a bidirectional promoter.

I have access to pair-end GRO-Seq data, but no RNA-seq data. Is there a way to do this?

ChIP-Seq GRO-seq next-gen bidirectional promoters • 2.5k views

ADD COMMENT • link updated 8.1 years ago by ivivek_ngs ★ 5.2k • written 8.2 years ago by cbio ▴ 450

0

Entering edit mode

technically, do you wan to get the 5' reads that go in opposite directions but overlap with each other ( or present with in certain distance, lets say 400bp ?) Like that of enhancerRNAs which transcribe bi directionally ?

ADD REPLY • link 8.1 years ago by GouthamAtla 12k

0

Entering edit mode

Yes this is what I'd like to do. I had previously thought I could simply look for overlapping regions of gro-seq neg/pos coverage bedgraphs 1k from annotated TSS's using bedtools, but this did not work.

ADD REPLY • link 8.1 years ago by cbio ▴ 450

0

Entering edit mode

Do you have a separate files for 5' reads ? When you say paired end data, do you know which reads are originated from 5' of a transcript ?

ADD REPLY • link 8.1 years ago by GouthamAtla 12k

0

Entering edit mode

I do not have a separate file for these. What I have currently is a bedtools genomecoverage bedgraph that contains the entire coverage, and is not limited to the -5' option that I generated using:

genomeCoverageBed -bg -strand + -ibam $infile -g $genome > outdir/genomecoveragebed/$outfile3 

genomeCoverageBed -bg -strand - -ibam $infile -g $genome | awk -F '\t' -v OFS='\t' '{ $4 = - $4 ; print $0 }'> $outdir/genomecoveragebed/$outfile4

I'm very new to this GRO-Seq, and the data wasn't generated by my lab so getting information about it's generation has been difficult at best.

ADD REPLY • link 8.1 years ago by cbio ▴ 450

1

Entering edit mode

If you have paired-End data, somehow you need to separate reads that originated from 5' end. Otherwise you will not be able to find out exactly bidirectional transcripts. Anyway, if you would like to check which of the regions from forward strand are close to regions on reverse strand, you could use the closestBed feature.

closestBed -a Fw_strand.bed -b Rv_strand.bed -d | awk -v OFS="\t" '{ if ($NF<=400) print $1, $2, $3}' | sort -k1,1 -k2,2n | uniq | wc -l

But this won't be exclusive to bidirectional transcripts. Infact, it does not meaningful at all as, in general, paired-end reads maps in fr or rf orientation , so you will definitely end up with may regions that are close to each other on Fw and reverse strand.

Ask the people who generated the data, if they can tell you how to separate reads originated from 5' ends. Then I can tell you how to get bidirectional transcripts.

ADD REPLY • link 8.1 years ago by GouthamAtla 12k

Ram · Answer 1 · 2016-02-29

I believe when you extract the list of genes from your data you have the strand specificity right? so then you will be able to understand which genes correspond to which strand be it + or - thus giving you strand specific feature. Then you can grep your output based on strand features.

This will give you two lists of promoters that have either + or - strandedness. Once you have it when you can overlap the genes to see bidirectional genes , since those which will overlap at refeseqIDs or gene symbols should be shared at the level of both strands. I believe this will help.