Target genes of position-specific transcription factors
1
0
Entering edit mode
5.5 years ago

I have a list of transcription factors with the following information:

chrom position  rs_ids   cosmic_id   motif_id   motif_alt_id   matched_sequence

chr3 71435739 rs201008547 COSN16951339 MA0528.1 ZNF263 GAGGGAGGAAGGGACGGAGGG

I want to know the target genes which are regulated by these transcription factors. Can anyone please give me suggestions for how can I do so, and what kind of tools I should use?

I also tried previously described tools and online browser but they give thousands of targeted genes for 1 transcription factor but I want more position-specifically related target genes.

Thanks in advance!!!!!

transcription factor targetted genes • 1.5k views
ADD COMMENT
0
Entering edit mode

I also tried previously described tools and online browser but they give thousands of targeted genes for 1 transcription factor but I want more position-specifically related target genes.

Some transcription factors do, literally, have many thousands of targets. Look up oestrogen ('estrogen', in US english) receptor α (alpha), Myc, and Pten, for example. Keep in mind that a transcription factor doesn't know what are its targets... it just binds wherever there is an electromagnetic / 'electrochemical' potential such that it can bind, which is mediated via target DNA sequence motifs and binding sites on the transcription factor. Where binding is sufficiently strong, it may exert its effects; where binding is not strong, the effect may be weaker or non-existent. Also, the target regions have to be accessible for binding to occur - different regions of chromatin will be 'open' (accessible) in different tissues due to tissue-specific differences. These can be gauged by ATAC-seq.

Using the programs that you have already tried, you should be able to order the targets by some sort of score and/or decide whether tissue-specific differences may be at play.

ADD REPLY
0
Entering edit mode
5.5 years ago

It's a bit of work, maybe, but perhaps the following set operations could guide some investigation.

Your example TF is MA0528.1, which is a Jaspar identifier.

For your genome of interest, you could run that genome's sequence through FIMO to call binding sites of Jaspar TF models at some threshold, say 1e-4 or 1e-5. Say this file is called tbfs.jaspar.1e-5.bed.

Given a set of whole-genome binding sites, you can then filter that set using the proximal promoters of all genes of interest (genes.bed). These could be Gencode genes in GFF format, converted to BED via gff2bed, or by way of similar approaches.

Proximal promoters could be defined as a 1kb region upstream of the gene's TSS:

$ bedops -u <( awk ($6="+") genes.bed | bedops --range -1000:0 - ) <( awk ($6="-") genes.bed | bedops --range 0:1000 - ) > promoters.bed

Then filter the whole-genome TFBS set :

$ bedops --element-of 1 tbfs.jaspar.1e-5.bed promoters.bed > tbfs.jaspar.1e-5.subset.bed

Then grep this subset for MA0528.1:

$ grep MA0528.1 tbfs.jaspar.1e-5.subset.bed > MA0528.1.hits.bed

and map these hits back to the genes:

$ bedmap --range 1000 --echo --skip-unmapped genes.bed MA0528.1.hits.bed > answer.bed

You might add TF-specific ChIP-seq data overlaps as experimental evidence of concordance of gene promoters derived from answer.bed with TFs of interest actually binding to those regions in real life.

ADD COMMENT
0
Entering edit mode

Alex Reynolds thanx for your reply I downloaded the file from Genecode "gencode.v29.chr_patch_hapl_scaff.annotation.gff3", and converted into bed format:

chr1    1320455 1320529 exon:ENST00000435064.5:3        .       -       HAVANA  exon    .       ID=exon:ENST00000435064.5:3;Parent=ENST00000435064.5;gene_id=ENSG00000127054.20;transcript_id=ENST00000435064.5;gene_type=protein_coding;gene_name=INTS11;transcript_type=protein_coding;transcript_name=INTS11-208;exon_number=3;exon_id=ENSE00003666435.1;level=2;protein_id=ENSP00000413493.1;transcript_support_level=1;tag=basic,appris_principal_1,CCDS;ccdsid=CCDS21.1;havana_gene=OTTHUMG00000003330.13;havana_transcript=OTTHUMT00000009360.2

When I run bedops command:

bedops -u <( awk ($6="+") gencode.v29.chr_patch_hapl_scaff.annotation.bed | bedops --range -1000:0 - ) <( awk ($6="-") gencode.v29.chr_patch_hapl_scaff.annotation.bed | bedops --range 0:1000 - ) > promoters.bed

it gives me the following error:

-bash: command substitution: line 15: syntax error near unexpected token `$6="+"'
-bash: command substitution: line 15: ` awk ($6="+") gencode.v29.chr_patch_hapl_scaff.annotation.bed | bedops --range -1000:0 - )'

Could you check is there any problem in bed format? etc.,

ADD REPLY
0
Entering edit mode

Sorry, try adding ticks around the awk condition:

bedops -u <( awk '($6="+")' gencode.v29.chr_patch_hapl_scaff.annotation.bed | ... )
ADD REPLY
0
Entering edit mode

again it gives the following error.

bedops -u <( awk '($6="+")' gencode.v29.chr_patch_hapl_scaff.annotation.bed | bedops --range -1000:0 - ) <( awk '($6="-")' gencode.v29.chr_patch_hapl_scaff.annotation.bed | bedops --range 0:1000 - ) > promotor.bed
    May use bedops --help for more help.

    Error: Bad Input
    No operation argument given.
    May use bedops --help for more help.

    Error: Bad Input
    No operation argument given.
ADD REPLY

Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6