Biostar Beta. Not for public use.
Separating gene list into chromatin domains and analysing separately within each domain
0
Entering edit mode
20 months ago
biostart • 290
Germany

Hello,

I have gene expression from RNA-seq, and want to separate genes into categories based to which TAD they belong. Say, I have coordinates of genes together with expression in one file and coordinates of TADs in another file, and I want to intersect these two files and add in the resulting new file with genes a new column with the number of the TAD to which a given gene belongs.

And the next step is to compare gene expression inside and outside each TAD.

Is there already a shared solution to do this?

Thanks!

1
Entering edit mode

Have you tried bedtools intersect?

$intersect -a <genes> -b <tads> -loj > genes_at_tads.bed ADD REPLYlink 0 Entering edit mode Yes, I actually ended up sorting both files and then applying intersectBed with option -wo. Which is equivalent to what you proposed. This, however, does not mark TADs by numbers (1,2,3, etc). So any downstream analysis requires an additional step reading the TAD coordinates and comparing them. Which means, I am afraid, that there is no ready solution to compare gene expression inside and outside each TAD? Has to be written manually? ADD REPLYlink 1 Entering edit mode Can you show how your TADs are saved? The intersect command will print per each gene the TAD it overlaps including the ID of the TAD. I assumed that each or your TADs had an ID. You can add a number to each TAD as follows: perl -lane '$count++; $,="\t",$F[3]=$count; print @F' TADS.bed > TADS_with_number.bed Notice that I assume that you already have the TADs as a .bed file in which the 4th column corresponds to the ID. ADD REPLYlink 0 Entering edit mode Are you familiar with any particular programming language such as R? ADD REPLYlink 0 Entering edit mode The question is whether a solution already exists to not repeat it. The task seems to be quite common. Any language would be fine. Perl, etc ADD REPLYlink 0 Entering edit mode GenomicRanges in Bioconductor supports this type of operation in all its simplicity or complexity (you would roll your own solution). ADD REPLYlink 1 Entering edit mode 18 months ago Seattle, WA USA If your TADs are BED files with the ID column containing a unique label (such as a unique number or other string that acts as a unique identifier), then you can use BEDOPS bedmap --echo-map-id-uniq to get a unique list of IDs of mapping TADs. For example: $ bedmap --echo --echo-map-id-uniq --delim '\t' genes.bed TADS.bed > answer.bed


The first columns of answer.bed contain each gene from genes.bed. The remaining columns contain a semi-colon delimited list of unique TAD IDs, for TADs which overlap the gene by one or more bases (when there are overlaps).

From here, it should be a simple matter to do set operations on genes which do and do not have associations with TAD IDs, and then do the respective signal analysis on subsets.