Biostar Beta. Not for public use.
FIMO output to bed
0
Entering edit mode
15 months ago
rbronste • 240

Trying to figure out a good way to covert fimo output files such as the gff file to a standard bed4-6 file for use elsewhere.

Was doing the following but it seems to generate an unusually structured bed file:

sortBed -i fimo.gff | gff2bed > fimo.bed
ADD COMMENTlink
1
Entering edit mode

After answering, I found you've already asked the question

FIMO GFF output to standard BED

ADD REPLYlink
0
Entering edit mode

My apologies neglected to search back, however your response adds to my understanding of how to do this so thanks.

ADD REPLYlink
2
Entering edit mode
18 months ago
venu 6.2k
Germany

A typical FIMO output looks like following (NOTE: This is with version 4.11.4, might slightly differ with other version)

#pattern name   sequence name   start   stop    strand  score   p-value q-value matched sequence
Homeodomain.UP00163_1   CLYBL|chr13|100529567|100530067 231     247     -       11.982  4.03e-05                TTCTTTAATTAATACAA
Homeodomain.UP00163_1   CLYBL|chr13|100529567|100530067 232     248     +       14.036  9.32e-06                TGTATTAATTAAAGAAT
Homeodomain.UP00163_1   CLYBL|chr13|100450472|100450972 247     263     +       10.5405 9.06e-05                TAACCTAATTAGATTCT

You can try the following to get a BED file

cat fimo_result.txt | grep -v pattern | cut -f2 | tr '|' '\t' | cut -f2-4 | sort -k1,1V -k2,2n > fimo_to_bed.bed

output:

chr1    202997705       202998205
chr2    207998147       207998647
chr3    140986720       140987220
chr13   100450472       100450972
ADD COMMENTlink
0
Entering edit mode

Unless I;m mistaken, this is describing the txt file...not the gff file.

The gff file needs parsing. to get out the p and q values

Why don't you just use the fimo.txt file?

Just cut the first 2 columns from it. The txt file contains all the information in the gff file but also filters by the p/q value you selected when running fimo

ADD REPLYlink
0
Entering edit mode

Your command just pulls out the factor name instead of the interval, seems like the current format for FIMO output is a little different - can you suggest on how to adjust? Thanks!:

 # motif_id motif_alt_id    sequence_name   start   stop    strand  score   p-value q-value matched_sequence
MA0032.2    FOXC1   chr5:29248245-29248604  202 212 +   15.44   4.51e-07    0.18    tatgtaaacat
MA0032.2    FOXC1   chr18:47026418-47027081 24  34  +   15.3    9.99e-07    0.18    TATGTAAATAT
MA0032.2    FOXC1   chr11:17198451-17199123 170 180 +   15.3    9.99e-07    0.18    tatgtaaatat
MA0032.2    FOXC1   chr15:17197994-17198595 230 240 -   15.3    9.99e-07    0.18    TATGTAAATAT
MA0032.2    FOXC1   chr2:24838945-24839358  294 304 -   15.3    9.99e-07    0.18    TATGTAAATAT
ADD REPLYlink
0
Entering edit mode

Try this

cat fimo_result.txt | grep -v motif_alt_id | cut -f3 | tr ':-' '\t' | sort -k1,1V -k2,2n > fimo_to_bed.bed
ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1