FIMO output to bed
1
0
Entering edit mode
6.2 years ago
rbronste ▴ 420

Trying to figure out a good way to covert fimo output files such as the gff file to a standard bed4-6 file for use elsewhere.

Was doing the following but it seems to generate an unusually structured bed file:

sortBed -i fimo.gff | gff2bed > fimo.bed
fimo meme bedops bedtools • 3.1k views
ADD COMMENT
1
Entering edit mode

After answering, I found you've already asked the question

FIMO GFF output to standard BED

ADD REPLY
0
Entering edit mode

My apologies neglected to search back, however your response adds to my understanding of how to do this so thanks.

ADD REPLY
2
Entering edit mode
6.2 years ago
venu 7.1k

A typical FIMO output looks like following (NOTE: This is with version 4.11.4, might slightly differ with other version)

#pattern name   sequence name   start   stop    strand  score   p-value q-value matched sequence
Homeodomain.UP00163_1   CLYBL|chr13|100529567|100530067 231     247     -       11.982  4.03e-05                TTCTTTAATTAATACAA
Homeodomain.UP00163_1   CLYBL|chr13|100529567|100530067 232     248     +       14.036  9.32e-06                TGTATTAATTAAAGAAT
Homeodomain.UP00163_1   CLYBL|chr13|100450472|100450972 247     263     +       10.5405 9.06e-05                TAACCTAATTAGATTCT

You can try the following to get a BED file

cat fimo_result.txt | grep -v pattern | cut -f2 | tr '|' '\t' | cut -f2-4 | sort -k1,1V -k2,2n > fimo_to_bed.bed

output:

chr1    202997705       202998205
chr2    207998147       207998647
chr3    140986720       140987220
chr13   100450472       100450972
ADD COMMENT
0
Entering edit mode

Unless I;m mistaken, this is describing the txt file...not the gff file.

The gff file needs parsing. to get out the p and q values

Why don't you just use the fimo.txt file?

Just cut the first 2 columns from it. The txt file contains all the information in the gff file but also filters by the p/q value you selected when running fimo

ADD REPLY
0
Entering edit mode

Your command just pulls out the factor name instead of the interval, seems like the current format for FIMO output is a little different - can you suggest on how to adjust? Thanks!:

 # motif_id motif_alt_id    sequence_name   start   stop    strand  score   p-value q-value matched_sequence
MA0032.2    FOXC1   chr5:29248245-29248604  202 212 +   15.44   4.51e-07    0.18    tatgtaaacat
MA0032.2    FOXC1   chr18:47026418-47027081 24  34  +   15.3    9.99e-07    0.18    TATGTAAATAT
MA0032.2    FOXC1   chr11:17198451-17199123 170 180 +   15.3    9.99e-07    0.18    tatgtaaatat
MA0032.2    FOXC1   chr15:17197994-17198595 230 240 -   15.3    9.99e-07    0.18    TATGTAAATAT
MA0032.2    FOXC1   chr2:24838945-24839358  294 304 -   15.3    9.99e-07    0.18    TATGTAAATAT
ADD REPLY
0
Entering edit mode

Try this

cat fimo_result.txt | grep -v motif_alt_id | cut -f3 | tr ':-' '\t' | sort -k1,1V -k2,2n > fimo_to_bed.bed
ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6