Parsing AME tsv file
1
0
Entering edit mode
5.5 years ago
rbronste ▴ 420

I am trying to find a quick and easy way to parse an AME generated true positive sequences.tsv file to pull out just a 3 column BED, the format look as follows, any ideas would be awesome thanks!

motif_DB  motif_ID  seq_ID   FASTA_score PWM_score  class

Jaspar  MA0004.1    chr5:144788829-144789179_shuf_2     2183    12.7135 fp
Jaspar  MA0004.1    chr5:112339537-112339887_shuf_1     1713    12.7131  tp
Jaspar  MA0004.1    chr16:94739915-94740265_shuf_1      1668    12.712   tp
meme ame tsv bed motif • 1.4k views
ADD COMMENT
3
Entering edit mode
5.5 years ago
grep -v ^motif in.tsv | cut -f 3 | cut -d '_' -f 1 | tr ":-" "\t"
ADD COMMENT
0
Entering edit mode

Thanks very helpful! Is there additionally a way to include only the true-positive sequences (tp in final column) in the output bed?

ADD REPLY
1
Entering edit mode

That will be another grep or awk in the command :)
I think you can figure out how to do that?

ADD REPLY
0
Entering edit mode

Maybe a hint? :) Not as familiar with awk, though trying to learn.

ADD REPLY
1
Entering edit mode

I would add another grep to get lines with tp, prior to cut.

ADD REPLY
0
Entering edit mode

Ok figured it out seems to work like this for true positive intervals with specific motifs IDs:

grep -v ^motif sequences.tsv | grep -w tp | grep -w MA0258.2 | cut -f 2,3,6 | cut -d '_' -f 1 | tr ":-" "\t" | head

MA0258.2    chr12   15566967    15567317    tp
MA0258.2    chr11   88155633    88155983    tp
MA0258.2    chr15   51071410    51071760    tp
MA0258.2    chr14   22151488    22151838    tp

Thanks for your help.

ADD REPLY
0
Entering edit mode

Thanks very helpful!

Used this to get the following:

grep -v ^motif sequences.tsv | cut -f 2,3  | cut -d '_' -f 1 | tr ":-" "\t" 

    MA0258.2    chr12   15566967    15567317    tp
    MA0258.2    chr11   88155633    88155983    tp
    MA0258.2    chr15   51071410    51071760    tp
    MA0258.2    chr14   22151488    22151838    tp

However can't quite figure out how to select only specific motif_IDs in the .tsv file as well as only tp (true positive) values for those specific motif IDs.

ADD REPLY

Login before adding your answer.

Traffic: 1539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6