pwm matrix from txt file python
1
0
Entering edit mode
4.0 years ago
vschultz • 0

Hii!

I'm trying to analyze data for my work and trying to read frequency (pwm) matrix from a txt file containing multiple tables to create consensus sequence in short:

A
1872.00 590.00  3339.00 6805.00 0.00    0.00    6805.00 1917.00
1821.00 5138.00 1992.00 207.00  0.00    0.00    0.00    1391.00
2236.00 246.00  1386.00 192.00  0.00    0.00    0.00    2420.00
877.00  1667.00 87.00   0.00    6805.00 6805.00 0.00    1077.00
B
11369.00    11735.00    3157.00 1226.00 26720.00    29957.00    274.00  29221.00    30645.00    30125.00    13752.00    10200.00
6380.00 2568.00 2096.00 26587.00    3312.00 414.00  391.00  761.00  349.00  595.00  5299.00 6905.00
7434.00 8816.00 24214.00    607.00  184.00  1196.00 386.00  999.00  366.00  502.00  5884.00 5934.00
6843.00 8907.00 2559.00 3606.00 1810.00 459.00  30975.00    1045.00 666.00  804.00  7091.00 8987.00
C
1449.00 688.00  4036.00 8832.00 0.00    96.00   8832.00 2770.00
3929.00 5585.00 2483.00 194.00  0.00    0.00    0.00    2369.00
2290.00 103.00  2197.00 1078.00 0.00    0.00    0.00    2417.00
1164.00 3247.00 116.00  66.00   8832.00 8832.00 0.00    1276.00

So I wrote this to read the file at first:

*with open("t.txt") as tx:
        for line in tx:
            values = line.strip("\n").split("\t")
            print(values)*

to get a matrix afterwards. I get the output correctly but when I try to create a matrix all of the ~1200 values merge into 1 matrix but I need every 4 of them to be 1 some kinda like this:

C
1449.00 688.00  4036.00 8832.00 0.00    96.00   8832.00 2770.00
3929.00 5585.00 2483.00 194.00  0.00    0.00    0.00    2369.00
2290.00 103.00  2197.00 1078.00 0.00    0.00    0.00    2417.00
1164.00 3247.00 116.00  66.00   8832.00 8832.00 0.00    1276.00

but instead I get this:

['X']
['0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00', '0.00']
['0.00', '0.00', '0.00', '4.00', '2.00', '10.00', '0.00', '9.00']
['0.00', '0.00', '0.00', '6.00', '8.00', '0.00', '10.00', '1.00']
['10.00', '10.00', '10.00', '0.00', '0.00', '0.00', '0.00', '0.00']

And i can't :). I tried to check lines as rows by writing print(values[0]) but it gave me every 1st one of the lists.

How am I supposed to read a matrix correctly by not getting rows as lists and every matrix to be seperated from each other?

python matrix array numpy • 1.4k views
ADD COMMENT
0
Entering edit mode

This is not a bioinformatics question, strictly speaking.

You can check if there is only one field, and the length of the content in that field is 1:

if(len(values) == 1 and len(values[0]) == 1):
    next

(Code is untested and my keywords might be wrong, I don't do a lot of Python).

ADD REPLY
0
Entering edit mode

sorry, i'm using these for transcription factor binding site analysis so i thought it could be counted as bioinformatics :)

didn't work but thanks anyway :)!

ADD REPLY
0
Entering edit mode

didn't work

Please give us more than "It did not work". What was expected and what actually happened?

I checked online and the keyword to use is continue, not next. I hope that helps.

ADD REPLY
0
Entering edit mode
  1. str.split() returns a list. That's why you get list.
  2. use pandas to work with table in python.
ADD REPLY
0
Entering edit mode

Your points do not amount to an answer, they are one basic fact and one broad concept. I'm moving this post to a comment.

ADD REPLY
0
Entering edit mode

i tried numpy but never used pandas before. thank you for advice! :)

ADD REPLY
0
Entering edit mode
4.0 years ago

It's not completely clear what you're asking here, but I am going to high recommend using the Biopython motif module if possible, as it will make dealing with motifs in almost any format much, much easier.

ADD COMMENT
0
Entering edit mode

thank you! I'll try :)

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6